This chapter deals with some important examples of contrastfunctions on a space of density functions, such as: Bregman divergence, Kullback–Leibler relative entropy, f-divergence, Hellinger distance, Chernoff information, Jefferey distance, Kagan divergence, and exponential contrast function. The relation with the skewness tensor and α-connection is made. The goal of this chapter is to produce hands-on examples for the theoretical concepts introduced in Chap. 11.

1 A First Example

We start with a suggestive example of Bregman divergence. We show that the Kullback–Leibler relative entropy on a statistical model is a particular example of Bregman divergence.

Let \(\mathcal{S} = \mathcal{P}(\mathcal{X})\), where \(\mathcal{X} =\{ x^{1},\ldots,x^{n+1}\}\) and consider the global chart \(\phi: \mathcal{S}\rightarrow \mathbb{E} \subset \mathbb{R}^{n}\)

$$\displaystyle{\phi (p) = (\ln p_{1},\ldots,\ln p_{n}) = (\xi ^{1},\ldots,\xi ^{n}),}$$

with the parameter space

$$\displaystyle{\mathbb{E} =\{ (\xi ^{1},\ldots,\xi ^{n});\xi ^{k} > 0,\sum _{ k=1}^{n}\xi ^{k} < 1\}.}$$

The contrast function on \(\mathcal{S}\) is then given by

$$\displaystyle\begin{array}{rcl} D_{\mathcal{S}}(p\vert \vert q)& =& D_{\mathcal{S}}\big(\phi ^{-1}(p)\vert \vert \phi ^{-1}(q)\big) {}\\ & =& D(\xi _{1}\vert \vert \xi _{2}), {}\\ \end{array}$$

where D(⋅  | | ⋅ ) is the Bregman divergence on \(\mathbb{E}\) induced by the convex function \(\varphi (\xi ) =\sum _{ i=1}^{n}e^{\xi _{i}}\), i.e.,

$$\displaystyle{D(\xi _{1}\vert \vert \xi _{2}) =\varphi (\xi _{2}) -\varphi (\xi _{1}) -\sum _{i=1}^{n}\partial _{ i}\varphi (\xi _{1})(\xi _{2}^{i} -\xi _{ 1}^{i}).}$$


$$\displaystyle\begin{array}{rcl} D_{\mathcal{S}}(p\vert \vert q)& =& D(\xi _{1}\vert \vert \xi _{2}) {}\\ & =& \sum _{i}e^{\xi _{2}^{i} } -\sum _{i}e^{\xi _{1}^{i} } -\sum _{i}e^{\xi _{1}^{i} }(\xi _{2}^{i} -\xi _{ 1}^{i}) {}\\ & =& \sum _{i}p_{i} -\sum _{i}q_{i} -\sum _{i}p_{i}\ln \frac{q_{i}} {p_{i}} {}\\ & =& \sum _{i}p_{i}\ln \frac{p_{i}} {q_{i}} = D_{KL}(p\vert \vert q). {}\\ \end{array}$$

Hence, the induced contrast function \(D_{\mathcal{S}}\) on \(\mathcal{P}(\mathcal{X})\) in this case is the Kullback–Leibler relative entropy.

2 f-Divergence

An important class of contrast functions on statistical models was introduced by Csiszár [31, 32]. Let \(f: (0,\infty ) \rightarrow \mathbb{R}\) be a function satisfying the following conditions

  1. (a)

    f is convex;

  2. (b)

    f(1) = 0; 

  3. (c)

    f″(1) = 1.

For each probability distributions p, q, consider

$$\displaystyle{ D_{f}(p\vert \vert q) = E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] =\int _{\mathcal{X}}p(x)f\Big(\frac{q(x)} {p(x)}\Big)\,dx. }$$

We shall assume that the previous integral converges and we can differentiate under the integral sign.

Proposition 12.2.1

The operator D f (⋅ || ⋅) is a contrast function on the statistical model \(\mathcal{S} =\{ p_{\xi }\}\) .


We check the properties of a contrast function.

  1. (i)

    positive: Jensen’s inequality applied to the convex function f provides

    $$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] \geq f\Big(E_{p}\Big[\frac{q(x)} {p(x)}\Big]\Big) {}\\ & =& f\Big(\int _{\mathcal{X}}p(x)\frac{q(x)} {p(x)}\,dx\Big) = f(1) = 0. {}\\ \end{array}$$
  2. (ii)

    non-degenerate: Let p ≠ q. Since f is strictly convex at 1, then

    $$\displaystyle{D_{f}(p\vert \vert q) = E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] > f\Big(E_{p}\Big[\frac{q(x)} {p(x)}\Big]\Big) = f(1) = 0,}$$

    and hence D(p | | q) ≠ 0, which implies the non-degenerateness.

  3. (iii)

    The vanishing property of the first variation along the diagonal {ξ 1 = ξ 2} is a consequence of (i) and (ii).

  4. (iv)

    Let \(p = p_{{\xi _{ 0}}}\) and \(q = p_{{\xi }}\). We shall compute the Hessian of

    $$\displaystyle{ D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }}) =\int _{\mathcal{X}}p_{{\xi _{ 0}}}(x)f\Big( \frac{p_{{\xi }}(x)} {p_{{\xi _{ 0}}}(x)}\Big)\,dx }$$

    along the diagonal ξ 0 = ξ. Differentiating we have

    $$\displaystyle\begin{array}{rcl} \partial _{\xi ^{j}}f\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{1} {p_{{\xi _{ 0}}}} \partial _{\xi ^{j}}p_{{\xi }} {}\\ \partial _{\xi ^{i}}\partial _{\xi ^{j}}f\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)^{2}\partial _{\xi ^{ i}}(\ln p_{{\xi }})\partial _{\xi ^{j}}(\ln p_{{\xi }}) {}\\ & & +f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{1} {p_{{\xi }}}\partial _{\xi ^{i}}\partial _{\xi ^{j}}p_{{\xi }}. {}\\ \end{array}$$

    Differentiating under the integral we get

    $$\displaystyle\begin{array}{rcl} \partial _{\xi ^{i}}\partial _{\xi ^{j}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}& =& f''(1)\int p_{{\xi _{ 0}}}\partial _{\xi ^{i}}\ln p_{{\xi }}\,\partial _{\xi ^{j}}\ln p_{\xi }\,dx_{\vert \xi =\xi _{0}} {}\\ & & +f'(1)\partial _{\xi ^{i}}\partial _{\xi ^{j}}\int p_{{\xi }}(x)\,dx {}\\ & =& f''(1)E_{\xi }[\partial _{\xi ^{i}}\ell(\xi )\partial _{\xi ^{j}}\ell(\xi )] {}\\ & =& E_{\xi }[(\partial _{\xi ^{i}}\ell)(\partial _{\xi ^{j}}\ell)] = g_{ij}(\xi ), {}\\ \end{array}$$

    which is strictly positive definite, since it is the Fisher–Riemann information matrix. Hence D f (⋅  | | ⋅ ) is a contrast function.


Theorem 12.2.2

The Riemannian metric induced by the contrast function D f (⋅ || ⋅) on the statistical model \(\mathcal{S} =\{ p_{{\xi }}\}\) is the Fisher–Riemann information matrix

$$\displaystyle{g_{ij}(\xi ) = \partial _{\xi ^{i}}\partial _{\xi ^{j}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}.}$$


It follows from the calculation performed in the part (iv) above. ■ 

Let \(f^{{\ast}}(u) = uf\Big(\frac{1} {u}\Big)\). Since

$$\displaystyle\begin{array}{rcl} f^{{\ast}}(1)& =& f(1) = 0 {}\\ f^{{\ast\prime\prime}}(u)& =& \frac{1} {u^{3}}f''\Big(\frac{1} {u}\Big) \geq 0 {}\\ f^{{\ast\prime\prime}}(1)& =& f^{\prime\prime}(1) = 1, {}\\ \end{array}$$

then f satisfies properties (a)–(c), and hence \(D_{f^{{\ast}}}(\cdot \,\vert \vert \,\cdot )\) is a contrast function, which defines the same Riemannian metric as D f (⋅  | | ⋅ ).

Proposition 12.2.3

The contrast function \(D_{f^{{\ast}}}(\cdot \,\vert \vert \,\cdot )\) is the dual of D f (⋅ || ⋅).


Consider the dual D f (p | | q) = D f (q | | p). Then we have

$$\displaystyle\begin{array}{rcl} D_{f^{{\ast}}}(p\vert \vert q)& =& \int _{\mathcal{X}}p(x)f^{{\ast}}\Big(\frac{q(x)} {p(x)}\Big)\,dx {}\\ & =& \int _{\mathcal{X}}p(x)\frac{q(x)} {p(x)}f\Big(\frac{p(x)} {q(x)}\Big)\,dx {}\\ & =& \int _{\mathcal{X}}q(x)f\Big(\frac{p(x)} {q(x)}\Big)\,dx {}\\ & =& D_{f}(q\vert \vert p) = D_{f}^{{\ast}}(p\vert \vert q),\qquad \forall p,q \in \mathcal{S}. {}\\ \end{array}$$

Therefore \(D_{f^{{\ast}}} = D_{f}^{{\ast}}\). ■ 

In the following we shall find the induced connections. Let ∇(f) be the linear connection induced by the contrast function D f (⋅  | | ⋅ ), and denote by Γ ij, k (f) its components on a local basis.

Proposition 12.2.4

We have

$$\displaystyle{ \varGamma _{ij,k}^{(f)}(\xi ) = E_{\xi }\Big[\big(\partial _{ i}\partial _{j}\ell - (f'''(1) + 1)\partial _{i}\partial _{j}\ell\big)\partial _{k}\ell\Big]. }$$


From formula (11.5.18) we find

$$\displaystyle{ \varGamma _{ij,k}^{(f)}(\xi ) = -\partial _{\xi _{ 0}^{i}}\partial _{\xi _{0}^{j}}\partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}. }$$

We shall compute the derivatives on the right side. Differentiating in (12.2.2) yields

$$\displaystyle{ \partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }}) =\int _{\mathcal{X}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)p_{{\xi }}\partial _{\xi ^{k}}\ell(\xi )\,dx. }$$

Before continuing the computation we note that

$$\displaystyle\begin{array}{rcl} \partial _{\xi _{0}^{j}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\Big(\frac{-p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & {}\\ \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f'''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}^{2}} {p_{{\xi _{ 0}}}^{2}}\partial _{\xi _{0}^{i}}\ell(\xi _{0})\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & +f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \partial _{\xi _{0}^{i}}\ell(\xi _{0})\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & -f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\ell(\xi _{0}). {}\\ \end{array}$$

Applying now \(\partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\) to (12.2.5), using the foregoing formulas, and taking ξ 0 = ξ, yields

$$\displaystyle\begin{array}{rcl} \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}& =& \int _{\mathcal{X}}\Big[(f'''(1) + f''(1))p_{{\xi }}(\partial _{\xi ^{i}}\ell)(\partial _{\xi ^{j}}\ell)(\partial _{\xi ^{k}}\ell) {}\\ & & -f''(1)(\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell)(\partial _{\xi ^{k}}\ell)\Big]\,dx {}\\ & =& E_{\xi }\Big[\big(\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell - (f'''(1) + 1)\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell\big)\partial _{\xi ^{k}}\ell\Big]. {}\\ \end{array}$$

Applying (12.2.4) we arrive at (12.2.3). ■ 

The relation with the geometry of α-connections is given below.

Theorem 12.2.5

The connection induced by D f (⋅ || ⋅) is an α-connection

$$\displaystyle{\nabla ^{(f)} = \nabla ^{(\alpha )},}$$

with \(\alpha = 2f'''(1) + 3\) .


It suffices to show the identity in local coordinates. Recall first the components of the α-connection given by (1.11.34)

$$\displaystyle{ \varGamma _{ij,k}^{(\alpha )} = E_{\xi }\Big[\Big(\partial _{ i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big]. }$$

Comparing with (12.2.3) we see that Γ ij, k (f) = Γ ij, k (α) if and only if \(\alpha = 2f'''(1) + 3\). ■ 

We make the remark that \(\nabla ^{(f^{{\ast}}) } = \nabla ^{(-\alpha )}\), which follows from the properties of dual connections induced by contrast functions. We shall show shortly that for any α there is a function f satisfying (a)–(c) and solving the equation \(\alpha = 2f'''(1) + 3\).

Proposition 12.2.6

The skewness tensor induced by the contrast function D f (⋅ || ⋅) is given in local coordinates by

$$\displaystyle{T_{ijk}^{(f)} = (2f'''(1) + 3)E_{\xi }[(\partial _{ i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)].}$$


Using Theorem 12.2.5, formula (12.2.6) and the aforementioned remarks, we have

$$\displaystyle\begin{array}{rcl} T_{ijk}^{(f)}& =& \varGamma _{ ijk}^{(f^{{\ast}}) } -\varGamma _{ijk}^{(f)} =\varGamma _{ ijk}^{(-\alpha )} -\varGamma _{ ijk}^{(\alpha )} {}\\ & =& E_{\xi }\Big[\Big(\partial _{i}\partial _{j}\ell + \frac{1+\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big] {}\\ & & -E_{\xi }\Big[\Big(\partial _{i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big] {}\\ & =& \alpha E_{\xi }[(\partial _{i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)] {}\\ & =& (2f'''(1) + 3)E_{\xi }[(\partial _{i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)]. {}\\ \end{array}$$


3 Particular Cases

This section presents a few classical examples of contrast functions as particular examples of D f (⋅  | | ⋅ ). These are constructed by choosing several examples of functions f that satisfy conditions (a)–(c) and verify the equation \(\alpha = 2f'''(1) + 3\). We make the remark that if f is such a function, then \(f_{c}(u) = f(u) + c(u - 1)\), \(c \in \mathbb{R}\), is also a function that induces the same contrast function, \(D_{f_{c}} = D_{f}\). Therefore, the correspondence between functions f and contrast functions is not one-to-one.

3.1 Hellinger Distance

Consider \(f(u) = 4(1 -\sqrt{u})\) and the associated contrast function

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& 4\int _{\mathcal{X}}p(x)\Big(1 -\sqrt{\frac{q(x)} {p(x)}}\Big)dx = 4\Big(1 -\int _{\mathcal{X}}\sqrt{p(x)q(x)}\,dx\Big) {}\\ & =& 2\Big(2 -\int _{\mathcal{X}}2\sqrt{p(x)q(x)}\,dx\Big) {}\\ & =& 2\int _{\mathcal{X}}\Big(p(x) - 2\sqrt{p(x)q(x)} + q(x)\Big)\,dx {}\\ & =& 2\int _{\mathcal{X}}\big(\sqrt{p(x)} -\sqrt{q(x)}\big)^{2}\,dx {}\\ & =& H^{2}(p,q). {}\\ \end{array}$$

H(p, q) is called the Hellinger distance, and is a true distance on the statistical model \(\mathcal{S} =\{ p_{{\xi }}\}\). Since in this case \(\alpha = 2f'''(1) + 3 = 0\), the linear connection induced by H 2(p, q) is exactly the Levi–Civita connection, ∇(0), on the Riemannian manifold \((\mathcal{S},g)\).

Example 12.3.1

Consider two exponential distributions, \(p(x) =\alpha e^{-\alpha x}\) and \(q(x) =\beta e^{-\beta x}\), x ≥ 0, α, β > 0. Then

$$\displaystyle\begin{array}{rcl} H^{2}(p,q)& =& 4 - 4\int _{ 0}^{\infty }\sqrt{p(x)q(x)}\,dx {}\\ & =& 4 - 4\sqrt{\alpha \beta }\int _{0}^{\infty }e^{-\frac{\alpha +\beta } {2} x}\,dx {}\\ & =& 4 -\frac{8\sqrt{\alpha \beta }} {\alpha +\beta }, {}\\ \end{array}$$

hence the Hellinger distance is \(H(p,q) = 2\sqrt{1 - \frac{2\sqrt{ \alpha \beta } } {\alpha +\beta }}\).

The Hellinger distance can also be defined between two discrete distributions p = (p k ) and q = (q k ), replacing the integral by a sum

$$\displaystyle{H(p,q) = 2\Big(1 -\sum _{k\geq 0}\sqrt{p_{k } q_{k}}\Big)^{1/2} =\Big (2\sum _{ k\geq 0}\big(\sqrt{p_{k}} -\sqrt{q_{k}}\big)^{2}\Big)^{1/2}.}$$

Example 12.3.2

Consider two Poisson distributions, \(p_{k} = \frac{\alpha ^{k}} {k!}e^{-\alpha }\) and \(q_{k} = \frac{\beta ^{k}} {k!}e^{-\beta }\), k ≥ 0. Then

$$\displaystyle\begin{array}{rcl} \sum _{k\geq 0}\sqrt{p_{ k}q_{k}}& =& \sum _{k\geq 0}\frac{(\sqrt{\alpha \beta })^{k}} {k!} e^{-\frac{\alpha +\beta } {2} } {}\\ & =& e^{-\frac{\alpha +\beta } {2} }e^{\sqrt{\alpha \beta }}\sum _{k\geq 0}\frac{(\sqrt{\alpha \beta })^{k}} {k!} e^{-\sqrt{\alpha \beta }} {}\\ & =& e^{\sqrt{\alpha \beta }-\frac{\alpha +\beta } {2} }. {}\\ \end{array}$$

Hence, the Hellinger distance becomes

$$\displaystyle{H(p,q) = 2\Big(1 -\sum _{k\geq 0}\sqrt{p_{k } q_{k}}\Big)^{1/2} = 2\sqrt{1 - e^{\sqrt{ \alpha \beta } -\frac{\alpha +\beta } {2} }}.}$$

3.2 Kullback–Leibler Relative Entropy

The contrast function associated with function \(f(u) = -\ln u\) is given by

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& \int _{\mathcal{X}}p(x)\ln \frac{p(x)} {q(x)}\,dx = D_{KL}(p\vert \vert q), {}\\ \end{array}$$

which is the Kullback–Leibler information or the relative entropy. In this case \(\alpha = 2f'''(1) + 3 = -1\), so the associated connection is ∇(−1).

It is worthy to note that the convex function f(u) = ulnu induces the contrast function

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& \int _{\mathcal{X}}q(x)\ln \frac{q(x)} {p(x)}\,dx = D_{KL}(q\vert \vert p) = D_{KL}^{{\ast}}(p\vert \vert q), {}\\ \end{array}$$

which is the dual of the Kullback–Leibler information, see [51, 53]. Since \(\alpha = 2f'''(1) + 3 = 1\), the induced connection is ∇(1).

3.3 Chernoff Information of Order α

The convex function

$$\displaystyle{f^{(\alpha )} = \frac{1} {1 -\alpha ^{2}}(1 - u^{\frac{1+\alpha } {2} }),\qquad \alpha \not = \pm 1}$$

induces the contrast function

$$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 -\int _{\mathcal{X}}p(x)^{\frac{1-\alpha } {2} }q(x)^{\frac{1+\alpha } {2} }\,dx\Big\},}$$

see Chernoff [27]. For the computation of D (α) in the case of exponential, normal and Poisson distributions, see Problems 12.9., 12.10. and 12.11. We note that for α = 0 we retrieve the squared Hellinger distance, D (0)(p | | q) = H 2(p, q).

3.4 Jeffrey Distance

The function \(f(u) = \frac{1} {2}(u - 1)\ln u\) induces the contrast function

$$\displaystyle\begin{array}{rcl} J(p,q)& =& D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\ln \frac{p(x)} {q(x)}\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}\big(p(x) - q(x)\big)\big(\ln p(x) -\ln q(x)\big)\,dx, {}\\ \end{array}$$

see Jeffrey [47]. A computation shows that α = 0, so the induced connection is the Levi–Civita connection ∇(0). In fact, the Jeffrey contrast function is the same as the symmetric Kullback–Leibler relative entropy

$$\displaystyle\begin{array}{rcl} J(p,q)& =& \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\ln \frac{p(x)} {q(x)}\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}p(x)\ln \frac{p(x)} {q(x)} + \frac{1} {2}\int _{\mathcal{X}}q(x)\ln \frac{q(x)} {p(x)}\,dx {}\\ & =& \frac{1} {2}\Big(D_{KL}(p\vert \vert q) + D_{KL}(q\vert \vert p)\Big). {}\\ \end{array}$$

3.5 Kagan Divergence

Choosing \(f(u) = \frac{1} {2}(1 - u)^{2}\) yields

$$\displaystyle\begin{array}{rcl} D_{\chi ^{2}}(p\vert \vert q)& =& D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}\frac{(p(x) - q(x))^{2}} {q(x)} \,dx, {}\\ \end{array}$$

called the Kagan contrast function, see Kagan [48]. In this case \(\alpha = 2f'''(1) + 3 = 3\), and therefore the induced connection is ∇(3). It is worth noting the relation with the minimum chi-squared estimation in the discrete case, see Kass and Vos [49], p.243. In this case the Kagan divergence becomes

$$\displaystyle{D_{\chi ^{2}}(p,q) = \frac{1} {2}\sum _{i=1}^{n}\frac{(p_{i} - q_{i})^{2}} {q_{i}}.}$$

3.6 Exponential Contrast Function

The contrast function associated with the convex function \(f(u) = \frac{1} {2}(\ln u)^{2}\) is

$$\displaystyle{\mathcal{E}(p\vert \vert q) = D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\big(\ln p(x) -\ln q(x)\big)^{2}\,dx.}$$

The induced connection in this case is ∇(−3).

We note that all function candidates of the form f(u) = K(lnu)2k are convex, but the condition f″(1) = 1 is verified only for k = 1, 2 (with appropriate constants K).

3.7 Product Contrast Function with (α, β)-Index

The following 2-parameter family of contrast functions is introduced and studied in Eguchi [40]

$$\displaystyle{D_{\alpha,\beta }(p\vert \vert q) = \frac{2} {(1-\alpha )(1-\beta )}\int \Big\{1 -\Big (\frac{p(x)} {q(x)}\Big)^{\frac{1-\alpha } {2} }\Big\}\Big\{1 -\Big (\frac{p(x)} {q(x)}\Big)^{\frac{1-\beta } {2} }\Big\}\,dx,}$$

and is induced by the function

$$\displaystyle{f_{\alpha,\beta }(u) = \frac{2} {(1-\alpha )(1-\beta )}(1 - u^{\frac{1-\alpha } {2} })(1 - u^{\frac{1-\beta } {2} }).}$$

This connects to the previous contrast functions, see Problem 12.3.

It is worthy to note that the contrast function D α, β (⋅  | | ⋅ ) can be written as the following convex combination of Chernoff informations, see Problem 12.3, part (e).

We end this section with a few suggestive examples. The computations are left as exercises to the reader.

Example 12.3.1

Consider the statistical model \(\mathcal{S} =\{ p_{\mu };\mu \in \mathbb{R}^{k}\}\), where

$$\displaystyle{p_{\mu }(x) = (2\pi )^{-k/2}e^{-\frac{\|x-\mu \|^{2}} {2} },\qquad x \in \mathbb{R}^{k}}$$

is a k-dimensional Gaussian density with σ = 1. Problem 12.4 provides exact formulas for the aforementioned contrast functions in terms of the Euclidean distance \(\|\cdot \|\).

Example 12.3.2 (Exponential Model)

Let \(\mathcal{S} =\{ p_{\xi }\}\), where

$$\displaystyle{p_{\xi } =\xi e^{-\xi x},\qquad \xi > 0,x > 0.}$$

A computation shows

$$\displaystyle\begin{array}{rcl} D_{KL}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{\xi '} {\xi } -\ln \frac{\xi '} {\xi }- 1 {}\\ J(p_{\xi },p_{\xi '})& =& \frac{(\xi '-\xi )^{2}} {2\xi \xi '} {}\\ H^{2}(p_{\xi },p_{\xi '})& =& \frac{4(\sqrt{\xi }-\sqrt{\xi '})^{2}} {\xi +\xi '} {}\\ D^{(\alpha )}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{4} {1 -\alpha ^{2}}\Bigg\{1 - \frac{\xi ^{\frac{1-\alpha } {2} }\xi '^{ \frac{1+\alpha } {2} }} { \frac{1+\alpha } {2} \xi ' + \frac{1-\alpha } {2} \xi }\Bigg\} {}\\ & & {}\\ D_{\chi ^{2}}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{1} {2}\Bigg[ \frac{1} {\Big(2 -\frac{\xi }{\xi '}\Big)\frac{\xi }{\xi '}} - 1\Bigg] {}\\ \mathcal{E}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{1} {2}\Big\{\frac{\xi '} {\xi } -\ln \frac{\xi '} {\xi }- 1\Big\}. {}\\ \end{array}$$

It is worthy to note that all these contrast functions provide the same Riemannian metric on \(\mathcal{S}\) given by \(g_{11} = \frac{1} {\xi ^{2}}\), which is the Fisher information. The induced distance between p ξ and p ξ is a hyperbolic distance, i.e., \(dist(p_{\xi },p_{\xi '}) = \vert \ln \frac{\xi }{\xi '}\vert \).

4 Problems

  1. 12.1.

    Consider the exponential family

    $$\displaystyle{p(x;\xi ) = e^{C(x)+\xi ^{i}F_{ i}(x)-\psi (\xi )},\quad i = 1,\cdots \,,n,}$$

    with ψ(ξ) convex function, and define

    $$\displaystyle{D(\xi _{0}\vert \vert \xi ) =\psi (\xi ) -\psi (\xi _{0}) -\langle \partial \psi (\xi _{0}),\xi -\xi _{0}\rangle.}$$
    1. (a)

      Prove that D(⋅ | | ⋅ ) is a contrast function;

    2. (b)

      Find the dual contrast function D (⋅ | | ⋅ );

    3. (c)

      Prove that the Riemann metric induced by the contrast function D(⋅ | | ⋅ ) is the Fisher–Riemann metric of the exponential family. Find a formula for it using the function ψ(ξ);

    4. (d)

      Find the components of the dual connections ∇(D) and \(\nabla ^{(D^{{\ast}}) }\) induced by the contrast function D(⋅ | | ⋅ );

    5. (e)

      Show that the skewness tensor induced by the contrast function D(⋅ | | ⋅ ) is T ijk (ξ) =  i j k ψ(ξ).

  2. 12.2.

    Prove that the Hellinger distance

    $$\displaystyle{H(p,q) = \sqrt{2\int _{\mathcal{X} } (\sqrt{p(x)} - \sqrt{q(x)} )^{2 } \,dx}}$$

    satisfies the distance axioms.

  3. 12.3.

    Consider the Eguchi contrast function

    $$\displaystyle{D_{\alpha,\beta }(p\vert \vert q) = \frac{2} {(1-\alpha )(1-\beta )}\int \Big\{1 -\Big (\frac{p} {q}\Big)^{\frac{1-\alpha } {2} }\Big\}\Big\{1 -\Big (\frac{p} {q}\Big)^{\frac{1-\beta } {2} }\Big\}\,dx.}$$

    Let H(⋅ , ⋅ ), D (α)(⋅ | | ⋅ ), J(⋅ , ⋅ ), \(\mathcal{E}(\cdot \vert \vert \cdot )\) be the Hellinger distance, the Chernoff information of order α, the Jefferey distance, and the exponential contrast function, respectively. Prove the following relations:

    $$\displaystyle\begin{array}{rcl} & & (a)\quad D_{0,0}(p\vert \vert q) = H^{2}(p,q) {}\\ & & (b)\quad D_{-\alpha,\alpha }(p\vert \vert q) = \frac{1} {2}\big(D^{(\alpha )}(p\vert \vert q) + D^{(-\alpha )}(p\vert \vert q)\big) {}\\ & & (c)\quad \lim _{\alpha \rightarrow 1}D_{-\alpha,\alpha }(p\vert \vert q) = J(p,q) {}\\ & & (d)\quad \lim _{\alpha \rightarrow -1}D_{\alpha,\alpha }(p\vert \vert q) = \mathcal{E}(p\vert \vert q) {}\\ & & (e)\quad D_{\alpha,\beta }(p\vert \vert q) =\lambda _{1}D^{(-\alpha )} +\lambda _{ 2}D^{(-\beta )} +\lambda _{ 3}D^{(\frac{1-\alpha -\beta } {2} )}, {}\\ \end{array}$$


    $$\displaystyle{ \lambda _{1} = \frac{1+\alpha } {2(1-\beta )},\;\lambda _{2} = \frac{1+\beta } {2(1-\alpha )},\;\lambda _{3} = -\frac{(\alpha +\beta )(2 -\alpha -\beta )} {2(1-\alpha )(1-\beta )}, }$$

    and show that \(\lambda _{1} +\lambda _{2} +\lambda _{3} = 1\).

  4. 12.4.

    Consider the statistical model defined by the k-dimensional Gaussian family, \(\mathcal{S} =\{ p_{\mu };\mu \in \mathbb{R}^{k}\}\),

    $$\displaystyle{p_{\mu }(x) = (2\pi )^{-k/2}e^{-\frac{\|x-\mu \|^{2}} {2} },\qquad x \in \mathbb{R}^{k}.}$$

    Prove the following relations:

    $$\displaystyle\begin{array}{rcl} & & (a)\quad D_{KL}(p_{\mu }\vert \vert p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2} {}\\ & & (b)\quad J(p_{\mu },p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2} {}\\ & & (c)\quad H^{2}(p_{\mu },p_{\mu '}) = 4\Big[1 - e^{-\frac{\|\mu -\mu '\|^{2}} {8} }\Big] {}\\ & & (d)\quad D^{(\alpha )}(p_{\mu }\vert \vert p_{\mu '}) = \frac{4} {1 -\alpha ^{2}}\Big[1 - e^{-\frac{1-\alpha ^{2}} {8} \|\mu -\mu '\|^{2} }\Big] {}\\ & & (e)\quad \mathcal{E}(p_{\mu }\vert \vert p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2}\Big[1 + \frac{1} {4}\|\mu -\mu '\|^{2}\Big], {}\\ \end{array}$$

    where \(\|\cdot \|\) denotes the Euclidean norm on \(\mathbb{R}^{k}\).

  5. 12.5.

    Let D f ( ⋅  | | ⋅ ) be the f-divergence. Prove the following convexity property

    $$\displaystyle\begin{array}{rcl} D_{f}\Big(\lambda p_{1}+(1-\lambda )p_{2}\vert \vert \lambda q_{1}+(1-\lambda )q_{2}\Big)& \leq &\lambda D_{f}(p_{1}\vert \vert q_{1}) {}\\ & & +(1-\lambda )D_{f}(p_{2}\vert \vert q_{2}), {}\\ \end{array}$$

    \(\forall \lambda \in [0,1]\) and p 1, p 2, q 1, q 2 distribution functions.

  6. 12.6.

    Prove the formulas for the contrast function in the case of the exponential distribution presented by Example 12.3.2.

  7. 12.7.

    Consider the normal distributions \(p(x) = \frac{1} {\sqrt{2\pi }\sigma _{1}} e^{-\frac{(x-\mu _{1})^{2}} {2\sigma _{1}^{2}} }\) and \(q(x) = \frac{1} {\sqrt{2\pi }\sigma _{2}} e^{-\frac{(x-\mu _{2})^{2}} {2\sigma _{2}^{2}} }\).

    1. (a)

      Show that

      $$\displaystyle{\int _{-\infty }^{\infty }\sqrt{p(x)q(x)}\,dx = \sqrt{ \frac{2\sigma _{1 } \sigma _{2 } } {\sigma _{1}^{2} +\sigma _{ 2}^{2}}}e^{A - B},}$$


      $$\displaystyle{A = \frac{\Big( \frac{\mu _{1}} {2\sigma _{1}^{2}} + \frac{\mu _{2}} {2\sigma _{2}^{2}} \Big)^{2}} { \frac{1} {\sigma _{1}^{2}} + \frac{1} {\sigma _{2}^{2}} },\quad B = \frac{\mu _{1}^{2}} {4\sigma _{1}^{2}} + \frac{\mu _{2}^{2}} {4\sigma _{2}^{2}}.}$$
    2. (b)

      Find the Hellinger distance H(p, q).

  8. 12.8.

    Find the Hellinger distance between two gamma distributions.

  9. 12.9.

    Consider two exponential distributions, \(p(x) = ae^{-ax}\) and \(q(x) = be^{-bx}\), x ≥ 0. Show that the Chernoff information of order α is

    $$\displaystyle{D^{\alpha }(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 - \frac{2a^{\frac{1-\alpha } {2} }b^{\frac{1+\alpha } {2} }} {a(1-\alpha ) + b(1+\alpha )}\Big\},\quad \alpha \not = \pm 1.}$$
  10. 12.10.

    Consider the normal distributions \(p(x) = \frac{1} {\sqrt{2\pi }\sigma _{1}} e^{-\frac{(x-\mu _{1})^{2}} {2\sigma _{1}^{2}} }\) and \(q(x) = \frac{1} {\sqrt{2\pi }\sigma _{2}} e^{-\frac{(x-\mu _{2})^{2}} {2\sigma _{2}^{2}} }\). Show that the Chernoff information of order α is

    $$\displaystyle{D^{\alpha }(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 - A\sqrt{ \frac{\pi } {a}}e^{\frac{b^{2}} {4a}-c}\Big\},\qquad \vert \alpha \vert < 1,}$$


    $$\displaystyle\begin{array}{rcl} a& =& \frac{1-\alpha } {4\sigma _{1}^{2}} + \frac{1+\alpha } {4\sigma _{2}^{2}} {}\\ b& =& \frac{\mu _{1}(1-\alpha )} {2\sigma _{1}^{2}} + \frac{\mu _{2}(1+\alpha )} {2\sigma _{2}^{2}} {}\\ c& =& \frac{\mu _{1}^{2}(1-\alpha )} {4\sigma _{1}^{2}} + \frac{\mu _{2}^{2}(1+\alpha )} {4\sigma _{2}^{2}}. {}\\ \end{array}$$
  11. 12.11.

    The Chernoff information of order α for discrete distributions (p n ) and (q n ) is given by

    $$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 -\sum _{n\geq 0}p_{n}^{\frac{1-\alpha } {2} }q_{n}^{\frac{1+\alpha } {2} }\Big\}.}$$

    Let \(p_{n} = \frac{\lambda _{1}^{n}} {n!} e^{-\lambda _{1}}\) and \(q_{n} = \frac{\lambda _{2}^{n}} {n!} e^{-\lambda _{2}}\) be two Poisson distributions.

    1. (a)

      Show that

      $$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1-\alpha ^{2}}\Big\{1-e^{\lambda _{1}^{(1-\alpha )/2}\lambda _{ 2}^{(1+\alpha )/2}-\lambda _{ 1}(1-\alpha )/2-\lambda _{2}(1+\alpha )/2}\Big\}.}$$
    2. (b)

      Show that the square of the Hellinger distance is given by

      $$\displaystyle{H^{2}(p,q) = 4\{1 - e^{\sqrt{\lambda _{1 } \lambda _{2}} -\frac{\lambda _{1}+\lambda _{2}} {2} }\}.}$$