Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter deals with some important examples of contrastfunctions on a space of density functions, such as: Bregman divergence, Kullback–Leibler relative entropy, f-divergence, Hellinger distance, Chernoff information, Jefferey distance, Kagan divergence, and exponential contrast function. The relation with the skewness tensor and α-connection is made. The goal of this chapter is to produce hands-on examples for the theoretical concepts introduced in Chap. 11.

1 A First Example

We start with a suggestive example of Bregman divergence. We show that the Kullback–Leibler relative entropy on a statistical model is a particular example of Bregman divergence.

Let \(\mathcal{S} = \mathcal{P}(\mathcal{X})\), where \(\mathcal{X} =\{ x^{1},\ldots,x^{n+1}\}\) and consider the global chart \(\phi: \mathcal{S}\rightarrow \mathbb{E} \subset \mathbb{R}^{n}\)

$$\displaystyle{\phi (p) = (\ln p_{1},\ldots,\ln p_{n}) = (\xi ^{1},\ldots,\xi ^{n}),}$$

with the parameter space

$$\displaystyle{\mathbb{E} =\{ (\xi ^{1},\ldots,\xi ^{n});\xi ^{k} > 0,\sum _{ k=1}^{n}\xi ^{k} < 1\}.}$$

The contrast function on \(\mathcal{S}\) is then given by

$$\displaystyle\begin{array}{rcl} D_{\mathcal{S}}(p\vert \vert q)& =& D_{\mathcal{S}}\big(\phi ^{-1}(p)\vert \vert \phi ^{-1}(q)\big) {}\\ & =& D(\xi _{1}\vert \vert \xi _{2}), {}\\ \end{array}$$

where D(⋅  | | ⋅ ) is the Bregman divergence on \(\mathbb{E}\) induced by the convex function \(\varphi (\xi ) =\sum _{ i=1}^{n}e^{\xi _{i}}\), i.e.,

$$\displaystyle{D(\xi _{1}\vert \vert \xi _{2}) =\varphi (\xi _{2}) -\varphi (\xi _{1}) -\sum _{i=1}^{n}\partial _{ i}\varphi (\xi _{1})(\xi _{2}^{i} -\xi _{ 1}^{i}).}$$

Therefore

$$\displaystyle\begin{array}{rcl} D_{\mathcal{S}}(p\vert \vert q)& =& D(\xi _{1}\vert \vert \xi _{2}) {}\\ & =& \sum _{i}e^{\xi _{2}^{i} } -\sum _{i}e^{\xi _{1}^{i} } -\sum _{i}e^{\xi _{1}^{i} }(\xi _{2}^{i} -\xi _{ 1}^{i}) {}\\ & =& \sum _{i}p_{i} -\sum _{i}q_{i} -\sum _{i}p_{i}\ln \frac{q_{i}} {p_{i}} {}\\ & =& \sum _{i}p_{i}\ln \frac{p_{i}} {q_{i}} = D_{KL}(p\vert \vert q). {}\\ \end{array}$$

Hence, the induced contrast function \(D_{\mathcal{S}}\) on \(\mathcal{P}(\mathcal{X})\) in this case is the Kullback–Leibler relative entropy.

2 f-Divergence

An important class of contrast functions on statistical models was introduced by Csiszár [31, 32]. Let \(f: (0,\infty ) \rightarrow \mathbb{R}\) be a function satisfying the following conditions

  1. (a)

    f is convex;

  2. (b)

    f(1) = 0; 

  3. (c)

    f″(1) = 1.

For each probability distributions p, q, consider

$$\displaystyle{ D_{f}(p\vert \vert q) = E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] =\int _{\mathcal{X}}p(x)f\Big(\frac{q(x)} {p(x)}\Big)\,dx. }$$
(12.2.1)

We shall assume that the previous integral converges and we can differentiate under the integral sign.

Proposition 12.2.1

The operator D f (⋅ || ⋅) is a contrast function on the statistical model \(\mathcal{S} =\{ p_{\xi }\}\) .

Proof:

We check the properties of a contrast function.

  1. (i)

    positive: Jensen’s inequality applied to the convex function f provides

    $$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] \geq f\Big(E_{p}\Big[\frac{q(x)} {p(x)}\Big]\Big) {}\\ & =& f\Big(\int _{\mathcal{X}}p(x)\frac{q(x)} {p(x)}\,dx\Big) = f(1) = 0. {}\\ \end{array}$$
  2. (ii)

    non-degenerate: Let p ≠ q. Since f is strictly convex at 1, then

    $$\displaystyle{D_{f}(p\vert \vert q) = E_{p}\Big[f\Big(\frac{q(x)} {p(x)}\Big)\Big] > f\Big(E_{p}\Big[\frac{q(x)} {p(x)}\Big]\Big) = f(1) = 0,}$$

    and hence D(p | | q) ≠ 0, which implies the non-degenerateness.

  3. (iii)

    The vanishing property of the first variation along the diagonal {ξ 1 = ξ 2} is a consequence of (i) and (ii).

  4. (iv)

    Let \(p = p_{{\xi _{ 0}}}\) and \(q = p_{{\xi }}\). We shall compute the Hessian of

    $$\displaystyle{ D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }}) =\int _{\mathcal{X}}p_{{\xi _{ 0}}}(x)f\Big( \frac{p_{{\xi }}(x)} {p_{{\xi _{ 0}}}(x)}\Big)\,dx }$$
    (12.2.2)

    along the diagonal ξ 0 = ξ. Differentiating we have

    $$\displaystyle\begin{array}{rcl} \partial _{\xi ^{j}}f\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{1} {p_{{\xi _{ 0}}}} \partial _{\xi ^{j}}p_{{\xi }} {}\\ \partial _{\xi ^{i}}\partial _{\xi ^{j}}f\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)^{2}\partial _{\xi ^{ i}}(\ln p_{{\xi }})\partial _{\xi ^{j}}(\ln p_{{\xi }}) {}\\ & & +f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{1} {p_{{\xi }}}\partial _{\xi ^{i}}\partial _{\xi ^{j}}p_{{\xi }}. {}\\ \end{array}$$

    Differentiating under the integral we get

    $$\displaystyle\begin{array}{rcl} \partial _{\xi ^{i}}\partial _{\xi ^{j}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}& =& f''(1)\int p_{{\xi _{ 0}}}\partial _{\xi ^{i}}\ln p_{{\xi }}\,\partial _{\xi ^{j}}\ln p_{\xi }\,dx_{\vert \xi =\xi _{0}} {}\\ & & +f'(1)\partial _{\xi ^{i}}\partial _{\xi ^{j}}\int p_{{\xi }}(x)\,dx {}\\ & =& f''(1)E_{\xi }[\partial _{\xi ^{i}}\ell(\xi )\partial _{\xi ^{j}}\ell(\xi )] {}\\ & =& E_{\xi }[(\partial _{\xi ^{i}}\ell)(\partial _{\xi ^{j}}\ell)] = g_{ij}(\xi ), {}\\ \end{array}$$

    which is strictly positive definite, since it is the Fisher–Riemann information matrix. Hence D f (⋅  | | ⋅ ) is a contrast function.

 ■ 

Theorem 12.2.2

The Riemannian metric induced by the contrast function D f (⋅ || ⋅) on the statistical model \(\mathcal{S} =\{ p_{{\xi }}\}\) is the Fisher–Riemann information matrix

$$\displaystyle{g_{ij}(\xi ) = \partial _{\xi ^{i}}\partial _{\xi ^{j}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}.}$$

Proof:

It follows from the calculation performed in the part (iv) above. ■ 

Let \(f^{{\ast}}(u) = uf\Big(\frac{1} {u}\Big)\). Since

$$\displaystyle\begin{array}{rcl} f^{{\ast}}(1)& =& f(1) = 0 {}\\ f^{{\ast\prime\prime}}(u)& =& \frac{1} {u^{3}}f''\Big(\frac{1} {u}\Big) \geq 0 {}\\ f^{{\ast\prime\prime}}(1)& =& f^{\prime\prime}(1) = 1, {}\\ \end{array}$$

then f satisfies properties (a)–(c), and hence \(D_{f^{{\ast}}}(\cdot \,\vert \vert \,\cdot )\) is a contrast function, which defines the same Riemannian metric as D f (⋅  | | ⋅ ).

Proposition 12.2.3

The contrast function \(D_{f^{{\ast}}}(\cdot \,\vert \vert \,\cdot )\) is the dual of D f (⋅ || ⋅).

Proof:

Consider the dual D f (p | | q) = D f (q | | p). Then we have

$$\displaystyle\begin{array}{rcl} D_{f^{{\ast}}}(p\vert \vert q)& =& \int _{\mathcal{X}}p(x)f^{{\ast}}\Big(\frac{q(x)} {p(x)}\Big)\,dx {}\\ & =& \int _{\mathcal{X}}p(x)\frac{q(x)} {p(x)}f\Big(\frac{p(x)} {q(x)}\Big)\,dx {}\\ & =& \int _{\mathcal{X}}q(x)f\Big(\frac{p(x)} {q(x)}\Big)\,dx {}\\ & =& D_{f}(q\vert \vert p) = D_{f}^{{\ast}}(p\vert \vert q),\qquad \forall p,q \in \mathcal{S}. {}\\ \end{array}$$

Therefore \(D_{f^{{\ast}}} = D_{f}^{{\ast}}\). ■ 

In the following we shall find the induced connections. Let ∇(f) be the linear connection induced by the contrast function D f (⋅  | | ⋅ ), and denote by Γ ij, k (f) its components on a local basis.

Proposition 12.2.4

We have

$$\displaystyle{ \varGamma _{ij,k}^{(f)}(\xi ) = E_{\xi }\Big[\big(\partial _{ i}\partial _{j}\ell - (f'''(1) + 1)\partial _{i}\partial _{j}\ell\big)\partial _{k}\ell\Big]. }$$
(12.2.3)

Proof:

From formula (11.5.18) we find

$$\displaystyle{ \varGamma _{ij,k}^{(f)}(\xi ) = -\partial _{\xi _{ 0}^{i}}\partial _{\xi _{0}^{j}}\partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}. }$$
(12.2.4)

We shall compute the derivatives on the right side. Differentiating in (12.2.2) yields

$$\displaystyle{ \partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }}) =\int _{\mathcal{X}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)p_{{\xi }}\partial _{\xi ^{k}}\ell(\xi )\,dx. }$$
(12.2.5)

Before continuing the computation we note that

$$\displaystyle\begin{array}{rcl} \partial _{\xi _{0}^{j}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\Big(\frac{-p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & {}\\ \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}f'\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big)& =& f'''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}^{2}} {p_{{\xi _{ 0}}}^{2}}\partial _{\xi _{0}^{i}}\ell(\xi _{0})\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & +f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \partial _{\xi _{0}^{i}}\ell(\xi _{0})\partial _{\xi _{0}^{j}}\ell(\xi _{0}) {}\\ & & -f''\Big( \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \Big) \frac{p_{{\xi }}} {p_{{\xi _{ 0}}}} \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\ell(\xi _{0}). {}\\ \end{array}$$

Applying now \(\partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\) to (12.2.5), using the foregoing formulas, and taking ξ 0 = ξ, yields

$$\displaystyle\begin{array}{rcl} \partial _{\xi _{0}^{i}}\partial _{\xi _{0}^{j}}\partial _{\xi ^{k}}D_{f}(p_{{\xi _{ 0}}}\vert \vert p_{{\xi }})_{\vert \xi =\xi _{0}}& =& \int _{\mathcal{X}}\Big[(f'''(1) + f''(1))p_{{\xi }}(\partial _{\xi ^{i}}\ell)(\partial _{\xi ^{j}}\ell)(\partial _{\xi ^{k}}\ell) {}\\ & & -f''(1)(\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell)(\partial _{\xi ^{k}}\ell)\Big]\,dx {}\\ & =& E_{\xi }\Big[\big(\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell - (f'''(1) + 1)\partial _{\xi ^{i}}\partial _{\xi ^{j}}\ell\big)\partial _{\xi ^{k}}\ell\Big]. {}\\ \end{array}$$

Applying (12.2.4) we arrive at (12.2.3). ■ 

The relation with the geometry of α-connections is given below.

Theorem 12.2.5

The connection induced by D f (⋅ || ⋅) is an α-connection

$$\displaystyle{\nabla ^{(f)} = \nabla ^{(\alpha )},}$$

with \(\alpha = 2f'''(1) + 3\) .

Proof:

It suffices to show the identity in local coordinates. Recall first the components of the α-connection given by (1.11.34)

$$\displaystyle{ \varGamma _{ij,k}^{(\alpha )} = E_{\xi }\Big[\Big(\partial _{ i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big]. }$$
(12.2.6)

Comparing with (12.2.3) we see that Γ ij, k (f) = Γ ij, k (α) if and only if \(\alpha = 2f'''(1) + 3\). ■ 

We make the remark that \(\nabla ^{(f^{{\ast}}) } = \nabla ^{(-\alpha )}\), which follows from the properties of dual connections induced by contrast functions. We shall show shortly that for any α there is a function f satisfying (a)–(c) and solving the equation \(\alpha = 2f'''(1) + 3\).

Proposition 12.2.6

The skewness tensor induced by the contrast function D f (⋅ || ⋅) is given in local coordinates by

$$\displaystyle{T_{ijk}^{(f)} = (2f'''(1) + 3)E_{\xi }[(\partial _{ i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)].}$$

Proof:

Using Theorem 12.2.5, formula (12.2.6) and the aforementioned remarks, we have

$$\displaystyle\begin{array}{rcl} T_{ijk}^{(f)}& =& \varGamma _{ ijk}^{(f^{{\ast}}) } -\varGamma _{ijk}^{(f)} =\varGamma _{ ijk}^{(-\alpha )} -\varGamma _{ ijk}^{(\alpha )} {}\\ & =& E_{\xi }\Big[\Big(\partial _{i}\partial _{j}\ell + \frac{1+\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big] {}\\ & & -E_{\xi }\Big[\Big(\partial _{i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big] {}\\ & =& \alpha E_{\xi }[(\partial _{i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)] {}\\ & =& (2f'''(1) + 3)E_{\xi }[(\partial _{i}\ell)(\partial _{j}\ell)(\partial _{k}\ell)]. {}\\ \end{array}$$

 ■ 

3 Particular Cases

This section presents a few classical examples of contrast functions as particular examples of D f (⋅  | | ⋅ ). These are constructed by choosing several examples of functions f that satisfy conditions (a)–(c) and verify the equation \(\alpha = 2f'''(1) + 3\). We make the remark that if f is such a function, then \(f_{c}(u) = f(u) + c(u - 1)\), \(c \in \mathbb{R}\), is also a function that induces the same contrast function, \(D_{f_{c}} = D_{f}\). Therefore, the correspondence between functions f and contrast functions is not one-to-one.

3.1 Hellinger Distance

Consider \(f(u) = 4(1 -\sqrt{u})\) and the associated contrast function

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& 4\int _{\mathcal{X}}p(x)\Big(1 -\sqrt{\frac{q(x)} {p(x)}}\Big)dx = 4\Big(1 -\int _{\mathcal{X}}\sqrt{p(x)q(x)}\,dx\Big) {}\\ & =& 2\Big(2 -\int _{\mathcal{X}}2\sqrt{p(x)q(x)}\,dx\Big) {}\\ & =& 2\int _{\mathcal{X}}\Big(p(x) - 2\sqrt{p(x)q(x)} + q(x)\Big)\,dx {}\\ & =& 2\int _{\mathcal{X}}\big(\sqrt{p(x)} -\sqrt{q(x)}\big)^{2}\,dx {}\\ & =& H^{2}(p,q). {}\\ \end{array}$$

H(p, q) is called the Hellinger distance, and is a true distance on the statistical model \(\mathcal{S} =\{ p_{{\xi }}\}\). Since in this case \(\alpha = 2f'''(1) + 3 = 0\), the linear connection induced by H 2(p, q) is exactly the Levi–Civita connection, ∇(0), on the Riemannian manifold \((\mathcal{S},g)\).

Example 12.3.1

Consider two exponential distributions, \(p(x) =\alpha e^{-\alpha x}\) and \(q(x) =\beta e^{-\beta x}\), x ≥ 0, α, β > 0. Then

$$\displaystyle\begin{array}{rcl} H^{2}(p,q)& =& 4 - 4\int _{ 0}^{\infty }\sqrt{p(x)q(x)}\,dx {}\\ & =& 4 - 4\sqrt{\alpha \beta }\int _{0}^{\infty }e^{-\frac{\alpha +\beta } {2} x}\,dx {}\\ & =& 4 -\frac{8\sqrt{\alpha \beta }} {\alpha +\beta }, {}\\ \end{array}$$

hence the Hellinger distance is \(H(p,q) = 2\sqrt{1 - \frac{2\sqrt{ \alpha \beta } } {\alpha +\beta }}\).

The Hellinger distance can also be defined between two discrete distributions p = (p k ) and q = (q k ), replacing the integral by a sum

$$\displaystyle{H(p,q) = 2\Big(1 -\sum _{k\geq 0}\sqrt{p_{k } q_{k}}\Big)^{1/2} =\Big (2\sum _{ k\geq 0}\big(\sqrt{p_{k}} -\sqrt{q_{k}}\big)^{2}\Big)^{1/2}.}$$

Example 12.3.2

Consider two Poisson distributions, \(p_{k} = \frac{\alpha ^{k}} {k!}e^{-\alpha }\) and \(q_{k} = \frac{\beta ^{k}} {k!}e^{-\beta }\), k ≥ 0. Then

$$\displaystyle\begin{array}{rcl} \sum _{k\geq 0}\sqrt{p_{ k}q_{k}}& =& \sum _{k\geq 0}\frac{(\sqrt{\alpha \beta })^{k}} {k!} e^{-\frac{\alpha +\beta } {2} } {}\\ & =& e^{-\frac{\alpha +\beta } {2} }e^{\sqrt{\alpha \beta }}\sum _{k\geq 0}\frac{(\sqrt{\alpha \beta })^{k}} {k!} e^{-\sqrt{\alpha \beta }} {}\\ & =& e^{\sqrt{\alpha \beta }-\frac{\alpha +\beta } {2} }. {}\\ \end{array}$$

Hence, the Hellinger distance becomes

$$\displaystyle{H(p,q) = 2\Big(1 -\sum _{k\geq 0}\sqrt{p_{k } q_{k}}\Big)^{1/2} = 2\sqrt{1 - e^{\sqrt{ \alpha \beta } -\frac{\alpha +\beta } {2} }}.}$$

3.2 Kullback–Leibler Relative Entropy

The contrast function associated with function \(f(u) = -\ln u\) is given by

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& \int _{\mathcal{X}}p(x)\ln \frac{p(x)} {q(x)}\,dx = D_{KL}(p\vert \vert q), {}\\ \end{array}$$

which is the Kullback–Leibler information or the relative entropy. In this case \(\alpha = 2f'''(1) + 3 = -1\), so the associated connection is ∇(−1).

It is worthy to note that the convex function f(u) = ulnu induces the contrast function

$$\displaystyle\begin{array}{rcl} D_{f}(p\vert \vert q)& =& \int _{\mathcal{X}}q(x)\ln \frac{q(x)} {p(x)}\,dx = D_{KL}(q\vert \vert p) = D_{KL}^{{\ast}}(p\vert \vert q), {}\\ \end{array}$$

which is the dual of the Kullback–Leibler information, see [51, 53]. Since \(\alpha = 2f'''(1) + 3 = 1\), the induced connection is ∇(1).

3.3 Chernoff Information of Order α

The convex function

$$\displaystyle{f^{(\alpha )} = \frac{1} {1 -\alpha ^{2}}(1 - u^{\frac{1+\alpha } {2} }),\qquad \alpha \not = \pm 1}$$

induces the contrast function

$$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 -\int _{\mathcal{X}}p(x)^{\frac{1-\alpha } {2} }q(x)^{\frac{1+\alpha } {2} }\,dx\Big\},}$$

see Chernoff [27]. For the computation of D (α) in the case of exponential, normal and Poisson distributions, see Problems 12.9., 12.10. and 12.11. We note that for α = 0 we retrieve the squared Hellinger distance, D (0)(p | | q) = H 2(p, q).

3.4 Jeffrey Distance

The function \(f(u) = \frac{1} {2}(u - 1)\ln u\) induces the contrast function

$$\displaystyle\begin{array}{rcl} J(p,q)& =& D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\ln \frac{p(x)} {q(x)}\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}\big(p(x) - q(x)\big)\big(\ln p(x) -\ln q(x)\big)\,dx, {}\\ \end{array}$$

see Jeffrey [47]. A computation shows that α = 0, so the induced connection is the Levi–Civita connection ∇(0). In fact, the Jeffrey contrast function is the same as the symmetric Kullback–Leibler relative entropy

$$\displaystyle\begin{array}{rcl} J(p,q)& =& \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\ln \frac{p(x)} {q(x)}\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}p(x)\ln \frac{p(x)} {q(x)} + \frac{1} {2}\int _{\mathcal{X}}q(x)\ln \frac{q(x)} {p(x)}\,dx {}\\ & =& \frac{1} {2}\Big(D_{KL}(p\vert \vert q) + D_{KL}(q\vert \vert p)\Big). {}\\ \end{array}$$

3.5 Kagan Divergence

Choosing \(f(u) = \frac{1} {2}(1 - u)^{2}\) yields

$$\displaystyle\begin{array}{rcl} D_{\chi ^{2}}(p\vert \vert q)& =& D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\Big(1 -\frac{q(x)} {p(x)}\Big)\,dx {}\\ & =& \frac{1} {2}\int _{\mathcal{X}}\frac{(p(x) - q(x))^{2}} {q(x)} \,dx, {}\\ \end{array}$$

called the Kagan contrast function, see Kagan [48]. In this case \(\alpha = 2f'''(1) + 3 = 3\), and therefore the induced connection is ∇(3). It is worth noting the relation with the minimum chi-squared estimation in the discrete case, see Kass and Vos [49], p.243. In this case the Kagan divergence becomes

$$\displaystyle{D_{\chi ^{2}}(p,q) = \frac{1} {2}\sum _{i=1}^{n}\frac{(p_{i} - q_{i})^{2}} {q_{i}}.}$$

3.6 Exponential Contrast Function

The contrast function associated with the convex function \(f(u) = \frac{1} {2}(\ln u)^{2}\) is

$$\displaystyle{\mathcal{E}(p\vert \vert q) = D_{f}(p\vert \vert q) = \frac{1} {2}\int _{\mathcal{X}}p(x)\big(\ln p(x) -\ln q(x)\big)^{2}\,dx.}$$

The induced connection in this case is ∇(−3).

We note that all function candidates of the form f(u) = K(lnu)2k are convex, but the condition f″(1) = 1 is verified only for k = 1, 2 (with appropriate constants K).

3.7 Product Contrast Function with (α, β)-Index

The following 2-parameter family of contrast functions is introduced and studied in Eguchi [40]

$$\displaystyle{D_{\alpha,\beta }(p\vert \vert q) = \frac{2} {(1-\alpha )(1-\beta )}\int \Big\{1 -\Big (\frac{p(x)} {q(x)}\Big)^{\frac{1-\alpha } {2} }\Big\}\Big\{1 -\Big (\frac{p(x)} {q(x)}\Big)^{\frac{1-\beta } {2} }\Big\}\,dx,}$$

and is induced by the function

$$\displaystyle{f_{\alpha,\beta }(u) = \frac{2} {(1-\alpha )(1-\beta )}(1 - u^{\frac{1-\alpha } {2} })(1 - u^{\frac{1-\beta } {2} }).}$$

This connects to the previous contrast functions, see Problem 12.3.

It is worthy to note that the contrast function D α, β (⋅  | | ⋅ ) can be written as the following convex combination of Chernoff informations, see Problem 12.3, part (e).

We end this section with a few suggestive examples. The computations are left as exercises to the reader.

Example 12.3.1

Consider the statistical model \(\mathcal{S} =\{ p_{\mu };\mu \in \mathbb{R}^{k}\}\), where

$$\displaystyle{p_{\mu }(x) = (2\pi )^{-k/2}e^{-\frac{\|x-\mu \|^{2}} {2} },\qquad x \in \mathbb{R}^{k}}$$

is a k-dimensional Gaussian density with σ = 1. Problem 12.4 provides exact formulas for the aforementioned contrast functions in terms of the Euclidean distance \(\|\cdot \|\).

Example 12.3.2 (Exponential Model)

Let \(\mathcal{S} =\{ p_{\xi }\}\), where

$$\displaystyle{p_{\xi } =\xi e^{-\xi x},\qquad \xi > 0,x > 0.}$$

A computation shows

$$\displaystyle\begin{array}{rcl} D_{KL}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{\xi '} {\xi } -\ln \frac{\xi '} {\xi }- 1 {}\\ J(p_{\xi },p_{\xi '})& =& \frac{(\xi '-\xi )^{2}} {2\xi \xi '} {}\\ H^{2}(p_{\xi },p_{\xi '})& =& \frac{4(\sqrt{\xi }-\sqrt{\xi '})^{2}} {\xi +\xi '} {}\\ D^{(\alpha )}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{4} {1 -\alpha ^{2}}\Bigg\{1 - \frac{\xi ^{\frac{1-\alpha } {2} }\xi '^{ \frac{1+\alpha } {2} }} { \frac{1+\alpha } {2} \xi ' + \frac{1-\alpha } {2} \xi }\Bigg\} {}\\ & & {}\\ D_{\chi ^{2}}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{1} {2}\Bigg[ \frac{1} {\Big(2 -\frac{\xi }{\xi '}\Big)\frac{\xi }{\xi '}} - 1\Bigg] {}\\ \mathcal{E}(p_{\xi }\vert \vert p_{\xi '})& =& \frac{1} {2}\Big\{\frac{\xi '} {\xi } -\ln \frac{\xi '} {\xi }- 1\Big\}. {}\\ \end{array}$$

It is worthy to note that all these contrast functions provide the same Riemannian metric on \(\mathcal{S}\) given by \(g_{11} = \frac{1} {\xi ^{2}}\), which is the Fisher information. The induced distance between p ξ and p ξ is a hyperbolic distance, i.e., \(dist(p_{\xi },p_{\xi '}) = \vert \ln \frac{\xi }{\xi '}\vert \).

4 Problems

  1. 12.1.

    Consider the exponential family

    $$\displaystyle{p(x;\xi ) = e^{C(x)+\xi ^{i}F_{ i}(x)-\psi (\xi )},\quad i = 1,\cdots \,,n,}$$

    with ψ(ξ) convex function, and define

    $$\displaystyle{D(\xi _{0}\vert \vert \xi ) =\psi (\xi ) -\psi (\xi _{0}) -\langle \partial \psi (\xi _{0}),\xi -\xi _{0}\rangle.}$$
    1. (a)

      Prove that D(⋅ | | ⋅ ) is a contrast function;

    2. (b)

      Find the dual contrast function D (⋅ | | ⋅ );

    3. (c)

      Prove that the Riemann metric induced by the contrast function D(⋅ | | ⋅ ) is the Fisher–Riemann metric of the exponential family. Find a formula for it using the function ψ(ξ);

    4. (d)

      Find the components of the dual connections ∇(D) and \(\nabla ^{(D^{{\ast}}) }\) induced by the contrast function D(⋅ | | ⋅ );

    5. (e)

      Show that the skewness tensor induced by the contrast function D(⋅ | | ⋅ ) is T ijk (ξ) =  i j k ψ(ξ).

  2. 12.2.

    Prove that the Hellinger distance

    $$\displaystyle{H(p,q) = \sqrt{2\int _{\mathcal{X} } (\sqrt{p(x)} - \sqrt{q(x)} )^{2 } \,dx}}$$

    satisfies the distance axioms.

  3. 12.3.

    Consider the Eguchi contrast function

    $$\displaystyle{D_{\alpha,\beta }(p\vert \vert q) = \frac{2} {(1-\alpha )(1-\beta )}\int \Big\{1 -\Big (\frac{p} {q}\Big)^{\frac{1-\alpha } {2} }\Big\}\Big\{1 -\Big (\frac{p} {q}\Big)^{\frac{1-\beta } {2} }\Big\}\,dx.}$$

    Let H(⋅ , ⋅ ), D (α)(⋅ | | ⋅ ), J(⋅ , ⋅ ), \(\mathcal{E}(\cdot \vert \vert \cdot )\) be the Hellinger distance, the Chernoff information of order α, the Jefferey distance, and the exponential contrast function, respectively. Prove the following relations:

    $$\displaystyle\begin{array}{rcl} & & (a)\quad D_{0,0}(p\vert \vert q) = H^{2}(p,q) {}\\ & & (b)\quad D_{-\alpha,\alpha }(p\vert \vert q) = \frac{1} {2}\big(D^{(\alpha )}(p\vert \vert q) + D^{(-\alpha )}(p\vert \vert q)\big) {}\\ & & (c)\quad \lim _{\alpha \rightarrow 1}D_{-\alpha,\alpha }(p\vert \vert q) = J(p,q) {}\\ & & (d)\quad \lim _{\alpha \rightarrow -1}D_{\alpha,\alpha }(p\vert \vert q) = \mathcal{E}(p\vert \vert q) {}\\ & & (e)\quad D_{\alpha,\beta }(p\vert \vert q) =\lambda _{1}D^{(-\alpha )} +\lambda _{ 2}D^{(-\beta )} +\lambda _{ 3}D^{(\frac{1-\alpha -\beta } {2} )}, {}\\ \end{array}$$

    where

    $$\displaystyle{ \lambda _{1} = \frac{1+\alpha } {2(1-\beta )},\;\lambda _{2} = \frac{1+\beta } {2(1-\alpha )},\;\lambda _{3} = -\frac{(\alpha +\beta )(2 -\alpha -\beta )} {2(1-\alpha )(1-\beta )}, }$$

    and show that \(\lambda _{1} +\lambda _{2} +\lambda _{3} = 1\).

  4. 12.4.

    Consider the statistical model defined by the k-dimensional Gaussian family, \(\mathcal{S} =\{ p_{\mu };\mu \in \mathbb{R}^{k}\}\),

    $$\displaystyle{p_{\mu }(x) = (2\pi )^{-k/2}e^{-\frac{\|x-\mu \|^{2}} {2} },\qquad x \in \mathbb{R}^{k}.}$$

    Prove the following relations:

    $$\displaystyle\begin{array}{rcl} & & (a)\quad D_{KL}(p_{\mu }\vert \vert p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2} {}\\ & & (b)\quad J(p_{\mu },p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2} {}\\ & & (c)\quad H^{2}(p_{\mu },p_{\mu '}) = 4\Big[1 - e^{-\frac{\|\mu -\mu '\|^{2}} {8} }\Big] {}\\ & & (d)\quad D^{(\alpha )}(p_{\mu }\vert \vert p_{\mu '}) = \frac{4} {1 -\alpha ^{2}}\Big[1 - e^{-\frac{1-\alpha ^{2}} {8} \|\mu -\mu '\|^{2} }\Big] {}\\ & & (e)\quad \mathcal{E}(p_{\mu }\vert \vert p_{\mu '}) = \frac{1} {2}\|\mu -\mu '\|^{2}\Big[1 + \frac{1} {4}\|\mu -\mu '\|^{2}\Big], {}\\ \end{array}$$

    where \(\|\cdot \|\) denotes the Euclidean norm on \(\mathbb{R}^{k}\).

  5. 12.5.

    Let D f ( ⋅  | | ⋅ ) be the f-divergence. Prove the following convexity property

    $$\displaystyle\begin{array}{rcl} D_{f}\Big(\lambda p_{1}+(1-\lambda )p_{2}\vert \vert \lambda q_{1}+(1-\lambda )q_{2}\Big)& \leq &\lambda D_{f}(p_{1}\vert \vert q_{1}) {}\\ & & +(1-\lambda )D_{f}(p_{2}\vert \vert q_{2}), {}\\ \end{array}$$

    \(\forall \lambda \in [0,1]\) and p 1, p 2, q 1, q 2 distribution functions.

  6. 12.6.

    Prove the formulas for the contrast function in the case of the exponential distribution presented by Example 12.3.2.

  7. 12.7.

    Consider the normal distributions \(p(x) = \frac{1} {\sqrt{2\pi }\sigma _{1}} e^{-\frac{(x-\mu _{1})^{2}} {2\sigma _{1}^{2}} }\) and \(q(x) = \frac{1} {\sqrt{2\pi }\sigma _{2}} e^{-\frac{(x-\mu _{2})^{2}} {2\sigma _{2}^{2}} }\).

    1. (a)

      Show that

      $$\displaystyle{\int _{-\infty }^{\infty }\sqrt{p(x)q(x)}\,dx = \sqrt{ \frac{2\sigma _{1 } \sigma _{2 } } {\sigma _{1}^{2} +\sigma _{ 2}^{2}}}e^{A - B},}$$

      where

      $$\displaystyle{A = \frac{\Big( \frac{\mu _{1}} {2\sigma _{1}^{2}} + \frac{\mu _{2}} {2\sigma _{2}^{2}} \Big)^{2}} { \frac{1} {\sigma _{1}^{2}} + \frac{1} {\sigma _{2}^{2}} },\quad B = \frac{\mu _{1}^{2}} {4\sigma _{1}^{2}} + \frac{\mu _{2}^{2}} {4\sigma _{2}^{2}}.}$$
    2. (b)

      Find the Hellinger distance H(p, q).

  8. 12.8.

    Find the Hellinger distance between two gamma distributions.

  9. 12.9.

    Consider two exponential distributions, \(p(x) = ae^{-ax}\) and \(q(x) = be^{-bx}\), x ≥ 0. Show that the Chernoff information of order α is

    $$\displaystyle{D^{\alpha }(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 - \frac{2a^{\frac{1-\alpha } {2} }b^{\frac{1+\alpha } {2} }} {a(1-\alpha ) + b(1+\alpha )}\Big\},\quad \alpha \not = \pm 1.}$$
  10. 12.10.

    Consider the normal distributions \(p(x) = \frac{1} {\sqrt{2\pi }\sigma _{1}} e^{-\frac{(x-\mu _{1})^{2}} {2\sigma _{1}^{2}} }\) and \(q(x) = \frac{1} {\sqrt{2\pi }\sigma _{2}} e^{-\frac{(x-\mu _{2})^{2}} {2\sigma _{2}^{2}} }\). Show that the Chernoff information of order α is

    $$\displaystyle{D^{\alpha }(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 - A\sqrt{ \frac{\pi } {a}}e^{\frac{b^{2}} {4a}-c}\Big\},\qquad \vert \alpha \vert < 1,}$$

    where

    $$\displaystyle\begin{array}{rcl} a& =& \frac{1-\alpha } {4\sigma _{1}^{2}} + \frac{1+\alpha } {4\sigma _{2}^{2}} {}\\ b& =& \frac{\mu _{1}(1-\alpha )} {2\sigma _{1}^{2}} + \frac{\mu _{2}(1+\alpha )} {2\sigma _{2}^{2}} {}\\ c& =& \frac{\mu _{1}^{2}(1-\alpha )} {4\sigma _{1}^{2}} + \frac{\mu _{2}^{2}(1+\alpha )} {4\sigma _{2}^{2}}. {}\\ \end{array}$$
  11. 12.11.

    The Chernoff information of order α for discrete distributions (p n ) and (q n ) is given by

    $$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1 -\alpha ^{2}}\Big\{1 -\sum _{n\geq 0}p_{n}^{\frac{1-\alpha } {2} }q_{n}^{\frac{1+\alpha } {2} }\Big\}.}$$

    Let \(p_{n} = \frac{\lambda _{1}^{n}} {n!} e^{-\lambda _{1}}\) and \(q_{n} = \frac{\lambda _{2}^{n}} {n!} e^{-\lambda _{2}}\) be two Poisson distributions.

    1. (a)

      Show that

      $$\displaystyle{D^{(\alpha )}(p\vert \vert q) = \frac{4} {1-\alpha ^{2}}\Big\{1-e^{\lambda _{1}^{(1-\alpha )/2}\lambda _{ 2}^{(1+\alpha )/2}-\lambda _{ 1}(1-\alpha )/2-\lambda _{2}(1+\alpha )/2}\Big\}.}$$
    2. (b)

      Show that the square of the Hellinger distance is given by

      $$\displaystyle{H^{2}(p,q) = 4\{1 - e^{\sqrt{\lambda _{1 } \lambda _{2}} -\frac{\lambda _{1}+\lambda _{2}} {2} }\}.}$$