Abstract
It is shown that, on any Lie group, the density ratio of the right invariant measure to the left invariant measure is harmonic with respect to the left invariant Riemannian metric. This result is applied to the Bayesian prediction theory on group invariant statistical models. A method of constructing Bayesian prior distributions that asymptotically dominate the right invariant priors is provided.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In Bayesian statistics, if the model has a group structure, inference based on the right invariant prior is known to have desirable properties; see [1] and references therein. The same holds true in Bayesian prediction [5, 10].
On the other hand, in the theory of Bayesian prediction, prior distributions that are superharmonic with respect to the Fisher metric have better performance than the Jeffreys prior [6].
These two facts raise the problem of the relation between right invariant priors and superharmonic priors. The Jeffreys prior corresponds to the left invariant prior in the models with group structures. In some examples such as location-scale models, the ratio of the right invariant prior to the left invariant prior is known to be harmonic with respect to the Fisher metric [7]. However, it was not known whether the harmonic property of the ratio of the right invariant prior to the left invariant prior holds in general. This paper proves the claim. The result is helpful for understanding the dominance property of the right invariant prior to the Jeffreys prior as shown in Lemma 2. We also provide a method of constructing prior distributions that asymptotically dominate the right invariant prior in Lemma 3, as we will demonstrate through examples.
In Sect. 2, we prove that the ratio of the right invariant measure to the left invariant measure is harmonic with respect to any left invariant metric. In Sect. 3, we apply the theorem to the Bayesian prediction problem.
2 Main result
Let G be a Lie group and e be its identity element. Choose a left invariant Riemannian metric h on G. We use the symbol h for Riemannian metrics to distinguish with elements of G usually denoted as g. In application to statistics, h is the Fisher metric of group-invariant models; see Sect. 3.
Let \(\nu _\textrm{L}\) be the left invariant measure (left Haar measure) on G. Up to multiplicative constants, \(\nu _\textrm{L}\) is written in terms of h as
where \(x^i\) is a local coordinate of \(x\in G\) and |h| is the determinant of the metric with respect to the local coordinate system. Denote the reciprocal of the modulus of G by \(\pi _\mathrm{R/L}\), that is,
(Eq. (1.2) of [1]) for any \(g\in G\) and \(f\in C_0(G)\), where \(C_0(G)\) denotes the set of continuous functions with compact supports. The map \(\pi _\mathrm{R/L}:G\rightarrow \mathbb {R}_{>0}\) is a group homomorphism. Define the right invariant measure \(\nu _\textrm{R}\) by
(see Eq. (1.4) of [1]). It is said that G is unimodular if \(\pi _\mathrm{R/L}(x)=1\) for all \(x\in G\). We are interested in groups that are not unimodular.
Define the Laplace–Beltrami operator \(\Delta \) associated with the metric h by
where \(\partial _i\) denotes the partial derivative with respect to the local coordinate, \(h^{ij}\) is the inverse matrix of \(h_{ij}=h(\partial _i,\partial _j)\), and Einstein’s summation convention is used. We call \(\Delta \) the Laplacian for simplicity. The Laplacian does not depend on the choice of the local coordinate system. A function f is said to be harmonic if \(\Delta f=0\) everywhere and superharmonic if \(\Delta f\le 0\) everywhere.
Our main theorem is stated as follows.
Theorem 1
The function \(\pi _\mathrm{R/L}\) is harmonic.
Proof
Take a function \(f\in C_0^\infty (G)\) such that \(\int f(x)\nu _\textrm{L}(\textrm{d}x)\ne 0\). Equation (1) is written as
where \(L_x\) is the left translation by x. Applying the Laplacian to both sides with respect to g yields
where the first equality uses Lebesgue’s convergence theorem, the second equality uses the isometric property of \(L_x\) and the invariance of \(\Delta \) with respect to isometries (see p.246, Proposition 2.4 of [4]), the third equality uses Eq. (1) again, and the fourth equality uses an integral formula on the Laplacian (see p.245 Proposition 2.3 of [4]). This proves \(\Delta \pi _\mathrm{R/L}(g)=0\). \(\square \)
Example 1
(Affine transformations) Consider the group of affine transformations
which is used to analyze the location-scale family in statistics. Let us directly show that \(\pi _\mathrm{R/L}\) is harmonic, as pointed out by [7]. It is widely known that the left and right invariant measures are
and
respectively (p. 63 of [3]). The density function of \(\nu _\textrm{R}\) with respect to \(\nu _\textrm{L}\) is
To derive the Laplacian, we determine a left invariant Riemannian metric. The metric tensor \(h_e\) at the identity element e is arbitrarily chosen. From the G-invariance, the metric tensor at \(g\in G\) is
where \((L_{g^{-1}})^*\) denotes the pull-back operator associated with the left translation \(L_{g^{-1}}\) and B is the Jacobian matrix of \(L_{g^{-1}}\). Indeed, the left translation
has the Jacobian matrix
The Laplacian is
It is immediate to see that \(\pi _\mathrm{R/L}(g)=\sigma \) is harmonic for any choice of \(h_e\).
3 Application to Bayesian prediction
3.1 Bayesian prediction problem
We briefly recall the Bayesian prediction problem and its relation with geometric quantities such as the Fisher metric and Laplacian.
A statistical model, or simply called a model, is a set of probability measures on a given measurable space \((\mathcal {X},\mathcal {F})\) indexed by a parameter \(\theta \) as
We assume that the model is identifiable, that is, \(\theta _1\ne \theta _2\) implies \(P_{\theta _1}\ne P_{\theta _2}\). The parameter space \(\Theta \) is assumed to be an orientable d-dimensional \(C^\infty \)-manifold. Let \(P_\theta \) be absolutely continuous with respect to a base measure \(v(\textrm{d}x)\) and its density function \(p(x|\theta )\) be positive everywhere and differentiable with respect to \(\theta \). The Fisher metric on \(\Theta \) is defined by
where \(\partial _i\) is the partial derivative with respect to local coordinates of \(\theta \). We assume that the Fisher metric is of \(C^\infty \) class and positive definite everywhere. The Fisher metric does not depend on the choice of the base measure \(v(\textrm{d}x)\).
A Borel measure on \(\Theta \) is called a Bayesian prior distribution or just a prior. The volume element
induced from the Fisher metric is called the Jeffreys prior. The Jeffreys prior does not depend on the choice of the local coordinate system. We focus on priors \(\pi (\theta )J(\textrm{d}\theta )\) that are absolutely continuous with respect to the Jeffreys prior. We call \(\pi (\theta )\) the prior density. Since \(J(\textrm{d}\theta )\) does not depend on the local coordinate system, \(\pi (\theta )\) is a scalar function. The functions \(\pi \) are assumed to be positive-valued and of \(C^2\) class. We consider not only proper priors but also improper priors.
A statistical prediction problem is to estimate the distribution of future observation \(y\in \mathcal {X}\) based on an independent sample \(x^n=(x_1,\ldots ,x_n)\in \mathcal {X}^n\) from \(P_\theta \). The Bayesian predictive density
based on the posterior density
is of interest.
The Bayesian prediction problem is to find a prior density function that has smaller prediction risk. We adopt the following risk function.
Definition 1
(Asymptotic risk; Eq. (13) of [7]) The asymptotic risk function of the prior density \(\pi \in C^2(\Theta )\) is defined by
where \(\Delta \) denotes the Laplacian on \(\Theta \) with respect to the Fisher metric. A prior density \(\pi _1\) is said to dominate \(\pi _2\) asymptotically if \(r(\pi _1,\theta )\le r(\pi _2,\theta )\) for all \(\theta \) and \(r(\pi _1,\theta )<r(\pi _2,\theta )\) for some \(\theta \).
The asymptotic risk is the leading term of the asymptotic expansion of the Kullback–Leibler risk of the Bayesian predictive density (2) as \(n\rightarrow \infty \). See Eq. (4) of [6] and Eq. (13) of [7] for details. It is straightforward to see
This is proved as
Our problem is to find a prior density \(\pi \) that has smaller asymptotic risk. The asymptotic risk of the Jeffreys prior density is 0 from the definition. Non-constant superharmonic prior densities asymptotically dominate the Jeffreys prior density since (3) holds.
3.2 Group invariant models
We consider the Bayesian prediction problem over group invariant models. Refer to [1, 2, 13] for comprehensive textbooks on the invariant models.
For simplicity, we suppose that the sample space \(\mathcal {X}\) is also a \(C^\infty \) manifold. Let a Lie group G act on \(\mathcal {X}\) smoothly from the left. For a probability measure P on \(\mathcal {X}\) and \(g\in G\), the push-forward measure \(g_*P\) is defined by \(g_*P(B)=P(g^{-1}B)\) for Borel sets B. The group G acts on the set of all probability measures by the push-forward operation.
Definition 2
(Group invariant model; Definition 3.1 of [1]) A statistical model \(\mathcal {P}\) is said to be G-invariant if for each \(P\in \mathcal {P}\), \(g_*P\in \mathcal {P}\) for all \(g\in G\).
If a G-invariant statistical model is parameterized as \(\mathcal {P}=\{P_\theta \mid \theta \in \Theta \}\), the left action of G on \(\Theta \) is well defined by \(P_{g\theta }=g_*P_\theta \) under identifiability. We assume that G transitively acts on \(\Theta \).
Let \(v(\textrm{d}x)\) be the base measure of \(\mathcal {P}\) as in the preceding subsection. We say that tensors on \(\Theta \) are G-invariant if they are preserved under the group action.
Lemma 1
Let \(\mathcal {P}\) be a G-invariant model. Then, the Fisher metric h is G-invariant. In particular, the Jeffreys prior is a left G-invariant measure on \(\Theta \).
See “Appendix” for the proof. Lemma 1 is used to prove Lemma 2.
We say that G acts freely on \(\Theta \) if \(g\theta =\theta \) for some \(\theta \in \Theta \) implies \(g=e\). If the action is free, the parameter space \(\Theta =\{g\theta _0\mid g\in G\}\) is identified with G, where \(\theta _0\in \Theta \) is a fixed element. Under the identification, the left invariant measure \(\nu _\textrm{L}\) on G is equal to the Jeffreys prior and the right invariant measure \(\nu _\textrm{R}\) is a prior on \(\Theta \), which we call the right invariant prior. It is known that the right invariant prior provides the best invariant predictive distribution [5, 10], which means that the right invariant prior attains the minimum of the Kullback–Leibler risk in the class of invariant predictive distributions. In particular, the right invariant prior dominates the Jeffreys prior if G is not unimodular. This fact is reflected in the following lemma. We prove the lemma in “Appendix” without using the fact.
Lemma 2
Suppose that G is not unimodular and acts freely on \(\Theta \). Then, the asymptotic risk of the right invariant prior density \(\pi _\mathrm{R/L}(\theta )\) is a negative constant.
Even if the action of G is not free, the theorem holds for any Lie subgroup \(G_1\) of G that acts freely and transitively on \(\Theta \). In that case, we can identify \(\Theta \) with \(G_1\) and construct harmonic prior densities from the right invariant measures on \(G_1\). Furthermore, since all the conjugate subgroups \(gG_1g^{-1}\) (\(g\in G\)) act freely as well, various harmonic prior densities are obtained. The prior densities have the same asymptotic risk because \(G_1\) and \(gG_1g^{-1}\) are isomorphic. We can reduce the asymptotic risk by aggregating the prior densities as follows.
Lemma 3
Let \(\pi _1\) and \(\pi _2\) be smooth positive functions on \(\Theta \). Define the generalized mean \(\bar{\pi }_\beta \) by
for \(\beta \ne 0\) and \(\bar{\pi }_0 = (\pi _1\pi _2)^{1/2}\) for \(\beta =0\). If \(\beta <1/2\), then
The equality holds for all \(\theta \in \Theta \) if and only if \(\pi _1/\pi _2\) is constant.
See “Appendix” for the proof. The case \(\beta =0\) is proved in [12].
We provide two applications of the lemma. In the applications, we first find a closed subgroup \(G_1\) that freely and transitively acts on \(\Theta \). Then, take \(g\in G\) and put \(G_2=gG_1g^{-1}\). Under the identifications \(G_1\simeq \Theta \), \(g_1\mapsto g_1\theta _0\), and \(G_2\simeq \Theta \), \(g_2\mapsto g_2g\theta _0\) as G-spaces, the following equality holds for any \(\theta \in \Theta \):
where \(\pi _1\) and \(\pi _2\) are the densities of the right invariant priors of \(G_1\) and \(G_2\), respectively. Indeed, the left and right invariant measures on \(G_2\) are the push-forward of those on \(G_1\) by \(g_1\mapsto gg_1g^{-1}\). Then the density \(\pi _1(g_1\theta _0)\) is equal to \(\pi _2((gg_1g^{-1})g\theta _0)=\pi _2(gg_1\theta _0)\), which proves \(\pi _2(\theta )=\pi _1(g^{-1}\theta )\) for \(\theta =gg_1\theta _0\).
Example 2
(Cauchy location-scale family [11]) Consider the Cauchy density function
with respect to the Lebesgue measure, where \(\mu \) and \(\sigma \) are called the location and scale parameters, respectively. The parameter space is \(\Theta =\{(\mu ,\sigma )\mid \mu \in \mathbb {R},\sigma >0\}\). The density function is written in terms of complex numbers as
The general linear group \(G=\textrm{GL}^+(2,\mathbb {R})\) with positive determinant acts on this model through the linear fractional transformation
The action of G on the parameter space is
for \((\mu ,\sigma )\in \Theta \). See [11] for details. Although the action of G on \(\Theta \) is not free, a subgroup
acts freely. We can identify \(G_1\) with \(\Theta \). As in Example 1, the left and right invariant measures of \(G_1\) are \(\sigma ^{-2}\textrm{d}\mu \wedge \textrm{d}\sigma \) and \(\sigma ^{-1}\textrm{d}\mu \wedge \textrm{d}\sigma \), respectively. The density of the right invariant prior on \(G_1\) is
From Theorem 1 and Lemma 2, the asymptotic risk of \(\pi _1\) is negative constant. Now consider a conjugate group
which also acts freely on \(\Theta \). The density of the right invariant prior on \(G_2\) is
This prior density is discussed in [7].
Finally, by taking the geometric mean of \(\pi _1\) and \(\pi _2\), we obtain a prior density
which shrinks the signal-noise ratio \(\mu /\sigma \) to the origin. Lemma 3 implies that the asymptotic risk of \((\pi _1\pi _2)^{1/2}\) is smaller than those of the right invariant priors \(\pi _1\) and \(\pi _2\).
For location-scale families other than the Cauchy family, the general linear group does not act because the family is not closed under the reciprocal 1/X of the random variable X. However, the dominance relationship on the asymptotic risk remains true because the asymptotic risk depends only on the Riemannian structure. See also [7] for this point.
Example 3
(two-dimensional Wishart model [8, 12]) Suppose that a random variable X has the two-dimensional Wishart distribution \(W_2(n,\Sigma )\) with the degree of freedom n and the covariance parameter
The model is G-invariant with respect to the general linear group \(G=\textrm{GL}(2,\mathbb {R})\), where the group action is defined by \((g,X)\mapsto gXg^\top \) and \((g,\Sigma )\mapsto g\Sigma g^\top \). The sample space \(\mathcal {X}\) and the parameter space \(\Theta \) are the set of positive definite symmetric matrices. The subgroup
of G has a one-to-one correspondence with \(\Theta \) through the Cholesky decomposition \(\Sigma =gg^\top \) with \(g\in G_1\). The left and right invariant measures of \(G_1\) are
and
respectively. The density of the right invariant prior on \(G_1\) is
in the \(\Sigma \)-coordinate. A conjugate group
also acts freely on \(\Theta \). The density of the right invariant prior on \(G_2\) is
The harmonic mean of \(\pi _1\) and \(\pi _2\) is
which is orthogonally invariant and shrinks the ratio of the two eigenvalues towards one. Lemma 3 implies that the prior asymptotically dominates the right invariant priors \(\pi _1\) and \(\pi _2\). The dominance relationship holds even in finite-sample cases as shown by [8].
Similarly, the geometric mean of \(\pi _1\) and \(\pi _2\) is
which is scale invariant and shrinks the correlation coefficient towards the origin. Again, Lemma 3 tells us that the prior asymptotically dominates the right invariant priors \(\pi _1\) and \(\pi _2\). This relation holds even in finite-sample cases as shown by [12].
The two examples show how Theorem 1 is useful in Bayesian inference.
We finally mention the predictive metric defined by [9], which appears in the asymptotic risk when the observed and predicted variables have different statistical models. The predictive metric is G-invariant whenever the statistical models for observed and predicted variables are G-invariant. The method of obtaining harmonic prior distributions is applicable to this case.
Data availability
The manuscript has no associated data.
References
Eaton, M.L.: Group Invariance Applications in Statistics. IMS, Hayward, California (1989)
Eaton, M.L.: Multivariate Statistics: A Vector Space Approach. IMS, Beachwood, Ohio (2007)
Fraser, D.A.S.: The Structure of Inference. John Wiley & Sons, New York (1968)
Helgason, S.: Groups and Geometric Analysis. American Mathematical Society, Orlando, Florida (1984)
Komaki, F.: Bayesian predictive distribution with right invariant priors. Calc. Stat. Assoc. Bull. 52, 171–179 (2002)
Komaki, F.: Shrinkage priors for Bayesian prediction. Ann. Stat. 34(2), 808–819 (2006)
Komaki, F.: Bayesian prediction based on a class of shrinkage priors for location-scale models. Ann. Inst. Stat. Math. 59, 135–146 (2007)
Komaki, F.: Bayesian predictive densities based on superharmonic priors for the 2-dimensional Wishart model. J. Multivar. Anal. 100(10), 2137–2154 (2009)
Komaki, F.: Asymptotic properties of Bayesian predictive densities when the distributions of data and target variables are different. Bayesian Anal. 10(1), 31–51 (2015)
Liang, F., Barron, A.: Exact minimax strategies for predictive density estimation, data compression, and model selection. IEEE Trans. Inf. Theory 50, 2708–2726 (2004)
McCullagh, P.: Möbius transformation and Cauchy parameter estimation. Ann. Stat. 24(2), 787–808 (1996)
Sei, T., Komaki, F.: A correlation-shrinkage prior for Bayesian prediction of the two-dimensional Wishart model. Biometrika 109(4), 1173–1180 (2022)
Wijsman, R.A.: Invariant Measures on Groups and Their Use in Statistics. IMS, Hayward, California (1990)
Acknowledgements
The authors are grateful to two anonymous referees for their careful reading and insightful suggestions.
Funding
Open Access funding provided by The University of Tokyo. This work was supported by JSPS KAKENHI Grant Number 21K11781, 22H00510, and AMED Grant Numbers JP23dm0207001 and JP23dm0307009.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
T. Sei and F. Komaki are current members of the Editorial Board of Information Geometry. On behalf of all authors, the corresponding author states that there is no other Conflict of interest.
Additional information
Communicated by Hiroshi Matsuzoe.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Proof of lemmas
Proof of lemmas
1.1 Proof of Lemma 1
Recall that the Fisher metric is a (0, 2) symmetric tensor
where \(\textrm{d}\log p(x|\theta )=\partial _i\log p(x|\theta )\textrm{d}\theta ^i\) is the exterior derivative with respect to \(\theta \in \Theta \). We prove \(g^*h=h\). The pull-back \(g^*h\) is
By assumption, the statistical model satisfies \(g_*P_\theta =P_{g\theta }\), which is equivalent to \(g_*(p(x|\theta )v(\textrm{d}x))=p(x|g\theta )v(\textrm{d}x)\) and therefore
In particular, \(g_*v\) and v are absolutely continuous with respect to each other because \(p(x|\theta )\) is assumed to be positive everywhere. We have
Since \(\textrm{d}(g_*v)/\textrm{d}v\) does not depend on \(\theta \), we obtain
Therefore
This proves the G-invariance of h.
1.2 Proof of Lemma 2
From Eq. (3) and Theorem 1, the asymptotic risk of \(\pi _\mathrm{R/L}\) is
The G-invariance of the asymptotic risk follows from the facts that h is G-invariant and \(\pi _\mathrm{R/L}\) is group homomorphic. Therefore, \(r(\pi _\mathrm{R/L})\) is a constant function on \(\Theta \) since G acts transitively by assumption. If G is not unimodular, the asymptotic risk is negative because \(\partial _i\log \pi _\mathrm{R/L}(\theta )\ne 0\) at some \(\theta \in \Theta \).
1.3 Proof of Lemma 3
Consider K smooth positive functions \(\pi _1,\ldots ,\pi _K\) for \(K\ge 2\). The lemma is a special case \(K=2\). Denote the generalized mean by
We prove \(r(\bar{\pi })\le \sum _{k=1}^K \lambda _kr(\pi _k)\), where
Define
It is straightforward to see
and
By using the formulas, we obtain
The last term is non-positive since \(\beta <1/2\). This proves the desired inequality \(r(\bar{\pi }) \le \sum _k \lambda _k r(\pi _k)\). The equality holds if and only if \((\partial _i\sqrt{\pi _k})/\sqrt{\pi _k}=\mu _i\) for all i and k, or equivalently, \(\pi _k/\pi _l\) are constants for all k, l.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Sei, T., Komaki, F. A harmonic property of right invariant priors. Info. Geo. (2024). https://doi.org/10.1007/s41884-024-00133-4
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41884-024-00133-4