1 Introduction

Probability density estimation on Riemannian manifolds is the subject of several recent studies. The different approaches can be separated into two categories, the parametric and non-parametric ones. The context of Riemannian manifolds brings difficulties of two kinds. Firstly, the theoretical results about distributions and the convergence of estimators known for random variables valued in \(\mathbb {R}^n\) have to be adapted to the case of random variables valued in Riemannian manifolds, see [1,2,3, 8, 9, 12,13,14,15]. Secondly, the construction of probability distribution and of density estimators should require a reasonable amount of computational complexity, see [8, 12, 13, 16,17,18]. A generalization of the Gaussian distribution on manifolds was proposed in [8]. Although the expression of the proposed law is hard to compute on general manifolds, expressions of radial Gaussians on symmetric spaces can be found in [12,13,14]. On isotropic spaces, an isotropic density is simply a radial density. The anisotropy of a density can be evaluated with the notion of covariance proposed in [8].

In this paper, we are interested in the construction of anisotropic distributions on the hyperbolic space. The problem of anisotropic normal distributions on manifold have been addressed in [19] through anisotropic diffusion. The construction is valid on arbitrary manifolds but requires important computations. The hyperbolic space is a very particular Riemannian manifold: it is at the same time isotropic and diffeomorphic to a vector space. These two specificities significantly ease the construction of probability distributions and probability density estimators. Generally, it is difficult to control the covariance of a distribution on a Riemannian manifold, e.g. the covariance of the Gaussian law proposed in [8]. We propose a simple way of constructing distributions whose covariance is fully controlled. The method is derived from the density kernel proposed by [1]. These distributions can be used in the non parametric kernel density estimator but also to design mixture models for parametric density estimation.

The paper is organised as follows. Section 2 is a very brief introduction to the hyperbolic plane. Section 3 reviews some general facts about probabilities on Riemannian manifolds. Section 4 describes how to built anisotropic density functions on the hyperbolic space.

2 The Hyperbolic Space

The hyperbolic geometry results of a modification of the fifth Euclid’s postulate on parallel lines. In two dimensions, given an line D and a point \(p\notin D\), the hyperbolic geometry is an example where there are at least two lines going through p, which do not intersect D. Let us consider the open unit disk of the Euclidean plane endowed with the Riemannian metric:

$$\begin{aligned} ds_{\mathbb {D}}^2=4\frac{dx^2+dy^2}{(1-x^2-y^2)^2} \end{aligned}$$
(1)

where x and y are the Cartesian coordinates. The unit disk \(\mathbb {D}\) endowed with \(ds_{\mathbb {D}}\) is called the Poincaré disk and is a model of the two-dimensional hyperbolic geometry. The construction is generalized to higher dimensions. Let ISO be the isometry group of \(\mathbb {D}\). It can be shown that:

  • \(\mathbb {D}\) is homogeneous: \(\forall p,q \in \mathbb {D},\exists \phi \in ISO, \phi (p)=q\), points are indistinguishable.

  • \(\mathbb {D}\) is isotropic: for any couple of geodesics \(\gamma _1\) and \(\gamma _2\) going through a point \(p\in \mathbb {D}\), there exists \(\phi \in ISO\) such that \(\phi (p)=p\) and \(\phi (\gamma _1)=\gamma _2\). In other words, directions are indistinguishable.

  • the Riemannian exponential applications are bijective.

  • \(\mathbb {D}\) has a constant negative curvature.

Let x denote the coordinates of elements of \(T_p\mathbb {D}\) in an orthogonal basis. x is mapped to a point on \(\mathbb {D}\) by the Riemannian exponential application noted \(exp_p\) and form thus a chart of \(\mathbb {D}\). This chart is called an exponential chart at the point p.

Given a reference point p the point of polar coordinates \((r,\alpha )\) of the hyperbolic space is defined as the point at distance r of p on the geodesic with initial direction \(\alpha \in \mathbb {S}^1\). Since the hyperbolic space is isotropic, the expression of the metric in polar coordinates only depends on r,

$$\begin{aligned} ds^2=dr^2+\sinh (r)^2d\alpha ^2, \end{aligned}$$
(2)

see [10, 11].

3 Distributions on \(\mathbb {D}\)

3.1 Densities

The metric of a Riemannian manifold provides a measure of volumes vol. In a chart, if G is the matrix of the metric, the density of vol with respect to the Lebesgue measure of the chart is

$$\begin{aligned} \frac{dvol}{dLeb}=|det(\sqrt{G})| \end{aligned}$$

where \(\sqrt{G}\) is the matrix square root of G. Let \(\mu \) be a measure on \(\mathcal {M}\). If \(\mu \) has a density f with respect to the Lebesgue measure of a chart, then the density with respect to the Riemannian volume measure is given by

$$\begin{aligned} \frac{d\mu }{dvol}=\frac{d\mu }{dLeb}\frac{dLeb}{dvol}=\frac{1}{|det(\sqrt{G})|}f. \end{aligned}$$
(3)

3.2 Intrinsic Means

Given a distribution \(\mu \), the variance at p be defined by

$$\begin{aligned} \sigma ^2(p)= \int _{\mathbb {D}}d(p,.)^2 d\mu . \end{aligned}$$

When the variance is finite everywhere, its minima are called mean points. The hyperbolic space is a Cartan-Hadamar manifold, that is to say it is complete, simply connected and of negative curvature. On Cartan-Hadamar manifolds, when the variance is finite everywhere, the mean exists and is unique, see [8] corollary 2. It is achieved at p such that

$$\begin{aligned} \int _{T_p\mathbb {D}} x d\tilde{\mu }=0, \end{aligned}$$

where \(\tilde{\mu }\) is the image of the measure \(\mu \) by the inverse of the exponential application at p.

3.3 Covariance on Manifold

The covariance of a random vector is the matrix formed by the covariance of its coordinates. In a vector space the coordinates of a vector are given in terms of projection on the corresponding axis. On a Riemannian manifold the notions of projection on a geodesic usually do not lead to explicit expressions. Even if it does not conserve all the properties of the covariance of vectors, when possible, the simplest generalisation to manifolds is to take the Euclidean covariance after lifting the distribution on a tangent space by the inverse of the exponential map, see [8]. Since on the hyperbolic space the exponential application a bijection, it is always possible to lift distributions on tangent spaces. Given a distribution \(\mu \) and a orthogonal basis of \(T_p\mathbb {D}\), the covariance at \(p\in \mathbb {D}\) is thus defined as

$$\begin{aligned} \varSigma _p(\mu )=\int _{T_p\mathbb {D}} xx^t d\tilde{\mu } \end{aligned}$$

This definition of covariance was used to define a notion of principal geodesic analysis on manifolds in [20]. It can be noted that the covariance at the point p is a point in \(T\mathbb {D} \otimes T\mathbb {D}\).

4 Constructing Anisotropic Distributions

The author of [8] proposes a generalization of Gaussian distributions on manifolds as the distribution that maximizes the entropy given its barycenter and covariance. This generalization leads to a density of the form,

$$\begin{aligned} N_{(p,\varGamma )}(exp_{p}(x))= k. \exp \left( -\frac{x^t \varGamma x}{2} \right) \end{aligned}$$

Given p and the covariance matrix \(\varSigma _p\), the main difficulties are to obtain expressions of the normalizing factor k and of the concentration matrix \(\varGamma \). Since hyperbolic space is homogenous, k and \(\varGamma \) only depend on the matrix \(\varSigma _p\). The expression of k and \(\varGamma \) when \(\varSigma _p\) is a (positive) multiple of the identity matrix can be found in [12]. However, it is difficult to obtain these relations when \(\varSigma _p\) is not diagonal.

It might be interesting to define parametric families of distributions whose means and covariances can easily be controlled, even if they do not verify the same statistical properties as the Gaussian distributions. Let \(K:\mathbb {R}_+\rightarrow \mathbb {R}_+\) be a function such that,

  1. i.

    \(\int _{\mathbb {R}^2} K(\Vert y\Vert )\,dy=1\)

  2. ii.

    \(\int _{\mathbb {R}^2} \Vert y \Vert ^2 K(\Vert y\Vert )\,dy=2\)

Given \(\varGamma \) a symmetric positive definite matrix, we have then

$$\begin{aligned} \int _{\mathbb {R}^2} \frac{1}{\sqrt{det(\varGamma )}} K(\sqrt{x^t\varGamma ^{-1} x})dx=1. \end{aligned}$$

Let \(\overline{p}\) be a point in \(\mathbb {D}\). Set an orthonormal basis of the tangent space \(T_{\overline{p}}\mathbb {D}\) and consider the distribution \(\nu _{\overline{p},\varGamma }\) on \(T_{\overline{p}}\mathbb {D}\) whose density with respect to the Lebesgue measure of \(T_{\overline{p}}\mathbb {D}\) is given by \(\frac{1}{\sqrt{det(\varSigma )}} K(\sqrt{x^t\varGamma ^{-1} x})\), where x and \(\varGamma \) are expressed in the reference basis. Let \(\mu _{\overline{p},\varGamma }=exp_{\overline{p}*}(\nu _{\overline{p},\varGamma })\) be the pushforward measure of \(\nu _{\overline{p},\varGamma }\) by the Riemannian exponential at \(\overline{p}\).

Theorem 1

\(\overline{p}\) is the unique mean of \(\mu _{\overline{p},\varGamma }\).

Proof

It can be checked that \(\mu _{\overline{p},\varGamma }\) has a finite variance everywhere. Moreover,

$$\begin{aligned} \int _{T_{\overline{p}}\mathbb {D}} \sqrt{\varGamma ^{-1}}x\frac{1}{\sqrt{\det \varGamma }}K(\sqrt{x^t\varGamma ^{-1} x})\,dx=0. \end{aligned}$$

The integrability of the function can be deduced from i and ii and the nullity from its symmetry. Therefore according to Sect. 3.2 \(\overline{p}\) is the unique mean of \(\mu _{\overline{p},\varGamma }\).

Theorem 2

The covariance \(\varSigma _{\overline{p}}\) of \(\mu _{\overline{p},\varGamma }\) at \(\overline{p}\) and the concentration matrix \(\varGamma \) are equal.

Proof

In the reference basis, making use of ii with the change of variables \(y=\sqrt{\varGamma ^{-1}}x\)

$$\begin{aligned} \varSigma _{\overline{p}}= & {} \int _{\mathbb {R}^2} xx^t \frac{1}{\sqrt{\det (\varGamma )}} K(\sqrt{x^t\varGamma ^{-1} x})dx \\= & {} \varGamma ^{1/2} \int _{\mathbb {R}^2}yy^t K(\sqrt{y^t y})dy\varGamma ^{1/2} \\= & {} \varGamma ^{1/2}\left( \int _{\mathbb {R}}\int _0^{2\pi } r^2\begin{pmatrix} \cos (\theta )\\ sin(\theta ) \end{pmatrix} \begin{pmatrix} \cos (\theta )\\ sin(\theta ) \end{pmatrix}^t K(r)rdrd\theta \right) \varGamma ^{1/2} \\= & {} \varGamma ^{1/2} \left( \frac{1}{2}\int _{\mathbb {R}}r^2I K(r) 2\pi rdr \right) \varGamma ^{1/2} \\= & {} \varGamma ^{1/2} I\left( \frac{1}{2}\int _{\mathbb {R}^2} \Vert y \Vert ^2 K(\Vert y\Vert )\,dy \right) \varGamma ^{1/2} \\= & {} \varGamma . \end{aligned}$$

The tangent space \(T_{\overline{p}}\mathbb {D}\) endowed with the reference basis provides a parametrization of the hyperbolic space. By definition, the density of \(\mu _{\overline{p},\varGamma }\) in this parametrization is given by \(\frac{1}{\sqrt{det(\varSigma )}} K(\sqrt{x^t\varSigma ^{-1} x})\). In order to obtain the density with respect to the Riemannian measure this term should be multiplied by the density of the Lebesgue measure of the parametrization with respect to the Riemannian measure, see Eq. 3. In an adapted orthonormal basis of \(T_{\overline{p}}\mathbb {D}\), Eq. 2 leads to the following expression of the matrix of the metric,

$$ G=\begin{pmatrix} 1&{}0\\ 0 &{} \frac{\sinh (r)^2}{r^2}\\ \end{pmatrix}. $$

Thus,

$$\begin{aligned} det(\sqrt{G})=\frac{\sinh (r)}{r}. \end{aligned}$$

Equation 3 leads to the density ratio,

$$\begin{aligned} \frac{dx}{dvol}(x)=\frac{||x||}{\sinh (||x||)}, \end{aligned}$$

where dx is the Lebesgue measure induced by the reference basis. Recall that in this parametrization, the Euclidean norm of x is the distance between \(exp_{\overline{p}}(x)\) and \(\overline{p}\). The density of \(\mu _{\overline{p},\varGamma }\) with respect to the Riemannian measure is given by

$$\begin{aligned} f(exp_{\overline{p}}(x))=\frac{||x||}{\sinh (||x||)\sqrt{det(\varSigma )}}K\left( \sqrt{x^t\varSigma ^{-1} x}\right) . \end{aligned}$$

Figure 1 shows the level lines when K is Gaussian.

Fig. 1.
figure 1

In this example \(K(x)=\frac{1}{\sqrt{2\pi }}e^{-x^2}\) and \(\varSigma \) has 1 and \(\frac{1}{4}\) as eigenvalues. The level lines of the corresponding density f are flattened circle but are not ellipses.

5 Estimating the Mean and the Covariance

Let the function K and the distribution \(\mu _{\overline{p},\varGamma }\) be as defined in Sect. 4. Given a set of draws drawn from this distribution it is important to have estimators of the two parameters: the mean and the covariance. In order to estimate the unknown parameters \((\overline{p},\varSigma _{\overline{p}})\) given a set of independent samples \((p_1,..,p_n)\), it is usual to try to maximize the likelihood function. The \(\log \)-likelihood of a set of samples is defined as

$$\begin{aligned} \mathcal {L}(p_1,..,p_n; (\hat{p},\hat{\varSigma }))= & {} \sum _i \log \left( \frac{||x_i||}{\sinh (||x_i||)\sqrt{det(\hat{\varSigma )}}} K\left( \sqrt{x_i^t\hat{\varSigma }^{-1} x_i}\right) \right) \\= & {} \sum _i \log \left( \frac{||x_i||}{\sinh (||x_i||)\sqrt{det(\hat{\varSigma )}}}\right) + log\left( K\left( \sqrt{x_i^t\hat{\varSigma }^{-1} x_i}\right) \right) .\\ \end{aligned}$$

The major difficulty is that it is not possible to optimize the mean and the covariance separately. Thus there might not be explicit expressions of the maximum likelihood. However, the mean and the covariance have natural estimators. It is already known that the empirical barycenter is a strongly consistent estimator of the barycenter, see [21] Theorem 2.3.

Given an estimate of the barycenter, it is possible to compute the empirical covariance in the corresponding tangent plane,

$$\begin{aligned} \hat{\varSigma }_{\hat{p}}=\frac{1}{N}\sum x_i x_i^t \end{aligned}$$
(4)

Using a similar construction as the Sasakian metric, see [22], the vector bundle \(T\mathbb {D} \otimes T\mathbb {D}\) can be endowed with a Riemannian metric. Although we do not prove it in this paper, we are convinced that almost surely

$$\begin{aligned} d((\hat{p},\hat{\varSigma }_{\hat{p}}),(\overline{p},\varSigma )) \underset{n\rightarrow +\infty }{\longrightarrow } 0, \end{aligned}$$

where d is the Riemannian distance on \(T\mathbb {D} \otimes T\mathbb {D}\).

6 Conclusion

In this paper we proposed a set of parametric families of anisotropic distributions on the hyperbolic plane. The main interest of these distributions is that the covariance matrix and concentration matrix are equal. The empirical mean and covariance provide thus simple estimators of the parameters of the distribution. Working with anisotropic distributions is expected to reduce the number of distributions used in mixture models and thus to reduce the computational complexity of the parameter estimation of the mixture models. On the one hand, our future work will focus on deriving convergence rates of the estimation of the covariance. On the other hand, we will study the use of these distributions in problems of radar signal classification.