Riemannian Online Algorithms for Estimating Mixture Model Parameters

Zanini, Paolo; Said, Salem; Berthoumieu, Yannick; Congedo, Marco; Jutten, Christian

doi:10.1007/978-3-319-68445-1_78

Paolo Zanini^15,16,
Salem Said^15,16,
Yannick Berthoumieu^15,16,
Marco Congedo^15,16 &
…
Christian Jutten^15,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10589))

Included in the following conference series:

International Conference on Geometric Science of Information

2300 Accesses
2 Citations

Abstract

This paper introduces a novel algorithm for the online estimate of the Riemannian mixture model parameters. This new approach counts on Riemannian geometry concepts to extend the well-known Titterington approach for the online estimate of mixture model parameters in the Euclidean case to the Riemannian manifolds. Here, Riemannian mixtures in the Riemannian manifold of Symmetric Positive Definite (SPD) matrices are analyzed in details, even if the method is well suited for other manifolds.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Recent Advances in Stochastic Riemannian Optimization

Online Learning of Riemannian Hidden Markov Models in Homogeneous Hadamard Spaces

Online k-MLE for Mixture Modeling with Exponential Families

Keywords

1 Introduction

Information theory and Riemannian geometry have been widely developed in the recent years in a lot of different applications. In particular, Symmetric Positive Definite (SPD) matrices have been deeply studied through Riemannian geometry tools. Indeed, the space $\mathcal {P}_m$ of $m \times m$ SPD matrices can be equipped with a Riemannian metric. This metric, usually called Rao-Fisher or affine-invariant metric, gives it the structure of a Riemannian manifold (specifically a homogeneous space of non-positive curvature). SPD matrices are of great interest in several applications, like diffusion tensor imaging, brain-computer interface, radar signal processing, mechanics, computer vision and image processing [1,2,3,4,5]. Hence, it is very useful to develop statistical tools to analyze objects living in the manifold $\mathcal {P}_m$. In this paper we focus on the study of Mixtures of Riemannian Gaussian distributions, as defined in [6]. They have been successfully used to define probabilistic classifiers in the classification of texture images [7] or Electroencephalography (EEG) data [8]. In these examples mixtures parameters are estimated through suitable EM algorithms for Riemannian manifolds. In this paper we consider a particular situation, that is the observations are observed one at a time. Hence, an online estimation of the parameters is needed. Following the Titterington’s approach [9], we derive a novel approach for the online estimate of parameters of Riemannian Mixture distributions.

The paper is structured as follows. In Sect. 2 we describe the Riemannian Gaussian Mixture Model. In Sect. 3, we introduce the reference methods for online estimate of mixture parameters in the Euclidean case, and we describe in details our approach for the Riemannian framework. For lack of space, some equation’s proofs will be omitted. Then, in Sect. 4, we present some simulations to validate the proposed method. Finally we conclude with some remarks and future perspectives in Sect. 5.

2 Riemannian Gaussian Mixture Model

We consider a Riemannian Gaussian Mixture model $g(x;\theta ) = \sum _{k=1}^K \omega _kp(x;\psi _k)$, with the constraint $\sum _{k=1}^K\omega _k = 1$. Here $p(x;\psi _k)$ is the Riemannian Gaussian distribution studied in [6], defined as $p(x;\psi _k) = \frac{1}{\zeta (\sigma _k)}\exp \left( -\frac{d_R^2(x,\overline{x}_k)}{2\sigma _k^2}\right) $, where x is a SPD matrix, $\overline{x}_k$ is still a SPD matrix representing the center of mass of the kth component of the mixture, $\sigma _k$ is a positive number representing the dispersion parameter of the kth mixture component, $\zeta (\sigma _k)$ is the normalization factor, and $d_R(\cdot ,\cdot )$ is the Riemannian distance induced by the metric on $\mathcal {P}_m$. $g(x;\theta )$ is also called incomplete likelihood. In the typical mixture model approach, indeed, we consider some latent variables $Z_i$, categorical variables over $\{1,...,K\}$ with parameters $\{\omega _k\}_{k=1}^K$, assuming $X_i | Z_i=k \sim p(\cdot ,\psi _k)$. Thus, the complete likelihood is defined as $f(x,z;\theta ) = \sum _{k=1}^K\omega _kp(x;\psi _k)\delta _{z,k}$, where $\delta _{z,k} = 1$ if $z=k$ and 0 otherwise. We deal here with the problem to estimate the model parameters, gathered in the vector $\theta = [\omega _1,\overline{x}_1, \sigma _1, ..., \omega _K,\overline{x}_K, \sigma _K]$. Usually, given a set of N i.i.d. observations $\chi = \{x_i\}_{i=1}^N$, we look for $\widehat{\theta }_{N}^{MLE}$, that is the MLE of $\theta $, i.e. the maximizer of the log-likelihood $l(\theta ;\chi ) = \frac{1}{N}\sum _{i=1}^N\log \sum _{k=1}^K\omega _kp(x_i;\psi _k).$

To obtain $\widehat{\theta }_{N}^{MLE}$, EM or stochastic EM approaches are used, based on the complete dataset $\chi _c = \{(x_i,z_i)\}_{i=1}^N$, with the unobserved variables $Z_i$. In this case, average complete log-likelihood can be written:

$$\begin{aligned} l_c(\theta ;\chi _c) = \frac{1}{N}\sum _{i=1}^N\log \prod _{k=1}^K(\omega _kp(x_i;\psi _k))^{\delta _{z_i,k}} = \frac{1}{N}\sum _{i=1}^N\sum _{k=1}^K\delta _{z_i,k}\log (\omega _kp(x_i;\psi _k)). \end{aligned}$$

(1)

Here we consider a different situation, that is the dataset $\chi $ is not available entirely, rather the observations are observed one at a time. In this situation online estimation algorithms are needed.

3 Online Estimation

In the Euclidean case, reference algorithms are the Titterington’s algorithm, introduced in [9], and the Cappé-Moulines’s algorithm presented in [10].

We focus here on Titterington’s approach. In classic EM algorithms, the Expectation step consists in computing $Q(\theta ;\widehat{\theta }^{(r)},\chi ) = E_{\widehat{\theta }^{(r)}}[l_c(\theta ;\chi _c) | \chi ]$, and then, in the Maximization step, in maximizing Q over $\theta $. These steps are performed iteratively and at each iteration r an estimate $\widehat{\theta }^{(r)}$ of $\theta $ is obtained exploiting the whole dataset. In the online framework, instead, the current estimate will be indicated by $\widehat{\theta }^{(N)}$, since in this setting, once $x_1,x_2,...,x_N$ are observed we want to update our estimate for a new observation $x_{N+1}$. Titterington approach corresponds to the direct optimization of $Q(\theta ;\widehat{\theta }^{(N)},\chi )$ using a Newton algorithm:

$$\begin{aligned} \widehat{\theta }^{(N+1)} = \widehat{\theta }^{(N)} + \gamma ^{(N+1)}I_c^{-1}(\widehat{\theta }^{(N)})u(x_{N+1};\widehat{\theta }^{(N)}), \end{aligned}$$

(2)

where $\{\gamma ^{(N)}\}_N$ is a decreasing sequence, the Hessian of Q is approximated by the Fisher Information matrix $I_c$ for the complete data $I_c^{-1}(\widehat{\theta }^{(N)}) = -E_{\widehat{\theta }^{(N)}}[\frac{\log f(x,z;\theta )}{\partial \theta \partial \theta ^T}]$, and the score $u(x_{N+1};\widehat{\theta }^{(N)})$ is defined as $u(x_{N+1};\widehat{\theta }^{(N)}) = \nabla _{\widehat{\theta }^{(N)}}\log g(x_{N+1};\widehat{\theta }^{(N)}) = E_{\widehat{\theta }^{(N)}}[\nabla _{\widehat{\theta }^{(N)}}\log f(x_{N+1};\widehat{\theta }^{(N)}) | x_{N+1}]$ (where last equality is presented in [10]).

Geometrically speaking, Tittetington algorithm consists in modifying the current estimate $\widehat{\theta }^{(N+1)}$ adding the term $\xi ^{(N+1)} = \gamma ^{(N+1)}I_c^{-1}(\widehat{\theta }^{(N)})u(x_{N+1};\widehat{\theta }^{(N)})$. If we want to consider parameters belonging to Riemannian manifolds, we have to suitably modify the update rule. Furthermore, even in the classical framework, Titterington update does not necessarily constraint the estimates to be in the parameters space. For instance, the weights could be assume negative values. The approach we are going to introduce solves this problem, and furthermore is suitable for Riemannian Mixtures.

We modify the update rule, exploiting the Exponential map. That is:

$$\begin{aligned} \widehat{\theta }^{(N+1)} = \text {Exp}_{\widehat{\theta }^{(N)}}(\xi ^{(N+1)}), \end{aligned}$$

(3)

where our parameters become $\theta _k = [s_k, \overline{x}_k, \eta _k]$. Specifically, $s_k^2 = w_k \rightarrow s=[s_1,...,s_K] \in \mathbb {S}^{K-1}$ (i.e., the sphere), $\overline{x}_k \in P(m)$ and $\eta _k = -\frac{1}{2\sigma _k^2} < 0$.

Actually we are not forced to choose the exponential map, in the update formula (3), but we can consider any retraction operator. Thus, we can generalize (3) in $\widehat{\theta }^{(N+1)} = \mathcal {R}_{\widehat{\theta }^{(N)}}(\xi ^{(N+1)}).$

In order to develop a suitable update rule, we have to define $I(\theta )$ and the score u() in the manifold, noting that every parameter belongs to a different manifold. Firstly we note that the Fisher Information matrix $I(\theta )$ can be written as:

Now we can analyze separately the update rule for s, $\overline{x}$, and $\eta $. Since they belong to different manifold the exponential map (or the retraction) will be different, but the philosophy of the algorithm is still the same.

For the update of weights $s_k$, the Riemannian manifold considered is the sphere $\mathbb {S}^{K-1}$, and, given a point $s \in \mathbb {S}^{K-1}$, the tangent space $T_s\mathbb {S}^{K-1}$ is identified as $T_s\mathbb {S}^{K-1} = \{\xi \in \mathbb {R}^K : \xi ^Ts = 0\}.$ We can write the complete log-likelihood only in terms of s: $l(x,z;s) = \log f(x,z;s) = \sum _{k=1}^K\log s_k^2\delta _{z,k}.$ We start by evaluating I(s), that will be a $K \times K$ matrix of the quadratic form

$$\begin{aligned} I_s(u,w) = E[\langle u,v(z,s)\rangle \langle v(z,s),w\rangle ], \end{aligned}$$

(4)

for u, w elements of the tangent space in s, and v(z, s) is the Riemannian gradient, defined as $ v(z,s) = \frac{\partial l}{\partial s} - \left( \frac{\partial l}{\partial s},s\right) s.$ In this case we obtain $\frac{\partial l}{\partial s_k} = 2\frac{\delta _{z,k}}{s_k} \rightarrow v(z,s_k) = 2\left( \frac{\delta _{z,k}}{s_k} - s_k\right) .$ It is easy to see that the matrix of the quadratic form has elements

$$\begin{aligned} I_{kl}(s)&= E[v_k(z,s)v_l(z,s)] = E\left[ 4\left( \frac{\delta _{z,k}}{s_k} - s_k\right) \left( \frac{\delta _{z,l}}{s_l} - s_l\right) \right] \\&= E\left[ 4\left( \frac{\delta _{z,k}\delta _{z,l}}{s_ks_l} - \frac{s_l}{s_k}\delta _{z,k} - \frac{s_k}{s_l}\delta _{z,l} + s_ks_l\right) \right] = 4(\delta _{kl} - s_ks_l - s_ks_l + s_ks_l) = 4(\delta _{kl} - s_ks_l). \end{aligned}$$

Thus, the Fisher Information matrix I(s) applied to an element $\xi $ of the tangent space results to be $I(s)\xi = 4\xi $, hence I(s) corresponds to 4 times the identity matrix. Thus, if we consider update rule (3), we have $\xi ^{(N+1)} = \frac{\gamma ^{(N+1)}}{4} u(x_{N+1};\widehat{\theta }^{(N)})$. We have to evaluate $u(x_{N+1};\widehat{\theta }^{(N)})$. We proceed as follows:

$$u_k(x_{N+1};\widehat{\theta }^{(N)}) = E[v_k(z,s) | x_{N+1}] = E\left[ 2\left( \frac{\delta _{z,k}}{s_k} - s_k\right) | x_{N+1}\right] = 2\left( \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{s_k} - s_k\right) ,$$

where $h_k(x_{N+1};\widehat{\theta }^{(N)}) \propto s_k^2p(x_{N+1};\widehat{\theta }_k^{(N)}).$ Thus we obtain

$$\begin{aligned} \widehat{s}^{(N+1)} = \text {Exp}_{\widehat{s}^{(N)}}\left( \frac{\gamma ^{(N+1)}}{2}\left( \frac{h_1(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{s}_1^{(N)}} - \widehat{s}_1^{(N)}, ..., \frac{h_K(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{s}_K^{(N)}} - \widehat{s}_K^{(N)}\right) \right) = \text {Exp}_{\widehat{s}^{(N)}}\left( \xi ^{(N+1)}\right) . \end{aligned}$$

(5)

Considering the classical exponential map on the sphere (i.e., the geodesic), the update rule (5) becomes

$$\begin{aligned} \widehat{s}_k^{(N+1)} = \widehat{s}_k^{(N)}\cos (\Vert \xi ^{(N+1)}\Vert ) + \frac{\frac{\gamma ^{(N+1)}}{2}\left( \frac{h_k}{\widehat{s}_k^{(N)}} - \widehat{s}_k^{(N)}\right) }{\Vert \xi ^{(N+1)}\Vert }\sin (\Vert \xi ^{(N+1)}\Vert ). \end{aligned}$$

(6)

Actually, as anticipated before, we are not forced to used the exponential map, but we can consider other retractions. In particular, on the sphere, we could consider the “projection” retraction $\mathcal {R}_x(\xi ) = \frac{x + \xi }{\Vert x + \xi \Vert },$ deriving update rule accordingly.

For the update of barycenters $\overline{x}_k$ we have, for every barycenter $\overline{x}_k, k=1,...,K$, an element of $\mathcal {P}_m$, the Riemannian manifold of $m \times m$ SPD matrices. Thus, we derive the update rule for a single k.

First of all we have to derive expression (4). But this expression is true only for irreducible manifolds, as the sphere. In the case of $\mathcal {P}_m$ we have to introduce some theoretical results. Let $\mathcal {M}$ a symmetric space of negative curvature (like $\mathcal {P}_m$), it can be expressed as a product $\mathcal {M} = \mathcal {M}_1 \times \cdots \times \mathcal {M}_R$, where each $\mathcal {M}_r$ is an irreducible space [11]. Now let $\overline{x}$ an element of $\mathcal {M}$, and v, w elements of the tangent space $T_{\overline{x}}\mathcal {M}$. We can write $\overline{x} = (\overline{x}_1,...,\overline{x}_R)$, $v = (v_1,...,v_R)$ and $w = (w_1,...,w_R).$ We can generalize (4) by the following expression:

$$\begin{aligned} I_{\overline{x}}(u,w) = \sum _{r=1}^{R}E[\langle u_r,v_r(\overline{x}_r)\rangle _{\overline{x}}\langle v_r(\overline{x}_r),w_r\rangle _{\overline{x}}], \end{aligned}$$

(7)

with $v_r(\overline{x}_r) = \nabla _{\overline{x}}l(\overline{x})$ being the Riemannian score. In our case $\mathcal {P}_m = \mathbb {R} \times \mathcal {SP}_m$, where $\mathcal {SP}_m$ represents the manifold of SPD matrices with unitary determinant, while $\mathbb {R}$ takes into account the part relative to the determinant. Thus, if $x \in \mathcal {P}_m$, we can consider the isomorphism $\phi (x) = (x_1,x_2)$ with $x_1 = \log \det x \in \mathbb {R}$ and $x_2 = e^{-x_1/m}x \in \mathcal {SP}_m$, $(\det x_2 = 1).$ The idea is to use the procedure adopted to derive $\widehat{s}^{(N+1)}$, for each component of $\widehat{\overline{x}}_k^{(N+1)}$. Specifically we proceed as follows:

we derive $I(\overline{x}_k)$ through formula (7), with components $I_r$.
we derive the Riemannian score $u(x_{N+1};\widehat{\theta }^{(N)}) = E\left[ v(x_{N+1},z_{N+1}; \widehat{\overline{x}}_k^{(N)}\right. $, $\left. \widehat{\sigma }_k^{(N)})|x_{N+1}\right] $, with components $u_r$.
for each component $r=1,2$ we evaluate $\xi _r^{(N+1)} = \gamma ^{(N+1)}I_r^{-1}u_r$
we update each component $\left( \widehat{\overline{x}}_k^{(N+1)}\right) _r = \text {Exp}_{\left( \widehat{\overline{x}}_k^{(N)}\right) _r}\left( \xi _r^{(N+1)}\right) $ and we could use $\phi ^{-1}(\cdot )$ to derive $\widehat{\overline{x}}_k^{(N+1)}$ if needed.

We start deriving $I(\overline{x}_k)$ for the complete model (see [12] for some derivations):

$$\begin{aligned} I_{\overline{x}_k}(u,w)&= E\left[ \langle u,v(x,z;\overline{x}_k,\sigma _{k})\rangle \langle v(x,z;\overline{x}_k,\sigma _{k}),w\rangle \right] = E\left[ \frac{\delta _{z,k}}{\sigma _k^4}\langle u,\text {Log}_{\overline{x}_k}x\rangle \langle \text {Log}_{\overline{x}_k}x,w\rangle \right] = \nonumber \\&\qquad = E\left[ \frac{\delta _{z,k}}{\sigma _k^4}I(u,w)\right] = \frac{\omega _{k}}{\sigma _k^4}\sum _{r=1}^2\frac{\psi '_r(\eta _k)}{dim(\mathcal {M}_r)}\langle u_r,w_r\rangle _{(\overline{x}_k)_r}, \end{aligned}$$

(8)

where $\psi (\eta _k) = \log \zeta $ as a function of $\eta _k = -\frac{1}{2\sigma _k^2}$, and we have the result introduced in [13] that says that if $x \in \mathcal {M}$ is distributed with a Riemannian Gaussian distribution on $\mathcal {M}$, $x_r$ is distributed as a Riemannian Gaussian distribution on $\mathcal {M}_r$ and $\zeta (\sigma _k) = \prod _{r=1}^R\zeta _r(\sigma _k)$. In our case $\zeta _1(\sigma _k) = \sqrt{2\pi m\sigma _k^2}$ ($\psi _1(\eta _k) = \frac{1}{2}\log (-\frac{\pi m}{\eta _k})$), and then we obtain $\zeta _2(\sigma _k) = \frac{\zeta (\sigma _k)}{\zeta _1(\sigma _k)}$ easily, since $\zeta (\sigma _k)$ has been derived in [6, 8]. From (8), we observe that for both components $r=1,2$ the Fisher Information matrix is proportional to the identity matrix with a coefficient $\frac{\omega _{k}}{\sigma _k^4}\frac{\psi '_r(\eta _k)}{dim(\mathcal {M}_r)}$.

We derive now the Riemannian score $u(x_{N+1};\widehat{\theta }_k^{(N)}) \in T_{\widehat{\overline{x}}_k^{(N)}}P(m)$:

$$u(x_{N+1};\widehat{\theta }_k^{(N)}) = E\left[ v(x,z;\widehat{\overline{x}}_k^{(N)},\widehat{\sigma }_{k}^{(N)}) | x_{N+1}\right] = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\text {Log}_{\widehat{\overline{x}}_k^{(N)}}x_{N+1}.$$

In order to find $u_1$ and $u_2$ we have simply to apply the Logarithmic map of Riemannian manifold $\mathcal {M}_1$ and $\mathcal {M}_2$, which in our case are $\mathbb {R}$ and $\mathcal {SP}_m$, respectively, to the component 1 and 2 of $x_{N+1}$ and $\widehat{\overline{x}}_k^{(N)}$:

$$u_1 = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\left( (\widehat{\overline{x}}_k^{(N)})_1 - (x_{N+1})_1\right) $$

$$u_2 = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \log \left( \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}(x_{N+1})_2\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\right) \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2}$$

Expliciting $\psi '_r(\eta _k)$, specifically $\psi '_1(\eta _k) = -\frac{1}{2\eta _k} = \sigma _k^2$ and $\psi '_2(\eta _k) = \psi '(\eta _k) + \frac{1}{2\eta _k}$, we can easily apply the Fisher Information matrix to $u_r$. In this way we can derive $\xi _1^{(N+1)} = \gamma ^{(N+1)}I_1^{-1}(\widehat{\theta }^{(N)})u_1$ and $\xi _2^{(N+1)} = \gamma ^{(N+1)}I_2^{-1}(\widehat{\theta }^{(N)})u_2$. We are now able to obtain the update rules through the respective exponential maps:

$$\begin{aligned} \left( \widehat{\overline{x}}_k^{(N+1)}\right) _1 = \left( \widehat{\overline{x}}_k^{(N)}\right) _1 - \xi _1^{(N+1)} \end{aligned}$$

(9)

$$\begin{aligned} \left( \widehat{\overline{x}}_k^{(N+1)}\right) _2 = \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \exp \left( \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\xi _2^{(N+1)}\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\right) \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \end{aligned}$$

(10)

For the update of dispersion parameters $\sigma _k$, we consider $\eta _k = -\frac{1}{2\sigma _k^2}$. Thus, we consider a real parameter, and then our calculus will be done in the classical Euclidean framework. First of all we have $l(x,z;\eta _k) = \log f(x,z;\eta _k) = \sum _{k=1}^K\delta _{z,k}\left( -\psi (\eta _k) + \eta _kd_R^2(x,\overline{x}_k)\right) $. Thus, we can derive $v(x,z;\eta _k) = \frac{\partial l}{\partial \eta _k} = \delta _{z,k}(-\psi '(\eta _k) + d_R^2(x,\overline{x}_k)).$ Knowing that $I(\eta _k) = \omega _k\psi ''(\eta _k)$, we can evaluate the score:

$$\begin{aligned} u(x_{N+1};\widehat{\theta }^{(N)}) = E[v(x,z;\eta _k) | x_{N+1}] = h_k(x_{N+1};\widehat{\theta }^{(N)})\left( d_R^2\left( x_{N+1},\widehat{\overline{x}}_k^{(N)}\right) - \psi '(\widehat{\eta }_k^{(N)})\right) . \end{aligned}$$

(11)

Hence we can obtain the updated formula for the dispersion parameter

$$\begin{aligned} \widehat{\eta }_k^{(N+1)} = \widehat{\eta }_k^{(N)} + \gamma ^{(N+1)}\frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\omega }_k^{(N)}\psi ''(\widehat{\eta }_k^{(N)})}\left( d_R^2\left( x_{N+1},\widehat{\overline{x}}_k^{(N)}\right) - \psi '(\widehat{\eta }_k^{(N)})\right) , \end{aligned}$$

(12)

and, obviously $\left. \widehat{\sigma }_k^2\right. ^{(N+1)} = -\frac{1}{2\widehat{\eta }_k^{(N+1)}}.$

4 Simulations

We consider here two simulation frameworks to test the algorithm described in this paper.

The first framework corresponds to the easiest case. Indeed we consider only one mixture component (i.e., $K=1$). Thus, this corresponds to a simple online mean and dispersion parameter estimate for a Riemannian Gaussian sample. We consider matrices in $\mathcal {P}_3$ and we analyze three different simulations corresponding to three different value of the barycenter $\overline{x}_1$:

The value of dispersion parameter $\sigma $ is taken equal to 0.1 for the three simulations. We analyze different initial estimates $\widehat{\theta }_{in}$, closer to the true values at the beginning, and further at the end. We focus only on the barycenter, while the initial estimate for $\sigma $ corresponds to the true value. We consider two different initial values for each simulation. Specifically for case (a), $d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)})$ is lower, varying between 0.11 and 0.14. For case (b) it is greater, varying between 1.03 and 1.16. For every simulation we generate $N_{rep}=100$ samples, each one of $N=100$ observations. Thus at the end we obtain $N_{rep}$ different estimates $(\widehat{\overline{x}}_{1r}, \widehat{\sigma }_{r})$ for every simulation and we can evaluate the mean m and standard deviation s of the error, where the error is measured as the Riemannian distance between $\widehat{\overline{x}}_{1r}$ and $\overline{x}_1$ for the barycenter, and as $|\sigma - \widehat{\sigma }|$ for the dispersion parameter. The results are summarized in Table 1.

Table 1. Mean and standard deviation of the error for the first framework

Full size table

In the second framework we consider the mixture case, in particular $K=2$. The true weight are 0.4 and 0.6, while $\sigma _1 = \sigma _2 = 0.1$. The true barycenters are:

We make the initial estimates varying from the true barycenters to some SPD different from the true ones. In particular we analyze three cases. Case (a), where $d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0$; case (b), where $d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = 0.2$ and $d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0.26$; case (c), where $d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0.99$. The results obtained are shown in Table 2. In both frameworks it is clear that we can obtain very good results when starting close to the real parameter values, while the goodness of the estimates becomes weaker as the starting points are further from real values.

Table 2. Mean and standard deviation of the error for the second framework

Full size table

5 Conclusion

This paper has addressed the problem of the online estimate of mixture model parameters in the Riemannian framework. In particular we dealt with the case of mixtures of Gaussian distributions in the Riemannian manifold of SPD matrices. Starting from a classical approach proposed by Titterington for the Euclidean case, we extend the algorithm to the Riemannian case. The key point was that to look at the innovation part in the step-wise algorithm as an exponential map, or a retraction, in the manifold. Furthermore, an important contribution was that to consider Information Fisher matrix in the Riemannian manifold, in order to implement the Newton algorithm. Finally, we presented some first simulations to validate the proposed method. We can state that, when the starting point of the algorithm is close to the real parameters, we are able to estimate the parameters very accurately. The simulation results suggested us the next future work needed, that is to investigate on the starting point influence in the algorithm, to find some ways to improve convergence towords the good optimum. Another perspective is to apply this algorithm on some real dataset where online estimation is needed.

References

Pennec, X., Fillard, P., Ayache, N.: A riemannian framework for tensor computing. Int. J. Comput. Vis. 66(1), 41–66 (2006)
Article MATH Google Scholar
Barachant, A., Bonnet, S., Congedo, M., Jutten, C.: Multiclass brain-computer interface classification by riemannian geometry. IEEE Trans. Biomed. Eng. 59(4), 920–928 (2012)
Article Google Scholar
Arnaudon, M., Barbaresco, F., Yang, L.: Riemannian medians and means with applications to Radar signal processing. IEEE J. Sel. Topics Signal Process. 7(4), 595–604 (2013)
Article Google Scholar
Tuzel, O., Porikli, F., Meer, P.: Pedestrian detection via classification on Riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 30(10), 1713–1727 (2008)
Article Google Scholar
Dong, G., Kuang, G.: Target recognition in SAR images via classification on riemannian manifolds. IEEE Geosci. Remote Sens. Lett. 21(1), 199–203 (2015)
Article Google Scholar
Said, S., Bombrun, L., Berthoumieu, Y., Manton, J.H.: Riemannian gaussian distributions on the space of covariance matrices. IEEE Trans. Inf. Theory 63(4), 2153–2170 (2017)
Article MATH Google Scholar
Said, S., Bombrun, L., Berthoumieu, Y.: Texture classification using Rao’s distance: An EM algorithm on the Poincaré half plane. In: International Conference on Image Processing (ICIP) (2015)
Google Scholar
Zanini, P., Congedo, M., Jutten, C., Said, S., Berthomieu, Y.: Parameters estimate of riemannian gaussian distribution in the manifold of covariance matrices. In: IEEE Sensor Array and Multichannel Signal Processing Workshop (IEEE SAM 2016) (2016)
Google Scholar
Titterington, D.: Recursive parameter estimation using incomplete data. J. Royal Stat. Soc. Ser. B (Stat. Methodologies) 46(2), 257–267 (1984)
MATH MathSciNet Google Scholar
Cappé, O., Moulines, E.: Online EM algorithm for latent data models. J. Roy. Stat. Soc. Ser. B (Stat. Methodologies) 593–613 (2009)
Google Scholar
Helgason, S.: Differential Geometry, Lie Groups, and Symmetric Space, vol. 34. American Mathematical Society, Providence (2012)
Google Scholar
Said, S., Berthoumieu, Y.: Warped metrics for location-scale models (2017). arXiv:1702.07118v1
Said, S., Hajri, H., Bombrun, L., Vemuri, B.: Gaussian distributions on riemannian symmetric spaces: statistical learning with structured covariance matrices (2016). arXiv:1607.06929v1

Download references

Author information

Authors and Affiliations

Laboratoire IMS (CNRS - UMR 5218), Université de Bordeaux, Bordeaux, France
Paolo Zanini, Salem Said, Yannick Berthoumieu, Marco Congedo & Christian Jutten
Gipsa-lab (CNRS - UMR 5216), Université de Grenoble, Grenoble, France
Paolo Zanini, Salem Said, Yannick Berthoumieu, Marco Congedo & Christian Jutten

Authors

Paolo Zanini
View author publications
You can also search for this author in PubMed Google Scholar
Salem Said
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Berthoumieu
View author publications
You can also search for this author in PubMed Google Scholar
Marco Congedo
View author publications
You can also search for this author in PubMed Google Scholar
Christian Jutten
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Salem Said .

Editor information

Editors and Affiliations

Ecole Polytechnique, Palaiseau, France
Frank Nielsen
Thales Land and Air Systems, Limours, France
Frédéric Barbaresco

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zanini, P., Said, S., Berthoumieu, Y., Congedo, M., Jutten, C. (2017). Riemannian Online Algorithms for Estimating Mixture Model Parameters. In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2017. Lecture Notes in Computer Science(), vol 10589. Springer, Cham. https://doi.org/10.1007/978-3-319-68445-1_78

Download citation

DOI: https://doi.org/10.1007/978-3-319-68445-1_78
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68444-4
Online ISBN: 978-3-319-68445-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Riemannian Online Algorithms for Estimating Mixture Model Parameters

Abstract

Similar content being viewed by others

Recent Advances in Stochastic Riemannian Optimization

Online Learning of Riemannian Hidden Markov Models in Homogeneous Hadamard Spaces

Online k-MLE for Mixture Modeling with Exponential Families

Keywords

1 Introduction

2 Riemannian Gaussian Mixture Model

3 Online Estimation

4 Simulations

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Riemannian Online Algorithms for Estimating Mixture Model Parameters

Abstract

Similar content being viewed by others

Recent Advances in Stochastic Riemannian Optimization

Online Learning of Riemannian Hidden Markov Models in Homogeneous Hadamard Spaces

Online k-MLE for Mixture Modeling with Exponential Families

Keywords

1 Introduction

2 Riemannian Gaussian Mixture Model

3 Online Estimation

4 Simulations

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation