Keywords

1 Introduction

Information theory and Riemannian geometry have been widely developed in the recent years in a lot of different applications. In particular, Symmetric Positive Definite (SPD) matrices have been deeply studied through Riemannian geometry tools. Indeed, the space \(\mathcal {P}_m\) of \(m \times m\) SPD matrices can be equipped with a Riemannian metric. This metric, usually called Rao-Fisher or affine-invariant metric, gives it the structure of a Riemannian manifold (specifically a homogeneous space of non-positive curvature). SPD matrices are of great interest in several applications, like diffusion tensor imaging, brain-computer interface, radar signal processing, mechanics, computer vision and image processing [1,2,3,4,5]. Hence, it is very useful to develop statistical tools to analyze objects living in the manifold \(\mathcal {P}_m\). In this paper we focus on the study of Mixtures of Riemannian Gaussian distributions, as defined in [6]. They have been successfully used to define probabilistic classifiers in the classification of texture images [7] or Electroencephalography (EEG) data [8]. In these examples mixtures parameters are estimated through suitable EM algorithms for Riemannian manifolds. In this paper we consider a particular situation, that is the observations are observed one at a time. Hence, an online estimation of the parameters is needed. Following the Titterington’s approach [9], we derive a novel approach for the online estimate of parameters of Riemannian Mixture distributions.

The paper is structured as follows. In Sect. 2 we describe the Riemannian Gaussian Mixture Model. In Sect. 3, we introduce the reference methods for online estimate of mixture parameters in the Euclidean case, and we describe in details our approach for the Riemannian framework. For lack of space, some equation’s proofs will be omitted. Then, in Sect. 4, we present some simulations to validate the proposed method. Finally we conclude with some remarks and future perspectives in Sect. 5.

2 Riemannian Gaussian Mixture Model

We consider a Riemannian Gaussian Mixture model \(g(x;\theta ) = \sum _{k=1}^K \omega _kp(x;\psi _k)\), with the constraint \(\sum _{k=1}^K\omega _k = 1\). Here \(p(x;\psi _k)\) is the Riemannian Gaussian distribution studied in [6], defined as \(p(x;\psi _k) = \frac{1}{\zeta (\sigma _k)}\exp \left( -\frac{d_R^2(x,\overline{x}_k)}{2\sigma _k^2}\right) \), where x is a SPD matrix, \(\overline{x}_k\) is still a SPD matrix representing the center of mass of the kth component of the mixture, \(\sigma _k\) is a positive number representing the dispersion parameter of the kth mixture component, \(\zeta (\sigma _k)\) is the normalization factor, and \(d_R(\cdot ,\cdot )\) is the Riemannian distance induced by the metric on \(\mathcal {P}_m\). \(g(x;\theta )\) is also called incomplete likelihood. In the typical mixture model approach, indeed, we consider some latent variables \(Z_i\), categorical variables over \(\{1,...,K\}\) with parameters \(\{\omega _k\}_{k=1}^K\), assuming \(X_i | Z_i=k \sim p(\cdot ,\psi _k)\). Thus, the complete likelihood is defined as \(f(x,z;\theta ) = \sum _{k=1}^K\omega _kp(x;\psi _k)\delta _{z,k}\), where \(\delta _{z,k} = 1\) if \(z=k\) and 0 otherwise. We deal here with the problem to estimate the model parameters, gathered in the vector \(\theta = [\omega _1,\overline{x}_1, \sigma _1, ..., \omega _K,\overline{x}_K, \sigma _K]\). Usually, given a set of N i.i.d. observations \(\chi = \{x_i\}_{i=1}^N\), we look for \(\widehat{\theta }_{N}^{MLE}\), that is the MLE of \(\theta \), i.e. the maximizer of the log-likelihood \(l(\theta ;\chi ) = \frac{1}{N}\sum _{i=1}^N\log \sum _{k=1}^K\omega _kp(x_i;\psi _k).\)

To obtain \(\widehat{\theta }_{N}^{MLE}\), EM or stochastic EM approaches are used, based on the complete dataset \(\chi _c = \{(x_i,z_i)\}_{i=1}^N\), with the unobserved variables \(Z_i\). In this case, average complete log-likelihood can be written:

$$\begin{aligned} l_c(\theta ;\chi _c) = \frac{1}{N}\sum _{i=1}^N\log \prod _{k=1}^K(\omega _kp(x_i;\psi _k))^{\delta _{z_i,k}} = \frac{1}{N}\sum _{i=1}^N\sum _{k=1}^K\delta _{z_i,k}\log (\omega _kp(x_i;\psi _k)). \end{aligned}$$
(1)

Here we consider a different situation, that is the dataset \(\chi \) is not available entirely, rather the observations are observed one at a time. In this situation online estimation algorithms are needed.

3 Online Estimation

In the Euclidean case, reference algorithms are the Titterington’s algorithm, introduced in [9], and the Cappé-Moulines’s algorithm presented in [10].

We focus here on Titterington’s approach. In classic EM algorithms, the Expectation step consists in computing \(Q(\theta ;\widehat{\theta }^{(r)},\chi ) = E_{\widehat{\theta }^{(r)}}[l_c(\theta ;\chi _c) | \chi ]\), and then, in the Maximization step, in maximizing Q over \(\theta \). These steps are performed iteratively and at each iteration r an estimate \(\widehat{\theta }^{(r)}\) of \(\theta \) is obtained exploiting the whole dataset. In the online framework, instead, the current estimate will be indicated by \(\widehat{\theta }^{(N)}\), since in this setting, once \(x_1,x_2,...,x_N\) are observed we want to update our estimate for a new observation \(x_{N+1}\). Titterington approach corresponds to the direct optimization of \(Q(\theta ;\widehat{\theta }^{(N)},\chi )\) using a Newton algorithm:

$$\begin{aligned} \widehat{\theta }^{(N+1)} = \widehat{\theta }^{(N)} + \gamma ^{(N+1)}I_c^{-1}(\widehat{\theta }^{(N)})u(x_{N+1};\widehat{\theta }^{(N)}), \end{aligned}$$
(2)

where \(\{\gamma ^{(N)}\}_N\) is a decreasing sequence, the Hessian of Q is approximated by the Fisher Information matrix \(I_c\) for the complete data \(I_c^{-1}(\widehat{\theta }^{(N)}) = -E_{\widehat{\theta }^{(N)}}[\frac{\log f(x,z;\theta )}{\partial \theta \partial \theta ^T}]\), and the score \(u(x_{N+1};\widehat{\theta }^{(N)})\) is defined as \(u(x_{N+1};\widehat{\theta }^{(N)}) = \nabla _{\widehat{\theta }^{(N)}}\log g(x_{N+1};\widehat{\theta }^{(N)}) = E_{\widehat{\theta }^{(N)}}[\nabla _{\widehat{\theta }^{(N)}}\log f(x_{N+1};\widehat{\theta }^{(N)}) | x_{N+1}]\) (where last equality is presented in [10]).

Geometrically speaking, Tittetington algorithm consists in modifying the current estimate \(\widehat{\theta }^{(N+1)}\) adding the term \(\xi ^{(N+1)} = \gamma ^{(N+1)}I_c^{-1}(\widehat{\theta }^{(N)})u(x_{N+1};\widehat{\theta }^{(N)})\). If we want to consider parameters belonging to Riemannian manifolds, we have to suitably modify the update rule. Furthermore, even in the classical framework, Titterington update does not necessarily constraint the estimates to be in the parameters space. For instance, the weights could be assume negative values. The approach we are going to introduce solves this problem, and furthermore is suitable for Riemannian Mixtures.

We modify the update rule, exploiting the Exponential map. That is:

$$\begin{aligned} \widehat{\theta }^{(N+1)} = \text {Exp}_{\widehat{\theta }^{(N)}}(\xi ^{(N+1)}), \end{aligned}$$
(3)

where our parameters become \(\theta _k = [s_k, \overline{x}_k, \eta _k]\). Specifically, \(s_k^2 = w_k \rightarrow s=[s_1,...,s_K] \in \mathbb {S}^{K-1}\) (i.e., the sphere), \(\overline{x}_k \in P(m)\) and \(\eta _k = -\frac{1}{2\sigma _k^2} < 0\).

Actually we are not forced to choose the exponential map, in the update formula (3), but we can consider any retraction operator. Thus, we can generalize (3) in \(\widehat{\theta }^{(N+1)} = \mathcal {R}_{\widehat{\theta }^{(N)}}(\xi ^{(N+1)}).\)

In order to develop a suitable update rule, we have to define \(I(\theta )\) and the score u() in the manifold, noting that every parameter belongs to a different manifold. Firstly we note that the Fisher Information matrix \(I(\theta )\) can be written as:

Now we can analyze separately the update rule for s, \(\overline{x}\), and \(\eta \). Since they belong to different manifold the exponential map (or the retraction) will be different, but the philosophy of the algorithm is still the same.

For the update of weights \(s_k\), the Riemannian manifold considered is the sphere \(\mathbb {S}^{K-1}\), and, given a point \(s \in \mathbb {S}^{K-1}\), the tangent space \(T_s\mathbb {S}^{K-1}\) is identified as \(T_s\mathbb {S}^{K-1} = \{\xi \in \mathbb {R}^K : \xi ^Ts = 0\}.\) We can write the complete log-likelihood only in terms of s: \(l(x,z;s) = \log f(x,z;s) = \sum _{k=1}^K\log s_k^2\delta _{z,k}.\) We start by evaluating I(s), that will be a \(K \times K\) matrix of the quadratic form

$$\begin{aligned} I_s(u,w) = E[\langle u,v(z,s)\rangle \langle v(z,s),w\rangle ], \end{aligned}$$
(4)

for uw elements of the tangent space in s, and v(zs) is the Riemannian gradient, defined as \( v(z,s) = \frac{\partial l}{\partial s} - \left( \frac{\partial l}{\partial s},s\right) s.\) In this case we obtain \(\frac{\partial l}{\partial s_k} = 2\frac{\delta _{z,k}}{s_k} \rightarrow v(z,s_k) = 2\left( \frac{\delta _{z,k}}{s_k} - s_k\right) .\) It is easy to see that the matrix of the quadratic form has elements

$$\begin{aligned} I_{kl}(s)&= E[v_k(z,s)v_l(z,s)] = E\left[ 4\left( \frac{\delta _{z,k}}{s_k} - s_k\right) \left( \frac{\delta _{z,l}}{s_l} - s_l\right) \right] \\&= E\left[ 4\left( \frac{\delta _{z,k}\delta _{z,l}}{s_ks_l} - \frac{s_l}{s_k}\delta _{z,k} - \frac{s_k}{s_l}\delta _{z,l} + s_ks_l\right) \right] = 4(\delta _{kl} - s_ks_l - s_ks_l + s_ks_l) = 4(\delta _{kl} - s_ks_l). \end{aligned}$$

Thus, the Fisher Information matrix I(s) applied to an element \(\xi \) of the tangent space results to be \(I(s)\xi = 4\xi \), hence I(s) corresponds to 4 times the identity matrix. Thus, if we consider update rule (3), we have \(\xi ^{(N+1)} = \frac{\gamma ^{(N+1)}}{4} u(x_{N+1};\widehat{\theta }^{(N)})\). We have to evaluate \(u(x_{N+1};\widehat{\theta }^{(N)})\). We proceed as follows:

$$u_k(x_{N+1};\widehat{\theta }^{(N)}) = E[v_k(z,s) | x_{N+1}] = E\left[ 2\left( \frac{\delta _{z,k}}{s_k} - s_k\right) | x_{N+1}\right] = 2\left( \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{s_k} - s_k\right) ,$$

where \(h_k(x_{N+1};\widehat{\theta }^{(N)}) \propto s_k^2p(x_{N+1};\widehat{\theta }_k^{(N)}).\) Thus we obtain

$$\begin{aligned} \widehat{s}^{(N+1)} = \text {Exp}_{\widehat{s}^{(N)}}\left( \frac{\gamma ^{(N+1)}}{2}\left( \frac{h_1(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{s}_1^{(N)}} - \widehat{s}_1^{(N)}, ..., \frac{h_K(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{s}_K^{(N)}} - \widehat{s}_K^{(N)}\right) \right) = \text {Exp}_{\widehat{s}^{(N)}}\left( \xi ^{(N+1)}\right) . \end{aligned}$$
(5)

Considering the classical exponential map on the sphere (i.e., the geodesic), the update rule (5) becomes

$$\begin{aligned} \widehat{s}_k^{(N+1)} = \widehat{s}_k^{(N)}\cos (\Vert \xi ^{(N+1)}\Vert ) + \frac{\frac{\gamma ^{(N+1)}}{2}\left( \frac{h_k}{\widehat{s}_k^{(N)}} - \widehat{s}_k^{(N)}\right) }{\Vert \xi ^{(N+1)}\Vert }\sin (\Vert \xi ^{(N+1)}\Vert ). \end{aligned}$$
(6)

Actually, as anticipated before, we are not forced to used the exponential map, but we can consider other retractions. In particular, on the sphere, we could consider the “projection” retraction \(\mathcal {R}_x(\xi ) = \frac{x + \xi }{\Vert x + \xi \Vert },\) deriving update rule accordingly.

For the update of barycenters \(\overline{x}_k\) we have, for every barycenter \(\overline{x}_k, k=1,...,K\), an element of \(\mathcal {P}_m\), the Riemannian manifold of \(m \times m\) SPD matrices. Thus, we derive the update rule for a single k.

First of all we have to derive expression (4). But this expression is true only for irreducible manifolds, as the sphere. In the case of \(\mathcal {P}_m\) we have to introduce some theoretical results. Let \(\mathcal {M}\) a symmetric space of negative curvature (like \(\mathcal {P}_m\)), it can be expressed as a product \(\mathcal {M} = \mathcal {M}_1 \times \cdots \times \mathcal {M}_R\), where each \(\mathcal {M}_r\) is an irreducible space [11]. Now let \(\overline{x}\) an element of \(\mathcal {M}\), and vw elements of the tangent space \(T_{\overline{x}}\mathcal {M}\). We can write \(\overline{x} = (\overline{x}_1,...,\overline{x}_R)\), \(v = (v_1,...,v_R)\) and \(w = (w_1,...,w_R).\) We can generalize (4) by the following expression:

$$\begin{aligned} I_{\overline{x}}(u,w) = \sum _{r=1}^{R}E[\langle u_r,v_r(\overline{x}_r)\rangle _{\overline{x}}\langle v_r(\overline{x}_r),w_r\rangle _{\overline{x}}], \end{aligned}$$
(7)

with \(v_r(\overline{x}_r) = \nabla _{\overline{x}}l(\overline{x})\) being the Riemannian score. In our case \(\mathcal {P}_m = \mathbb {R} \times \mathcal {SP}_m\), where \(\mathcal {SP}_m\) represents the manifold of SPD matrices with unitary determinant, while \(\mathbb {R}\) takes into account the part relative to the determinant. Thus, if \(x \in \mathcal {P}_m\), we can consider the isomorphism \(\phi (x) = (x_1,x_2)\) with \(x_1 = \log \det x \in \mathbb {R}\) and \(x_2 = e^{-x_1/m}x \in \mathcal {SP}_m\), \((\det x_2 = 1).\) The idea is to use the procedure adopted to derive \(\widehat{s}^{(N+1)}\), for each component of \(\widehat{\overline{x}}_k^{(N+1)}\). Specifically we proceed as follows:

  • we derive \(I(\overline{x}_k)\) through formula (7), with components \(I_r\).

  • we derive the Riemannian score \(u(x_{N+1};\widehat{\theta }^{(N)}) = E\left[ v(x_{N+1},z_{N+1}; \widehat{\overline{x}}_k^{(N)}\right. \), \(\left. \widehat{\sigma }_k^{(N)})|x_{N+1}\right] \), with components \(u_r\).

  • for each component \(r=1,2\) we evaluate \(\xi _r^{(N+1)} = \gamma ^{(N+1)}I_r^{-1}u_r\)

  • we update each component \(\left( \widehat{\overline{x}}_k^{(N+1)}\right) _r = \text {Exp}_{\left( \widehat{\overline{x}}_k^{(N)}\right) _r}\left( \xi _r^{(N+1)}\right) \) and we could use \(\phi ^{-1}(\cdot )\) to derive \(\widehat{\overline{x}}_k^{(N+1)}\) if needed.

We start deriving \(I(\overline{x}_k)\) for the complete model (see [12] for some derivations):

$$\begin{aligned} I_{\overline{x}_k}(u,w)&= E\left[ \langle u,v(x,z;\overline{x}_k,\sigma _{k})\rangle \langle v(x,z;\overline{x}_k,\sigma _{k}),w\rangle \right] = E\left[ \frac{\delta _{z,k}}{\sigma _k^4}\langle u,\text {Log}_{\overline{x}_k}x\rangle \langle \text {Log}_{\overline{x}_k}x,w\rangle \right] = \nonumber \\&\qquad = E\left[ \frac{\delta _{z,k}}{\sigma _k^4}I(u,w)\right] = \frac{\omega _{k}}{\sigma _k^4}\sum _{r=1}^2\frac{\psi '_r(\eta _k)}{dim(\mathcal {M}_r)}\langle u_r,w_r\rangle _{(\overline{x}_k)_r}, \end{aligned}$$
(8)

where \(\psi (\eta _k) = \log \zeta \) as a function of \(\eta _k = -\frac{1}{2\sigma _k^2}\), and we have the result introduced in [13] that says that if \(x \in \mathcal {M}\) is distributed with a Riemannian Gaussian distribution on \(\mathcal {M}\), \(x_r\) is distributed as a Riemannian Gaussian distribution on \(\mathcal {M}_r\) and \(\zeta (\sigma _k) = \prod _{r=1}^R\zeta _r(\sigma _k)\). In our case \(\zeta _1(\sigma _k) = \sqrt{2\pi m\sigma _k^2}\) (\(\psi _1(\eta _k) = \frac{1}{2}\log (-\frac{\pi m}{\eta _k})\)), and then we obtain \(\zeta _2(\sigma _k) = \frac{\zeta (\sigma _k)}{\zeta _1(\sigma _k)}\) easily, since \(\zeta (\sigma _k)\) has been derived in [6, 8]. From (8), we observe that for both components \(r=1,2\) the Fisher Information matrix is proportional to the identity matrix with a coefficient \(\frac{\omega _{k}}{\sigma _k^4}\frac{\psi '_r(\eta _k)}{dim(\mathcal {M}_r)}\).

We derive now the Riemannian score \(u(x_{N+1};\widehat{\theta }_k^{(N)}) \in T_{\widehat{\overline{x}}_k^{(N)}}P(m)\):

$$u(x_{N+1};\widehat{\theta }_k^{(N)}) = E\left[ v(x,z;\widehat{\overline{x}}_k^{(N)},\widehat{\sigma }_{k}^{(N)}) | x_{N+1}\right] = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\text {Log}_{\widehat{\overline{x}}_k^{(N)}}x_{N+1}.$$

In order to find \(u_1\) and \(u_2\) we have simply to apply the Logarithmic map of Riemannian manifold \(\mathcal {M}_1\) and \(\mathcal {M}_2\), which in our case are \(\mathbb {R}\) and \(\mathcal {SP}_m\), respectively, to the component 1 and 2 of \(x_{N+1}\) and \(\widehat{\overline{x}}_k^{(N)}\):

$$u_1 = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\left( (\widehat{\overline{x}}_k^{(N)})_1 - (x_{N+1})_1\right) $$
$$u_2 = \frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\sigma }_{k}^{2^{(N)}}}\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \log \left( \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}(x_{N+1})_2\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\right) \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2}$$

Expliciting \(\psi '_r(\eta _k)\), specifically \(\psi '_1(\eta _k) = -\frac{1}{2\eta _k} = \sigma _k^2\) and \(\psi '_2(\eta _k) = \psi '(\eta _k) + \frac{1}{2\eta _k}\), we can easily apply the Fisher Information matrix to \(u_r\). In this way we can derive \(\xi _1^{(N+1)} = \gamma ^{(N+1)}I_1^{-1}(\widehat{\theta }^{(N)})u_1\) and \(\xi _2^{(N+1)} = \gamma ^{(N+1)}I_2^{-1}(\widehat{\theta }^{(N)})u_2\). We are now able to obtain the update rules through the respective exponential maps:

$$\begin{aligned} \left( \widehat{\overline{x}}_k^{(N+1)}\right) _1 = \left( \widehat{\overline{x}}_k^{(N)}\right) _1 - \xi _1^{(N+1)} \end{aligned}$$
(9)
$$\begin{aligned} \left( \widehat{\overline{x}}_k^{(N+1)}\right) _2 = \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \exp \left( \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\xi _2^{(N+1)}\left( \widehat{\overline{x}}_k^{(N)}\right) _2^{-1/2}\right) \left( \widehat{\overline{x}}_k^{(N)}\right) _2^{1/2} \end{aligned}$$
(10)

For the update of dispersion parameters \(\sigma _k\), we consider \(\eta _k = -\frac{1}{2\sigma _k^2}\). Thus, we consider a real parameter, and then our calculus will be done in the classical Euclidean framework. First of all we have \(l(x,z;\eta _k) = \log f(x,z;\eta _k) = \sum _{k=1}^K\delta _{z,k}\left( -\psi (\eta _k) + \eta _kd_R^2(x,\overline{x}_k)\right) \). Thus, we can derive \(v(x,z;\eta _k) = \frac{\partial l}{\partial \eta _k} = \delta _{z,k}(-\psi '(\eta _k) + d_R^2(x,\overline{x}_k)).\) Knowing that \(I(\eta _k) = \omega _k\psi ''(\eta _k)\), we can evaluate the score:

$$\begin{aligned} u(x_{N+1};\widehat{\theta }^{(N)}) = E[v(x,z;\eta _k) | x_{N+1}] = h_k(x_{N+1};\widehat{\theta }^{(N)})\left( d_R^2\left( x_{N+1},\widehat{\overline{x}}_k^{(N)}\right) - \psi '(\widehat{\eta }_k^{(N)})\right) . \end{aligned}$$
(11)

Hence we can obtain the updated formula for the dispersion parameter

$$\begin{aligned} \widehat{\eta }_k^{(N+1)} = \widehat{\eta }_k^{(N)} + \gamma ^{(N+1)}\frac{h_k(x_{N+1};\widehat{\theta }^{(N)})}{\widehat{\omega }_k^{(N)}\psi ''(\widehat{\eta }_k^{(N)})}\left( d_R^2\left( x_{N+1},\widehat{\overline{x}}_k^{(N)}\right) - \psi '(\widehat{\eta }_k^{(N)})\right) , \end{aligned}$$
(12)

and, obviously \(\left. \widehat{\sigma }_k^2\right. ^{(N+1)} = -\frac{1}{2\widehat{\eta }_k^{(N+1)}}.\)

4 Simulations

We consider here two simulation frameworks to test the algorithm described in this paper.

The first framework corresponds to the easiest case. Indeed we consider only one mixture component (i.e., \(K=1\)). Thus, this corresponds to a simple online mean and dispersion parameter estimate for a Riemannian Gaussian sample. We consider matrices in \(\mathcal {P}_3\) and we analyze three different simulations corresponding to three different value of the barycenter \(\overline{x}_1\):

The value of dispersion parameter \(\sigma \) is taken equal to 0.1 for the three simulations. We analyze different initial estimates \(\widehat{\theta }_{in}\), closer to the true values at the beginning, and further at the end. We focus only on the barycenter, while the initial estimate for \(\sigma \) corresponds to the true value. We consider two different initial values for each simulation. Specifically for case (a), \(d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)})\) is lower, varying between 0.11 and 0.14. For case (b) it is greater, varying between 1.03 and 1.16. For every simulation we generate \(N_{rep}=100\) samples, each one of \(N=100\) observations. Thus at the end we obtain \(N_{rep}\) different estimates \((\widehat{\overline{x}}_{1r}, \widehat{\sigma }_{r})\) for every simulation and we can evaluate the mean m and standard deviation s of the error, where the error is measured as the Riemannian distance between \(\widehat{\overline{x}}_{1r}\) and \(\overline{x}_1\) for the barycenter, and as \(|\sigma - \widehat{\sigma }|\) for the dispersion parameter. The results are summarized in Table 1.

Table 1. Mean and standard deviation of the error for the first framework

In the second framework we consider the mixture case, in particular \(K=2\). The true weight are 0.4 and 0.6, while \(\sigma _1 = \sigma _2 = 0.1\). The true barycenters are:

We make the initial estimates varying from the true barycenters to some SPD different from the true ones. In particular we analyze three cases. Case (a), where \(d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0\); case (b), where \(d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = 0.2\) and \(d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0.26\); case (c), where \(d_R(\overline{x}_1,\widehat{\overline{x}}_1^{(0)}) = d_R(\overline{x}_2,\widehat{\overline{x}}_2^{(0)}) = 0.99\). The results obtained are shown in Table 2. In both frameworks it is clear that we can obtain very good results when starting close to the real parameter values, while the goodness of the estimates becomes weaker as the starting points are further from real values.

Table 2. Mean and standard deviation of the error for the second framework

5 Conclusion

This paper has addressed the problem of the online estimate of mixture model parameters in the Riemannian framework. In particular we dealt with the case of mixtures of Gaussian distributions in the Riemannian manifold of SPD matrices. Starting from a classical approach proposed by Titterington for the Euclidean case, we extend the algorithm to the Riemannian case. The key point was that to look at the innovation part in the step-wise algorithm as an exponential map, or a retraction, in the manifold. Furthermore, an important contribution was that to consider Information Fisher matrix in the Riemannian manifold, in order to implement the Newton algorithm. Finally, we presented some first simulations to validate the proposed method. We can state that, when the starting point of the algorithm is close to the real parameters, we are able to estimate the parameters very accurately. The simulation results suggested us the next future work needed, that is to investigate on the starting point influence in the algorithm, to find some ways to improve convergence towords the good optimum. Another perspective is to apply this algorithm on some real dataset where online estimation is needed.