1 Introduction

We consider the problem of computing the (global) Riemannian center of mass of the data set on Hadamard manifolds. This notion and its variants have been appeared with many different names in the literature, e.g., Riemannian mean or average and Fréchet or Karcher mean, but we prefer (cf. [1]) the former denomination. This concept was proposed by Grove and Karcher [2], and since then, it has been extensively studied in pure mathematics as well as applied fields such as computer vision, statistical analysis of shapes, medical imaging, sensor networks and many other general data analysis applications. Several works have been devoted to study this problem on the context of positive definite symmetric matrices, e.g., [3,4,5,6,7]. This manifold plays a key role in diffusion tensor imaging, as explained in [8, 9], and it always admits an invariant Riemannian metric, so that it becomes a Hadamard manifold (a complete, simply connected Riemannian manifold of nonpositive sectional curvature); see [10, Theorem 1.2]. For an approach to this problem in a more general setting, namely Finsler manifolds, see Kristály et al. [11]. Bačák [12] studied convergence of the proximal point method for computing center of mass in a geodesic metric space of nonpositive curvature, which includes Hadamard manifolds. Afsari et al. [13] investigated the applicability of a gradient descent algorithm, which is the most popular and easiest version of gradient descent method, for finding a center of mass in the particular case where the objective function is twice-continuously differentiable (in a small region at least). As observed by the authors, the main challenge is that the underlying cost function is usually neither globally differentiable nor globally convex on the general manifold.

Computing the Riemannian center of mass is just one of the many optimization problems arising in various applications, which are posed on manifolds and require a manifold structure (not necessarily with linear structure). Therefore, a large amount of classical notions and methods have been extended for problems formulated on Riemannian manifolds as well as on other spaces with nonlinear structures, e.g., [12, 14,15,16,17,18,19,20,21]. As far as we know, one first propose dealing with the steepest descent method for continuously differentiable functions in the Riemannian setting was presented by Luemberger [22] and later by Gabay [23] both in the particular case where M is the inverse image of regular value. Udriste [24], Smith [25], Rapcsák [26] also studied that method in the case that the Riemannian manifold is a any complete Riemannian manifold and partial convergence results were obtained. A result of full convergence, using Armijo’s rule, was obtained by Cruz Neto et al. [27], in the convex case by assuming that the Riemannian manifold has a nonnegative sectional curvature. The influence of the sign of the sectional curvature in the convergence analysis presented in [27] combined with the assumption of convexity of the objective function is directly related to the classic demonstration script adopted in the linear scenario, namely based on the quasi-Fejér convergence to the minimizers set, and these manifolds have geometric properties which are favorable to the characterization of the quasi-Fejér convergence of any sequence generated by the method.

Recent research papers have provided an unified framework for the convergence analysis of classical descent methods when the real-valued objective function is not necessarily a convex function, but that satisfies the well-known Kurdyka–Łojasiewicz inequality; see, for instance, Absil et al. [28], Attouch et al. [29] and Lageman [30, Theorem 2.1.22]. This inequality was first introduced by Lojasiewicz [31], for real analytic functions, and extended for differentiable definable functions in an o-minimal structure by Kurdyka [32]. For extensions of Kurdyka–Łojasiewicz inequality to subanalytic nonsmooth functions (defined in Euclidean spaces), see, for example, Bolte et al. [33], Attouch and Bolte [34]. A more general extension, again in Euclidean context, was developed by Bolte et al. [35] for Clarke’s subdifferentiable of a lower semicontinuous function definable in an o-minimal structure. Lageman [30] extended the Kurdyka–Łojasiewicz inequality for analytic manifolds and differentiable \(\mathcal {C}\)-functions in an analytic–geometric category, and, in particular, a full convergence result is shown for discrete gradient-like algorithm on a general Riemannian manifold when the differentiable \(\mathcal {C}\)-functions satisfy a certain descent condition, namely angle and Wolfe–Powell conditions; see [30, Example of Theorem 2.1.22, page 96]). It is important to note that Kurdyka et al. [36] had already established an extension of the Kurdyka–Łojasiewicz inequality for analytic manifolds and analytic functions in order to solve R. Thom’s conjecture. Bento et al. in [37] presented an abstract convergence result for inexact descent methods in the Riemannian context and, in particular, extended the applicability of the gradient (with Armijo’s rule) method to solving any problem which may be formulated as of minimizing a definable function (e.g., analytic) restricted, for example, to Hadamard manifolds; see also [38].

Following [12], who studied convergence of the proximal point method for computing center of mass on Hadamard manifolds, in this present paper it is studied the convergence of the steepest descent method with respect to the same problems on Hadamard manifolds. Taking into account the results presented in [13], it is worth mentioning that the main novelty of this work is to study convergence of the steepest descent method to a specific case of the problem (\(L^1\) center of mass), which is also frequently used and not considered in [13].

The paper is organized as follows. In Sect. 2, we take a brief look at basic facts and notations regarding Riemannian geometry and Kurdyka–Łojasiewicz property and we present the steepest descent method. In Sect. 3, we use the steepest descent method for computing Riemannian center of mass of a set of data points on Hadamard manifolds. In Sect. 4, the robustness and efficiency of the method are illustrated with some numerical experiments in the context of positive definite symmetric matrices using the fixed stepsize and the Armijo’s rule.

2 Preliminaries

In this paper, M is assumed to be finite-dimensional Hadamard manifold. The notations, results and concepts of Riemannian geometry used throughout the paper can be found in Udriste [24] and Ferreira and Oliveira [14].

Let \(f:M\rightarrow \mathbb {R}\) be a differentiable function. The function f is said to have the Kurdyka–Łojasiewicz property at a point \(\bar{x}\in M\) iff there exists \(\eta \in ]0,+\infty ]\), a neighborhood U of \(\bar{x}\) and a continuous concave function \(\varphi :[0,\eta [\rightarrow \mathbb {R}_+\) (called desingularizing function) such that:

  1. (i)

    \(\varphi (0)=0\), \(\varphi \in C^1(]0,\eta [)\) and, for all \(s\in ]0,\eta [\), \(\varphi '(s)>0;\)

  2. (ii)

    For all \(x\in U\cap [f(\bar{x})<f<f(\bar{x})+\eta ]\), the Kurdyka–Łojasiewicz inequality holds

    $$\begin{aligned} \varphi '(f(x)-f(\bar{x}))||\hbox {grad}\, f(x)||\ge 1, \end{aligned}$$
    (1)

    where \([f(\bar{x})<f<f(\bar{x})+\eta ]:=\{x\in M: f(\bar{x})< f(x) < f(\bar{x})+\eta \}\).

If \(f(\bar{x})=0\), then (1) can be rewritten as

$$\begin{aligned} ||\nabla (\varphi \circ f)(x)||\ge 1. \end{aligned}$$

One can easily check that the Kurdyka–Łojasiewicz property is automatically satisfied at any noncritical point; see [39, Lemma 3.6]. The most fundamental works on this subject are of course due to Łojasiewicz [31] and Kurdyka [32].

Remark 2.1

A \(C^2\) function is said to be a Morse function if at each critical point \(\bar{x}\) of f the Hessian of f at \(\bar{x}\), denoted by \(\hbox {Hess}\, f(\bar{x})\), has all its eigenvalues different from zero. The class of Morse functions form a dense and open set in the space of \(C^2\)-functions; see Hirsh [40, Theorem 1.2]. It is known that a Morse function satisfies the Kurdyka–Łojasiewicz property at every point with desingularizing function \(\varphi (t)=c\sqrt{t}\), for some \(c>0\); see [39, Theorem 3.8]. In particular, it follows from [24, Theorem 1.2] that every \(C^2\) strongly convex function satisfies the Kurdyka–Łojasiewicz property at every point. A very rich class of functions satisfying the Kurdyka–Łojasiewicz property are the analytic functions defined on analytic manifolds; see [36].

Next, we recall the steepest descent method for solving a minimization problem

$$\begin{aligned} \min _{x\in M}f(x), \end{aligned}$$
(2)

where \(f:M\rightarrow \mathbb {R}\) is continuously differentiable with Lipschitz gradient. The steepest descent method generates a sequence as follows:

Steepest Descent Method (SDM)

Step 1.:

Choose \(x^0\in M\).

Step 2.:

Given \(x^k\), if \(x^k\) is a critical point of f, then set \(x^{k+p}=x^k\) for all \(p\in \mathbb {N}\). Otherwise, compute \(t_k >0\) some real stepsize and take as the next iterate \(x^{k+1}\in M\) such that

$$\begin{aligned} x^{k+1}=\exp _{x^k}(-t_k\hbox {grad} f(x^k)). \end{aligned}$$
(3)

Note that from (3), we have

$$\begin{aligned} t_k||\hbox {grad} f(x^k)||=d(x^{k+1},x^{k}). \end{aligned}$$
(4)

Thus, if \(x^{k+1}=x^k\), then one can easily check that \(x^k\) is a critical point of f taking into account that \(t_k>0\).

The next convergence results of the steepest descent method, in the context of Kurdyka–Łojasiewicz inequality in Hadamard manifolds, are proved in [37, Theorem 7.1]. In particular setting of Łojasiewicz inequality, it can be found in [38, Theorem 2.3]. In [37], they consider two possibilities of the stepsize rule:

  • (Armijo’s rule) Let \(\{t_k\}\) be a sequence obtained by

    $$\begin{aligned}&t_k := \max \left\{ 2^{-j} : j\in {\mathbb N},\, f \left( \exp _{x^{k}}(-2^{-j}\hbox {grad} f(x^k)\right) \le f(x^{k}) \right. \\&\left. \quad \qquad -\,\alpha 2^{-j}\,||\hbox {grad} f(x^{k})||^2 \right\} , \end{aligned}$$

    with \(\alpha \in ]0,1[\).

  • (Fixed stepsize rule) Given \(\delta _1,\delta _2>0\) such that \(L\delta _1+\delta _2<1\), where L is the Lipschitz constant associated with the gradient map of f. Take \(\{t_k\}\) such that

    $$\begin{aligned} t_k\in \left]\delta _1,\frac{2}{L}(1-\delta _2)\right[, \quad \forall k\ge 0. \end{aligned}$$

Theorem 2.1

Assume that the sequence \(\{x^k\}\) generated by SDM has an accumulation point \(x^*\in M\) and f satisfies the Kurdyka–Łojasiewicz inequality at \(x^*\). Then, \(\lim _{k\rightarrow +\infty }f(x^k)=f(x^*)\) and \(\{x^k\}\) converges to \(x^*\) which is a critical point of f.

Remark 2.2

In the previous result, M is supposed to be a Hadamard manifold. However, it also works on any set where uniqueness of minimizing geodesics holds, for example in convex balls (without any assumption on the signal of the curvature); see [37, Remark 7.2]. Therefore, our results presented on the sequel also hold in this context. We choose to handle with Hadamard manifolds just for the sake of simplicity.

3 Computing Riemannian \(L^p\) Center of Mass on Hadamard Manifolds

Consider the problem of computing (global) Riemannian \(L^p\) center of mass of the data set \(\{a_i\}_{i=1}^{n}\subset M\) on a Hadamard manifold with respect to weights \(0\le w_i \le 1\), such that \(\sum _{i=1}^{n}w_i=1\). The Riemannian \(L^p\) center of mass is defined as the solution set of the following problem:

$$\begin{aligned} \min _{x\in M}f_p(x):=\frac{1}{p}\sum _{i=1}^{n}w_id^p(x,a_i), \end{aligned}$$
(5)

for \(1\le p < \infty \). If \(p=\infty \), the center of mass is defined as the minimizers of \(\max _i d(x,a_i)\) in M. As a convention, when referring to the center of mass of some data points, we usually do not refer to explicit weights unless needed. In this paper, our focus is limited to the cases \(p=1\) and \(p=2\), which are the most commonly used cases.

As mentioned before, the problem of computing the Riemannian center of mass has been extensively studied in both theory and applications. In [13], the authors investigate the convergence of constant stepsize gradient descent algorithms for computing \(L^2\) center of mass in Riemannian manifolds with curvature bounded from above and below. In [3, 4] is studied incremental gradient methods for computing \(L^2\) center of mass in the context of positive definite symmetric matrices, which is a particular instance of Hadamard manifolds. Therefore, in some sense, our results generalize the previously mentioned works. The main novelty of this paper, in relation to [13], is the extension of the convergence analysis of the steepest descent method for computing \(L^1\) center of mass. In this case, the main challenge is that the underlying cost function is not globally differentiable. Recently, Bačák [12] performed the proximal point method for computing \(L^1\) and \(L^2\) center of mass in a more general nonlinear setting in which a Hadamard manifold is a particular instance, namely CAT(0) spaces. We provide an alternative way to solve this problem by using the steepest descent method which is the most popular and easiest algorithm for solving a differentiable minimization problem.

One can check that

$$\begin{aligned} M\ni x\longmapsto \hbox {grad}\, f_p(x)=-\sum _{i=1}^{n}w_id^{p-2}(x,a_i)\exp _{x}^{-1}a_i, \end{aligned}$$
(6)

having in mind that, in the particular case where \(p<2\), x should be different from any of the data points.

Next, we recall a useful estimate about the Hessian of the Riemannian distance function. Assume that \(x,y\in M\) (y distinct from x) and \(t\mapsto \gamma (t)\) with \(\gamma (0)=x\) is a unit speed geodesic making, at x, an angle \(\beta \) with the (minimal) geodesic from y to x. It can be proved that (see, e.g., [41, pp. 152–154])

$$\begin{aligned} \frac{\sin ^2\beta }{d(x,y)}\le \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2}d(\gamma (t),y)|_{t=0}. \end{aligned}$$
(7)

Recall that the points \(a_1,\ldots ,a_n\) are said to be collinear if they reside on the same geodesic, i.e., there exist \(y\in M\), \(v\in T_yM\) and \(t_i \in \mathbb {R}\), \(i=1,\ldots , n\), such that \(a_i=\exp _y t_i v\), for each \(i=1,\ldots , n\).

Proposition 3.1

Let \(\{a_i\}_{i=1}^{n}\subset M\) be the data set and let \(\gamma \) be a unit speed geodesic such that \(\gamma (0)=x\), where \(x\not =a_i\), for \(i=1,\ldots ,n\). Then, there exists a constant \(\alpha \ge 0\) such that

$$\begin{aligned} \alpha \le \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_1\circ \gamma )(t)|_{t=0}. \end{aligned}$$

Furthermore, if the points \(a_1,\ldots ,a_n\) are not collinear, then \(\alpha >0\).

Proof

Let \(\beta _i\) be the angle which the unit speed geodesic \(\gamma \) such that \(\gamma (0)=x\) makes with the geodesic \(\xi _i\) connecting the points x and \(a_i\). Applying (7) for \(y=a_i\), for each \(i=1,\ldots ,n\) and summing up these inequalities, we obtain

$$\begin{aligned} \sum _{i=1}^{n} \frac{\sin ^2\beta _i}{d(x,a_i)}\le \sum _{i=1}^{n}\frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2}d(\gamma (t),a_i)|_{t=0}. \end{aligned}$$
(8)

Denote by

$$\begin{aligned} \alpha =\frac{1}{\max _{i}d(x,a_i)}\sum _{i=1}^{n} \sin ^2\beta _i\ge 0. \end{aligned}$$

Thus, from (8), we obtain

$$\begin{aligned} \alpha \le \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_1\circ \gamma )(t)|_{t=0}. \end{aligned}$$

Now, if the points \(a_1,\ldots ,a_n\) are not collinear, then there exists at least one point, namely \(a_{i_0}\), with \(i_0\in \{1,\ldots ,n\}\) such that \(\sin ^2 \beta _{i_0} >0\) which implies that \(\alpha >0\). \(\square \)

Before analyzing the convergence of an algorithm for computing Riemannian \(L^1\) center of mass, we state and prove some preliminary results.

Proposition 3.2

The following statements hold:

  1. (a)

    The function \(f_1(x)=\sum _{i=1}^{n}w_id(x,a_i)\) is convex;

  2. (b)

    The problem (5), for \(p=1\), always has a solution. Furthermore, if the points \(a_1,\ldots ,a_n\) are not collinear, then the solution is unique;

  3. (c)

    Let \(i_0\in \{1,\ldots ,n\}\) be an index such that \(f_1(a_{i_0})=\min _{i=1,\ldots ,n}f_1(a_i)\). Then, \(a_{i_0}\) is a minimizer of \(f_1\) on M if and only if

    $$\begin{aligned} \left| \left| \sum _{i=1 , i\not ={i_0}}^{n}w_i\frac{\exp _{a_{i_0}}^{-1}a_{i}}{d(a_i,a_{i_0})}\right| \right| \le w_{i_0}. \end{aligned}$$

Proof

  1. (a)

    The proof follows from the convexity of the distance function and properties of convex functions; see for instance [17, Lemma 5.2] and [24, Theorem 3.3].

  2. (b)

    Since \(f_1\) is continuous and coercive function which is bounded from below, \(f_1\) has a minimizer. From the second part of Proposition 3.1, we have that the eigenvalues of \(\hbox {Hess}\, f_1(x)\) are positive which implies that \(f_1\) is strictly convex. Thus, \(f_1\) has a unique minimizer.

  3. (c)

    Since \(f_1\) is convex, we have that \(a_{i_0}\) is a minimizer of \(f_1\) if and only if \(0\in \partial f_1(a_{i_0})\). On the other hand, combining (6) with [17, Theorem 5.3], we obtain

$$\begin{aligned} \partial f_1(a_{i_0})=\sum _{i=1 , i\not ={i_0}}^{n}w_i\frac{\exp _{a_{i_0}}^{-1}a_{i}}{d(a_i,a_{i_0})}+w_{i_0}\mathbb {B}_{a_{i_0}}, \end{aligned}$$

where \(\mathbb {B}_{a_{i_0}}\) denotes the unit ball with center in \(a_{i_0}\). Thus, \(a_{i_0}\) is a minimizer of \(f_1\) if and only if there exists \(\theta _{i_0}\in \mathbb {B}_{a_{i_0}}\) such that

$$\begin{aligned} \sum _{i=1,i\not ={i_0}}^{n}w_i\frac{\exp _{a_{i_0}}^{-1}a_{i}}{d(a_i,a_{i_0})}=-w_{i_0}\theta _{i_0}, \end{aligned}$$

and the assertion follows taking the norm in both sides of the last equation having in mind that \(||\theta _{i_0}||\le 1\).\(\square \)

Remark 3.1

We directly conclude from Proposition 3.1 that if in the data set \(\{a_i\}_{i=1}^{n}\) the points are not collinear, then the Hessian of the function \(f_1(x)=\sum _{i=1}^{n}w_id(x,a_i)\) has all its eigenvalues positive which means that \(f_1\) is a Morse function. Thus, from [39, Theorem 3.8], we have that \(f_1\) satisfies the Kurdyka–Łojasiewicz property at every point.

Proposition 3.3

Let C be a compact set such that \(a_i\notin C\), for each \(i=1,\ldots ,n\). Then, the vector field \(\hbox {grad}\, f_1:M\rightarrow TM\) is Lipschitz continuous on C.

Proof

From the fact that C is compact, we have the manifold M which has the sectional curvature bounded from above and below on C denoted by \(\varDelta \) and \(\delta \), respectively. Take any \(x\in C\), whence \(x\not =a_i\) for all \(i=1,\ldots ,n\), and let \(\gamma (t)\) be a unit speed geodesic, with \(\gamma (0)=x\), making at x an angle \(\beta _i\) with the minimal geodesic from \(a_i\) to x. From Proposition 3.1, there exists a constant \(\alpha \ge 0\) such that

$$\begin{aligned} \alpha \le \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_1\circ \gamma )(t)|_{t=0}. \end{aligned}$$

On the other hand, since M is a Hadamard manifold, we have that \(\delta \le 0\). It follows from [41, pp. 152–154] that

$$\begin{aligned} \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_1\circ \gamma )(t)|_{t=0}\le \sum _{i=1}^{n} ct_{\delta }(d(x,a_i))\sin ^2\beta _i, \end{aligned}$$
(9)

where \(ct_{\delta }(l)=\displaystyle \frac{1}{l}\), if \(\delta =0\), and \(ct_{\delta }(l)=\sqrt{|\delta |}\coth (\sqrt{|\delta |}l)\), if \(\delta <0\). Since \(a_i\notin C\), for each \(i=1,\ldots ,n\), we have that \(d(x,a_i)\) is positive and from the fact that C is compact we obtain that the right-hand side of (9) has an upper bound. Thus, the Hessian of \(f_1\) at every \(x\in C\) is bounded which means that \(\hbox {grad}\, f_1\) is Lipschitz continuous on C.\(\square \)

Now, we present a version of the steepest descent method for computing Riemannian \(L^1\) center of mass of the data set \(\{a_i\}_{i=1}^{n}\subset M\). To this end, we denote the index \(q\in \{1,\ldots ,n\}\) such that \(f_1(a_q)=\min _{i=1,\ldots ,n}f_1(a_i)\). We also assume that \(a_q\) is not a solution (Riemannian \(L^1\) center of mass). Next, we consider \(\{x^k\}\) a sequence generated by Algorithm 1 for computing Riemannian \(L^1\) center of mass, i.e., taking \(f=f_1\). Note that for any \(x^0\in M\), we have that \(x^k\in L_{f_1}(x^0)\), for all \(k \in \mathbb {N}\), and \(L_{f_1}(x^0)\) is a nonempty and compact set. Then, we consider a direction \(d_q\) and \(t_q\) small enough such that \(f_1(\exp _{a_q}t_q d_q)<f_1(a_q)\). Setting \(x^0:=\exp _{a_q}t_q d_q\), we have that \(a_i\notin L_{f_1}(x^0)\), for each \(i=1,\ldots ,n\).

Theorem 3.1

The sequence \(\{x^k\}\) converges to the unique Riemannian \(L^1\) center of mass of the data set \(\{a_i\}_{i=1}^{n}\) as long as the points \(a_i\), for \(i=1,\ldots ,n\), are not collinear.

Proof

Since \(x^k\in L_{f_1}(x^0)\), for all \(k \in \mathbb {N}\) and \(L_{f_1}(x^0)\) is a compact set, we have that \(\{x^k\}\) is bounded. Hence, let \(\hat{x}\) be a cluster point of \(\{x^k\}\). Since the points \(a_i\), for \(i=1,\ldots ,n\), are not collinear, from Remark 3.1 we have that \(f_1(x)=\sum _{i=1}^{n}w_id(x,a_i)\) satisfies the Kurdyka–Łojasiewicz inequality at \(\hat{x}\). Thus, from Theorem 2.1, we obtain that \(\{x^k\}\) converges to \(\hat{x}\) which is a critical point of \(f_1\). From Proposition 3.2 (a), we have that \(f_1\) is convex which means that \(\hat{x}\) is a minimizer of \(f_1\). From Proposition 3.2 (c), such a minimizer is unique. This completes the proof.\(\square \)

Before analyzing the convergence of an algorithm for computing Riemannian \(L^2\) center of mass, we state and prove some preliminary results.

Proposition 3.4

The following statements hold:

  1. (a)

    The function \(f_2(x)=\frac{1}{2}\sum _{i=1}^{n}w_id^2(x,a_i)\) is strictly convex and continuously differentiable with its gradient Lipschitz on compact sets;

  2. (b)

    The problem of computing Riemannian \(L^2\) center of mass always has a unique solution.

Proof

  1. (a)

    The strictly convexity and continuously differentiability of \(f_2\) can be found for instance in [24, page 111]. Let \(C\subset M\) be a compact set. Then, M has the sectional curvature bounded from above and below on C denoted by \(\varDelta \) and \(\delta \), respectively. Take any \(x\in C\) and let \(\gamma (t)\) be a unit speed geodesic, with \(\gamma (0)=x\), making at x an angle \(\beta _i\) with the minimal geodesic from \(a_i\) to x. One can verify from [41, pp. 152–154] that

    $$\begin{aligned} \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_2\circ \gamma )(t)|_{t=0}\le \sum _{i=1}^{n} c_{\delta }(d(x,a_i))\sin ^2\beta _i, \end{aligned}$$
    (10)

    where \(c_{\delta }(l)=1\), if \(\delta =0\), and \(c_{\delta }(l)=\sqrt{|\delta |}\coth (\sqrt{|\delta |}l)\), if \(\delta <0\). Therefore, an upper bound to (10) follows from the fact that C is compact. From [42, Corollary 3.1], we have that \(d^2(\cdot ,z)\) is a strongly convex function, and hence

    $$\begin{aligned} 0 \le \frac{{\mathrm{d}}^2}{{\mathrm{d}}t^2} (f_2\circ \gamma )(t)|_{t=0}. \end{aligned}$$

    Thus, the Hessian of \(f_2\) at every \(x\in C\) is bounded which means that \(\hbox {grad}\, f_2\) is Lipschitz continuous on C.

  2. (b)

    Since \(f_2\) is continuous and coercive, \(f_2\) has a minimizer. The uniqueness follows from the strictly convexity of \(f_2\). \(\square \)

Remark 3.2

As already mentioned in Introduction, in [11] the authors dealt with Problem 5. It is worth mentioning that, among other things, the authors in particular studied the existence of solutions to the \(L^1\) case, investigated in Proposition 3.2 item b), and \(L^2\), dealt with in Proposition 3.4 item b). We chose to keep a proof for each of these items just for the sake of completeness.

It follows from [42, Corollary 3.1] that \(d^2(\cdot ,z)\) is a strongly convex function. Since the finite sum of strongly convex functions is still strongly convex, from Remark 2.1 one has the following result:

Proposition 3.5

The function \(f_2(x)=\frac{1}{2}\sum _{i=1}^{n}w_id^2(x,a_i)\) satisfies the Kurdyka–Łojasiewicz inequality at every point of M.

Next, we consider \(\{x^k\}\) the sequence generated by SDM for computing the Riemannian \(L^2\) center of mass, i.e., taking \(f=f_2\). Similar to the case \(p=1\), we have that for any \(x^0\in M\), \(x^k\in L_{f_2}(x^0)\), for all \(k \in \mathbb {N}\), and \(L_{f_2}(x^0)\) is a nonempty and compact set. Then, we consider a direction \(d_q\) and \(t_q\) small enough such that \(f_2(\exp _{a_q}t_q d_q)<f_2(a_q)\), where q denotes the index in \(\{1,\ldots ,n\}\) such that \(f_2(a_q)=\min _{i=1,\ldots ,n}f_2(a_i)\). Setting \(x^0:=\exp _{a_q}t_q d_q\), we have that \(a_i\notin L_{f_2}(x^0)\), for each \(i=1,\ldots ,n\).

Theorem 3.2

The sequence \(\{x^k\}\) converges to the unique Riemannian \(L^2\) center of mass of the data set \(\{a_i\}_{i=1}^{n}\).

Proof

Since \(x^k\in L_{f_2}(x^0)\), for all \(k \in \mathbb {N}\) and \(L_{f_2}(x^0)\) is a compact set, we have that \(\{x^k\}\) is bounded. Hence, let \(\hat{x}\) be a cluster point of \(\{x^k\}\). From Proposition 3.5, we have that \(f_2(x)=\frac{1}{2}\sum _{i=1}^{n}w_id^2(x,a_i)\) satisfies the Kurdyka–Łojasiewicz inequality at \(\hat{x}\). Thus, from Theorem 2.1, we obtain that \(\{x^k\}\) converges to \(\hat{x}\) which is a critical point of \(f_2\). Proposition 3.4 (a) implies that \(f_2\) is strictly convex which means that \(\hat{x}\) is the unique minimizer of f and the proof is completed. \(\square \)

4 Numerical Experiments

This section is devoted to some numerical experiments. We coded our simulations in MATLAB on a machine AMD Athlon(tm) X2 with a Dual Core Processor and 1.20 GHz CPU.

Let \({\mathbb S}^{m}\) be the set of the symmetric matrices, \({\mathbb S}^{m}_{+}\) be the cone of the symmetric positive semi-definite matrices and \({\mathbb S}^{m}_{++}\) be the cone of the symmetric positive definite matrices both \(m\times m\). For \(X, \, Y\in {\mathbb S}^{m}_{+}\), \(Y\succeq X\) (or \(X \preceq Y\)) means that \(Y-X \in {\mathbb S}^{m}_{+}\) and \(Y\succ X\) (or \(X \prec Y\)) means that \(Y-X \in {\mathbb S}^{m}_{++}\). Let \(M:=({\mathbb S}^m_{++}, \langle \, , \, \rangle )\) be the Riemannian manifold endowed with the Riemannian metric induced by the Euclidean Hessian of \(\varPsi (X)=-\ln \det X\),

$$\begin{aligned} \langle U,V \rangle =\hbox {tr}\, (V\varPsi ''(X)U)=\hbox {tr}\, (VX^{-1}UX^{-1}),\quad X\in M, \quad U,V\in T_XM, \end{aligned}$$
(11)

where \(\hbox {tr}\,(A)\) denotes the trace of matrix \(A\in {\mathbb S}^m\) and \(T_XM\approx ~\mathbb {S}^m\), with the corresponding norm denoted by \(\Vert \; .\; \Vert \); see Rothaus [43]. In this case, for any \(X,Y\in M\) the unique geodesic joining those two points is given by:

$$\begin{aligned} \gamma (t)=X^{1/2}\left( X^{-1/2}YX^{-1/2}\right) ^tX^{1/2},\qquad t\in [0,1]; \end{aligned}$$

see, for instance, [44]. More precisely, M is a Hadamard manifold; see, for example, [10, Theorem 1.2., page 325]. One can compute the curvature of M and verify that M has nonconstant curvature; see [45, page 428]. From the above equality, it is immediate that

$$\begin{aligned} \gamma '(0)=X^{1/2}\ln \left( X^{-1/2}YX^{-1/2}\right) X^{1/2}. \end{aligned}$$

Thus, for each \(X\in M\), \(\exp _X:T_XM\rightarrow M\) and \(\exp ^{-1}_X:M\rightarrow T_XM\) are given, respectively, by

$$\begin{aligned} \exp _XV=X^{1/2}e^{\left( X^{-1/2}YX^{-1/2}\right) }X^{1/2},\;\; \exp ^{-1}_XY=X^{1/2}\ln \left( X^{-1/2}YX^{-1/2}\right) X^{1/2}. \end{aligned}$$
(12)

Now, since the Riemannian distance d is given by \(d({X},Y)=\,||\exp ^{-1}_{X}Y||\), from (11) along with second expression in (12), we conclude

$$\begin{aligned} d^2(X,Y)=\hbox {tr}\,\left( \ln ^2 X^{-1/2}YX^{-1/2}\right) =\sum _{i=1}^{n}\ln ^2\lambda _i\left( X^{-\frac{1}{2}}YX^{-\frac{1}{2}}\right) , \end{aligned}$$
(13)

where \(\lambda _i(X^{-\frac{1}{2}}YX^{-\frac{1}{2}})\) denotes the ith eigenvalue of the matrix \(X^{-\frac{1}{2}}YX^{-\frac{1}{2}}\).

The numerical experiment aims to verify the efficiency of the gradient method in locating \(L^1\) and \(L^2\) center of mass on Hadamard manifolds. In our simulations, we consider different scenes taking into account three parameters: the number of matrices n in the data set \(\{Q_i\}_{i=1}^{n}\), the size \(m\times m\) of the matrices and the stopping rule \(\varepsilon > 0\). The random matrices we use for our test are generated with an uniform (well-conditioned) and non-uniform (ill-conditioned) distribution of the eigenvalues of each matrix of the data set. Hence, the non-uniform distribution satisfies \(\displaystyle \frac{\lambda _{\max }}{\lambda _{\min }}>10^2\), where \(\lambda _{\max }\) and \(\lambda _{\min }\) denote the largest and the smallest eigenvalues of each matrix of the data set, respectively. Table 1 reports the number of iterations, CPU time in seconds and gradient norm of the sequence generated by Algorithm 1 for finding \(L^2\) center of mass of the data set \(\{Q_i\}_{i=1}^{n}\) with \(m\times m\) matrices. The constant stepsize \(\{t_k\}\) satisfying the fixed stepsize rule and weights \(\{w_i\}_{i=1}^{n}\) were taken as \(t_k=w_i=\frac{1}{n}\), for all \(k\in \mathbb {N}\) and \(i=1,\ldots ,n\) with \(\delta _1=\frac{1}{2n}\) and \(\delta _2=\frac{1}{4}\). In these simulations, we use the stopping rule \(d(x^{k+1},x^k)<10^{-8}\) (Figs. 1, 2).

Table 1 Computing \(L^2\) center of mass
Fig. 1
figure 1

Non-uniform distribution

Fig. 2
figure 2

Uniform distribution

We report to the substantial reduction in the number of iterations obtained from non-uniform to uniform distribution of the eigenvalues.

Next, we perform Algorithm 1 for computing \(L^1\) center of mass. The main challenge is that often the underlying function is not globally differentiable and despite this one would like to have guaranteed effectiveness of the method. Therefore, before starting the method, we determine a direction \(d_q\) and \(t_q\) small enough such that \(f(\exp _{a_q}t_q d_q)<f(a_q)\), where \(q\in \{1,\ldots ,n\}\) satisfies \(f(a_q)=\min _{i=1,\ldots ,n}f(a_i)\), and hence set \(x^0=\exp _{a_q}t_q d_q\). Thus, we avoid the non-differentiability of the function under consideration. Table 2 reports the number of iterations, CPU time in seconds and gradient norm of the sequence generated by Algorithm 1 for finding \(L^1\) center of mass of the data set \(\{Q_i\}_{i=1}^{n}\) with \(m\times m\) matrices and using the stepsize given by the Armijo’s rule. We use the stopping rule \(d(x^{k+1},x^k)<10^{-6}\).

Table 2 Computing \(L^1\) center of mass

We can note by comparing Tables 1 and 2 that, even with a uniform distribution of the eigenvalues, computing \(L^1\) center of mass demands more efforts in both number of iterations and CPU time than to compute \(L^2\) center of mass. Next figures plot side by side the number of iterations and CPU time necessary for computing \(L^1\) and \(L^2\) center of mass of a data set. In this simulation, we consider the eight scenarios presented in Table 2 for values of n and m (Figs. 3, 4).

Fig. 3
figure 3

Non-uniform distribution

Fig. 4
figure 4

Uniform distribution

5 Conclusions

In this paper, we have considered the steepest descent method for computing Riemannian center of mass on Hadamard manifolds. As mentioned in Remark 2.2, we choose to handle with Hadamard manifolds just for the sake of simplicity because our results also work on any set where the uniqueness of minimizing geodesics holds, for example in convex balls (without any assumption on the signal of the curvature). Some numerical experiments computing \(L^1\) and \(L^2\) center of mass in the context of positive definite symmetric matrices are presented using two different stepsize rules.