1 Introduction

This paper is concerned with a characterization of the solution sets of convex optimization programmings on Riemannian manifolds. Characterizations of the solution sets of the nonlinear programming problems with multiple solutions, provided that one minimizer is known, play an important role in many fields, such as optimization problems, variational inequalities, and equilibrium problems. Mangasarian [16] presented several characterizations of the solution sets for differentiable convex programs on linear spaces and applied them to study monotone linear complementarity problems. Further investigation has been done by Burke and Ferris [3]. In the last decade, Mangasarian type characterizations were derived for several smooth and nonsmooth convex or generalized convex problems on linear spaces; see [12, 13, 23] and references therein.

Extensions of concepts and techniques from Euclidean spaces to Riemannian manifolds are natural and lead to successful tools in optimization, therefore such topics with practical and theoretical purposes have been the subject of several research papers. Udriste [22] and Rapscák [17] introduced the theory of convex functions on Riemannian manifolds motivated by the fact that some constrained optimization problems can be seen as unconstrained ones from the Riemannian geometry point of view. In addition, another advantage is that optimization problems with nonconvex objective functions can be written as convex optimization problems by endowing the space with an appropriate Riemannian metric (see Example 2.1). Recently, a number of important results have been obtained on various aspects of optimization theory and applications on Riemannian manifolds, which introduced several important techniques and methods for existence of solutions of optimization problems on Riemannian manifolds; see [1, 2, 8, 20].

The purpose of this paper is to present a simple characterization of solution sets of convex optimization problems on Riemannian manifolds in terms of the Riemannian gradient of the cost function. To the best of our knowledge, it has not been given before and to formulate and prove this result on Riemannian manifolds, we need to use several tools and techniques from Riemannian geometry. One of the applications of this characterization is for the problem of minimizing convex quadratic functions defined on a convex subset of a sphere, which arises in solving fixed point theorems, surjectivity theorems, existence theorems for complementarity problems, and variational inequalities by calculating the scalar derivatives; for more details about these theorems and their applications see [6] and references therein. In particular, some existence theorems could be reduced to optimizing a quadratic function on a convex subset of the sphere. Moreover, the minimization problems of quadratic functions defined on the sphere occur as subproblems in methods of nonlinear programming; see [18].

2 Preliminaries

In this section, we introduce some fundamental properties and notations of Riemannian manifolds. These basic facts can be found in any introductory book on Riemannian geometry; see for example [19]. Throughout this paper, M is an n-dimensional Riemannian manifold with a Riemannian metric \(\langle \cdot , \cdot \rangle _{x}\) on the tangent space \(T_{x}M\cong {{\mathbb {R}}}^{n}\) for every \(x\in M\). The corresponding norm is denoted by \(\Vert \cdot \Vert \). Let us recall that the length of a piecewise \(C^{1}\) curve \(\gamma :[a,b]\rightarrow M\) is defined by

$$\begin{aligned} L(\gamma )= & {} \int \limits ^{b}_{a}\Vert \gamma ^{\prime }(t)\Vert dt. \end{aligned}$$

By minimizing the length functional over the set of all piecewise \(C^{1}\) curves with \(\gamma (0)=x\) and \(\gamma (1)=y\) for \(x, y \in M\), we obtain a Riemannian distance on M denoted by d(xy). The space of vector fields on M is denoted by \({\mathcal X}(M)\) and \(\nabla \) is the Levi-Civita connection associated to M. A geodesic is a smooth curve \(\gamma \) satisfying the equation \(\nabla _{\gamma ^{\prime }(t)} \gamma ^{\prime }(t)=0\). The exponential mapping \( \exp : {\tilde{T}} M \rightarrow M\) is defined as \(\exp (v) = \gamma (1)\), where \(\gamma \) is the geodesic defined by its starting point x and the velocity \(\gamma ^{\prime }(0)=v\) at x and \({\tilde{T}} M\) is an open neighborhood in TM. The restriction of \(\exp \) to \(T_{x}M\) in \({\tilde{T}} M\) is denoted by \(\exp _{x}\) for every \(x\in M\). For a minimizing geodesic \(\gamma :[0,l]\rightarrow M\) connecting x to y in M,  and for a vector \(v \in T_xM\), there is a unique parallel vector field P along \(\gamma \) such that \(P(0)=v,\) this is called the parallel translation of v along \(\gamma .\) The mapping \( T_xM\ni v\mapsto P(1)\in T_yM\) is a linear isometry from \(T_xM\) onto \(T_yM\). This map is denoted by \(P_{x}^{y}.\)

The Riemannian metric induces a map \(f \mapsto \mathrm{grad}\, f \in {\mathcal X}(M)\) which associates to each differentiable function f at \(x\in M\), its gradient via the rule

$$\begin{aligned} \langle \mathrm{grad}\, f( x),v \rangle _{x}= df(v)=\frac{d}{dt}f(\exp _{x}(tv))|_{t=0}, \quad v\in T_{x}M. \end{aligned}$$

The Riemannian Hessian of f at a point \(x\in M\) is the linear mapping

$$\begin{aligned} \mathrm{Hess}\, f(x): T_{x}M\rightarrow T_{x}M \end{aligned}$$

defined by

$$\begin{aligned} \mathrm{Hess}\, f(x)[v]=\nabla _{v}~\mathrm{grad}\, f \end{aligned}$$

for every \(v\in T_{x}M.\) Note that \(\mathrm{Hess}\,f(x)\) satisfies

$$\begin{aligned} \mathrm{Hess}\, f(x)(v,v)=\frac{d^{2}}{dt^{2}}f(\exp _{x}(tv))|_{t=0}, \quad v\in T_{x}M, \end{aligned}$$

and this formula fully defines \(\mathrm{Hess}\, f(x)\).

A subset S of a Riemannian manifold is called convex if any two points x, \(y\in S\) can be joined by a unique minimizing geodesic (denoted by \(\gamma _{xy}\)) which lies entirely in S. Note that there is little consistency in the meanings attached to the terms “convex set” and “strongly convex set” (see page 105 in [9] and page 2488 in [15] and references therein).

It is known that \( ~\exp _{x}^{-1}\) is well-defined on every convex set S, \(d(x, y)=|| \exp _{x}^{-1}(y)||\) for every \(x, y\in S\), and

$$\begin{aligned} \gamma _{xy}(t)=\exp _{x}(t\exp _{x}^{-1}y)\quad \text {for all} ~t\in [0,1]; \end{aligned}$$

see [11]. Let S be a nonempty convex subset of M, a function \(f: S\rightarrow {\mathbb {R}}\) is said to be convex if for every x, \(y\in S\) and every \(t\in [0,1]\),

$$\begin{aligned} f(\gamma _{xy}(t))\le (1-t)f(x)+tf(y). \end{aligned}$$

The following example illustrates a nonconvex function which can be written as a convex function on a Riemannian manifold with an appropriate metric (see [4]).

Example 2.1

The function \(f: {{\mathbb {R}}}^{2}\rightarrow {\mathbb {R}}\) defined by

$$\begin{aligned} f(x)= e^{x_{1}}(\cosh (x_{2})-1),\quad x=(x_{1},x_{2}), \end{aligned}$$

is not convex. Endowing \({{\mathbb {R}}}^{2}\) with the metric \(g(x):=\mathrm{diag}(1,e^{2x_{1}})\), we obtain a Riemannian manifold \(M_{g}\). The Hessian matrix

$$\begin{aligned} \mathrm{Hess}\, f(x)=\mathrm{diag}(e^{x_{1}}(\cosh (x_{2})-1),e^{x_{1}}\cosh (x_{2})+e^{3x_{1}}(\cosh (x_{2})-1)), \end{aligned}$$

is positive semidefinite, therefore f is convex on \(M_{g}\).

3 Characterization of the solution sets

Our aim is to characterize the solution set of the following optimization problem

$$\begin{aligned} \min _{x\in S} f(x), \end{aligned}$$
(1)

where \(S\subseteq M\) is a convex subset of M and f is a twice continuously differentiable convex function on some open convex set containing S. We denote the solution set of the optimization problem (1) by

$$\begin{aligned} \bar{S}= \mathop {\hbox {argmin}}\limits _{x\in S}f(x), \end{aligned}$$

and assume that \(\bar{S}\ne \emptyset \). If \(\bar{x}\in \bar{S}\), then

$$\begin{aligned} \bar{S}=\{x\in S : f(x)=f(\bar{x})\}, \end{aligned}$$

and \(\bar{S}\) is a convex subset of S. The following theorem is a generalization of [7, Proposition 15] from the sphere \(S^{n}\) to a general setting which describes the relation between the solution sets of (1) and a variational inequality.

Theorem 3.1

Let \(S\subseteq M\) be a convex subset of M and f be a twice continuously differentiable convex function on some open convex set containing S. Then \(\bar{x}\in \bar{S}\) if and only if

$$\begin{aligned} \langle \mathrm{grad}\, f(\bar{x}),\exp _{\bar{x}}^{-1}y\rangle _{\bar{x}}\ge 0\quad \text {for all }y\in S. \end{aligned}$$
(2)

Proof

Let \(\bar{x}\in \bar{S}\), \(y\in S\), and \(\gamma _{\bar{x} y}\) be the minimal geodesic connecting \(\bar{x}\) and y. By convexity of S, \(f(\gamma _{\bar{x} y}(t))\ge f(\bar{x})\) for all \(t\in [0,1]\). Therefore

$$\begin{aligned} \langle \mathrm{grad}\, f(\bar{x}),\exp _{\bar{x}}^{-1}y\rangle _{\bar{x}}=\lim _{t\rightarrow 0} \frac{f(\gamma _{\bar{x}y}(t))-f(\bar{x})}{t}\ge 0. \end{aligned}$$

Now suppose that (2) holds. By convexity of f, we have

$$\begin{aligned} f(y)-f(\bar{x})\ge \langle \mathrm{grad}\, f(\bar{x}),\gamma ^{\prime }_{\bar{x} y}(0)\rangle _{\bar{x}} =\langle \mathrm{grad}\, f(\bar{x}),\exp _{\bar{x}}^{-1}y\rangle _{\bar{x}}\ge 0\quad \text {for all }y\in S. \end{aligned}$$

\(\square \)

Now we present a characterization for the solution set of a convex optimization problem on a convex subset of a Riemannian manifold which is our main result.

Theorem 3.2

Let \(S\subseteq M\) be a convex subset of M, f be a twice continuously differentiable convex function on some open convex set containing S, and \(\bar{x}\in \bar{S}\). Then

$$\begin{aligned} \bar{S}=\{x\in S:\langle \mathrm{grad}\, f(\bar{x}),\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}}=0,\ P^{\bar{x}}_{x}[\mathrm{grad} f( x)]=\mathrm{grad}\, f(\bar{x})\}. \end{aligned}$$
(3)

Proof

We denote the right hand side of (3) by \(S^{*}\). On the contrary, we assume that \(x\in S^{*}\setminus \bar{S}.\) By convexity of f,

$$\begin{aligned} \langle \mathrm{grad}\, f(x),\exp _x^{-1}(\bar{x})\rangle _{\bar{x}}\le f(\bar{x})-f(x)<0. \end{aligned}$$

Now, by properties of the parallel translation, we get

$$\begin{aligned} \begin{aligned} \langle \mathrm{grad}\, f(x),\exp _x^{-1}(\bar{x})\rangle _{x}&=\langle P_{x}^{\bar{x}}[ \text {grad} f( x)], P_{x}^{\bar{x}}[\exp _{x}^{-1}\bar{x}]\rangle _{\bar{x}}\\&=\langle \text {grad} f( \bar{x}), -\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}}=0, \end{aligned} \end{aligned}$$

which is a contradiction.

For the converse, let \(x\in \bar{S}\) and \(\gamma _{\bar{x} x}(t)\) be the minimal geodesic connecting \(\bar{x}\) and x. Since \(\bar{S}\) is convex, this geodesic lies entirely in \(\bar{S}\) so \(f(\gamma _{\bar{x} x }(t))=f(\bar{x})\) for all \(t\in [0,1]\). Therefore

$$\begin{aligned} \langle \mathrm{grad}\, f( \bar{x}),\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}}=\lim _{t\rightarrow 0}\frac{f(\gamma _{\bar{x}x}(t))-f(\bar{x})}{t}=0. \end{aligned}$$

Similarly, we have \(\langle \mathrm{grad}\, f(x),\exp _{x}^{-1}\bar{x}\rangle _{x}=0.\) These two equations and properties of the parallel transport imply

$$\begin{aligned} \langle \mathrm{grad}\, f( \bar{x})-P_{x}^{\bar{x}}[\mathrm{grad}\, f(x)],\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}}=0. \end{aligned}$$
(4)

Now, we define \(F: [0,1]\rightarrow T_{\bar{x}}M\) as follows

$$\begin{aligned} F(t)= P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}[\mathrm{grad}\, f( \gamma _{x \bar{x}}(t))]. \end{aligned}$$

Note that \(\gamma _{x \bar{x}}^{\prime }(t)=P_{x}^{\gamma _{x \bar{x}}(t)}[\exp _{x}^{-1}\bar{x}]\) and

$$\begin{aligned} F(t-s)=P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}P_{\gamma _{x \bar{x}}(t-s)}^{\gamma _{x \bar{x}}(t)}[\mathrm{grad}\, f( \gamma _{x \bar{x}}(t-s))]\quad \text {for all }t\in [0,1], \end{aligned}$$

hence

$$\begin{aligned} \begin{aligned} F^{\prime }(t)&=-\frac{d}{ds}F(t-s)|_{s=0}\\&=-P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}\frac{d}{ds}P_{\gamma _{x \bar{x}}(t-s)}^{\gamma _{x \bar{x}}(t)}[\mathrm{grad}\, f( \gamma _{x \bar{x}}(t-s))]|_{s=0}\\&=P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}[\text {Hess} f(\gamma _{x \bar{x}}(t))(\gamma ^{\prime }_{x \bar{x}}(t))]. \end{aligned} \end{aligned}$$

Since F is \(C^{1}\) and by the previous equality, we get

$$\begin{aligned} \mathrm{grad}\, f( \bar{x})- P^{\bar{x}}_{x}[\mathrm{grad}\, f( x)]= & {} F(1)-F(0)=\int \limits _{0}^{1}F^{\prime }(t)dt\nonumber \\= & {} \int \limits _{0}^{1}P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t))(\gamma ^{\prime }_{x \bar{x}}(t))dt\nonumber \\= & {} \int \limits _{0}^{1}P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t))(P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(\exp _{\bar{x}}^{-1}x))dt\nonumber \\= & {} \left( \int \limits _{0}^{1}P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t)\right) P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}( \cdot ) dt)\exp _{\bar{x}}^{-1}x\nonumber \\= & {} A\exp _{\bar{x}}^{-1}x, \end{aligned}$$
(5)

where

$$\begin{aligned} \int \limits _{0}^{1}P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t))P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(\cdot )dt: T_{\bar{x}}M\rightarrow T_{\bar{x}}M \end{aligned}$$

is a linear map. We claim that this linear map is positive semidefinite and therefore corresponds to a positive semidefinite matrix A. To prove the claim, note that \(\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t))\) is a positive semidefinite linear map, hence

$$\begin{aligned} \begin{aligned}&\langle w,P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}[\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t)) (P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(w))]\rangle _{\bar{x}}\\&\quad =\langle P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(w),\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t)) [P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(w)]\rangle _{\gamma _{x \bar{x}}(t)} \ge 0 \end{aligned} \end{aligned}$$

for every \(w\in T_{\bar{x}}M\), therefore by integration with respect to t, it implies that

$$\begin{aligned} \langle w,Aw\rangle _{\bar{x}}= & {} \left\langle w,\int \limits _{0}^{1}P_{\gamma _{x \bar{x}}(t)}^{\bar{x}}[\mathrm{Hess}\, f(\gamma _{x \bar{x}}(t)) (P_{\bar{x}}^{\gamma _{x \bar{x}}(t)}(w))]dt\right\rangle _{\bar{x}}\ge 0, \end{aligned}$$

which proves our claim. Combining (4) and (5), we deduce that

$$\begin{aligned} \langle \mathrm{grad}\, f( \bar{x})- P^{\bar{x}}_{x}[\text {grad} f( x)],\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}} =\langle A\exp _{\bar{x}}^{-1}x,\exp _{\bar{x}}^{-1}x\rangle _{\bar{x}}=0. \end{aligned}$$
(6)

Since A is symmetric positive semidefinite, it follows from (6) and [10, p. 431] that

$$\begin{aligned} \mathrm{grad}\, f( \bar{x})- P^{\bar{x}}_{x}[\mathrm{grad}\, f( x)]=A\exp _{\bar{x}}^{-1}x=0, \end{aligned}$$

which completes the proof. \(\square \)

The following corollary is an immediate consequence of Theorem 3.2.

Corollary 3.1

Let \(S\subseteq M\) be a convex subset of M, f be a twice continuously differentiable convex function on some open convex set containing S, and \(\bar{x}\in \bar{S}\). Then

  1. (i)

    the function \(x\mapsto \Vert \mathrm{grad}\, f(x)\Vert \) is constant on \(\bar{S}.\)

  2. (ii)

    \(\bar{S}=\tilde{S}=\{x\in S:\langle \mathrm{grad}\, f(\bar{x}),\exp _{\bar{x}}^{-1}x)\rangle _{\bar{x}}\le 0, P^{\bar{x}}_{x}[\mathrm{grad}\, f( x)]=\mathrm{grad} f(\bar{x})\}.\)

Proof

The first part is obtained by Theorem 3.2 and the isometric property of the parallel translation. For proving the second part, we have that the inclusion \(\bar{S}\subseteq \tilde{S}\) holds by Theorem 3.2. For the converse, assume that \(x\in \tilde{S}\). Since the parallel translation is an isometry and f is convex, we deduce that

$$\begin{aligned} \begin{aligned} f(\bar{x})- f(x)\ge \langle \mathrm{grad}\, f( x),\exp _{ x}^{-1}\bar{x}\rangle _{{x}}&=\langle P^{\bar{x}}_{x}[\mathrm{grad}\, f( x)],P^{\bar{x}}_{x}[\exp _{ x}^{-1}\bar{x}]\rangle _{\bar{x}}\\&=-\langle \mathrm{grad}\, f( \bar{x}),\exp _{\bar{x}}^{-1} x\rangle _{{\bar{x}}} \ge 0. \end{aligned} \end{aligned}$$

This shows that \(f(\bar{x})\ge f( x)\). Since \(\bar{x}\) is the minimizer of f on S, we have \(f(\bar{x})= f( x)\), which proves that \( \tilde{S}\subseteq \bar{S}\). \(\square \)

Now, we present some examples to illustrate how our characterization of solution sets of optimization problems works in particular nontrivial settings of Riemannian manifolds.

Example 3.1

Let M be a Hadamard manifold and U be a nonempty open convex subset of M. Pick \(x_{0},y_{0} \in U\) with \(x_{0}\ne y_{0}.\) Choose \(\varepsilon >0\) such that \( S:= {\bar{B}}(x_{0},\varepsilon )\subset U\), \(A:={\bar{B}}({y_{0}},\varepsilon )\subset U\) are convex and \(S\cap A=\emptyset .\) Define the function \(f: S\rightarrow {\mathbb {R}}\) by

$$\begin{aligned} f(x):=\frac{1}{2}d_{A}^{2}(x). \end{aligned}$$

Note that the function \(d_{A}^{2}(\cdot )\) is convex and twice continuously differentiable on U. Let \(\bar{x}\in \bar{S}\) and \(p:=\pi _{A}(\bar{x}),\) where \(\pi _{A}(\bar{x})\) is the metric projection of \(\bar{x}\) on A. We have that

$$\begin{aligned} \mathrm{grad }f(\bar{x})=-\exp ^{-1}_{\bar{x}}p,\quad \Vert \mathrm{grad }f(\bar{x})\Vert =d(\bar{x}, p)=d(\bar{x}, A). \end{aligned}$$

Thus, by using Theorem 3.2, we deduce

$$\begin{aligned} \begin{aligned} \bar{S}=\big \{x\in S:\langle \exp ^{-1}_{\bar{x}}p, \exp ^{-1}_{\bar{x}}x\rangle _{\bar{x}}=0, \mathrm{grad } f( x)=-P_{\bar{x}}^{x}[\exp ^{-1}_{\bar{x}}p]\big \}. \end{aligned} \end{aligned}$$

Example 3.2

The general matrix rank minimization problem (RMP) expressed as

$$\begin{aligned} \min \mathrm{rank} X,\quad \quad X\in S\subset P_n, \end{aligned}$$

where \(X\in {{\mathbb {R}}}^{n\times n}\), \(P_n\) is the set of positive semidefinite \(n\times n\) matrices, and S is a convex set, is computationally hard to solve. This problem arises in many areas such as control, system identification, statistics, signal processing, and computational geometry; see [5] and references therein. Rather than solving the RMP, one can use the function

$$\begin{aligned} \log \det (X +\delta I), \end{aligned}$$

as a smooth surrogate for \(\mathrm{rank} X \) and instead solve the following problem

$$\begin{aligned} \min \log \det (X +\delta I),\quad X\in S, \end{aligned}$$

where \(\delta >0\) can be interpreted as a small regularization constant; see [5]. Note that this surrogate is not convex on the linear space \({{\mathbb {R}}}^{n\times n}\). This application motivated us to consider the problem

$$\begin{aligned} \min \log \det (X),\qquad X\in S=\{X\in P_n^+:~~0<A\le X\}, \end{aligned}$$
(7)

which is not convex with respect to the Euclidean metric on \({{\mathbb {R}}}^{n\times n}\). Here \(P_n^+\) is the set of positive definite \(n\times n\) matrices and \(A\in P_n^+\).

The set of symmetric positive definite matrices, as a Riemannian manifold, is the most studied example of manifolds of nonpositive curvature. The tangent space to \(P_n^+\) at any of its points P is the space \(T_PP_n^+=\{P\}\times S_n\), where \(S_n\) is the space of symmetric \(n\times n\) matrices. On each tangent space \(T_PP_n^+\), the inner product is defined by

$$\begin{aligned} \langle A,B\rangle _{P}=\mathrm{tr} (P^{-1}AP^{-1}B). \end{aligned}$$

The Riemannian distance between \(P,Q\in P_n^+\) is given by

$$\begin{aligned} \text{ dist }(P,Q)=\left( \mathop \sum \limits _{i=1}^n \ln ^2(\lambda _i)\right) ^{1/2}, \end{aligned}$$

where \(\lambda _i, ~i=1,...,n\), are eigenvalues of \(P^{-1}Q.\) The exponential map

$$\begin{aligned} \exp _P:S_n\rightarrow P_n^+ \end{aligned}$$

is defined by

$$\begin{aligned} \exp _P(v)=P^{1/2}\exp (P^{-1/2}vP^{-1/2})P^{1/2}. \end{aligned}$$

Moreover, if \(P\in P_n^+\), then

$$\begin{aligned} \exp _P^{-1}:P_n^+\rightarrow S_n \end{aligned}$$

is defined by

$$\begin{aligned} \exp _P^{-1}(Q)=P\log (P^{-1}Q), \end{aligned}$$

where \(\log \) and \(\exp \) denote the logarithm and exponential functions on the matrix space; for more details see [21]. The parallel transport along the unique geodesic connecting X and Y, is defined by

$$\begin{aligned} P_X^Y(Z)=(YX^{-1})^{1/2}Z(X^{-1}Y)^{1/2}. \end{aligned}$$

Moreover, the Riemannian gradient of a function f defined on \(P_n^+\) is given by using the Euclidean gradient, denoted by \(\nabla f\), using the following formula,

$$\begin{aligned} \mathrm{grad}\, f (X) = X\mathrm{symm}(\nabla f(X))X, \end{aligned}$$

where \(\mathrm{symm}(\nabla f(X))= 1/2(\nabla f(X)+\nabla f(X)^T).\) For \(f(X)=\log \det (X)\), \(\nabla f(X)=X^{-1}\), therefore \(\mathrm{grad}\, f(X)=X\). First we claim that S is a convex subset of \(P_n^+\). Assume that \(X, Y\in S\), then \(X\ge A\) and \(Y\ge A\). The unique geodesic connecting these two points is defined by

$$\begin{aligned} \gamma (t):= X^{1/2}(X^{-1/2}YX^{-1/2})^tX^{1/2}, \end{aligned}$$

by using the Löwner–Heinz inequality (see [14, Lemma 2.1]), we have \(\gamma (t)\ge A\) and therefore S is convex in \(P_n^+\). We claim that A is a solution for the problem (7). By using Theorem 3.1, we need to prove that \(\langle A,\exp _A^{-1}(Y)\rangle _A\ge 0\) for all \(Y\in S\). Note that

$$\begin{aligned} \begin{aligned} \langle A,\exp _A^{-1}(Y)\rangle _A&=\mathrm{tr} (A^{-1}AA^{-1}A\log (A^{-1}Y))\\&=\mathrm{tr} \log (A^{-1}Y)=\log \det (A^{-1}Y)\ge 0. \end{aligned} \end{aligned}$$

Therefore,

$$\begin{aligned} \bar{S}= \{X\in S: \log \det A=\log \det X\}. \end{aligned}$$

To illustrate Theorem 3.2, we will see that

$$\begin{aligned} \bar{S} = \{X\in S:\langle \mathrm{grad}\, f(A),\exp _{A}^{-1}X\rangle _{A}=0,~ P^{A}_{X}[\mathrm{grad}\, f( X)]=\mathrm{grad} f(A)\}. \end{aligned}$$

Note that \(\mathrm{grad}\, f(X)=X,\mathrm{grad}\, f(A)=A\), and

$$\begin{aligned} P_X^A(X)=(AX^{-1})^{1/2}X(X^{-1}A)^{1/2}=A \text { for all }~X\in S. \end{aligned}$$

Moreover,

$$\begin{aligned} \langle \mathrm{grad}\, f(A),\exp _{A}^{-1}X\rangle _{A}&=\langle A, A\log (A^{-1}X)\rangle _A\\&=\mathrm{tr} (A^{-1}AA^{-1}A\log (A^{-1}X))\\&=\mathrm{tr} \log (A^{-1}X)=\log \det (A^{-1}X), \end{aligned}$$

which shows the required equation.

Recall that the unit sphere \(S^{2}:=\{x\in {{\mathbb {R}}}^3:~||x||=1\}\) is a 2-dimensional manifold with the usual Riemannian distance function defined as

$$\begin{aligned} d(x,y)=\arccos \langle x,y\rangle \quad \mathrm{for ~all}~ x,y\in S^{2}. \end{aligned}$$

For every \(\bar{x} \in S^{2},\) it follows from the definition of the Riemannian metric on \(S^{2}\) that

$$\begin{aligned} \langle u,v \rangle _{\bar{x}}=\langle u,v \rangle \quad \mathrm{for ~ all}~u,v\in T_{\bar{x}}S^{2}, \end{aligned}$$

where \(\langle \cdot , \cdot \rangle \) denotes the standard inner product in \({{\mathbb {R}}}^{3}.\) For \(x\in S^{2}\), the exponential map \(\exp _{x}:T_{x} S^{2}\rightarrow S^{2}\) is defined by

$$\begin{aligned} \exp _{x}(v)=\cos (||v||)x+\sin (||v||)\frac{v}{||v||}, \quad v\in T_{x} S^{2}. \end{aligned}$$
(8)

Moreover, \(\exp ^{-1}_{x}:S^{2}\rightarrow T_{x} S^{2}\) is

$$\begin{aligned} \exp ^{-1}_{x}(y)=\frac{\theta }{\sin \theta }(y-x\cos \theta ), \quad y \in S^{2}, \end{aligned}$$
(9)

where \(\theta =\arccos \langle x,y\rangle .\) Let \(t\rightarrow \gamma (t)\) be the unique minimal geodesic in \(S^{2}\) joining \(\gamma (0)=x\) to \(\gamma (0)=y,\) and let \(u:=\frac{\gamma ^{\prime }(0)}{||\gamma ^{\prime }(0)||}.\) The parallel translation of a vector \(v\in T_{x} S^{2}\) along the geodesic \(\gamma \) is given by

$$\begin{aligned} \begin{aligned} P_{x}^{\gamma (t)}(v)&=\big (-\sin (||\gamma ^{\prime }(0)||t)u^{\prime }v\big )x\\&\quad -\big (\cos (||\gamma ^{\prime }(0)||t)u^{\prime }v\big )u+(I-uu^{\prime })v; \end{aligned} \end{aligned}$$
(10)

see [1]. In the following example, which is an improvement of [16, Corollary 1, p. 1988], we consider the optimization problem on convex subsets of the unit sphere involving quadratic cost functions.

Example 3.3

Let S be a convex subset of \(S^{2}\) and f be the quadratic convex function on an open convex subset of \(S^{2}\) containing S defined by

$$\begin{aligned} \begin{aligned} f(x):=\langle Ax,x\rangle =x^{T}Ax, \end{aligned} \end{aligned}$$
(11)

where \(A\in {{\mathbb {R}}}^{n\times n}\) is an \(n\times n\) matrix. Suppose that \(\bar{x}\in \bar{S}\). By using formulas (8)–(10) and [1, p. 74], for every \(x\in \bar{S}\), we get

$$\begin{aligned} \mathrm{grad}\, f( x)=2(Ax-(xx^{T})Ax), \end{aligned}$$

and

$$\begin{aligned} \exp ^{-1}_{\bar{x}}(x)=\frac{\theta }{\sin \theta }(x-{\bar{x}}\cos \theta ), \end{aligned}$$

where \(\theta =\arccos \langle {\bar{x}},x\rangle .\) Therefore, by Theorem 3.2, we obtain

$$\begin{aligned} \bar{S}=\{x\in S:\langle c, x- \bar{x}\cos \theta \rangle _{\bar{x}}=0, (I-xx^{T})Ax=d\}, \end{aligned}$$

where \(c=A\bar{x}-(\bar{x}{\bar{x}}^{T})A\bar{x}\) and \(d=P^{x}_{\bar{x}}(c).\)