1 Introduction

The aim of this paper was to study a semivectorial bilevel optimization problem on Riemannian manifolds.

A semivectorial bilevel optimization problem is a bilevel problem, where the upper level is a scalar optimization problem, and the lower level is a multiobjective optimization problem.

Multiobjective optimization finds its origin in the nineteenth century in the economic works of Edgeworth [1] and Pareto [2], but, in mathematics, this branch commenced in 1951 with the famous paper of Kuhn and Tucker [3]. Due to its various applications in real-life problems, intensively researches have been made in vector optimization in the last 50 years. Dealing with several conflicting objectives, a Pareto solution (called also efficient) is such that none of the objective values can be improved further without deteriorating another.

The lower level of the semivectorial bilevel optimization problem is a parametric multiobjective (vector) optimization problem and may be considered as a single follower having to optimize several objectives, or as several followers each of them having to optimize one (scalar) objective. The last situation corresponds to the so-called greatest coalition multiplayers game. The parameter in the lower level problem is the (vector) variable chosen by the leader, and, for each choice of the leader’s variable, the followers choose a Pareto solution.

In a semivectorial bilevel optimization problem, the upper level is a scalar optimization problem to be solved by the leader. The leader objective depends on two (vector) variables: one chosen by the leader, and the second one represents the response of the followers.

If for each choice of the leader the followers choose among their best responses (Pareto solutions) one which is the best for the leader, so, when the followers cooperate with the leader, we deal with the so-called optimistic problem.

In the case when there is no cooperation between the leader and the followers, the leader may consider the worst scenario, i.e., the situation when, for each choice of the leader, the followers choose, among their best responses, one which is the most unfavorable for the leader, leading to the so-called pessimistic problem.

The study of semivectorial optimization problems in Euclidean or Hilbert spaces has begun in 2006; see papers [4, 5], and it was continued by several authors; see papers [610]. The case of semivectorial bilevel optimal control problems was considered in [11, 12].

The semivectorial bilevel optimization problem includes as particular cases the following problems which have been intensively studied in the last decades so we will give essentially a few earlier references,

  • Optimizing a scalar function over the Pareto set (introduced in [13] and investigated in [1429] and [30] for a survey);

  • Bilevel optimization problems where the upper level and lower level are scalar optimization problems (e.g., [3136] for an extensive bibliography).

Extension of optimization problem from Euclidean spaces to Riemannian manifolds is natural and nontrivial (see, e.g., [3741] and the references therein). Such extensions have different advantages. For example, some constrained optimization problems can be seen as unconstrained ones from the Riemannian geometry viewpoint. Moreover, some nonconvex optimization problems in the Euclidean setting may become convex introducing an appropriate Riemannian metric (see, e.g., [3741]).

In the last years, researchers began the study of the multiobjective optimization problems on Riemannian manifolds [4244].

It seems that our paper is the first approach in the literature for a semivectorial bilevel problem on Riemannian manifolds. Moreover, with difference of the multiobjective problems on Riemannian manifolds presented in several papers (see, e.g., [42, 43]), we consider the multiobjective lower level problem in the more general setting when the partial order is given by an arbitrary closed, convex and pointed cone, and not only by the positive orthant (Pareto cone). This generalization is motivated by the fact that many application problems need the use of arbitrary closed, convex and pointed cones (see, for instance, [4550] and the references therein).

It is worth to note that some of our results are new also for the Euclidean case. Thus, in our approach in finite dimensions, inspired by the technics used in [11, 12] for semivectorial bilevel convex optimal control problems, where the optimality conditions are related only with the linear-quadratic case for the lower level, we deal now on one hand with the general nonlinear case for both upper and lower level obtaining more explicit optimality conditions, and on the other hand with an arbitrary convex, closed and pointed ordering cone.

Our paper is organized as follows. In Sect. 2, we present first some basic facts in Riemannian geometry and then some aspects of vector optimization on Riemannian manifolds. We deal with weakly as well as properly Pareto solutions with respect to an arbitrary partial order and introduce different kinds of convexity on Riemannian manifolds with respect to the ordering cone. The classical convex scalarization theorem is now proved for the Riemannian setting.

Section 3 contains the main results. It is divided into six subsections. First, we state the optimistic and pessimistic semivectorial bilevel problems. In the second subsection, based on the scalarization theorem for convex multiobjective problems presented in the previous section, we rewrite our problems as bilevel optimization problems with a scalar lower-level problem which has a unique solution. Moreover, we show that under our hypotheses, the weakly Pareto set coincides with the Pareto set. In the third subsection, we give optimality conditions for this scalar lower-level problem. In the fourth subsection, we present optimality conditions for the optimistic semivectorial problem for the case of (weakly) Pareto case as well as for the properly Pareto one. In the fifth subsection, we give an existence result for the more difficult case of the semivectorial bilevel pessimistic problem. In the last subsection, we give an illustrative example, where the lower level is given by a nonconvex multiobjective problem on the initial Euclidean space, but becomes convex for a suitable Riemannian metric. Thus, taking advantage of our results presented in Sects. 2, 3 and 4, we are able to find the optimal solution.

Some concluding remarks are presented the last section.

2 Preliminaries

2.1 Basics in Riemannian Geometry

An n-dimensional Riemannian manifold is a pair (Mg), where M stands for an n-dimensional smooth manifold and g stands for a smooth, symmetric positive definite (0, 2)-tensor field on M, called a Riemannian metric on M. If (Mg) is a Riemannian manifold; then, for any point \(p \in M\), the restriction \(g_p : T_pM \times T_p M \longrightarrow \mathbb {R}\) is an inner product on the tangent space \(T_pM\).

The tangent bundle TM over M is \(TM := \bigcup _{x \in M} T_xM\), and a vector field on M is a section of the tangent bundle, that is a map \(X : M \longrightarrow TM\) such that for any \(p \in M\), \(X(p) \equiv X_p \in T_pM\). The set of all \(C^\infty \) vector fields on M is denoted \({\varGamma } (TM)\) or \(\chi (M)\).

By identifying a vector field with its image set, the sets \({\varGamma } (TM)\) and TM can be identified. In all the paper, we use the Einstein summation convention: that is, the summation symbol is omitted when the same index occurs twice in a product, once as an upper index and once as a lower index. Moreover, we will consider only connected Riemannian manifolds.

In local coordinates \((x^1, \ldots , x^n)\) around the point x on M, the Riemannian metric g is written \(g = g_{ij}\mathrm{d}x^i \mathrm{d}x^j\) with \(g_{ij} = g_x (\frac{\partial }{\partial x^i} , \frac{\partial }{\partial x^j})\), where \((\frac{\partial }{\partial x^1}, \ldots , \frac{\partial }{\partial x^n})\) is the natural basis of \(T_xM\) , and \((\mathrm{d}x^1 , \ldots , \mathrm{d}x^n)\) is the corresponding dual basis on the cotangent vector space \(T^*_xM\).

The length of a tangent vector \(v \in T_p M\) is defined by \(\Vert v\Vert _p := g_p (v , v)^{\frac{1}{2}}\).

If \(\gamma : [a ,b] \subset {\mathbb {R}} \longrightarrow M\) is a piecewise smooth curve in M, then its length is defined by \(L (\gamma ) := \int _a^b \Vert \dot{\gamma }(t)\Vert _{\gamma (t)} \mathrm{d}t\), where \(\dot{\gamma }\) means the first derivative of \(\gamma \) w.r.t to t.

Let p and q be two points in (Mg) and \({\varGamma }_{pq}\) the set of all piecewise smooth curves joining p and q. The function

$$\begin{aligned} d:M\times M \rightarrow R, \; \; { d(p,q) := \inf \{L(\gamma ) : \, \gamma \in {\varGamma }_{pq}\}} \end{aligned}$$

is a distance on M, and the induced metric topology on M coincides with the topology of M as manifold.

Let \(M_1\) and \(M_2\) be two manifolds. The tangent bundle of the product manifold \(M_1 \times M_2\) can be written as \(T(M_1 \times M_2) = TM_1 \times TM_2\), and if \(g_1\) and \(g_2\) are two Riemannian metrics on \(M_1\) and \(M_2\), respectively, then the corresponding Riemannian metric product g on \(M_1 \times M_2\) is defined by

$$\begin{aligned} g ((X_1 , X_2) , (Y_1 , Y_2)) = g_1 (X_1 , Y_1) + g_2 (X_2 , Y_2) \end{aligned}$$

for all \( X_1 , Y_1 \in TM_1 \) and \(X_2 , Y_2 \in TM_2.\)

A linear connection or equivalently a covariant derivative on a n-dimensional smooth manifold M is a map \(D : TM \times TM \longrightarrow TM\) such that, by putting \(D (X,Y) = D_XY\), we have

\(D_{X + fY}Z = D_XZ + fD_YZ\), \(D_X (Y + Z) = D_XY + D_XZ\) and \(D_X fY = df(X).Y + f. D_XY\), for any \(X,Y,Z \in TM\) and any \(f \in C^\infty (M)\), the set of smooth real functions on M.

Let \(x \in M\), \((x^1, \ldots , x^n)\) be a local coordinates system on M around the point x and let \((\frac{\partial }{\partial x^1}, \ldots , \frac{\partial }{\partial x^n})\) be the induced basis of \(T_xM\). Then for \(i , j = 1 , \ldots , n\), \(D_{\frac{\partial }{\partial x^i}}\frac{\partial }{\partial x^j}\) is a tangent vector to M at the point x. Thus, there exist \(n^3\) smooth functions \({\varGamma }_{ij}^k\), \(i,j,k = 1 , \ldots , n\) on M, such that: \(D_{\frac{\partial }{\partial x^i}}\frac{\partial }{\partial x^j} = {\varGamma }_{ij}^k \frac{\partial }{\partial x^k}\).

The functions \({\varGamma }_{ij}^k\) are called the Christoffel symbols of the connexion D.

If (Mg) is a Riemannian manifold, there exists only one connexion \(\nabla \) on M and such that \(\nabla _XY = \nabla _YX\) and \(X . g(Y , Z) = g (\nabla _XY , Z) + g(Y , \nabla _XZ)\), \(\forall \; X , Y , Z \in TM\). This privileged connexion is called Levi-Civita connexion and its Christoffel symbols contain all the geometric information on the Riemannian manifold (Mg). In a local coordinates system \((x^1, \ldots , x^n)\), the Christoffel symbols of the Levi-Civita connexion on (Mg) are given by:

\({\varGamma }_{ij}^k = \frac{1}{2}g^{kl} \left( \frac{\partial g_{jl}}{\partial x^i} + \frac{\partial g_{il}}{\partial x^j} - \frac{\partial g_{ij}}{\partial x^l}\right) \), where the matrix \((g^{kl})\) is the inverse of the matrix \((g_{ij})\).

A piecewise smooth curve \(\gamma : [a , b] \longrightarrow M\) is said to be parametrized by arclength if \(\Vert \dot{\gamma }(t)\Vert _{\gamma (t)}\) is constant on [ab], and \(\gamma \) is called a geodesic joining the points \(\gamma (a)\) and \(\gamma (b)\) if for any \(t \in [a , b]\), \(\nabla _{\dot{\gamma }(t)} \dot{\gamma }(t) = 0\). If moreover the length \(L(\gamma )\) is equal to the distance between the points \(\gamma (a)\) and \(\gamma (b)\), then \(\gamma \) is said to be a minimizing geodesic. A geodesic curve is always parametrized by arclength. In a local coordinates system \((x^1, \ldots , x^n)\), the curve \(t\mapsto \gamma (t)=(\gamma ^1(t) , \ldots , \gamma ^n (t))\) is a geodesic if and only if

$$\begin{aligned} \ddot{\gamma }^k(t) + {\varGamma }_{ij}^k (\gamma (t))\dot{\gamma }^i(t) \dot{\gamma }^j(t) = 0, \quad \; \forall \; t \in [a , b], \quad \forall \; k = 1, \ldots , n, \end{aligned}$$

where \(\dot{\gamma }^k\) means the first derivative of \(\gamma ^k\) w.r.t to t.

Given a point \(p \in M\) and a tangent vector \(v \in T_pM\), there exists \(\varepsilon > 0\) and precisely one geodesic \(\gamma _v : [0 , \varepsilon ] \longrightarrow M\), depending smoothly on p and v, such that \(\gamma _v (0) = p\) and \(\dot{\gamma }_v(0) = v\).

Let \(p \in M\) and \(V_p := \{v \in T_M : \, \gamma _v \text{ is } \text{ defined } \text{ on } [0,1]\}\). Then, the map \(\exp _p : V_p \longrightarrow M\) defined by \(\exp _p (v) = \gamma _v (1)\), \(\forall v \in V_p\), is called the exponential map of M at the point p. The exponential map \(\exp _p\) maps a neighborhood of the null tangent vector \(0 \in T_pM\) diffeomorphically onto a neighborhood of p in M.

The Riemannian manifold (Mg) is said to be geodesically complete if for any point \(p \in M\), the exponential map \(\exp _p\) is defined on all of \(T_pM\), that is if any geodesic \(\gamma \) containing the point p is defined on all \({\mathbb {R}}\). We have the following result known as the theorem of Hopf–Rinow.

Theorem 2.1

(Hopf-Rinow) Let M be a connected Riemannian manifold. The following statements are equivalent:

  1. (i)

    M is complete as metric space.

  2. (ii)

    M is geodesically complete.

  3. (iii)

    The closed and bounded subsets of M are compact. Furthermore, each of the statements \((i) - (iii)\) implies that any two points of M can be joined by a minimizing geodesic.

Let (Mg) and (Nh) be two Riemannian manifolds and \(F : M \longrightarrow N\) be a differentiable map between M and N. The differential dF(p) of F at a point \(p \in M\) is a linear map from \(T_pM\) into \(T_{F(p)}N\). The map F is said to be conformal if there exists a positive function \(\alpha : M \longrightarrow {\mathbb {R}}\), called conformal factor, such that for any \(p \in M\) and any \(u , v \in T_pM\), \(h_{F(p)}(dF(p)(u) , \mathrm{d}F(p)(v)) = \alpha (p) g_p (u,v)\).

A conformal map preserves angles between tangent vectors.

If the conformal factor is a constant function equal to one, then F is called an isometry.

The gradient of a differentiable function \(f : M \longrightarrow {\mathbb {R}}\) w.r.t. the Riemannian metric g is the vector field \(\mathrm{grad}f\) defined by \(g(\mathrm{grad}f , X) = \mathrm{d}f (X),\; \forall X \in TM\), where df denotes the differential of the function f.

The Hessian of a smooth function \(f : M \longrightarrow {\mathbb {R}}\) is the 2-form \(\hbox {Hess}(f)\) defined by \(\hbox {Hess}(f) (X,Y)= \nabla df (X,Y) = g (\nabla _X \mathrm{grad}f , Y),\; \forall X , Y \in TM\).

In local coordinates \((x^1, \ldots , x^n)\) around \(p \in M\), we can write:

$$\begin{aligned} \mathrm{grad}f(p)= & {} g^{ij}\frac{\partial f}{\partial x^i} \frac{\partial }{\partial x^j},\\ {\hbox {Hess}} (f)= & {} \nabla \hbox {d}f =\Big (\frac{\partial ^2 f}{\partial x^i \partial x^j} - {\varGamma }_{ij}^k\frac{\partial f}{\partial x^k}\Big )\; \hbox {d}x^i \hbox {d}x^j. \end{aligned}$$

Let \(p\in M\) be fixed. The inverse of the exponential map \(\exp _p^{-1}\) maps diffeomorphically a neighborhood of p onto a neighborhood of the origin of \(T_pM\). Considering an orthonormal basis \((\frac{\partial }{\partial x^1}, \ldots , \frac{\partial }{\partial x^n})\) in \(T_pM\) with respect to the scalar product \(g_p(\cdot , \cdot )\), this diffeomorphism establishes a local coordinate system \((x^1, \ldots , x^n)\) around the point p called normal coordinate system. In this normal coordinates system, the geodesics through p are represented by lines passing through origin. Moreover, the matrix \((g_{ij})\) associated with the bilinear form g at the point p in this orthonormal basis reduces to the identity matrix and the Christoffel symbols vanish. Thus, for any smooth function \(f:M\rightarrow \mathbb {R}\), in normal coordinates around p, we obtain

$$\begin{aligned} \mathrm{grad}f(p)=\sum _i \frac{\partial f}{\partial x^i}(p)\frac{\partial }{\partial x^i}, \end{aligned}$$

and, for all \(u=u^i\frac{\partial }{\partial x^i}, \, v=v^i\frac{\partial }{\partial x^i} \in T_pM\),

$$\begin{aligned} {\hbox {Hess}}_p (f)(u,v)=\frac{\partial ^2 f}{\partial x^i \partial x^j}(p)u^iv^j . \end{aligned}$$

Now consider a smooth function \(f:M\rightarrow \mathbb {R}\) and the real-valued function \(T_pM \ni v\mapsto f_p(v):=f(\exp _pv) \) defined around 0 in \(T_pM\).

It is easy to see that

$$\begin{aligned} \frac{\partial f_p}{\partial x^i}(0)=\frac{\partial f}{\partial x^i}(p), \; \; \frac{\partial ^2f_p}{\partial x^i\partial x^j}(0)=\frac{\partial ^2 f}{\partial x^i\partial x^j}(p). \end{aligned}$$

The Taylor–Young formula (for Euclidean spaces) applied to \(f_p\) around the origin can be written using matrices as

$$\begin{aligned} f_p(v)=f_p(0)+J_{f_p}(0)v+\frac{1}{2}v^TH_{f_p}(0)v +o(\Vert v\Vert ^2), \end{aligned}$$
(1)

where

$$\begin{aligned} v= & {} [v^1 \; \ldots \; v^n]^T, \quad J_{f_p}(0)=\Big [ \frac{\partial f}{\partial x^1}(p) \; \ldots \; \frac{\partial f}{\partial x^n}(p)\Big ], \\ H_{f_p}(0)= & {} \Big (\frac{\partial ^2f}{\partial x^i\partial x^j}(p)\Big ). \end{aligned}$$

In other words, we have the following Taylor–Young expansion for f around p

$$\begin{aligned} f(\exp _pv)=f(p)+g_p(\mathrm{grad}f,v)+\frac{1}{2}{\hbox {Hess}} _pf(v,v) +o(\Vert v\Vert _p ^2) \end{aligned}$$
(2)

which holds in any coordinate system.

A function \(f : M \longrightarrow {\mathbb {R}}\) is said to be (strictly or strongly) convex if its restriction to any geodesic curve \(\gamma : [a , b] \longrightarrow M\) is convex in the classical sense; that the one real variable function \(f \circ \gamma : [a , b] \longrightarrow {\mathbb {R}}\) is (strictly) convex.

A function f on M is (strictly) convex if and only if its Hessian is (positive definite) positive.

For more detailed and complete information on the fundamentals in Riemannian geometry, we refer to [51, 52].

2.2 Basics in Vector Optimization on Riemannian Manifolds

Let (Mg) be a complete Riemannian manifold of dimension n; hence, from Hopf-Rinow theorem, for any two points p and q in M, there exists a length minimizing geodesic connecting these two points.

Throughout the paper, \(C\subset \mathbb {R}^r\) will denote a convex and pointed cone (i.e., \(\mathbb {R}_+C\subset C, \, C+C\subset C, \, C\cap (-C)=\{ 0\}\)). We suppose that C is closed and \(\mathrm{int}\; (C)\ne \emptyset \), where \(\mathrm{int}\; (A)\) stands for the the topological interior of any subset \(A\subset \mathbb {R}^r\).

For any \(y, y'\in \mathbb {R}^r\), we denote

$$\begin{aligned}&y \preccurlyeq y' \Longleftrightarrow y'-y \in C; \\&\quad y\prec y' \Longleftrightarrow y'-y\in \mathrm{int}\; (C); \quad y \precneqq y'\Longleftrightarrow y'-y\in C \setminus \{ 0\}. \end{aligned}$$

Obviously,

$$\begin{aligned} y\prec y' \Longrightarrow y\precneqq y'\Longrightarrow y\preccurlyeq y'. \end{aligned}$$

Note that \(\preccurlyeq \) is a partial order relation on \(\mathbb {R}^r\), i.e., a reflexive, antisymmetric and transitive binary relation. Also \(\prec \) and \(\precneqq \) are transitive relations.

Consider a vector function \(G=(G_1, \ldots , G_r):M\rightarrow \mathbb {R}^r\), and the multiobjective optimization problemFootnote 1

For (MOP), the point \(a\in M\) is called:

  • Pareto solution iff there is no \(x\in M\) such that \(G(x) \precneqq G(a)\);

  • weakly Pareto solution iff there is no \(x\in M\) such that \(G(x) \prec G(a)\);

  • properly Pareto solution iff a is a Pareto solution, and there exists a pointed convex cone K such that \(C\setminus \{ 0\}\subset \mathrm{int}\; (K)\) and a is a Pareto solution for the problem \(\text{ MIN }_K G(x)\quad \text{ s.t. }\, x\in M \), i.e., \(G(M)\cap (G(a)-K)=\{ G(a)\}\).

When \(C=\mathbb {R}^r_+:=\{ \lambda =(\lambda _1, \ldots , \lambda _r)\in \mathbb {R}^r : \; \lambda _i\ge 0, \; i=1, \ldots ,r \}\) (the Pareto cone), the previous definitions can be stated as follows.

  • Pareto solution iff there is no \(x\in M\) such that, for all \(i\in \{1, \ldots , r\}\), \(G_i(x) \le G_i(a), \; \) and \(G(x)\ne G(a)\);

  • Weakly Pareto solution iff there is no \(x\in M\) such that, for all \(i\in \{1, \ldots , r\}\), \(G_i(x) < G_i(a)\);

  • Properly Pareto solution (provided that \(G(M)+\mathbb {R}_+^r\) is convex) iff a is a Pareto solution, and there exists a real number \(\mu >0\) such that for each \(i\in \{ 1, \ldots , r\}\) and every \(x\in M\) with \(G_i(x)<G_i(a)\) at least one \(j\in \{ 1, \ldots , r\}\) exists with \(G_j(x)>G_j(a)\) and

    $$\begin{aligned} \frac{G_i(a)-G_i(x)}{G_j(x)-G_j(a)}\le \mu . \end{aligned}$$

We denote the set of all Pareto (resp. weakly Pareto and properly Pareto) solutions by . Obviously, we have

In order to simplify the notations, we will use the symbol \(\sigma \in \{ w,p\}\) for weak (if \(\sigma =w\)) or proper (if \(\sigma =p\)), i.e., we write for the weakly or properly Pareto solutions set.

We recall that a real function \(h:M\rightarrow \mathbb {R}\) is called convex if for any two distinct points a and b in M, and for any geodesic segment \(\gamma : [0,1]\rightarrow M\) with \(\gamma (0)=a, \, \gamma (1)=b\), the function \(h\circ \gamma \) is convex in the usual way, i.e., for all \(t\in ]0,1[\)

$$\begin{aligned} h(\gamma (t))\le (1-t)h(a)+th(b). \end{aligned}$$

If the last inequality is strict, we say that h is strictly convex.

Now, the vector-valued function \(G=(G_1, \ldots , G_r):M\rightarrow \mathbb {R}^r\) is called C-convex (resp. w-strictly C-convex or p-strictly C-convex) iff for any two distinct points a and b in M, and for any geodesic segment \(\gamma : [0,1]\rightarrow M\) with \(\gamma (0)=a, \, \gamma (1)=b\), we have, respectively,

$$\begin{aligned} \forall t\in ]0,1[ \qquad \qquad \qquad \quad G(\gamma (t))\preccurlyeq (1-t)G(a)+tG(b), \end{aligned}$$
(3)
$$\begin{aligned} \forall t\in ]0,1[ \quad \qquad \qquad \qquad G(\gamma (t))\prec (1-t)G(a)+tG(b), \end{aligned}$$
(4)
$$\begin{aligned} \forall t\in ]0,1[\quad \qquad \qquad \qquad G(\gamma (t))\precneqq (1-t)G(a)+tG(b). \end{aligned}$$
(5)

In the case \(C=\mathbb {R}^r_+\), it is easy to see that

  • G is \(\mathbb {R}^r_+\)-convex iff \(G_i\) is convex for all \(i=1,\ldots ,r\);

  • G is w-strictly \(\mathbb {R}^r_+\)-convex iff \(G_i\) is strictly convex for all \(i=1, \ldots ,r\);

  • G is p-strictly \(\mathbb {R}^r_+\)-convex if G is \(\mathbb {R}^r_+\)-convex, and there exists \(i\in \{ 1, \ldots , r\}\) such that \(G_i\) is strictly convex.

Throughout the paper, \(\mathbb {R}^r\) is considered with its usual Euclidean structure and identified to its dual space, and we denote by \(\left\langle \cdot ,\cdot \right\rangle \) its usual inner product (which coincides with the duality product with our identification) and by \(\Vert \cdot \Vert \) the induced norm.

The dual cone of C (or positive polar cone) is the set

$$\begin{aligned} C^*:=\{ \lambda \in \mathbb {R}^r: \, \left\langle \lambda ,y\right\rangle \ge 0 \quad \forall y\in C\}, \end{aligned}$$

and its quasi-interior is given by

$$\begin{aligned} C^*_\sharp :=\{ \lambda \in \mathbb {R}^r: \, \left\langle \lambda ,y\right\rangle > 0 \quad \forall y\in C\setminus \{ 0\}\}. \end{aligned}$$

Note that

$$\begin{aligned} (\mathbb {R}^r_+)^*=\mathbb {R}^r_+, \; \text{ and } (\mathbb {R}^r_+)^*_\sharp =\mathrm{int}\; (\mathbb {R}^r_+)=\{ \lambda \in \mathbb {R}^r| \, \lambda _i>0 \quad i=1, \ldots , r\}. \end{aligned}$$

Let us denote

$$\begin{aligned} \Lambda _\sigma =\left\{ \begin{array}{ll} \{ \lambda \in C^*: \; {\Vert \lambda \Vert _1} =1\}, \qquad \qquad &{} \text{ if } \sigma =w, \\ C^*_\sharp , \qquad \qquad &{} \text{ if } \sigma =p.\end{array} \right. \end{aligned}$$
(6)

Proposition 2.1

The following properties hold.

  • A. The dual cone \(C^*\) is a closed set in \(\mathbb {R}^r\).

  • B. The set \(C^*_\sharp \) (the quasi-interior of \(C^*\)) is a nonempty open set,Footnote 2 and it is in fact the topological interior of \(C^*\).

  • C. The set \(\Lambda _w\) is compact.

Proof

A. We can write \(C^*\) as an intersection of a family of closed sets:

$$\begin{aligned} C^*=\bigcap _{y\in C}\{\lambda \in \mathbb {R}^r: \; \left\langle \lambda ,y\right\rangle \ge 0\}. \end{aligned}$$

B. Let \(\lambda _0\in C^*_\sharp \). Let \(K=C\cap \{ y\in \mathbb {R}^r: \, \Vert y\Vert =1\}\). Obviously, K is a compact set; hence, by Weierstrass’ theorem, there is some real \(\alpha >0\) such that \(\left\langle \lambda _0,y\right\rangle \ge \alpha \) for all \(y\in K\). Let \(R=\alpha /2\). For any \(\lambda \in B(\lambda _0;R)\) (the open ball centered in \(\lambda _0\) and of radius r), and for any \(y\in C \setminus \{ 0\}\), by Cauchy–Schwarz inequality, we have

$$\begin{aligned} \left\langle \lambda ,\frac{y}{\Vert y\Vert }\right\rangle =\left\langle \lambda _0,\frac{y}{\Vert y\Vert }\right\rangle +\left\langle \lambda -\lambda _0,\frac{y}{\Vert y\Vert }\right\rangle \ge \alpha -\Vert \lambda -\lambda _0\Vert \ge \alpha -R >0, \end{aligned}$$

hence \(\left\langle \lambda ,y\right\rangle >0\); therefore \(B(\lambda _0;R)\subset C^*_\sharp \). Thus, \(C^*_\sharp \) is an open set included in \(C^*\).

Let now \(\lambda _0\in \mathrm{int}\; (C^*)\). Arguing by contradiction, if \(\lambda _0\notin C^*_\sharp \), then \(\left\langle \lambda _0,y\right\rangle =0\) for some \(y\in C \setminus \{ 0\}\). There exists some real \(R>0\) such that \(\lambda _0+h\in C^*\) for all \(h\in \mathbb {R}^r\) with \(\Vert h \Vert <R\). Therefore,

$$\begin{aligned} 0\le \left\langle \lambda _0+h,y\right\rangle =\left\langle h,y\right\rangle \quad \forall h\in \mathbb {R}^r, \; \Vert h\Vert <R, \end{aligned}$$

which implies the contradiction \(y=0\).

Finally, \(C^*_\sharp \) is nonempty because \(C\setminus \{ 0\}\) is contained in an open half-space (using a separation theorem since C is closed convex and pointed).

C. It is obvious. \(\square \)

Now we give a scalarization theorem on manifolds, which allows us to replace the vector optimization problem with a family of scalar optimization problems.

Theorem 2.2

Let (Mg) be a complete and connected Riemannian manifold in \(\mathbb {R}^n\), and consider a function \(G=(G_1, \ldots , G_r):M \rightarrow \mathbb {R}^r\). Then, for each \(\sigma \in \{ w,p\}\), we have

(7)

Moreover, if G is C-convex on M, then the previous inclusion becomes an equality, i.e.,

(8)

Proof

Inclusion (7) is a standard result (see, e.g., [46, 49]). However, for reader’s convenience, we give a short proof here.

Let \(\sigma =p\), \(\lambda \in \Lambda _p\), and \(a\in \displaystyle \text{ arg }\min _{x\in M}\left\langle \lambda ,G(x)\right\rangle .\) Let us consider the set \(K=\{ y\in \mathbb {R}^r| \, \left\langle \lambda ,y\right\rangle >0\} \cup \{0\}\). Obviously, K is pointed and convex cone, and \(C\setminus \{ 0\}\subset \mathrm{int}\; (K)\). We claim that . Indeed, arguing by contradiction, if , then there is some \(x\in M\) such that \(G(a)-G(x)\in K\setminus \{ 0\}\); hence, \(\left\langle \lambda ,G(a)-G(x)\right\rangle >0\) which contradicts the choice of a. Therefore, .

Let now \(\sigma =w\), \(\lambda \in \Lambda _w\), and \(a\in \displaystyle \text{ arg }\min _{x\in M}\left\langle \lambda ,G(x)\right\rangle .\) Since \(\lambda \in C^*\setminus \{ 0\}\), it is easy to see that \(\lambda (\mathrm{int}\; (C))\subset ]0,+\infty [\). If, by contradiction , then we can find \(x\in M\) such that \(G(x) \prec G(a)\), i.e., \( G(a)-G(x)\in \mathrm{int}\; (C)\); therefore, \(\left\langle \lambda ,G(a)-G(x)\right\rangle >0\) which contradicts the choice of a. Thus, inclusion (7) holds.

Suppose now that G is C-convex. Let us prove that

$$\begin{aligned} G(M)+C \; \text{ is } \text{ convex } \text{ in }\, \mathbb {R}^r. \end{aligned}$$
(9)

Let \(y,y'\in G(M)+C\), thus \(y=G(x)+c, \, y'=G(x')+c'\) for some \(x,x'\in M\) and \(c,c'\in C\). There is a geodesic segment \(\gamma :[0,1]\rightarrow M\) with \(\gamma (0)=x\), \(\gamma (1)=x'\). Then, for each \(t\in ]0,1[\), we have \(G(\gamma (t)) \preccurlyeq (1-t)G(x)+tG(x')\), thus

$$\begin{aligned} c'':=(1-t)(y-c)+t(y'-c')-G(\gamma (t)) \in C. \end{aligned}$$

Therefore,

$$\begin{aligned} (1-t)y+ty'=G(\gamma (t))+ (1-t)c+tc'+c'' \in G(M)+C, \end{aligned}$$

hence (9) holds.

Let \(\sigma = w\), Thus, \((G(a)-\mathrm{int}\; (C))\cap G(M)=\emptyset \). Since \(\mathrm{int}\; (C)+C\subset \mathrm{int}\; (C)\), it follows that the convex sets \((G(a)-\mathrm{int}\; (C)\) and \(G(M)+C\) are disjoints. Using the separation theorem, we can find \(\lambda \in \mathbb {R}^r\setminus \{ 0\}\) such that

$$\begin{aligned} \forall x\in M, \, c\in \mathrm{int}\; (C), \, c'\in C \quad \quad \left\langle \lambda ,G(a)-c\right\rangle \le \left\langle \lambda ,G(x)+c'\right\rangle . \end{aligned}$$
(10)

Taking above \(x=a, c'=0\), we obtain that \(\left\langle \lambda ,c\right\rangle \ge 0\) for all \(c\in \mathrm{int}\; (C)\), hence for all \(c\in C\), because \(\text{ cl }\,({\mathrm{int}\; (C)})=\text{ cl }\, (C)\), where \(\text{ cl }\, (A)\) stands for the topological closure of a set A. Thus, \(\lambda \in C^*\setminus \{ 0\}\). Finally, \(\left\langle \lambda ,G(a)\right\rangle \le \left\langle \lambda ,G(x)\right\rangle \) for all \(x\in M\), and replacing without loss of generality \(\lambda \) by \(\frac{\lambda }{\Vert \lambda \Vert _1}\) we find that \(a\in \displaystyle \text{ arg }\min _{x\in M}\left\langle \lambda ,G(x)\right\rangle \) with \(\lambda \in \Lambda _w\).

Take now \(\sigma =p\). Let There exists a pointed convex cone K such that \(C\setminus \{ 0\}\subset \mathrm{int}\; (C)\), and Obviously, Using the case \(\sigma =w\) with K instead of C, we can find \(\lambda \in K^*\setminus \{ 0\}\) such that \(\displaystyle a\in \text{ arg }\min _{x\in M}\left\langle \lambda ,G(x)\right\rangle .\) Since \(C\setminus \{ 0\}\subset \mathrm{int}\; (K)\), and \(\lambda \ne 0\), we have \(\left\langle \lambda ,c\right\rangle >0\) for all \(c\in C\setminus \{ 0\}\), hence \(\lambda \in C^*_\sharp \). \(\square \)

For more details about vector optimization in Euclidean spaces see, e.g., the monographs [4547, 49].

3 The Semivectorial Bilevel Problem

3.1 Statement of the Problem

Let \((M_1,g_1)\) (the leader decision variables set) and \((M_2,g_2)\) (the follower decision variables set) be two connected Riemannian manifolds of dimension m and n, respectively. Moreover, \((M_2,g_2)\) is supposed to be complete. The corresponding Riemannian metrics will be denoted by \(g_1(\cdot ,\cdot )\) and \(g_2(\cdot ,\cdot )\), respectively. Let also \(f:M_1\times M_2\rightarrow \mathbb {R}\) be the leader objective function, and let \(F=(F_1, \ldots , F_r):M_1\times M_2\rightarrow \mathbb {R}^r\) be the follower multiobjective function.

For each \(x\in M_1, \) we denote by \(\psi (x)\) the weakly or properly Pareto solution set of the follower multiobjective optimization problem, i.e.,

Thus, \(\psi :M_1 \rightrightarrows M_2\) is a set-valued function.

We deal with two semivectorial bilevel problems.

  • The “optimistic semivectorial bilevel problem”

    $$\begin{aligned} \hbox {(OSB)} \qquad \qquad \qquad \qquad \qquad \displaystyle \min _{x\in M_1} \min _{y\in \psi (x)}f(x,y). \end{aligned}$$

    In this case, the follower cooperates with the leader; i.e., for each \(x\in M_1\), the follower chooses among all its \(\sigma \)-Pareto solutions (his best responses) one which is the best for the leader (assuming that such a solution exists).

  • The “pessimistic semivectorial bilevel problem”

    $$\begin{aligned} \hbox {(PSB)} \qquad \qquad \qquad \qquad \qquad \displaystyle \min _{x\in M_1}{ \sup _{y\in \psi (x)}}f(x,y). \end{aligned}$$

    In this case, there is no cooperation between the leader and the follower, and the leader expects the worst scenario; i.e., for each \(x\in M_1\), the follower may choose among all its \(\sigma \)-Pareto solutions (his best responses) one which is unfavorable for the leader (in this case we prefer to use “sup” instead of “max”).

Consider the following hypotheses:

(HC)\(_\sigma \) :

For each \(x\in M_1\), the function \(F(x,\cdot )\) is \(\sigma \)-strictly C-convex on \(M_2\), \(\sigma \in \{ w,p\}\).

(HCC)\(_\sigma \) :

For all \(x\in M_1\) and \(\lambda \in \Lambda _\sigma \), the function \(y\mapsto \left\langle \lambda ,F(x,y)\right\rangle \) has bounded sublevel sets, i.e, for all reals \(\alpha \), the set

$$\begin{aligned} \{ y\in M_2: \, \left\langle \lambda ,F(x,y)\right\rangle \le \alpha \} \end{aligned}$$

is bounded.

3.2 A Useful Equivalent Form

Proposition 3.1

If \((HC)_\sigma \) holds, then, for any \(x\in M_1\) and any \( \lambda \in \Lambda _\sigma \), where \(\Lambda _\sigma \) is given in (6), the real-valued function \(M_2\ni y\mapsto \left\langle \lambda ,F(x,y)\right\rangle \) is strictly convex.

Proof

Let \(a,b\in M_2\) \(a\ne b\), and \(\gamma :[0,1]\rightarrow M_2\) be the geodesic segment with \(\gamma (0)=a, \, \gamma (1)=b\). Let \(t\in ]0,1[\).

If \(\sigma =w\), then \(\lambda \in C^*\setminus \{ 0\}\), and \(F(x,\gamma (t))\prec (1-t)F(x,a)+tF(x,b)\), i.e., \((1-t)F(x,a)+tF(x,b)-F(x,\gamma (t))\in \mathrm{int}\; (C)\), hence

$$\begin{aligned} \left\langle \lambda ,(1-t)F(x,a)+tF(x,b)-F(x,\gamma (t))\right\rangle >0. \end{aligned}$$

If \(\sigma =p\), then \(\lambda \in C^*_\sharp \), and \(F(x,\gamma (t))\precneqq (1-t)F(x,a)+tF(x,b)\), i.e., \((1-t)F(x,a)+tF(x,b)-F(x,\gamma (t))\in C\setminus \{ 0\}\), hence

$$\begin{aligned} \left\langle \lambda ,(1-t)F(x,a)+tF(x,b)-F(x,\gamma (t))\right\rangle >0. \end{aligned}$$

\(\square \)

Proposition 3.2

Assume that \((HC)_w\) holds. Then the set of weakly Pareto solutions coincides with the set of Pareto solutions, i.e., for each \(x\in M_1\),

Proof

Let \(x\in M_1\) be fixed, and let . Theorem 2.2 and Proposition 3.1 imply that there exists \(\lambda \in \Lambda _w\) such that \(\hat{y} \) is a unique global minimizer of the function \(M_2\ni y\mapsto \left\langle \lambda ,F(x,y)\right\rangle \). If ,then we can find \(\bar{y} \in M_2\) such that \(F(x,\bar{y}) \precneqq F(x,\hat{y})\). It follows that \(\bar{y} \ne \hat{y}\) and \( \left\langle \lambda ,F(x,\bar{y})\right\rangle \le \left\langle \lambda ,F(x, \hat{y})\right\rangle \) which contradicts the fact that \(\hat{y}\) is the unique minimizer of \(\left\langle \lambda ,F(x,\cdot )\right\rangle \). \(\square \)

From now on, we assume for the rest of the paper that \(\hbox {(HC)}_\sigma \) and \(\hbox {(HCC)}_\sigma \) are satisfied.

Proposition 3.3

For each \(x\in M_1\) and \(\lambda \in \Lambda _\sigma \), the minimization problem

$$\begin{aligned} \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle \end{aligned}$$

admits a unique solution, which will be denoted hereafter \(y(x,\lambda )\).

Proof

Uniqueness is provided by Proposition 3.1. Since any convex function on a Riemannian manifold is continuous, and closed bounded sets on a complete and connected Riemannian manifold are compact, by the hypothesis (HCC)\(_\sigma \) and Weierstrass’ theorem, we can find a minimizer over a nonempty sublevel set of the function \(y\mapsto \left\langle \lambda ,F(x,y)\right\rangle \) which in fact is a minimizer over \(M_2\). \(\square \)

Thus, we obviously have the following.

Corollary 3.1

For each fixed \(x\in M_1\), the map \(\lambda \mapsto y(x,\lambda )\) is a surjection from \(\Lambda _\sigma \) to \(\psi (x)\), hence

$$\begin{aligned} \psi (x)=\bigcup _{\lambda \in \Lambda _\sigma }\{ y(x,\lambda ) \}. \end{aligned}$$
(11)

Finally, we obtain the following equivalent form with scalar bilevel problems where the lower level has a unique response.

Theorem 3.1

Problem (OSB) is equivalent to the following problem

$$\begin{aligned}&\displaystyle \min _{x\in M_1}\min _{\lambda \in \Lambda _\sigma }f\big (x,y(x,\lambda )\big ), \nonumber \\&\hbox {where}\, y(x, \lambda )\, \hbox {is the unique solution to the problem} \nonumber \\&\displaystyle \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle . \end{aligned}$$
(12)

Problem (PSB) is equivalent to the following problem

$$\begin{aligned}&\displaystyle \min _{x\in M_1}{\displaystyle \sup _{\lambda \in \Lambda _\sigma }}f\big (x,y(x,\lambda )\big ), \nonumber \\&\hbox {where}\, y(x, \lambda )\, \hbox {is the unique solution to the problem} \nonumber \\&\displaystyle \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle . \end{aligned}$$
(13)

3.3 Optimality Conditions for the Problem \(\displaystyle \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle \)

From now on, we suppose that \(F(\cdot , \cdot )\) is a smooth fonction.

Let us denote by \(\mathrm{grad}_i\) the gradient operator on \((M_i,g_i)\), \(i=1,2\), and by \(F^a\) (resp. \(\lambda _a\)), \(a = 1, \ldots , r\), the components functions of the map

\(F : M_1 \times M_2 \longrightarrow {\mathbb {R}}^r\) (resp. the canonical coordinates of the vector \(\lambda \in {\mathbb {R}}^r\)).

The proof of the following result is well known.

Proposition 3.4

(Necessary and sufficient conditions for \(y(x,\lambda ))\) Let \(\lambda \in \Lambda _\sigma \) and \(x\in M_1\) be given. Then \(y=y(x,\lambda )\) is the unique solution of the problem \(\displaystyle \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle \) iff (if and only if)

$$\begin{aligned} \lambda _a \,\mathrm{grad}_2\, F^a (x,y) = 0. \end{aligned}$$
(14)

\(\square \)

Now let us consider the map

$$\begin{aligned} G : {\mathbb {R}}^r \times M_1 \times M_2 \longrightarrow TM_2, \; \; G (\lambda , x , y) = \lambda _a \mathrm{grad}_2 F^a (x,y). \end{aligned}$$

Note that, for each \((\lambda , x)\in \Lambda _\sigma \times M_1\), the solution \(y=y(x,\lambda )\) to the problem \(\displaystyle \min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle \) satisfies the equation \(G (\lambda , x , y) = 0\).

Denote by \(\delta _2 G (\lambda , x , y) : T_yM_2 \longrightarrow T_yM_2\) the partial differential of G w.r.t. to y at the point \((\lambda , x , y)\).

The following is a direct consequence of the implicit functions theorem.

Proposition 3.5

Let \((\lambda _0, x_0)\in \Lambda _\sigma \times M_1\), and let \(y_0=y(x_0,\lambda _0)\) be the unique solution of the problem \(\displaystyle \min _{y\in M_2}\left\langle \lambda _0,F(x_0,y)\right\rangle \).

Suppose that \(\delta _2 G (\lambda _0 , x_0 , y_0)\) is an isomorphism.Footnote 3 Then, in a neighborhood of \((\lambda _0 , x_0)\), the function \(y(\cdot , \cdot )\) is smooth and

$$\begin{aligned} \frac{\partial }{\partial \lambda }y(\lambda , x) = - (\delta _2 G (\lambda , x , y))^{-1} \circ \frac{\partial G}{\partial \lambda }(\lambda , x , y) \end{aligned}$$

and

$$\begin{aligned} \delta _1 y (\lambda , x) = - (\delta _2 G (\lambda , x , y))^{-1} \circ \delta _1 G (\lambda , x , y), \end{aligned}$$

where \(\delta _1\) denotes the partial differential operator w.r.t. \(x \in M_1\).

Remark 3.1

Note that for a tangent vector \(u \in T_xM_1\), we have

$$\begin{aligned} \delta _1 G (\lambda , x , y) (u) = (\delta _1 G^1(\lambda , x , y) (u), \ldots , \delta _1 G^n(\lambda , x , y) (u)), \end{aligned}$$

with

$$\begin{aligned} \delta _1 G^\alpha (\lambda , x , y) (u) = g_1 (\mathrm{grad}_1 G^\alpha (\lambda , x , y) , u), \end{aligned}$$

and for a tangent vector \(v \in T_yM_2\), we have

$$\begin{aligned} \delta _2 G (\lambda , x , y) (v) = (d_2 G^1(\lambda , x , y) (v), \ldots , d_2 G^n(\lambda , x , y) (v), \end{aligned}$$

with

$$\begin{aligned} d_2 G^\alpha (\lambda , x , y) (v) = g_2 (\mathrm{grad}_2 G^\alpha (\lambda , x , y) , v), \end{aligned}$$

where \(G^\alpha \), \(\alpha = 1 , \ldots , n\), are the component functions of G.

In local coordinates \((x^i)_{i=1}^m\) on \(M_1\) and \((y^\alpha )_{\alpha = 1}^n\) on \(M_2\), we can write

$$\begin{aligned} \mathrm{grad}_1 G^\alpha (\lambda , x , y) = g_1^{ij}\frac{\partial G^\alpha }{\partial x^i}(\lambda , x , y) \frac{\partial }{\partial x^j}, \end{aligned}$$

and

$$\begin{aligned} \mathrm{grad}_2 G^\alpha (\lambda , x , y) = g_2^{\beta \mu }\frac{\partial G^\alpha }{\partial y^\beta }(\lambda , x , y) \frac{\partial }{\partial y^\mu }. \end{aligned}$$

3.4 Optimality Conditions for the Optimistic Problem

In the sequel, we denote for all \(x\in M_1\) and \(\lambda \in \Lambda _\sigma \)

$$\begin{aligned} \varphi (x,\lambda )=f\big (x,y(x,\lambda )\big ). \end{aligned}$$

In this subsection, we suppose that \(f(\cdot , \cdot )\) is a smooth function.

Theorem 3.2

(Necessary optimality conditions for the optimistic problem when we deal with properly Pareto solutions) Let \((x^*,\lambda ^*)\in M_1\times \Lambda _p\) be a (local Footnote 4) solution of the problem

$$\begin{aligned} \min _{x\in M_1}\min _{\lambda \in \Lambda _p}f\Big (x,y(x,\lambda )\Big ). \end{aligned}$$

Let \(y^*=y(x^*,\lambda ^*)\). Then

$$\begin{aligned} \mathrm{grad}_1f(x^*,y^*)+\mathrm{grad}_2f( x^*,y^*)\circ \delta _1y(x^*,\lambda ^*)= & {} 0 \\ \\ \mathrm{grad}_2f(x^*,y^*)\circ \frac{\partial y}{\partial \lambda }(x^*,\lambda ^*)= & {} 0, \end{aligned}$$

where \(\delta _1y(x^*,\lambda ^*)\) and \(\displaystyle \frac{\partial y}{\partial \lambda }(x^*,\lambda ^*)\) are given in Proposition 3.5.

Moreover, the point \((x^*,\lambda ^*)\) has the property that the quadratic form associated with the bilinear form \(\hbox {Hess}(\varphi )(x^*, \lambda ^*)\) is positive semidefinite.

Proof

Note that our problem written in short \(\min _{x\in M_1}\min _{\lambda \in \Lambda _p}\varphi (x,\lambda )\) is (locally) equivalent to the problem

$$\begin{aligned} \min _{(x,\lambda )\in M_1\times \Lambda _p}\varphi (x,\lambda ), \end{aligned}$$

and the set \(M_1\times \Lambda _p\) is open in \(M_1\times \mathbb {R}^r\) according to Proposition 2.1. Then, using Fermat’s rule, chain rule and the necessary second-order optimality conditions (which can be justified as in the Euclidean case using Taylor–Young formula (2) around \((x^*,\lambda ^*)\)), we obtain easily the result. \(\square \)

Theorem 3.3

(Necessary optimality conditions for the optimistic problem when we deal with (weakly) Pareto solutions) Let \((x^*,\lambda ^*)\in M_1\times \Lambda _w\) be a (local) solution of the problem

$$\begin{aligned} \min _{x\in M_1}\min _{\lambda \in \Lambda _w}f\Big (x,y(x,\lambda )\Big ). \end{aligned}$$

Let \(y^*=y(x^*,\lambda ^*)\). Then

$$\begin{aligned} \mathrm{grad}_1f(x^*,y^*)+\mathrm{grad}_2f( x^*,y^*)\circ \delta _1y(x^*,\lambda ^*)= & {} 0 \end{aligned}$$
(15)
$$\begin{aligned} \mathrm{grad}_2f(x^*,y^*)\circ \frac{\partial y}{\partial \lambda }(x^*,\lambda ^*)+N^C_{\Lambda _w}(\lambda ^*)&\ni&0, \end{aligned}$$
(16)

where \(N^C_{\Lambda _w}(\lambda ^*)\) is the Clarke normal cone to the set \(\Lambda _w\) at the point \(\lambda ^*\) (see [53, page 212]), and \(\delta _1y(x^*,\lambda ^*)\), \(\displaystyle \frac{\partial y}{\partial \lambda }(x^*,\lambda ^*)\) are given in Proposition 3.5.

Proof

Note that \(\Lambda _w\) is compact according to Proposition 2.1; hence for each fixed \(x\in M_1\), the minimum \(\min _{\lambda \in \Lambda _w}\varphi (x,\lambda )\) is attained. Thus, our problem is equivalent to

$$\begin{aligned} \min _{(x,\lambda )\in M_1\times \Lambda _w}\varphi (x,\lambda ). \end{aligned}$$

The function \(\varphi \) is smooth; hence, we can find an open and bounded neighborhood \(O_1\times O_2\) of \((x^*,\lambda ^*)\) with \(\Lambda _w\subset O_2\) (because \(\Lambda _w\) is compact) and a positive real k such that \(\varphi \) is Lipschitz of rank k on \(O_1\times O_2\).

Thus, \((x^*,\lambda ^*)\) solves the unconstrained problem

$$\begin{aligned} \min _{(x,\lambda )\in O_1\times O_2}\Big (\varphi (x,\lambda )+kd_{\Lambda _w}(\lambda )\Big ), \end{aligned}$$

where

$$\begin{aligned} d_{\Lambda _w}(\lambda )=\inf _{\mu \in \Lambda _w}\Vert \lambda -\mu \Vert \end{aligned}$$

is the distance function associated with the set \(\Lambda _w\). Then using Clarke’s subgradient calculus (see [53, Proposition 10.36]), we obtain the result. \(\square \)

Remark 3.2

In the particular and important case when \(C=\mathbb {R}^r_+\) (the Pareto cone), the set \(\Lambda _w\) is convex and closed, so the Clarke normal cone is given by (see [53, Theorem10.39])

$$\begin{aligned} N^C_{\Lambda _w}(\lambda ^*)=\{ \alpha \in \mathbb {R}^r | \, \left\langle \alpha ,\lambda -\lambda ^*\right\rangle \le 0 \quad \forall \lambda \in \Lambda _w\}. \end{aligned}$$

Then it is easy to see that,

$$\begin{aligned} N^C_{\Lambda _w}(\lambda ^*)=\{ \alpha \in \mathbb {R}^r| \; \alpha _i=\alpha _j \; \forall i,j \in I_+(\lambda ^*); \; \alpha _k\le \alpha _i \; \forall (k,i)\in I_0(\lambda ^*)\times I_+(\lambda ^*) \}, \end{aligned}$$

where \(I_+(\lambda ^*)=\{ i | \lambda ^*_i>0\}\) and \(I_0(\lambda ^*)=\{ k | \lambda ^*_k=0\}\). Therefore, (16) becomes:

there exists a real \(\nu \) such that

$$\begin{aligned} \mathrm{grad}_2f(x^*,y^*)\circ \frac{\partial y}{\partial \lambda _i}(x^*,\lambda ^*)= & {} \nu \quad \forall i\in I_+(\lambda ^*) \end{aligned}$$
(17)
$$\begin{aligned} \mathrm{grad}_2f(x^*,y^*)\circ \frac{\partial y}{\partial \lambda _k}(x^*,\lambda ^*)\ge & {} \nu \quad \forall k\in I_0(\lambda ^*). \end{aligned}$$
(18)

These relations can also be obtained by using Karush–Kuhn–Tucker Theorem.

3.5 An Existence Result for the Pessimistic Problem

For the more difficult case of the pessimistic problem, we will deal only with (weakly) Pareto solutions.

Theorem 3.4

Suppose that the function \((x,y)\mapsto f(x, y)\) is continuous on \(M_1\times M_2\) and sublevel bounded in x uniformly in y, i.e., there exists a bounded set \(B\subset M_1\) such that for each \(\alpha \in \mathbb {R}\), \(\{ x\in M_1: \, f(x,y) \le \alpha \} \subset B\) for all \(y\in M_2\) (see [54, Definition1.16]). Then the pessimistic problem

$$\begin{aligned} \min _{x\in M_1}\sup _{\lambda \in \Lambda _w}f\Big (x, y(x,\lambda )\Big ) \end{aligned}$$

has at least one global solution.

Proof

According to Proposition 2.1, the set \(\Lambda _w\) is compact, and, by Proposition 3.5, the function \((x,\lambda )\mapsto f\Big (x, y(x,\lambda )\Big )\) is continuous; hence for each fixed x, the supremum \(\sup _{\lambda \in \Lambda _w}f\Big (x, y(x,\lambda )\Big )\) is finite and attained by Weierstrass’ theorem.

On the other hand, the function \(x\mapsto \sup _{\lambda \in \Lambda _w}f\Big (x, y(x,\lambda )\Big )\) is lower semicontinuous as a supremum of a family of (lower semi)continuous functions. Moreover, the function \(x\mapsto \sup _{\lambda \in \Lambda _w}f\Big (x, y(x,\lambda )\Big )\) has bounded sublevel sets by the hypothesis about f; hence, by Rockafellar [54, Theorem1.9], we obtain the conclusion. \(\square \)

3.6 An Illustrative Example

With the notations used in the Sect. 3.1, we consider in this subsection the particular case \(C=\mathbb {R}^2_+\), \(M_1=\mathbb {R}_{++}:=]0,+\infty [\) with the Euclidean metric, and \(M_2=\mathbb {R}^2_{++}:=\{ (y_1,y_2)\in \mathbb {R}^2| \, y_1>0, \, y_2>0\}\) with the metric \(g_2\) given in Cartesian coordinates \((y_1,y_2)\) around the point \(y\in M_2\) by the matrix

$$\begin{aligned} M_2\ni y\mapsto (g_{ij})_{y}= \left( g_2\left( \frac{\partial }{\partial y_i}, \frac{\partial }{\partial y_j}\right) \right) :=\text{ diag } \,\left( y_1^{-2} , \, y_2^{-2}\right) . \end{aligned}$$

In other words, for any vectors \(u=(u_1,u_2)\) and \(v=(v_1,v_2)\) in the tangent plane at \(y\in M_2\), denoted \(T_yM_2\), which coincides with \(\mathbb {R}^2\), we have

$$\begin{aligned} g_2(u,v)=\frac{u_1v_1}{y_1^2}+\frac{u_2v_2}{y_2^2}. \end{aligned}$$

Let \(a=(a_1,a_2) \in M_2\) and \(v=(v_1,v_2) \in T_aM_2\). It is easy to see that the (minimizing) geodesic curve \(t\mapsto \gamma (t)\) verifying \(\gamma (0) =a, \, \dot{\gamma }(0)=v\) is given by

$$\begin{aligned} \mathbb {R}\ni t\mapsto \big (a_1 e^{\frac{v_1}{a_1}t}, \, a_2 e^{\frac{v_2}{a_2}t}\big ). \end{aligned}$$

Hence, \(M_2\) is a complete Riemannian manifold. Also, the (minimizing) geodesic segment \(\gamma :[0,1] \rightarrow M_2\) joining the points \(a=(a_1,a_2)\) and \(b=(b_1,b_2)\), i.e., \(\gamma (0)=a, \, \gamma (1) =b\) is given by \(\gamma _i(t)=a_i^{1-t}b_i^t\), \(i=1,2\). Thus, the distance d on the metric space \((M_2,g_2)\)

$$\begin{aligned} d(a,b)= & {} \int _0^1\Vert \dot{\gamma }(t)\Vert _{\gamma (t)}\hbox {d}t=\int _0^1\sqrt{\Big (\frac{\dot{\gamma }_1(t)}{\gamma _1(t)}\Big )^2+\Big (\frac{\dot{\gamma }_1(t)}{\gamma _2(t)}\Big )^2} \hbox {d}t \\= & {} \sqrt{\Big (\ln \frac{a_1}{b_1}\Big )^2+\Big (\ln \frac{a_2}{b_2}\Big )^2}. \end{aligned}$$

For more details about this particular Riemannian manifold, see for instance [38, 39, 42].

It follows easily that the closed ball B(aR) centered in \(a\in M_2\) of radius \(R\ge 0\) verifies

$$\begin{aligned}{}[a_1e^{-\frac{R}{\sqrt{2}}},a_1e^{\frac{R}{\sqrt{2}}}]\times [a_2e^{-\frac{R}{\sqrt{2}}},a_2e^{\frac{R}{\sqrt{2}}}]\subset B(a;R), \end{aligned}$$
(19)

thus every closed rectangle \([\rho _1,\eta _1]\times [\rho _2, \eta _2]\) (\(\rho _1>0, \rho _2>0\)) is bounded in the metric space \((M_2,g_2)\) with the distance d.

Consider now the functions \(F:M_1\times M_2\rightarrow \mathbb {R}^2\) and \(f:M_1\times M_2\rightarrow \mathbb {R}\) given for any \((x,y)\in M_1\times M_2\) by

$$\begin{aligned} F(x,y)= & {} \big (F_1(x,y), F_2(x,y)\big )=\big ( \ln ^2(y_1)+x\sqrt{y_2}, \, x\ln (y_1)-\ln (y_2)\big ),\\ f(x,y)= & {} \frac{1}{x}\ln (y_1)-x-\frac{1}{\sqrt{y_2}}(1+y_2)\ln (y_1). \end{aligned}$$

Let us solve the optimistic semivectorial bilevel optimization problem in the properly Pareto case

$$\begin{aligned} (OSB)_p \qquad \qquad \qquad \min _{x\in M_1}\min _{y\in \psi (x)}f(x,y), \end{aligned}$$

where, for each \(x\in M_1\),

It is obvious that for each fixed \(x\in M_1\), none of the functions \(F_1(x, \cdot )\) and \(F_2(x, \cdot )\) is convex on \(M_2\) with the Euclidean metric. Therefore, the vector function \(F(x, \cdot )\) is not C-convex on \(\mathbb {R}^2_{++}\) with the Euclidean metric.

It is easy to see that, for any geodesic segment \(\gamma :[0,1]\rightarrow M_2\) with \(\gamma (0)=a, \, \gamma (1) =b\), the functions \( F_i(x, \cdot ) \circ \gamma :[0,1]\rightarrow \mathbb {R}\) are convex \((i=1,2)\). Moreover, the function \(F_1(x, \cdot ) \circ \gamma \) is strictly convex; hence, the function \(F(x, \cdot )\) is p-strictly C-convex on the Riemannian manifold \((M_2,g_2)\) (see Sect. 2.2). So the hypothesis \(\hbox {(HC)}_p\) holds.

On the other hand, \(\Lambda _p=\mathrm{int}\; (C^*)=\mathbb {R}^2_{++}\), so for every \(\lambda =(\lambda _1,\lambda _2)\in \Lambda _p\) and \((x,y)\in M_1\times M_2\), we have

$$\begin{aligned} \left\langle \lambda ,F(x,y)\right\rangle =\lambda _1F_1(x,y)+\lambda _2F_2(x,y). \end{aligned}$$

Let \(\alpha \in \mathbb {R}\), \(x\in M_1\) and \(\lambda \in \Lambda _p\) be fixed and consider the sublevel set

$$\begin{aligned} A:= \{ y\in M_2| \, \left\langle \lambda ,F(x,y)\right\rangle \le \alpha \}. \end{aligned}$$

The inequality \(\left\langle \lambda ,F(x,y)\right\rangle \le \alpha \) can be written \(\phi _1(y_1)+\phi _2(y_2)\le \alpha \), with

$$\begin{aligned} \phi _1(y_1):= \lambda _1\ln ^2(y_1)+ \lambda _2x\ln (y_1); \quad \phi _2(y_2)=\lambda _1x\sqrt{y_2}-\lambda _2\ln (y_2). \end{aligned}$$

A simple computation shows that the variations of \(\phi _1\) and \(\phi _2\) are given by

y

0

\( \xi _1\)

\( +\infty \)

y

0

\( \xi _2 \)

\( +\infty \)

\(\phi _1'(y) \)

\( - \)

0

\(+ \)

\(\phi _2'(y) \)

\( - \)

0

\(+ \)

\(\phi _1(y) \)

\(+\infty \searrow \)

\(\mu _1\)

\(\nearrow +\infty \)

\(\phi _2(y) \)

\(+\infty \searrow \)

\(\mu _2\)

\(\nearrow +\infty \)

where \(\xi _1=e^{-\frac{\lambda _2x}{2\lambda _1}}\), \(\mu _1=\phi _1(\xi _1)\), \(\xi _2=\frac{4\lambda _2^2}{\lambda _1^2x^2}\), \(\mu _2=\phi _2(\xi _2)\). Thus, if \(y\in A\), then \(\phi _1(y_1)\le \alpha -\mu _2\) and \(\phi _2(y_2)\le \alpha -\mu _1\). Hence, if \( \alpha <\mu _1+\mu _2\), the set A is empty; otherwise, \(A \subset [\eta _1,\nu _1]\times [\eta _2, \nu _2]\) with \(0<\eta _i\le \xi _i \le \nu _i\), \(\phi _i(\eta _i)=\phi _i(\nu _i)=\alpha -\mu _j\), \(j= i+(-1)^{i-1}\), \(i=1,2\). Hence, A is bounded with respect to the distance d of the metric space \((M_2,g_2)\). Therefore, the hypothesis (HCC)\(_p\) is satisfied. Notice also that the functions \(\phi _1\) and \(\phi _2\) are not convex on \(\mathbb {R}_{++}\) in the usual sense, but the scalar function \(y\mapsto \left\langle \lambda ,F(x,y)\right\rangle =\phi _1(y_1)+\phi _2(y_2)\) is strictly convex on the Riemannian manifold \((M_2,g_2)\). Since f and F are smooth, all the hypotheses of the previous sections are fulfilled.

So we can take advantage of the fact that the lower level (followers) objective is convex on this suitable Riemannian manifold, and therefore, we can transform equivalently our semivectorial bilevel problem into a usual bilevel problem (i.e., upper level and lower level are scalar problems) such that, for each choice of the new leader, the best response of the new lower level is unique (Theorem 3.1)!

Let us now find the function \((x,\lambda ) \mapsto y(x,\lambda )\), which represents the unique solution of the problem \(\min _{y\in M_2}\left\langle \lambda ,F(x,y)\right\rangle \). To simplify the computations, let us remark that this minimization problem is equivalent to the problem

$$\begin{aligned} \min _{y\in M_2}\frac{1}{\lambda _1+\lambda _2}\left\langle \lambda ,F(x,y)\right\rangle =\min _{y\in M_2}\big ( \theta F_1(x,y)+(1-\theta ) F_2(x,y)\big ), \end{aligned}$$

where \(\theta := \frac{\lambda _1}{\lambda _1+\lambda _2}\in ]0,1[\).

Thus, \(y(x,\lambda )=y(x, \theta )\), where \(y(x, \theta )\) is the unique solution of the problem \(\min _{y\in M_2}\big ( \theta F_1(x,y)+(1-\theta ) F_2(x,y)\big )\).

Since \(\{ \big ( \frac{\lambda _1}{\lambda _1+\lambda _2}, \frac{\lambda _2}{\lambda _1+\lambda _2}\big ) \big | \, \lambda \in \Lambda _p\}=\{(\theta , 1-\theta )| \, \theta \in ]0,1[\}\), by Theorem 3.1 problem \((OSB)_p\) is equivalent to the problem

$$\begin{aligned} \min _{x\in M_1}\min _{\theta \in ]0,1[}f\big (x,y(x,\theta )\big ). \end{aligned}$$
(20)

So \(y(x,\theta )\) solve the equation \(\mathrm{grad}(y\mapsto \big ( \theta F_1(x,y)+(1-\theta ) F_2(x,y)\big )=0\), i.e.,

$$\begin{aligned} y_1^2 \Big (\theta \frac{\partial F_1}{\partial y_1}(y)+(1-\theta )\frac{\partial F_2}{\partial y_1}(y)\Big )\frac{\partial }{\partial y_1}+ y_2^2\Big (\theta \frac{\partial F_1}{\partial y_2}(y)+(1-\theta )\frac{\partial F_2}{\partial y_2}(y)\Big )\frac{\partial }{\partial y_2}=0. \end{aligned}$$

Thus, \(y(x,\theta )=\big (y_1(x,\theta ), y_2(x,\theta )\big ) \) solves the systems of equations

$$\begin{aligned} \theta \frac{\partial F_1}{\partial y_1}(y)+(1-\theta )\frac{\partial F_2}{\partial y_1}(y)= & {} 0 \\ \theta \frac{\partial F_1}{\partial y_2}(y)+(1-\theta )\frac{\partial F_2}{\partial y_2}(y)= & {} 0, \end{aligned}$$

and some simple computations give

$$\begin{aligned} y(x,\theta )=\Big ( \exp \big (-\frac{1-\theta }{2\theta }x\big ), \frac{4(1-\theta )^2}{\theta ^2x^2}\Big ). \end{aligned}$$

To simplify notations, put \(\eta =\frac{1-\theta }{2\theta }\) and notice that the function \(\theta \mapsto \eta \) is a bijection from ]0, 1[ to \(]0,+\infty [\). Thus, if we denote

$$\begin{aligned} \hat{y}(x,\eta )=y(x,\theta )=\big ( e^{-\eta x}, \frac{16\eta ^2}{x^2}\big ), \end{aligned}$$

problem (20) becomes

$$\begin{aligned} \min _{x\in M_1}\min _{\eta \in ]0,+\infty [}f\big (x,\hat{y}(x,\eta )\big )=\min _{(x,\eta )\in M_1\times ]0,+\infty [}f\big (x,\hat{y}(x,\eta )\big ). \end{aligned}$$

But

$$\begin{aligned} f\big (x,\hat{y}(x,\eta )\big )=\frac{x^2}{4}+4\eta ^2-x-\eta , \end{aligned}$$

which obviously has \((2, \frac{1}{8})\) as unique global minimizer over \(M_1\times ]0,+\infty [=\mathbb {R}^2_+\).

Finally, the global minimizer of our original problem \((OSB)_p\) is given by

$$\begin{aligned} (x,y_1,y_2)= \big (2, e^{-\frac{1}{4}}, \frac{1}{16}\big ). \end{aligned}$$

4 Conclusions

We have considered the generalization in three directions of the technics presented in [11, 12] for the semivectorial bilevel convex optimal control problems. Thus, our problem has nonlinear upper and lower level obtaining more explicit optimality conditions, the order given by a convex pointed closed cone, and, the most important thing, we deal now with the Riemannian manifolds setting. The different results obtained are closely related to the geometric structure of leader’s Riemannian manifold as well as the followers one. Thus, explicit solutions could be obtained on specific Riemannian manifolds. The results presented are also important for the Euclidean case.

Even if some constrained optimization problem on linear spaces may become unconstrained for a suitable Riemannian manifold, we consider that further research avenues should include the case of explicit constraints in the upper and lower level.