Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems

Fazzio, N. S.; Schuverdt, M. L.

doi:10.1007/s11590-018-1353-8

Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems

Original Paper
Published: 16 November 2018

Volume 13, pages 1365–1379, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Optimization Letters Aims and scope Submit manuscript

Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems

Download PDF

N. S. Fazzio¹ &
M. L. Schuverdt¹

773 Accesses
21 Citations
Explore all metrics

Abstract

In this work we consider an extension of the projected gradient method (PGM) to constrained multiobjective problems. The projected gradient scheme for multiobjective optimization proposed by Graña Drummond and Iusem and analyzed by Fukuda and Graña Drummond is extended to include a nonmonotone line search based on the average of the successive previous functions values instead of the traditional Armijo-like rules. Under standard assumptions, stationarity of the accumulation points is established. Moreover, under standard convexity assumptions, we prove full convergence to weakly Pareto optimal solutions of any sequence produced by the proposed algorithm.

Linear convergence of a nonmonotone projected gradient method for multiobjective optimization

Article 03 September 2021

Convergence rates analysis of a multiobjective proximal gradient method

Article 24 April 2022

A Nonmonotone Projected Gradient Method for Multiobjective Problems on Convex Sets

Article 16 April 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

We will consider the constrained multiobjective optimization problem (MOP) of the form:

$$\begin{aligned} \text{ Minimize } F(x) \text{ subject } \text{ to } x \in C \end{aligned}$$

(1)

where $F: \mathbb {R}^n \rightarrow \mathbb {R}^r, F(x)=(F_1(x), \ldots , F_r(x))$ is a continuously differentiable vectorial function in $\mathbb {R}^n$ and $C \subseteq \mathbb {R}^n$ is a closed and convex set.

In a multicriteria setting there are many optimality definitions. Throughout this paper, we are interested in the Pareto and weak Pareto optimality concepts. A feasible point of problem (1) is called Pareto optimum or efficient solution [19] if there is no $x \in C $ such that $F(x) \le F(x^*)$ and $F(x) \ne F(x^*)$. A point $x^* \in C$ is said to be a weak Pareto optimum point or a weakly efficient solution if there is no $x \in C $ such that $F(x) < F(x^*)$, where the inequalities < and $\le $ must be understood in the component-wise sense.

The procedure we propose here is the projected gradient method (PGM) for vector optimization defined by Graña Drummond and Iusem [10] and analyzed by Fukuda and Graña Drummond in [7,8,9]. In those works, PGM share two essential features: (a) they are all descent methods: the objective value of the vectorial function F decreases at each iteration in the partial order induced by the underlying cones; and (b) full convergence of the sequences generated by the proposed methods is established under reasonable assumptions.

In the present work, we combine the classical PGM for vector optimization with two ingredients from the scalar case: (1) instead of a fixed parameter we propose to associate a variable steplength to compute the search direction; and (2) the classical Armijo line search is replaced by a nonmonotone strategy developed by Zhang and Hager [26].

Projected gradient method for different choices of the stepsize has been extensible studied, in the scalar case, see for example [1, 3, 4, 11, 16, 23] and references therein. From the practical point of view, the spectral choice of the steplength, introduced by Barzilai and Borwein [1], and later analyzed by Raydan [21] and used in [3], requires little computational work that makes it especially for large-scale problems. From the theoretical point of view, it has been proved that the scheme greatly speeds up the convergence of methods based on gradient ideas. As it is mentioned in [4] it is natural to combine the spectral gradient idea with nonmonotone line search.

Nonmonnotone line search techniques have been extensible studied in the scalar case with great success, see for example [6, 11,12,13, 18, 22, 24,25,26]. In the present work, we propose to use the nonmonotone technique defined in [26] based on the average of successive functional values. We prove that, without convexity assumptions, accumulation points of sequences generated by the proposed algorithm are stationary for problem (1). A feasible point $\bar{x} \in C$ is stationary for F if, and only if,

$$\begin{aligned} JF(\bar{x}) (C-\{\bar{x}\}) \cap \left[ -\mathbb {R}_{++}^n\right] = \emptyset \end{aligned}$$

where $JF(\bar{x}) (C-\{\bar{x}\}) := \{JF(\bar{x}) (x-\bar{x}): x \in C\}$. This is a necessary condition for Pareto optimality, [14]. Clearly, if $\bar{x} \in C$ is stationary for F then for all $v \in C-\{\bar{x}\}, JF(\bar{x}) v \not < 0.$

In [5] it is proved that the steepest descent method for smooth (scalar) convex minimization, with stepsize obtained using the Armijo rule implemented by a backtracking procedure, is globally convergent to a solution (essentially, under the sole assumption of existence of optima). Using a similar technique, we extend this result to the multiobjective setting, that is to say, we show that any sequence produced by the proposed algorithm converges to a weakly Pareto optimal point of problem (1), no matter how poor the initial guess might be. Full convergence of PGM for MOP was established in [7,8,9,10] under the monotone Armijo line search.

Up to our knowledge, this is the first time that a nonmonotone line search based of [26] is used to analyze the global convergence of a PGM for MOP. In [17] the authors introduce nonmonotone line searches for MOP in the steepest descent methods. In [20] the authors propose two nonmonotone gradient algorithms for vector optimization with a convex objective function based on the nonmonotone line search given by Grippo et al. [11] in the scalar context.

This paper is organized as follows. In Sect. 2 we present the nonmonotone projected gradient method for MOP. In Sect. 3 it is proved that accumulation points of sequences generated by the proposed algorithm are stationary for (1). In Sect. 4 the convex case is analyzed. Conclusions and lines for future research are given in Sect. 5.

Notation We denote: $\mathbb {R}_+ = \{ t \in \mathbb {R} \;|\; t \ge 0 \},$$\mathbb {R}_{++} = \{t \in \mathbb {R} \;|\; t > 0 \},$$\mathbb {N} = \{0, 1, 2, \ldots \},$$\Vert \cdot \Vert \text{ an } \text{ arbitrary } \text{ vector } \text{ norm }.$ If x and y are two vectors in $\mathbb {R}^n$, we write $x \le y$ if $x_i \le y_i$, $ i =1, \ldots , n,$ and $x < y$ if $x_i < y_i$, $ i =1, \ldots , n$ where $v_i$ is the $i-$th component of the vector v. If $F: \mathbb {R}^n \rightarrow \mathbb {R}^m$, $F = (f_1, \ldots , f_m)$, JF(x) stands for the Jacobian matrix of F at x: JF(x) is matrix in $\mathbb {R}^{m\times n}$ with entries $ (JF(x))_{ij}= \frac{\partial f_i(x)}{\partial x_j}$. If $K = \{k_0, k_1, k_2, \ldots \} \subset \mathbb {N}$ ($k_{j+1}>k_j )$ for all j, we denote $\lim _{k \in K} x^k = \lim _{j \rightarrow \infty } x^{k_j}.$ The canonical inner product is written as $\langle \cdot , \cdot \rangle $, and the transpose sign is denoted by $^T$.

2 The nonmonotone projected gradient method for MOP

The projected gradient method we consider in the present work uses the projected gradient direction $v_{\beta }(x)$, proposed in [10] and analyzed in [7], that is defined, for a given point $x \in C$, by

$$\begin{aligned} v_{\beta }(x)= \displaystyle \mathop {\mathrm {argmin}}_{v \in C-\{x\}} \beta \varphi _x(v)+ \frac{\Vert v\Vert ^2}{2} \end{aligned}$$

(2)

where $\beta >0$ is a parameter and

$$\begin{aligned} \varphi _x(v) = \displaystyle \max _{i =1,\ldots ,r } \{\nabla F_i (x)^T v\}. \end{aligned}$$

(3)

We also consider the optimal value function $\theta _{\beta }: C \rightarrow \mathbb {R}$ given by

$$\begin{aligned} \theta _{\beta }(x)= \beta \varphi _{x} (v_{\beta }(x)) +\frac{\Vert v_{\beta }(x)\Vert ^2}{2}. \end{aligned}$$

(4)

See for example [9, 10]. It is well established that a point $x \in C$ is stationary for F if, and only if, $\theta _{\beta }(x) = 0$, [10].

2.1 The nonmonotone line search

It is well-known in the literature that nonmonotone schemes can improve the efficiency of descent methods in the scalar case [4, 6, 15, 18, 22, 24, 26].

The nonmonotone line search framework developed by Grippo et al. [11] is based on the maximum functional value on some previous iterates: given $\sigma \in (0,1), M \ge 1$ the step $\alpha _k$ must satisfy

$$\begin{aligned} f(x^k + \alpha _k d^k) \le M_k + \sigma \alpha _k \nabla f(x^k)^T d^k \end{aligned}$$

(5)

where $\displaystyle M_k= \max \nolimits _{0 \le j \le m(k)} f(x^{k-j})$ where $m(k) = \min \{m(k-1)+1, M\}$.

As we have already mentioned in the Introduction, in [20] the authors propose two nonmonotone gradient algorithms based on the nonmonotone line search (5) but we have chosen Zhang and Hager’s strategy as it is known to be more efficient, at least in the scalar case. In [26] the authors propose the following scheme: given $\sigma \in (0,1),$$C_0 = f(x^0) $, $Q_0 = 1$, $0 \le \eta _{\min }\le \eta _{\max } \le 1, $$\eta _k \in \left[ \eta _{\min }, \eta _{\max }\right] $ the step $\alpha _k$ must satisfy

$$\begin{aligned} f(x^k + \alpha _k d^k) \le C_k + \sigma \alpha _k \nabla f(x^k)^T d^k \end{aligned}$$

where $ C_{k+1} =\frac{\left( \eta _k Q_k C_k+f(x^{k+1})\right) }{Q_{k+1}} \text{ and } Q_{k+1}=\eta _k Q_k+1$.

As it is mentioned in [26], if $\eta _k=0$ for each k, then the line search is the usual monotone Armijo line search. If $\eta _k=1$ for each k, then $C_k=A_k$ where $A_k=\frac{1}{k+1} \sum \nolimits _{i=0}^k f(x^i)$ is the average of the successive functional values. Thus, $C_k$ can be seen as a combination of the functional values $f(x^0), \ldots , f(x^k)$ using a parameter $\eta _k$ that controls the degree of non-monotonicity.

In order to use a nonmonotone line search in the descent method for MOP we propose the following scheme: given $\sigma \in (0,1),$$C_0 = F(x^0), Q_0 = 1$, $0 \le \eta _{\min }\le \eta _{\max } \le 1, $$\eta _k \in \left[ \eta _{\min }, \eta _{\max }\right] $ and $v^k$ such that $J F(x^k) v^k <0$ the step $\alpha _k$ must satisfy

$$\begin{aligned} F(x^k + \alpha _k v^k) \le C^k + \sigma \alpha _k J F(x^k) v^k \end{aligned}$$

(6)

where

$$\begin{aligned} C^{k+1} =\frac{\left( \eta _k Q_k C^k+F(x^{k+1})\right) }{Q_{k+1}} \text{ and } Q_{k+1}=\eta _k Q_k+1. \end{aligned}$$

(7)

Observe that we can write

$$\begin{aligned} C^{k+1} =\frac{\left( \eta _k Q_k+1\right) C^k+F(x^{k+1})-C^k}{Q_{k+1}}= C^k+\frac{F(x^{k+1})-C^k}{Q_{k+1}} \end{aligned}$$

and then

$$\begin{aligned} C^k-C^{k+1} = \frac{C^k- F(x^{k+1})}{Q_{k+1}}. \end{aligned}$$

(8)

2.2 The nonmonotone multiobjective projected gradient algorithm

We can now define an extension of the classical PGM using the nonmonotone line search technique (6)–(7) for the constrained MOP (1).

Algorithm 2.1. Choose $\sigma \in (0,1)$, $Q_0=1$, $0 \le \eta _{\min }\le \eta _{\max } \le 1$, $\eta _0 \in \left[ \eta _{\min }, \eta _{\max }\right] $, $0<\beta _{\min }< \beta _{\max }< \infty $ and $\beta _0 \in \left[ \beta _{\min }, \beta _{\max }\right] $. Let $x^0\in C$ be an arbitrary initial point. Set $C^{0} = F(x^0)$ and $k=0.$

1.
Compute the search direction.
$$\begin{aligned} v_{\beta _k}(x^k):=\displaystyle \mathop {\mathrm {argmin}}_{v \in C-\{x^k\}} \beta _k\varphi _{x^k}(v)+ \frac{\Vert v\Vert ^2}{2 } \end{aligned}$$
where
$$\begin{aligned} \varphi _{x^k}(v) = \displaystyle \max _{i =1,\ldots ,r } \{\nabla F_i(x^k)^T v\}. \end{aligned}$$
2.
Stopping criterion. Compute $\theta _{\beta _k} (x^k) = \beta _k \varphi _{x^k} (v_{\beta _k}(x^k)) +\frac{1}{2}\Vert v_{\beta _k}(x^k)\Vert ^2.$ If $\theta _{\beta _k}(x^k)=0,$ then stop.
3.
Compute the steplength. Choose $\alpha _k$ as the largest $\alpha \in \left\{ \frac{1}{2^j}: j = 0, 1, 2, ...\right\} $ such that
$$\begin{aligned} \displaystyle F(x^k+\alpha v_{\beta _k}(x^k))\le C^k+\sigma \alpha JF(x^k) v_{\beta _k}(x^k). \end{aligned}$$
(9)
4.
Set $x^{k+1}=x^k+\alpha _k v_{\beta _k}(x^k).$ Define $\beta _{k+1}$ such that $\beta _{k+1} \in \left[ \beta _{\min }, \beta _{\max }\right] $.

Choose $\eta _k \in \left[ \eta _{\min }, \eta _{\max }\right] $ and set $Q_{k+1}$ and $C^{k+1}$ as in (7). Do $k=k+1$, and go to Step 1.

Observe that by (7) and (9)

$$\begin{aligned} C^{k+1} =\frac{\eta _k Q_k C^k+F(x^{k+1})}{Q_{k+1}} \le C^k+ \frac{ \sigma \alpha _k JF(x^k) v_{\beta _k}(x^k)}{Q_{k+1}} \le C^k \end{aligned}$$

which implies that $\{C^k\}$ is a nonincreasing sequence in $\mathbb {R}^r$.

The differences between Algorithm 2.1 and a classical extension of PGM for the constrained MOP (see for example Algorithm 4.2 in [9]) rely on: (a) the presence of a variable steplength $\beta _k$ instead a fixed parameter to compute the search direction in Step 1; and (b) the nonmonotone line search in Step 3.

From the classical characterization of stationarity in terms of $v_{\beta }(\cdot )$ and $\theta _{\beta }(\cdot )$ [10], we have that $x^k$ is a stationary point for F if Algorithm 2.1 stops with the stopping criterion in Step 2.

3 Convergence analysis

In this we suppose that Algorithm 2.1 does not have a finite termination and therefore it generates infinite sequences $\{x^k\}, \{v_{\beta _k}(x^k)\}$ and $ \{\alpha _k\}$ .

The following Lemma follows from Lemma 1.1 in [26].

Lemma 1

If $v_{\beta _k}(x^k) \in C-\{x^k\}$ and $JF(x^k) v_{\beta _k}(x^k)<0$ for each k, then $F(x^k)\le C^k \le A^k$ for each k, where $A^k=\frac{1}{k+1}\sum \nolimits _{i=0}^k F(x^i).$

Lemma 2

Let $\{x^k\} \subset \mathbb {R}^n$ be a sequence generated by Algorithm 2.1. Then:

(a)
$\{x^k\} \subset C$.
(b)
Algorithm 2.1 in well defined.

Proof

(a)
follows from Proposition 5 in [10].
(b)
follows from Proposition 1 in [10] and the fact that $F(x^k)\le C^k$ for each k.

$\square $

Lemma 3

The scalar sequence $\{\Vert v_{\beta _k}(x)\Vert \}$ is bounded.

Proof

By the definition of $v_{\beta _k}(x),$ we have that

$$\begin{aligned} \beta _k \varphi _{x}(v_{\beta _k}(x))+\frac{\Vert v_{\beta _k}(x)\Vert ^2}{2}= \theta _{\beta _k}(x)\le 0 \end{aligned}$$

where the last inequality holds since $0 \in C-\{x\}$.

Then, by the Cauchy Schwartz inequality we have that, for each $i=1, \ldots , r$

$$\begin{aligned} - \beta _k \Vert \nabla F_i(x)\Vert \Vert v_{\beta _k}(x)\Vert +\frac{\Vert v_{\beta _k}(x)\Vert ^2}{2}\le 0. \end{aligned}$$

Therefore, by the continuous differentiability of F, Lemma 2(a) and the boundedness of $\{\beta _k\}$ we obtain the desired result. $\square $

Proposition 1

Let $\{x^k\} \subset \mathbb {R}^n$ be a sequence generated by Algorithm 2.1 and $K \subset \mathbb {N}$, $\bar{\beta } > 0, \bar{x} \in C$ such that $\lim _{k \in K} x^k = \bar{x}, \lim _{k \in K} \beta _k = \bar{\beta }$. Then, there exists $\bar{K}\subset K$ such that

$$\begin{aligned} \liminf _{k\in \bar{K}} \theta _{\beta _k}(x^k)= \theta _{\bar{\beta }}(\bar{x}). \end{aligned}$$

(10)

Proof

First, let us show that there exists $\bar{K}\subset K$ such that

$$\begin{aligned} \lim _{k\in \bar{K}} \theta _{\beta _k}(\bar{x})= \theta _{\bar{\beta }}(\bar{x}). \end{aligned}$$

(11)

Since $v_{{\beta }_k}(\bar{x}) \in C-\{\bar{x}\}$, from the definition of $\theta _{\bar{\beta }}(\bar{x})$ we have that

$$\begin{aligned} \theta _{\bar{\beta }}(\bar{x}) \le \bar{\beta } \varphi _{\bar{x}} (v_{{\beta }_k}(\bar{x})) +\frac{1}{2}\Vert v_{{\beta }_k}(\bar{x})\Vert ^2 = (\bar{\beta } - \beta _k) \varphi _{\bar{x}} (v_{{\beta }_k}(\bar{x})) + \theta _{{\beta }_k}(\bar{x}). \end{aligned}$$

Then, by using Lemma 3 (the sequence $\{\Vert v_{\beta _k}(\bar{x})\Vert \}$ is bounded) and the continuity of the maximum function $u\mapsto \max \left\{ u_1,\ldots , u_r\right\} $, by taking lim inf in an appropriate subsequence $\bar{K} \subset K$ when $k \in \bar{K}$ goes to infinity we obtain that

$$\begin{aligned} \theta _{\bar{\beta }}(\bar{x}) \le \liminf _{k \in \bar{K}} \theta _{{\beta }_k}(\bar{x}). \end{aligned}$$

(12)

Now, from the definition of $\theta _{{\beta }_k}(\bar{x})$, since $v_{\bar{\beta }}(\bar{x}) \in C-\{\bar{x}\}$ we have that $\theta _{{\beta }_k}(\bar{x}) \le {\beta }_k \varphi _{\bar{x}} (v_{\bar{{\beta }}}(\bar{x})) +\frac{1}{2}\Vert v_{\bar{{\beta }}}(\bar{x})\Vert ^2$. Then, by taking lim sup when $k \in \bar{K}$ goes to infinity we obtain that $ \limsup \nolimits _{k \in \bar{K}} \theta _{{\beta }_k}(\bar{x}) \le \theta _{\bar{\beta }}(\bar{x}) $ which, together with the inequality (12) proves (11).

Secondly, since $v_{\bar{\beta }}(\bar{x})\in C- \{\bar{x}\}$ we have that $v_{\bar{\beta }}(\bar{x})+\bar{x}-x^k \in C- \{ x^k\}$ and from the definition of $ \theta _{\beta _k}(x^k)$

$$\begin{aligned} \theta _{\beta _k}(x^k) \le \beta _k \varphi _{x^k} (v_{\bar{\beta }}(\bar{x})+\bar{x}-x^k) +\frac{1}{2}\Vert v_{\bar{\beta }}(\bar{x})+\bar{x}-x^k\Vert ^2. \end{aligned}$$

Hence, since $\varphi _{x}(u+v)\le \varphi _{x}(u)+\varphi _{x}(v)$ for all $u, v \in \mathbb {R}^n$ we obtain

$$\begin{aligned} \theta _{\beta _k}(x^k)\le & {} \beta _k \varphi _{x^k} (v_{\bar{\beta }}(\bar{x})) +\beta _k \varphi _{x^k} (\bar{x}-x^k)\\&+\frac{1}{2}\Vert v_{\bar{\beta }}(\bar{x})\Vert ^2+ \frac{1}{2}\Vert \bar{x}-x^k\Vert ^2+\langle v_{\bar{\beta }}(\bar{x}), \bar{x}-x^k\rangle . \end{aligned}$$

By the continuity of the maximum function $u\mapsto \max \left\{ u_1,\ldots , u_r\right\} $ and the continuously differentiability of F, taking lim sup for $k \in \bar{K}$ on the above inequality we have

$$\begin{aligned} \begin{array}{c}\limsup \limits _{k \in \bar{K}} \theta _{\beta _k}(x^k) \le \bar{\beta } \varphi _{\bar{x}} (v_{\bar{\beta }}(\bar{x})) +\frac{1}{2}\Vert v_{\bar{\beta }}(\bar{x})\Vert ^2 =\theta _{\bar{\beta }}(\bar{x}).\end{array} \end{aligned}$$

(13)

Once again, since $v_{\beta _k}(x^k) \in C-\{x^k\}$ we have that $v_{\beta _k}(x^k)+ x^k- \bar{x} \in C-\{\bar{x}\}$ and

$$\begin{aligned} \theta _{\beta _k}(\bar{x})\le & {} \beta _k \varphi _{\bar{x}} (v_{\beta _k}(x^k)+x^k-\bar{x}) +\frac{1}{2}\Vert v_{\beta _k}(x^k)+x^k-\bar{x}\Vert ^2 \\\le & {} \beta _k \varphi _{\bar{x}} (v_{\beta _k}(x^k))+ \beta _k \varphi _{\bar{x}}(x^k-\bar{x})\\&+\frac{1}{2}\Vert v_{\beta _k}(x^k)\Vert ^2+\frac{1}{2}\Vert x^k- \bar{x}\Vert ^2+\langle v_{\beta _k}(x^k),x^k- \bar{x}\rangle . \end{aligned}$$

So, by Lemma 3 and the continuity of the maximum function, taking lim inf for $k \in \bar{K}$ on both sides of the above inequality, we get, from (11),

$$\begin{aligned} \theta _{\bar{\beta }}(\bar{x})\le & {} \liminf _{k \in \bar{K}} \left[ \beta _k \varphi _{\bar{x}} (v_{\beta _k}(x^k)) +\frac{1}{2}\Vert v_{\beta _k}(x^k)\Vert ^2+ \beta _k \varphi _{x_k} (v_{\beta _k}(x^k)) -\beta _k \varphi _{x^k} (v_{\beta _k}(x^k))\right] \\= & {} \liminf _{k \in \bar{K}}\left[ \theta _{\beta _k}(x^k)+ \beta _k \left( \varphi _{\bar{x}} (v_{\beta _k}(x^k))- \varphi _{x^k} (v_{\beta _k}(x^k))\right) \right] . \end{aligned}$$

Now, using that the maximum function $u\mapsto \max \left\{ u_1,\ldots , u_r\right\} $ is Lipschitz continuous with constant 1:

$$\begin{aligned} |\varphi _{x} (v)-\varphi _{y} (w)|\le \Vert JF(x)v - JF(y)w\Vert \end{aligned}$$

for all $v, w \in \mathbb {R}^n$, we obtain that

$$\begin{aligned} \theta _{\bar{\beta }}(\bar{x})\le & {} \liminf _{k \in \bar{K}} [\theta _{\beta _k}(x^k)+\beta _k \Vert JF(\bar{x})- JF(x^k)\Vert \Vert v_{\beta _k}(x^k)\Vert ] \nonumber \\\le & {} \liminf _{k \in \bar{K}} \theta _{\beta _k}(x^k) \end{aligned}$$

(14)

where we use Lemma 3 and the continuous differentiability of F in the last inequality. Thus, by combining (13) with (14) we obtain that

$$\begin{aligned} \limsup \limits _{k \in \bar{K}}\theta _{\beta _k}(x^k)\le \theta _{\bar{\beta }}(\bar{x})\le \liminf _{k \in \bar{K}}\theta _{\beta _k}(x^k) \end{aligned}$$

which implies (10). $\square $

The following theorem is the main result of this section: it establishes stationarity of accumulation points of the sequences generated by Algorithm 2.1.

Theorem 1

Assume that F is bounded from below. Let $\{x^k\} \subset \mathbb {R}^n$ be a sequence generated by Algorithm 2.1. Then, every accumulation point of $\{x^k\}$ is a feasible stationary point of (1).

Proof

Let $\bar{x}$ be an accumulation point of the sequence $\{x^k\}$. Then, there exists $ K \subset \mathbb {N}$ such that $ \lim \nolimits _{ k \in K} x^k = \bar{x}.$ The feasibility of $\bar{x}$ follows combining the fact that C is closed with Lemma 2. The sequence $\{\beta _k\}_{k \in K}$ is bounded, then there exists $ K_0 \subset K$ and $\bar{\beta }>0$ such that $\limsup \nolimits _{k \in K_0}\beta _k=\bar{\beta }.$

Considering that $\alpha _k \in (0, 1]$ for all $k \in K_0,$ we have the following two possibilities: (a) $\limsup \nolimits _{k \in K_0} \alpha _k> 0, $ or (b) $\limsup \nolimits _{k \in K_0} \alpha _k=0$.

First, assume that (a) holds. Then, there exist $K_1 \subset K_0$ and $\bar{\alpha }>0$ such that $\limsup \nolimits _{k \in K_1} \alpha _{k}= \bar{\alpha }. $ Since $\bar{\alpha }>0$ there exists ${K}_2,$$K_2 \subset K_1$ such that $\alpha _k\ge \epsilon >0$ for all $k \in {K}_2.$ Therefore, for all $k \in {K}_2,$ by (9) and the fact that $ JF(x^k) v_{\beta _k}(x^k)<0,$ we have that

$$\begin{aligned} F(x^{k+1})\le \displaystyle C^k- \sigma \alpha _k (-JF(x^k) v_{\beta _k}(x^k)) \le \displaystyle C^k- \sigma \epsilon (-JF(x^k) v_{\beta _k}(x^k)). \end{aligned}$$

Then

$$\begin{aligned} \begin{array}{c} \displaystyle \sigma \epsilon (-JF(x^k) v_{\beta _k}(x^k))\le C^k-F(x^{k+1}). \end{array} \end{aligned}$$

(15)

Now, using (8) and (15) we obtain

$$\begin{aligned} C^k-C^{k+1} \ge \frac{ \sigma \epsilon (-JF(x^k) v_{\beta _k}(x^k))}{Q_{k+1}}. \end{aligned}$$

(16)

Then, since $\sum \nolimits _{k=0}^m C^k-C^{k+1}= C^0- C^{m+1}$ and F is bounded from below we have, by Lemma 1, that $R \le F(x^k)\le C^k,$ for all k, and we can conclude that $C^k$ is bounded from below. It follows from (16) that

$$\begin{aligned} \begin{array}{c}\displaystyle \sum _{k=0}^{\infty }\frac{ \sigma \epsilon (-JF(x^k) v_{\beta _k}(x^k))}{Q_{k+1}}< \infty .\end{array} \end{aligned}$$

(17)

If we assume that $\displaystyle \liminf \nolimits _{k\in {K}_2} (-JF(x^k) v_{\beta _k}(x^k)) \ne 0,$ then $\displaystyle \liminf \nolimits _{k\in {K}_2} (-JF(x^k)$$ v_{\beta _k}(x^k)) \ge c>0.$ Therefore, using the second equality in (7)

$$\begin{aligned} \displaystyle \sum _{k=0}^{\infty }\frac{ (-JF(x^k) v_{\beta _k}(x^k))}{Q_{k+1}}\ge \sum _{k=0}^{\infty }\frac{ c}{k+2} , \end{aligned}$$

in contradiction with (17).

Hence $\liminf _{k \in {K}_2}(-JF(x^k) v_{\beta _k}(x^k))=0$, which, component-wise, can be written as

$$\begin{aligned} \displaystyle \liminf _{k\in {K}_2} (-\nabla F_i(x^k)^T v_{\beta _k}(x^k))=0, \text{ for } \text{ all } i =1, \ldots , r. \end{aligned}$$

(18)

Therefore, using (18) and (10) there exists $\bar{K} \subset K_2$ such that

$$\begin{aligned} 0= & {} \liminf _{k\in \bar{K}} \beta _k \nabla F_i(x^k)^T v_{\beta _k}(x^k) \le \liminf _{k\in \bar{K}} \beta _k \varphi _{x^k} (v_{\beta _k}(x^k)) +\frac{1}{2 }\Vert v_{\beta _k}(x^k)\Vert ^2 \\= & {} \liminf _{k\in \bar{K}} \theta _{\beta _k} (x^k) =\theta _{\bar{\beta }}(\bar{x}) \end{aligned}$$

and $\theta _{\bar{\beta }}(\bar{x})=0$. Thus, since $\bar{\beta }$ is fixed, in view of what was mentioned immediatly after (4) we conclude that $\bar{x}$ is a stationary point of (1).

Now assume that (b) holds. Due to Lemma 3, there exist $K_1 \subset K_0$ and $\bar{v} \in \mathbb {R}^n$ such that $\lim _{k \in K_1}v_{\beta _k}(x^k) =\bar{v}$ and $\lim _{k \in K_1}\alpha _{k} =0$. Note that we have $\max _{i = 1, \ldots , r} \{\beta _k \nabla F_i(x^{k})^T v_{\beta _k}(x^k)\}\le \theta _{\beta _k}(x^{k})<0$ so, letting $k \in K_1$ go to infinity we get

$$\begin{aligned} \max _{i = 1, \ldots , r} \{\bar{\beta } \nabla F_i(\bar{x})^T\bar{v}\}\le \theta _{\bar{\beta }}(\bar{x})\le 0. \end{aligned}$$

(19)

Take now a fixed but arbitrary positive integer q. Since $\alpha _{k} \rightarrow 0$, for $k \in K_1$ large enough, we have $\alpha _{k}<\frac{1}{2^q},$ which means that the nonmonotone condition in Step 3 of Algorithm 2.1 is not satisfied for $\alpha =\frac{1}{2^q}$ at $x^k$:

$$\begin{aligned} F(x^{k} + \left( \frac{1}{2}\right) ^q v_{\beta _k}(x^k)) \not \le C^k+\sigma \left( \frac{1}{2}\right) ^q J F (x^{k}) v_{\beta _k}(x^k). \end{aligned}$$

So for each $k \in K_1$ there exists $i = i (k)\in \left\{ 1, \ldots , r\right\} $ such that

$$\begin{aligned} F_i(x^{k} + \left( \frac{1}{2}\right) ^q v_{\beta _k}(x^k))\ge C^k_i +\sigma \left( \frac{1}{2}\right) ^q \nabla F_i (x^{k})^T v_{\beta _k}(x^k) \end{aligned}$$

where $C^{k}_i =\frac{\left( \eta _{k-1} Q_{k-1} C^{k-1}_i+F_i(x^{k})\right) }{Q_{k}}. $ Since $\left\{ i (k)\right\} _{k \in K_1} \subset \left\{ 1, \ldots , r\right\} ,$ there exist $K_2 \subset K_1$ and an index $i_0$ such that $i_0 =i(k)$ for all $k \in K_2$,

$$\begin{aligned} F_{i_0}(x^{k} + \left( \frac{1}{2}\right) ^q v_{\beta _k}(x^k))\ge C^k_{i_0} + \sigma \left( \frac{1}{2}\right) ^q \nabla F_{i_0} (x^{k})^T v_{\beta _k}(x^k) \end{aligned}$$

and since, by Lemma 1, $F(x^k)\le C^k$ we have

$$\begin{aligned} F_{i_0}(x^{k} + \left( \frac{1}{2}\right) ^q v_{\beta _k}(x^k))\ge F_{i_0}(x^{k}) +\sigma \left( \frac{1}{2}\right) ^q \nabla F_{i_0} (x^{k})^T v_{\beta _k}(x^k). \end{aligned}$$

Taking the limit when $k \in K_2$ goes to infinity in the above inequality, we obtain $F_{i_0}(\bar{x} + (1/2)^q \bar{v})\ge F_{i_0} (\bar{x}) +\sigma (1/2)^q \nabla F_{i_0} (\bar{x})^T \bar{v}.$ Since this inequality holds for any positive integer q and for $i_0$ (depending on q), by Proposition 1 in [10] it follows that $J F (\bar{x}) \bar{v} \not < 0$, then $\max _{i=1, \ldots , r} \{\nabla F_i(\bar{x})^T \bar{v}\}\ge 0,$ which, together with (19) implies $\theta _{\bar{\beta }}(\bar{x}) = 0.$

Therefore, we conclude that $\bar{x}$ is a stationary point of (1). $\square $

4 The convex case

The objective of this section is to prove, under convexity assumptions, that in the case in which Algorithm 2.1 does not have a finite termination, the sequence $\{x^k\}$ converges to a weak Pareto optimum point.

We say that the mapping $F:\mathbb {R}^n \rightarrow \mathbb {R}^r$ is $\mathbb {R}^r_+$-convex if

$$\begin{aligned} F(\lambda x + (1 - \lambda )z)\le \lambda F(x) + (1 - \lambda )F(z) \end{aligned}$$

for all $x, z \in \mathbb {R}^n$ and all $\lambda \in [0, 1].$ Clearly, $F:\mathbb {R}^n \rightarrow \mathbb {R}^r$ is $\mathbb {R}^r_+$-convex if and only if its components $F_i : \mathbb {R}^n \rightarrow \mathbb {R}$ are all convex.

The proof of the full convergence to a weak Pareto optimum point follows from the analysis made in [7]. From the theory of convex analysis and optimization we know that, from (2), there exists $w(x) \in \mathbb {R}^r$ with

$$\begin{aligned} w(x) \ge 0,\quad \displaystyle \sum _{i=1}^r w_i (x) = 1 \end{aligned}$$

(20)

such that $v_{\beta }(x) = P_C(x - \beta JF(x)^T w (x))-x$, where $P_C(z)$ is the orthogonal projection of z on C, satisfying

$$\begin{aligned} \langle \beta J F(x)^T w(x)+v_{\beta }(x), v - v_{\beta }(x)\rangle \ge 0 \end{aligned}$$

(21)

for all $v \in C -\{x\}.$ See [2, 8, 9].

We will use the following technical result in order to prove the full convergence of the sequence $\{x^k\} $ generated by Algorithm 2.1.

Lemma 4

Suppose that F is $\mathbb {R}^r_+$-convex and let $\{x^k\} $ be an infinite sequence generated by Algorithm 2.1. Let $w^k := w(x^k )\in \mathbb {R}^r$ such that (20) holds and $v^k:= v_{\beta _k}(x^k) = P_C(x^k - \beta _k JF(x^k)^T w^k)-x^k$. If for $x^* \in C$ and $k \ge 0$, we have $F(x^*) \le C^k,$ then

$$\begin{aligned} \Vert x^{k+1}- x^*\Vert ^2\le & {} \Vert x^{k}- x^*\Vert ^2 + 2 \beta _{\max } \alpha _k |\langle w^k , J F(x^k)v^k \rangle |\nonumber \\&+2 \beta _{\max } \alpha _k \langle w^k ,C^k-F(x^k)\rangle . \end{aligned}$$

(22)

Proof

Since $x^{k+1} = x^k +\alpha _k v^k,$ we have

$$\begin{aligned} \Vert x^{k+1}- x^*\Vert ^2= & {} \Vert x^k-x^*+ \alpha _k v^k \Vert ^2 \nonumber \\= & {} \Vert x^k - x^*\Vert ^2 + \alpha _k^2 \Vert v^k\Vert ^2 -2 \alpha _k \langle v^k , x^*-x^k \rangle . \end{aligned}$$

(23)

By using (21) with $x = x^k$ and $\beta =\beta _k$ we get

$$\begin{aligned} \langle \beta _k J F(x^k )^T w^k + v^k , v - v^k\rangle \ge 0 \text{ for } \text{ all } v \in C - \{x^k \}. \end{aligned}$$

Taking $v = x^* - x^k \in C - \{x^k \}$ on the above inequality, we obtain

$$\begin{aligned} \begin{array}{c} -\langle v^k, x^* - x^k \rangle \le \beta _k \langle w^k , J F(x^k)(x^* - x^k) \rangle - \beta _k \langle w^k , J F(x^k)v^k \rangle - \Vert v^k\Vert ^2. \end{array} \end{aligned}$$

(24)

By the convexity of F we know that $J F(x^k )(x^* - x^k ) \le F(x^*)- F(x^k )$. Then, this fact, together with $w^k \ge 0$ and $F(x^*) \le C^k$, imply

$$\begin{aligned} \langle w^k , J F(x^k)(x^* - x^k) \rangle \le \langle w^k , F(x^*)- F(x^k ) \rangle \le \langle w^k , C^k- F(x^k) \rangle . \end{aligned}$$

Now, since $x^k$ is a nonstationary point, we have that $J F(x^k )v^k < 0$. Then, $\langle w^k , J F(x^k)v^k \rangle <0$ and from (24) it follows that

$$\begin{aligned} -\langle v^k, x^* - x^k \rangle\le & {} \beta _k \langle w^k , C^k- F(x^k ) \rangle + \beta _k | \langle w^k , J F(x^k)v^k \rangle |- \Vert v^k\Vert ^2\\\le & {} \beta _{\max } \langle w^k , C^k- F(x^k) \rangle + \beta _{\max } | \langle w^k , J F(x^k)v^k \rangle |- \Vert v^k\Vert ^2. \end{aligned}$$

Thus, from (23)

$$\begin{aligned} \Vert x^{k+1}- x^*\Vert ^2\le & {} \Vert x^k - x^*\Vert ^2 + \alpha _k^2 \Vert v^k\Vert ^2 + 2 \alpha _k \beta _k \langle w^k , J F(x^k)(x^* - x^k) \rangle \\&\quad +2 \alpha _k \beta _k | \langle w^k , J F(x^k)v^k \rangle |- 2 \alpha _k \Vert v^k\Vert ^2 \le \Vert x^k - x^*\Vert ^2 \\&\quad +2 \alpha _k \beta _{\max } \langle w^k , C^k- F(x^k ) \rangle + 2 \alpha _k \beta _{\max } | \langle w^k , J F(x^k)v^k \rangle |. \end{aligned}$$

And the result is obtained because $ \alpha _k^2 - 2 \alpha _k \le 0$ since $0 < \alpha _k \le 1$. $\square $

The following theorem is established in order to prove the full convergence of the sequence regardless of the initial feasible iterate.

Theorem 2

Assume that $F : \mathbb {R}^n \rightarrow \mathbb {R}^r$ is $\mathbb {R}^r_+$-convex. Let us suppose that $\{x^k\}$ is a sequence generated by Algorithm 2.1 by considering $\eta _{\max } < 1$ and let us define

$$\begin{aligned} L := \lbrace x \in C : F(x) \le C^k \text{ for } \text{ all } k\rbrace . \end{aligned}$$

(25)

Assume that $L \ne \emptyset $. Then, the sequence $\{x^k\}$ is quasi-Féjer convergent to L, which means that, for every $x^* \in L$ there exists a sequence $\{ \epsilon _k\}\subset \mathbb {R}$, $\epsilon _k \ge 0$ for all k such that $\Vert x^{k+1} - x^*\Vert ^2 \le \Vert x^k - x^*\Vert ^2 + \epsilon _k$ with $\sum \nolimits _{k=0}^{\infty } \epsilon _k < \infty $.

Proof

By the hypothesis there is $x^* \in L$. Since F is $\mathbb {R}^r_+$-convex it follows from Lemma 4 that, for all k

$$\begin{aligned} \Vert x^{k+1}-x^*\Vert ^2\le & {} \Vert x^k-x^*\Vert ^2 + 2\beta _{\max } \alpha _k | \langle w^k , J F(x^k)v^k \rangle |\nonumber \\&+2 \beta _{\max } \alpha _k \langle w^k ,C^k-F(x^k)\rangle . \end{aligned}$$

(26)

By considering the canonical basis of $\mathbb {R}^r$$\lbrace e_1, \ldots , e_r\rbrace $, we can write $w^k =\sum \nolimits _{i=1}^r w^k_i e_i$ for each k. Then, from (26) and the facts that $0\le w^k_i \le 1$ for all i and k and $C^k \ge F(x^k)$ we have that, for all k,

$$\begin{aligned} \Vert x^{k+1}-x^*\Vert ^2\le & {} \Vert x^k- x^*\Vert ^2 \\&\quad + 2\beta _{\max } \alpha _k \sum _{i=1}^r \left( | \langle e_i , J F(x^k)v^k \rangle |+\langle e_i ,C^k-F(x^k)\rangle \right) . \end{aligned}$$

Defining

$$\begin{aligned} \epsilon _k := 2\beta _{\max } \alpha _k \sum _{i=1}^r \left( | \langle e_i , J F(x^k )v^k \rangle |+\langle e^i ,C^k-F(x^k)\rangle \right) \end{aligned}$$

we have that $\epsilon _k \ge 0$ and $\Vert x^{k+1}-x^*\Vert ^2 \le \Vert x^k-x^*\Vert ^2 +\epsilon _k.$ Thus, let us prove that $\sum \nolimits _{k=0}^{\infty } \epsilon _k < \infty $. To do that, let us define $\epsilon _k^1 = 2\beta _{\max } \alpha _k \sum _{i=1}^r | \langle e_i , J F(x^k )v^k \rangle |$ and $\epsilon _k^2 = 2\beta _{\max } \alpha _k \sum _{i=1}^r \langle e_i ,C^k-F(x^k)\rangle .$ We are aware that $\epsilon _k^1$ depends on $\beta _k$ but our intention is to demonstrate that $\epsilon _k^1$ can be independently bounded from the steplength $\beta _k$ and that is the reason why we prefer to write it in that way.

From one side, from the line search condition in Step 3, we obtain $-\alpha _k \langle e_i , J F(x^k )v^k \rangle $$ \le \frac{1}{\sigma }( C^k_i- F_i(x^{k+1})).$ Since $x^k$ is nonstationary, we also have for all i and k that $\langle e_i , J F(x^k )v^k \rangle = \nabla F_i(x^k)^T v^k<0,$ which means that $-\alpha _k \langle e_i , J F(x^k )v^k \rangle = \alpha _k |\langle e_i , J F(x^k )v^k \rangle |.$ Hence, by the definition of $\epsilon _k^1$, $ \epsilon _k^1 \le \frac{2 \beta _{\max }}{\sigma }\sum \nolimits _{i=1}^r (C^k_i - F_i(x^{k+1}))$ and, by (8) $ \epsilon _k^1 \le \frac{2 \beta _{\max }}{\sigma }\sum \nolimits _{i=1}^r Q_{k+1} (C^k_i-C^{k+1}_i). $ Since

$$\begin{aligned} Q_{k+1}= 1+\displaystyle \sum _{i=0}^k \prod _{m=0}^i \eta _{k-m}\le 1+\displaystyle \sum _{i=0}^k \eta _{\max }^{i+1}\le \displaystyle \sum _{i=0}^{\infty }\eta _{\max }^i \le \frac{1}{1-\eta _{\max }} \end{aligned}$$

we obtain that

$$\begin{aligned} 0 \le \epsilon _k^1 \le \frac{ 2 \beta _{\max }}{\sigma (1-\eta _{\max })}\sum _{i=1}^r (C^k_i-C^{k+1}_i). \end{aligned}$$

(27)

On the other side, by (8), $C^{k}_i=C^{k-1}_i +\frac{ F_i(x^{k})-C^{k-1}_i}{Q_{k}},$ for all $k \ge 1$ and we have that $C^k_i-F_i(x^k)=(1-\frac{1}{Q_{k}} )(C^{k-1}_i-F_i(x^{k}))$. Then, by using (8) again:

$$\begin{aligned} \epsilon _k^2= & {} 2 \beta _{\max } \alpha _k\frac{(Q_{k}-1 )}{Q_k} \sum _{i=1}^r ( C^{k-1}_i-F_i(x^{k})) \\= & {} 2 \beta _{\max } \alpha _k (Q_{k}-1 ) \sum _{i=1}^r ( C^{k-1}_i-C^{k}_i). \end{aligned}$$

Since $Q_{k}-1 \le \frac{\eta _{\max }}{1-\eta _{\max }}$ and $\alpha _k \le 1$ we get

$$\begin{aligned} \epsilon _k^2 \le \frac{2 \beta _{\max } \eta _{\max }}{(1-\eta _{\max })}\displaystyle \displaystyle \sum _{i=1}^r ( C^{k-1}_i-C^{k}_i). \end{aligned}$$

(28)

Adding up from $k = 0$ to $k = N$ at the inequalities (27) and (28) where N is any positive integer and defining $C^{-1} = C^0$, we get

$$\begin{aligned} \displaystyle \sum _{k=0}^N \epsilon _k \le \frac{2 \beta _{\max }}{\sigma (1-\eta _{\max })} \sum _{i=1}^r (C_i^0 - C^{N+1}_i) +\frac{2 \beta _{\max } \eta _{\max }}{(1-\eta _{\max })} \sum _{i=1}^r (C_i^0 - C^{N}_i). \end{aligned}$$

Since N is an arbitrary positive integer, $\beta _{\max }> 0 , \sigma > 0, \eta _{\max } <1$ and $F(x^*) \le C^k$ for all k, we conclude that $\sum \nolimits _{k=0}^{\infty } \epsilon _k < \infty $ as we wanted to prove. $\square $

The previous theorem formulates conditions under which the sequence generated by the nonmonotone Algorithm 2.1 is a quasi-Féjer sequence. Some comments concerning the assumption $L \ne \emptyset $ are in order. When the Armijo line search is considered the assumption used is the following [8, 9]:

Assumption

Every $\mathbb {R}^r_+$-decreasing sequence $\{y^k\} \subset F(C) := \{F(x): x \in C \}$ is $\mathbb {R}^r_+$-bounded from below by an element of F(C), i.e., for any $\{y^k\}$ contained in F(C) with $y^{k+1} < y^k$ for all k, there exists $\bar{x} \in C$ such that $F(\bar{x}) \le y^k$ for all k.

It is well-known that in the classical unconstrained (convex) optimization case, this condition, known as $\mathbb {R}^r_+$- completeness, is equivalent to existence of solutions of the optimization problem and it is standard for ensuring existence of efficient points for vector optimization problems.

Thus, in the monotone case [8, 9], since the sequence $\{F(x^k)\}$ is $\mathbb {R}^r_+-$ decreasing, by the above assumption, there exists a point $\hat{x} \in \{x \in C : F(x) \le F(x^k)\}$ and an analogous of Lemma 4 is proved with the inequality $\Vert x^{k+1}- \hat{x}\Vert ^2 \le \Vert x^{k}- \hat{x}\Vert ^2 + 2 \beta _{\max } \alpha _k |\langle w^k , J F(x^k)v^k \rangle |, $ instead of (22).

In the nonmonotone case, we consider the set L defined by (25) by considering $C^k$ instead of $F(x^k)$ since $\{C^k\}$ is a $\mathbb {R}^r_+$-decreasing sequence. This fact is the responsible for the appearance of the last term in (22) and for $\epsilon _k^2 $ in the proof of Theorem 2.

Observe finally that the assumption $L \ne \emptyset $ is verified when F is a linear mapping or being nonlinear F(C) is a convex set or even when the original problem has a solution.

Theorem 3

Assume that $F : \mathbb {R}^n \rightarrow \mathbb {R}^r$ is $\mathbb {R}^r_+$-convex and that the hypotheses of the previous theorem hold. Then, any sequence $\{x^k\}$ generated by Algorithm 2.1 converges to a weak Pareto optimum point.

Proof

By Theorem 2, since $\{x^k\}$ is a quasi-Féjer sequence convergent to L, it follows, from Theorem 1 in [5], that $\lbrace x^k\rbrace $ is bounded. So, since C is closed, $\lbrace x^k\rbrace \subset C $ has at least one feasible limit point. Thus, there exist $x^* \in C$ and $K \subset \mathbb {N}$ such that $\lim _{k \in K} x^k = x^*$. By Theorem 1, $x^*$ is stationary for problem (1). Using the $\mathbb {R}^r_+$-convexity of F, from Lemma 5.2 in [7], it follows that $x^* \in C$ is a weak Pareto optimum solution of (1).

Let us prove now that $x^* \in L.$ Since $C^{k+1} \le C^k $ for all k, then, for each $k_0$ fixed we have that $ C^{k} \le C^{k_0}$ for all $k > k_0$. Then, $F(x^k) \le C^{k_0}$ for all $k > k_0.$ Thus, by taking limit for $k \in K$ and using the continuity of F we obtain that $ F(x^*) \le C^{k_0}.$ Then, $ F(x^*) \le C^k$ for all $k \in K$. Thus $x^* \in L.$ Then, by Theorem 1 in [5], since $\{x^k\}$ has a cluster limit point $x^* \in L$ we conclude that $\{x^k\}$ converges to $x^* \in C$ as we wanted to prove. $\square $

5 Conclusions

We have presented a new algorithm for multiobjective optimization with convex constraints. At each iteration the search direction was computed by considering a variable steplength instead of a fixed parameter. The novelty feature has been the use of a nonmonotone line search technique instead of the classical Armijo strategy.

Stationarity of the accumulation points of the sequences generated by the proposed algorithm has been established in the general case, and, under standard convexity assumptions, full convergence to weak Pareto points of any sequence has been proved.

Future work will be focused on the study of a numerical implementation of Algorithm 2.1 by incorporating a variable steplength $\beta _k$ which may be updated using the ideas underlying the spectral parameter from the scalar case [1, 3, 21]. The use of the nonmonotone PGM as a method to solve the subproblem inside of an Augmented Lagrangian type method to solve multiobjective problems with general constraints is also left as a topic for a future work.

References

Barzilai, J., Borwein, J.M.: Two-point step size gradient methods. IMA J. Numer. Anal. 8, 141–148 (1988)
Article MathSciNet MATH Google Scholar
Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont, MA (2003)
MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Inexact spectral projected gradient methods on convex sets. IMA J. Numer. Anal. 23(4), 539–559 (2003)
Article MathSciNet MATH Google Scholar
Birgin, E.G., Martínez, J.M., Raydan, M.: Nonmonotone spectral projected gradient methods on convex sets. SIAM J. Optim. 10(4), 1196–12119 (1999)
Article MathSciNet MATH Google Scholar
Burachik, R., Graña Drummond, L.M., Iusem, A.N., Svaiter, B.F.: Full convergence of the steepest descent method with inexact line searches. Optimization 32(2), 137–146 (1995)
Article MathSciNet MATH Google Scholar
Dai, Y.H.: On the nonmonotone line search. J. Optim. Theory Appl. 112(2), 315–330 (2002)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Graña Drummond, L.M.: On the convergence of the projected gradient method for vector optimization. Optimization 60, 1009–1021 (2011)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Graña Drummond, L.M.: Inexact projected gradient method for vector optimization. Comput. Optim. Appl. 54, 473–493 (2013)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Graña Drummond, L.M.: A survey on multiobjective descent methods. Pesqui. Oper. 34, 585–620 (2014)
Article Google Scholar
Graña Drummond, L.M., Iusem, A.N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28, 5–29 (2004)
Article MathSciNet MATH Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A nonmonotone line search technique for newton’s method. SIAM J. Numer. Anal. 23, 707–716 (1986)
Article MathSciNet MATH Google Scholar
Grippo, L., Lampariello, F., Lucidi, S.: A class on nonmonotone stabilization methods in unconstrained optimization. Numer. Math. 59(8), 779–806 (1991)
Article MathSciNet MATH Google Scholar
Hager, W.W., Zhang, H.: Algorithm 851: CG\_DESCENT, a conjugate gradient method with guaranteed descent. Assoc. Comput. Mach. ACM Trans. Math. Softw. 32(1), 113–137 (2006)
Article MATH Google Scholar
Luc, D.: Theory of Vector Optimization. Springer, Berlin (1989)
Book Google Scholar
Lucidi, S., Rochetich, F., Roma, M.: Curvilinear stabilization techniques for truncated newton methods in large-scale unconstrained optimization. SIAM J. Opt. 8, 916–939 (1998)
Article MathSciNet MATH Google Scholar
McCormick, G.P., Tapia, R.A.: The gradient projection method under mild differentiability conditions. SIAM J. Control Optim. 10, 93–98 (1972)
Article MathSciNet MATH Google Scholar
Mita, K., Fukuda, E.H., Yamashita, N.: On using nonmonotone line search techniques in steepest descentmethods for multiobjective optimization. In: Proceedings of the 61stAnnual Conference of the Institute of Systems, Control andInformation Engineers (ISCIE), Kyoto (2017)
Panier, E.R., Tits, A.L.: Avoiding the maratos effect by means of a nonmonotone line search. I: general constrained problems. SIAM J. Numer. Anal. 28(4), 1183–1195 (1991)
Article MathSciNet MATH Google Scholar
Pareto, V.: Manual D’economie Politique. F. Rouge, Lausanne (1896)
Google Scholar
Qu, S., Ji, Y., Jiang, J., Zhang, Q.: Nonmonotone gradient methods for vector optimization with a portfolio optimization application. Eur. J. Oper. Res. 263(2), 356–366 (2017)
Article MathSciNet MATH Google Scholar
Raydan, M.: On the Barzilai and Borwein choice of steplength for the gradient method. IMA J. Numer. Anal. 13, 321–326 (1993)
Article MathSciNet MATH Google Scholar
Raydan, M.: The Barzilai and Borwein gradient method for the large scale unconstrained minimization problem. SIAM J. Optim. 7(1), 26–33 (1997)
Article MathSciNet MATH Google Scholar
Schwartz, A., Polak, E.: Family of projected descent methods for optimization problems with simple bounds. J. Optim. Theory Appl. 92, 1–31 (1997)
Article MathSciNet MATH Google Scholar
Toint, P.L.: An assessment of nonmonotone linesearch techniques for unconstrained optimization. SIAM J. Sci. Comput. 17(3), 725–739 (1996)
Article MathSciNet MATH Google Scholar
Wan, Z., Feng, D.: Investigation on a nonmonotone cautious BFGS algorithm. Math. Numer. Sin. 33(4), 387–396 (2011)
MathSciNet MATH Google Scholar
Zhang, H., Hager, W.W.: A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 14(4), 1043–1056 (2004)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are indebted to the anonymous referees whose comments helped a lot to improve the quality of the paper.

Author information

Authors and Affiliations

CONICET, Department of Mathematics, FCE, University of La Plata, La Plata, Argentina
N. S. Fazzio & M. L. Schuverdt

Authors

N. S. Fazzio
View author publications
You can also search for this author in PubMed Google Scholar
M. L. Schuverdt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. L. Schuverdt.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fazzio, N.S., Schuverdt, M.L. Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems. Optim Lett 13, 1365–1379 (2019). https://doi.org/10.1007/s11590-018-1353-8

Download citation

Received: 30 January 2018
Accepted: 03 November 2018
Published: 16 November 2018
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s11590-018-1353-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems

Abstract

Similar content being viewed by others

Linear convergence of a nonmonotone projected gradient method for multiobjective optimization

Convergence rates analysis of a multiobjective proximal gradient method

A Nonmonotone Projected Gradient Method for Multiobjective Problems on Convex Sets

1 Introduction

2 The nonmonotone projected gradient method for MOP

2.1 The nonmonotone line search

2.2 The nonmonotone multiobjective projected gradient algorithm

3 Convergence analysis

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Proposition 1

Proof

Theorem 1

Proof

4 The convex case

Lemma 4

Proof

Theorem 2

Proof

Assumption

Theorem 3

Proof

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Convergence analysis of a nonmonotone projected gradient method for multiobjective optimization problems

Abstract

Similar content being viewed by others

Linear convergence of a nonmonotone projected gradient method for multiobjective optimization

Convergence rates analysis of a multiobjective proximal gradient method

A Nonmonotone Projected Gradient Method for Multiobjective Problems on Convex Sets

1 Introduction

2 The nonmonotone projected gradient method for MOP

2.1 The nonmonotone line search

2.2 The nonmonotone multiobjective projected gradient algorithm

3 Convergence analysis

Lemma 1

Lemma 2

Proof

Lemma 3

Proof

Proposition 1

Proof

Theorem 1

Proof

4 The convex case

Lemma 4

Proof

Theorem 2

Proof

Assumption

Theorem 3

Proof

5 Conclusions

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation