Proximal gradient methods for multiobjective optimization and their applications

Tanabe, Hiroki; Fukuda, Ellen H.; Yamashita, Nobuo

doi:10.1007/s10589-018-0043-x

Proximal gradient methods for multiobjective optimization and their applications

Published: 31 October 2018

Volume 72, pages 339–361, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Optimization and Applications Aims and scope Submit manuscript

Proximal gradient methods for multiobjective optimization and their applications

Download PDF

2270 Accesses
32 Citations
3 Altmetric
Explore all metrics

Abstract

We propose new descent methods for unconstrained multiobjective optimization problems, where each objective function can be written as the sum of a continuously differentiable function and a proper convex but not necessarily differentiable one. The methods extend the well-known proximal gradient algorithms for scalar-valued nonlinear optimization, which are shown to be efficient for particular problems. Here, we consider two types of algorithms: with and without line searches. Under mild assumptions, we prove that each accumulation point of the sequence generated by these algorithms, if exists, is Pareto stationary. Moreover, we present their applications in constrained multiobjective optimization and robust multiobjective optimization, which is a problem that considers uncertainties. In particular, for the robust case, we show that the subproblems of the proximal gradient algorithms can be seen as quadratic programming, second-order cone programming, or semidefinite programming problems. Considering these cases, we also carry out some numerical experiments, showing the validity of the proposed methods.

An accelerated proximal gradient method for multiobjective optimization

Article 09 June 2023

Multiple reduced gradient method for multiobjective optimization problems

Article 13 February 2018

A Gradient-Free Method for Multi-objective Optimization Problem

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Multiobjective optimization consists in minimizing (or maximizing) more than one objective function at the same time and under possible constraints. In most of the cases, it is not possible to find a single point that minimizes all objective functions at once, so the concept of Pareto optimality becomes essential. A point is called Pareto optimal or efficient, if there does not exist another point with the same or smaller objective function values, and with at least one objective function value being strictly smaller. Multiobjective optimization problems have applications in many fields. We refer to [15] for a list of applications.

One of the most well-known methods for multiobjective optimization problems is the scalarization approach [16, 20, 23]. It consists in solving one or several parametrized single-objective optimization problems to find a solution of the original multiobjective problem. However, in the non-convex case, we may not necessarily get all Pareto optimal solutions using this approach.

In recent years, descent methods for multiobjective optimization problems have attracted a lot of attention in the optimization community. For example, a steepest descent method for differentiable unconstrained multiobjective optimization problems was proposed in [13]. Afterwards, a proximal point method [8], that can be applied to nondifferentiable problems, was considered. However, this method is just a conceptual scheme and does not necessarily generate subproblems that are easy to solve. For nondifferentiable problems, a subgradient method was also developed in [10]. However, for some particular problems, these methods may not be efficient.

In this paper, we consider the following unconstrained multiobjective optimization problem:

$$\begin{aligned} \begin{aligned} \min \quad&F(x) \\ \mathrm {s.t.}\quad&x \in {\mathbf {R}}^n, \end{aligned} \end{aligned}$$

(1.1)

where $F:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m$ is a vector-valued function with $F :=(F_1,\ldots ,F_m)^\top $ and $\top $ denotes transpose. We assume that each component $F_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}$ is defined by

$$\begin{aligned} F_i(x) :=f_i(x) + g_i(x),\quad i = 1,\ldots ,m, \end{aligned}$$

(1.2)

where $f_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}$ is continuously differentiable and $g_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{\infty \}$ is proper convex and lower semicontinuous but not necessarily differentiable.

In order to solve (1.1), we propose a proximal gradient method, that combines proximal point and gradient methods [8, 13]. The proximal point method solves subproblems iteratively, and their objective functions are defined as the sum of the original function F with regularization terms. However, these subproblems are nonlinear in general, so the computational cost for solving them is possibly high. On the other hand, a proximal gradient method applies a gradient method for the differentiable part $f_i$ and a proximal point method for the convex part $g_i$. As it can be seen in the single objective function cases [3, 22], this method is shown to be efficient when $g_i$ has some special structure.

We also observe that the problem and the proposed methods have many applications. For example, when $g_i$ is an indicator function of a convex set S, (1.1) is equivalent to the optimization problems with constraints $x \in S$. Also, as it can be seen in Sect. 5.2, we can deal with robust optimization problems. These problems include uncertain parameters and basically consists in optimizing under the worst scenario. Although the literature about robust optimization is vast, the studies about robust multiobjective optimization is relatively new [11, 14, 19].

The outline of this paper is as follows. We present some notations and notions of Pareto optimality and Pareto stationarity in Sect. 2. In Sect. 3, we propose the proximal gradient methods for unconstrained multiobjective optimization. We show the global convergence of the proposed algorithms in Sect. 4. In Sect. 5, we apply the proposed method to constrained problems and to robust optimization. Finally, we report some numerical experiments by solving robust multiobjective optimization problems in Sect. 6.

2 Preliminaries

Let us first present some notations that will be used in this paper. Let ${\mathbf {R}}$ denote the set of real numbers and let ${\mathbf {N}}$ be the set of positive integers. The symbol $\Vert \cdot \Vert $ stands for the Euclidean norm in ${\mathbf {R}}^n$. We also define the relation $\le $ (<) in ${\mathbf {R}}^m$ as $u \le v$ ($u < v$) if and only if $u_i \le v_i$ ($u_i < v_i$) for all $i = 1, \ldots , m$. Moreover, we call

$$\begin{aligned} h^\prime (x; d) :=\lim _{\alpha \searrow 0} \frac{h(x + \alpha d) - h(x)}{\alpha } \end{aligned}$$

the directional derivative of $h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}$ at x in the direction d. It is easy to see that $h^\prime (x; d) = \nabla h(x)^\top d$ when h is differentiable at x, where $\nabla h(x)$ denotes the gradient of h at x. The following well-known result shows a non-decreasing property when h is convex.

Lemma 2.1

[7, Section 4.3] Let $h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}$ be a convex function.

Then, the function $\tilde{h} :(0, +\infty ) \rightarrow {\mathbf {R}}$ defined by

$$\begin{aligned} \tilde{h}(\alpha ) :=\frac{h(x + \alpha d) - h(x)}{\alpha } \end{aligned}$$

is non-decreasing. In particular, it follows that

$$\begin{aligned} h(x + d) - h(x) \ge \frac{h(x + \alpha d) - h(x)}{\alpha } \quad \text {for all } \alpha \in (0, 1). \end{aligned}$$

Now, we introduce the concept of optimality for the multiobjective optimization problem (1.1). Recall that $x^*\in {\mathbf {R}}^n$ is a Pareto optimal point for F, if there is no $x \in {\mathbf {R}}^n$ such that $F(x) \le F(x^*)$ and $F(x) \ne F(x^*)$. The set of all Pareto optimal values is called Pareto frontier. Likewise, $x^*\in {\mathbf {R}}^n$ is a weakly Pareto optimal point for F, if there is no $x \in {\mathbf {R}}^n$ such that $F(x) < F(x^*)$. It is known that Pareto optimal points are always weakly Pareto optimal, and the converse is not always true. We also say that $\bar{x} \in {\mathbf {R}}^n$ is Pareto stationary (or critical), if and only if,

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (\bar{x};d) \ge 0 \quad \text {for all } d \in {\mathbf {R}}^n. \end{aligned}$$

Observe that this definition generalizes the one given in [13], because here we have to deal with possibly nondifferentiable $F_i$ (or, in particular, $g_i$). Moreover, instead of considering subdifferentials as in [10], we use the directional derivative notion. Still, with this definition, we can show in the next lemma that weakly Pareto optimal points are always Pareto stationary, but the converse is not always true. However, if every component $F_i$ is convex, then Pareto stationarity implies weak Pareto optimality. Furthermore, if every component $F_i$ is strictly convex, then Pareto stationary points are also Pareto optimal.

Lemma 2.2

1.
If $x \in {\mathbf {R}}^n$ is a weakly Pareto optimal point of F, then x is Pareto stationary.
2.
Let every component $F_i$ of F be convex. If $x \in {\mathbf {R}}^n$ is a Pareto stationary point of F, then x is weakly Pareto optimal.
3.
Let every component $F_i$ of F be strictly convex. If $x \in {\mathbf {R}}^n$ is a Pareto stationary point of F, then x is Pareto optimal.

Proof

We show the contrapositions of statements 1, 2 and 3.

1.
Suppose that x is not Pareto stationary. Then, there exists $d \in {\mathbf {R}}^n$ such that
$$\begin{aligned} F_i^\prime (x;d) < 0,\quad i = 1, \ldots , m. \end{aligned}$$
By the definition of directional derivative, for a sufficiently small scalar $\alpha > 0$ we obtain
$$\begin{aligned} F_i(x + \alpha d) - F_i(x) < 0,\quad i = 1, \ldots , m, \end{aligned}$$
which means that x is not weakly Pareto optimal.
2.
Suppose that x is not weakly Pareto optimal. Then, there exists a point $y \in {\mathbf {R}}^n$ such that $y \ne x$ and
$$\begin{aligned} F_i(y) < F_i(x),\quad i = 1, \ldots , m. \end{aligned}$$
(2.1)
Since $F_i$ is convex, it follows from Lemma 2.1 that
$$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } \le F_i(y) - F_i(x) \end{aligned}$$
for an arbitrary i and $\alpha \in (0, 1)$. From the convexity of $F_i$ and Lemma 2.1, the left-hand side is non-decreasing with respect to $\alpha $. Therefore, we have
$$\begin{aligned} F_i^\prime (x; y - x) \le F_i(y) - F_i(x) < 0, \end{aligned}$$
where the second inequality follows from (2.1). Since this inequality holds for all $i = 1, \ldots , m$, we have
$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d) < 0 \quad \text {for } d = y - x. \end{aligned}$$
Consequently, we conclude that x is not Pareto stationary.
3.
Suppose that x is not Pareto optimal. Then, there exists a point $y \in {\mathbf {R}}^n$ such that $y \ne x$ and $F_i(y) \le F_i(x)$ for all $i = 1, \ldots , m$. Since $F_i$ is strictly convex, for each i we have
$$\begin{aligned} F_i(x + \alpha (y - x)) < F_i(x) + \alpha (F_i(y) - F_i(x)) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$
Therefore, it follows that
$$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } < F_i(y) - F_i(x) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$
In the same way to statement 2, we obtain $F_i^\prime (x; y - x) < 0$ for all $i = 1, \ldots , m$, which concludes the proof. $\square $

3 Proximal gradient methods for multiobjective optimization

In this section, we propose two types of proximal gradient methods for unconstrained multiobjective optimization. Both generate some sequence $\{x^k\}$ iteratively with the following procedure:

$$\begin{aligned} x^{k + 1} = x^k + t_k d^k, \end{aligned}$$

where $d^k$ is a search direction, and $t_k$ is a step size. For the method without line searches, we set $t_k = 1$ in each iteration. Now, define the function $\psi _x :{\mathbf {R}}^n \rightarrow {\mathbf {R}}$ by

$$\begin{aligned} \psi _x(d) :=\max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i(x + d) - g_i(x) \right\} , \end{aligned}$$

(3.1)

where $\nabla f_i(x)$ denotes the gradient of $f_i$ at x. The term inside the maximum represents the approximation of $F_i(x + d) - F_i(x)$. It is clear that $\psi _x$ is convex and $\psi _x(0)=0$. The following lemma shows an important property of $\psi _x$.

Lemma 3.1

For all $d \in {\mathbf {R}}^n$, the following equality holds:

$$\begin{aligned} \psi _x^\prime (0;d) = \max _{i = 1,\ldots ,m} F_i^\prime (x; d). \end{aligned}$$

Proof

Since $\psi _x(0) = 0$, by the definition of directional derivative, we get

$$\begin{aligned} \psi _x^\prime (0;d) = \lim _{\alpha \searrow 0} \frac{\psi _x(\alpha d)}{\alpha }. \end{aligned}$$

Moreover, the definition of $\psi _x$ in (3.1) shows that

$$\begin{aligned} \lim _{\alpha \searrow 0} \frac{\psi _x(\alpha d)}{\alpha }&= \lim _{\alpha \searrow 0} \max _{i = 1,\ldots ,m} \frac{\nabla f_i(x)^\top (\alpha d) + g_i(x+\alpha d) - g_i(x)}{\alpha } \\&= \max _{i = 1,\ldots ,m} \lim _{\alpha \searrow 0} \frac{\nabla f_i(x)^\top (\alpha d) + g_i(x+\alpha d) - g_i(x)}{\alpha } \\&= \max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i^\prime (x;d)\right\} \\&= \max _{i = 1,\ldots ,m} F_i^\prime (x; d), \end{aligned}$$

where the second equality follows from the continuity of the max function and the third one comes from the definition of directional derivative. Thus, the claim holds. $\square $

Now, let $\ell $ be a positive constant. Here, we define $\phi _{\ell , x} :{\mathbf {R}}^n \rightarrow {\mathbf {R}}$ as

$$\begin{aligned} \phi _{\ell , x}(d) :=\psi _x(d) + \frac{\ell }{2} \Vert d\Vert ^2, \end{aligned}$$

where the function $\psi _x$ is defined in (3.1). Clearly, $\phi _{\ell , x}$ is strongly convex and $\phi _{\ell , x}(0) = 0$. Using this function, we define the search direction at an iteration k, which we call proximal gradient direction, as $d^k = d_\ell (x^k)$, where

$$\begin{aligned} d_\ell (x) :=\mathop {\mathrm{argmin}}\limits _{d\in {\mathbf {R}}^{n}} \phi _{\ell , x}(d). \end{aligned}$$

(3.2)

Remark 3.1

1.
When $g_i = 0$, $d_\ell (x)$ given in (3.2) corresponds to the search direction of the multiobjective steepest descent method [13]. On the other hand, when $f_i = 0$, $d_\ell (x)$ given in (3.2) corresponds to the search direction of the multiobjective proximal point method [8].
2.
Since $\phi _{\ell , x}$ is strongly convex, (3.2) has a unique solution, and so $d_\ell (x)$ is well-defined.
3.
Since $\phi _{\ell , x}(0) = 0$, we have $\phi _{\ell , x}(d_\ell (x)) \le 0$.

Let $\beta _\ell (x)$ be the optimal value in (3.2), i.e.,

$$\begin{aligned} \beta _\ell (x) :=\min _{d \in {\mathbf {R}}^n} \phi _{\ell , x}(d) = \phi _{\ell , x}(d_\ell (x)). \end{aligned}$$

(3.3)

The following lemma characterizes the stationarity in terms of $d_\ell (\cdot )$ and $\beta _\ell (\cdot )$.

Lemma 3.2

Let $d_\ell (x)$ and $\beta _\ell (x)$ be defined in (3.2) and (3.3), respectively. Then, the following statements hold.

1.
If x is Pareto stationary, then $d_\ell (x)=0$ and $\beta _\ell (x)=0$. Conversely, if $d_\ell (x)=0$ and $\beta _\ell (x)=0$, then x is Pareto stationary.
2.
If x is not Pareto stationary, then $d_\ell (x) \ne 0$ and $\beta _\ell (x) < 0$. Conversely, if $d_\ell (x) \ne 0$ and $\beta _\ell (x) < 0$, then x is not Pareto stationary.
3.
The mappings $d_\ell (\cdot )$ and $\beta _\ell (\cdot )$ are continuous.

Proof

1.
Let x be Pareto stationary. Suppose, for the purpose of contradiction, that $d_\ell (x) \ne 0$ or $\beta _\ell (x) < 0$. From statements 2 and 3 in Remark 3.1 it follows that $d_\ell (x) \ne 0$ if and only if $\beta _\ell (x) < 0$. This means that $d_\ell (x) \ne 0$ and $\beta _\ell (x) < 0$. Therefore, we see that
$$\begin{aligned} \beta _\ell (x) = \psi _x(d_\ell (x)) + \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2 < 0. \end{aligned}$$
(3.4)

Since $\psi _x$ is convex and $\psi _x(0) = 0$, we get

$$\begin{aligned} \psi _x(\alpha d_\ell (x))&= \psi _x(\alpha d_\ell (x) + (1 - \alpha ) \cdot 0) \\&\le \alpha \psi _x(d_\ell (x)) + (1 - \alpha ) \psi _x(0) \\&= \alpha \psi _x(d_\ell (x)) \\&< - \frac{\alpha \ell }{2} \Vert d_\ell (x)\Vert ^2 \quad \text {for all } \alpha \in (0, 1), \end{aligned}$$

where the last inequality follows from (3.4). Thus, for all $\alpha \in (0, 1)$ we have

$$\begin{aligned} \frac{\psi _x(\alpha d_\ell (x))}{\alpha } < - \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2. \end{aligned}$$

Since $d_\ell (x) \ne 0$ and $\ell > 0$, letting $\alpha \searrow 0$ we obtain

$$\begin{aligned} \psi _x^\prime (0; d_\ell (x)) \le - \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2 < 0. \end{aligned}$$

It then follows from Lemma 3.1 that

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d_\ell (x)) < 0, \end{aligned}$$

which contradicts the Pareto stationarity of x.

Let us now prove the converse. Then, suppose that $d_\ell (x)=0$ and $\beta _\ell (x)=0$. From the definition of $\beta _\ell (x)$ given in (3.3), we have

$$\begin{aligned} \phi _{\ell , x}(d) = \psi _x(d) + \frac{\ell }{2} \Vert d\Vert ^2 \ge \beta _\ell (x) = 0 \quad \text {for all } d. \end{aligned}$$

Let $\alpha \in (0,1)$. We get

$$\begin{aligned} \frac{\psi _x(\alpha d) + \frac{\ell }{2} \Vert \alpha d\Vert ^2}{\alpha } \ge 0 \quad \text {for all } d. \end{aligned}$$

Letting $\alpha \searrow 0$ and using Lemma 3.1, we obtain

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d) \ge 0, \end{aligned}$$

which is our claim.

2.
This statement is equivalent to statement 1.
3.
It is easy to see that the function
$$\begin{aligned} \max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i(x+d) - g_i(x) \right\} + \frac{\ell }{2} \Vert d\Vert ^2 \end{aligned}$$
(3.5)

is continuous with respect to x and d. Therefore, the optimal value function $\beta _\ell (\cdot )$ is also continuous from [5, Maximum Theorem]. Moreover, since the optimal set mapping $d_\ell (\cdot )$ is unique, $d_\ell (\cdot )$ is continuous from [18, Corollary 8.1]. $\square $

3.1 A proximal gradient method with line searches

In this section, we present the proposed method with line searches. To compute the step length $t_k > 0$, we use an Armijo rule. Let $\rho \in (0,1)$ be a prespecified constant. The condition to accept $t_k$ is given by

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k) + t_k \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m. \end{aligned}$$

(3.6)

We begin with $t_k = 1$ and while (3.6) is not satisfied, we update

$$\begin{aligned} t_k :=\xi t_k, \end{aligned}$$

where $\xi \in (0,1)$. The following lemma demonstrates the finiteness of this procedure.

Lemma 3.3

Let $d^k$ be defined in (3.2) with $x = x^k$ and assume that $x^k$ is not Pareto stationary. If $\rho \in (0, 1)$, then there exists some $\bar{t}_k > 0$ such that

$$\begin{aligned} F_i(x^k + t d^k) \le F_i(x^k) + t \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m \end{aligned}$$

for any $t \in (0, \bar{t}_k]$.

Proof

Let $t \in (0, 1]$. Since $g_i$ is convex for all $i = 1, \ldots , m$, we have

$$\begin{aligned} g_i(x^k + td^k) - g_i(x^k)&= g_i ( (1-t)x^k + t(x^k + d^k) ) - g_i(x^k) \\&\le (1-t)g_i(x^k) + tg_i(x^k + d^k) - g_i(x^k) \\&= t(g_i(x^k + d^k) - g_i(x^k)). \end{aligned}$$

Therefore, from the differentiability of f we obtain

$$\begin{aligned}&f_i(x^k + td^k) + g_i(x^k + td^k) \\&\quad \le \; f_i(x^k) + g_i(x^k) + t\nabla f_i(x^k)^\top d^k + t \left( g_i(x^k + d^k) - g_i(x^k) \right) + o(t) \\&\quad \le \; f_i(x^k) + g_i(x^k) + t\psi _{x^k} (d^k) + o(t), \end{aligned}$$

where the last inequality comes from the definition (3.1) of $\psi _{x^k}$. Since $x^k$ is not Pareto stationary, we have $\psi _{x^k} (d^k) < 0$ from Lemma 3.2. Thus, if $\rho \in (0,1)$, then there exists some $\bar{t}_k > 0$ such that

$$\begin{aligned} f_i(x^k + t d^k) + g_i(x^k + t d^k) \le f_i(x^k) + g_i(x^k) + t \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m \end{aligned}$$

for any $t \in (0, \bar{t}_k]$. $\square $

Finally, based on the previous discussions, we state below the proposed method, considering line searches.

Algorithm 3.1

Step 1 :: Choose $\ell > 0,\ \rho \in (0,1),\ \xi \in (0,1),\ x^0 \in {\mathbf {R}}^n$ and set $k :=0$.
Step 2 :: Compute $d^k$ by solving subproblem (3.2) with $x = x^k$.
Step 3 :: If $d^k = 0$, then stop.
Step 4 :: Compute the step length $t_k \in (0, 1]$ as the maximum of
$$\begin{aligned} T_k :=\{ t = \xi ^j \mid j \in {\mathbf {N}},\, F_i(x^k + t_k d^k) \le F_i(x^k) + t_k \rho \psi _{x^k}(d^k),\, i = 1,\ldots ,m \} \end{aligned}$$
Step 5 :: Set $x^{k+1} :=x^k + t_k d^k$, $k :=k + 1$, and go to Step 2.

Observe that from Lemma 3.2, the algorithm stops at Step 3 with a Pareto stationary point or produces an infinite sequence of nonstationary points $\{x^k\}$. If Step 4 is reached in some iteration k, it means that in Step 3, $d^k \ne 0$, or equivalently, $\beta _\ell (x^k) < 0$. Thus, we have $\psi _{x^k}(d^k) < 0$. From the Armijo condition, we conclude that all objective functions decrease, i.e.,

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k), \quad i = 1,\ldots ,m. \end{aligned}$$

3.2 A proximal gradient method without line searches

In this section, we assume that $\nabla f_i$ is Lipschitz continuous with constant L. When we set $\ell > L / 2$ in (3.2), we can fix the step length $t_k = 1$ for every iteration. We then state below the proposed method.

Algorithm 3.2

Step 1 :: Choose $\ell > L / 2,\ x^0 \in {\mathbf {R}}^n$ and set $k :=0$
Step 2 :: Compute $d^k$ by solving subproblem (3.2) with $x = x^k$.
Step 3 :: If $d^k = 0$, then stop.
Step 4 :: Set $x^{k+1} :=x^k + d^k$, $k :=k + 1$, and go to Step 2.

Similarly to Algorithm 3.1, it stops with a Pareto stationary point or generates an infinite sequence of nonstationary points. Moreover, as we can see in Lemma 4.3, the objective function values also decrease in each iteration.

4 Convergence analysis

In this section, we prove that the sequences generated by Algorithm 3.1 and Algorithm 3.2 converge to Pareto stationary points, respectively. From now on, we assume that an infinite sequence is generated. To begin with, we recall the so-called three points property, which is a key to show convergence of proximal point type algorithms.

Theorem 4.1

(Three points property) [9, Lemma 3.2] Let $\theta :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}$ be proper convex and define

$$\begin{aligned} x^*= \mathop {\mathrm{argmin}}\limits _{x \in {\mathbf {R}}^n} \left\{ \theta (x) + \frac{1}{2} \Vert x - y \Vert ^2 \right\} . \end{aligned}$$

Then, for all $z \in {\mathbf {R}}^n$, we have

$$\begin{aligned} \theta (x^*) - \theta (z) \le - \frac{1}{2} \Vert z - x^*\Vert ^2 - \frac{1}{2} \Vert y - x^*\Vert ^2 + \frac{1}{2} \Vert z - y \Vert ^2. \end{aligned}$$

In order to show the convergence of the search direction, we first prove the following lemma.

Lemma 4.1

Let $\{d^k\}$ be generated by Algorithms 3.1 or 3.2 and recall the definition of $\psi _x$ in (3.1). Then, we have

$$\begin{aligned} \psi _{x^k}(d^k) \le -\ell \Vert d^k\Vert ^2 \quad \text {for all } k. \end{aligned}$$

Proof

Defining $\theta :=\psi _{x^k} / \ell $ we can rewrite (3.2) with $x = x^k$ as

$$\begin{aligned} d^k = \mathop {\mathrm{argmin}}\limits _{d \in {\mathbf {R}}^n} \left\{ \theta (d) + \frac{1}{2} \Vert d-0\Vert ^2 \right\} . \end{aligned}$$

Thus, substituting $x^*= d^k$ and $y = z = 0$ into Theorem 4.1, we get $\theta (d^k) - \theta (0) \le -\Vert d^k\Vert ^2$. Therefore, recalling that $\psi _{x^k} (0) = 0$, we have

$$\begin{aligned} \psi _{x^k}(d^k) \le -\ell \Vert d^k\Vert ^2 \quad \text {for all } k. \end{aligned}$$

$\square $

4.1 Convergence of Algorithm 3.1

Let us recall the algorithm with line searches. We first show the convergence of the length of steps $\Vert x^{k + 1} - x^k\Vert $.

Lemma 4.2

Let $\{d^k\}$ be generated by Algorithm 3.1 and suppose that $\{F_i (x^k)\}$ is bounded from below for all $i = 1,\ldots ,m$. Then, it follows that

$$\begin{aligned} \sum _{k = 0}^\infty t_k \Vert d^k \Vert ^2 < \infty . \end{aligned}$$

Proof

From Lemma 4.1 and from (3.6) it follows that

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k) - t_k \rho \ell \Vert d^k\Vert ^2,\quad i = 1,\ldots ,m. \end{aligned}$$

Since $\{F_i (x^k)\}$ is bounded from below, there exists $\tilde{F}_i \le F_i(x^k)$ for all i and k. Adding up the above inequality from $k = 0$ to $k = \tilde{k}$, we obtain

$$\begin{aligned} F_i(x^{\tilde{k} + 1}) \le F_i(x^0) - \rho \ell \sum _{k = 0}^{\tilde{k}} t_k \Vert d^k \Vert ^2. \end{aligned}$$

Thus, we have

$$\begin{aligned} \sum _{k = 0}^{\tilde{k}} t_k \Vert d^k \Vert ^2 \le \frac{1}{\rho \ell } (F_i(x^0) - \tilde{F}_i). \end{aligned}$$

Taking $\tilde{k} \rightarrow \infty $, we have $\sum _{k = 0}^\infty t_k \Vert d^k \Vert ^2 < \infty $. $\square $

The next theorem is our main result. Basically, if the sequence produced by Algorithm 3.1 has accumulation points, then they are all Pareto stationary.

Theorem 4.2

Every accumulation point of the sequence $\{x^k\}$ generated by Algorithm 3.1, if it exists, is a Pareto stationary point. In particular, if the level set of each $F_i$ is bounded, then $\{x^k\}$ has accumulation points and they are all Pareto stationary.

Proof

The second statement follows immediately from the first. Let $\bar{x}$ be an accumulation point of $\{x^k\}$ and let $\{x^{k_j}\}$ be a subsequence converging to $\bar{x}$. From statement 3 of Lemma 3.2, we have $d^{k_j} = d_\ell (x^{k_j}) \rightarrow d_\ell (\bar{x})$. Here, it is sufficient to show that $d_\ell (\bar{x}) = 0$ because of statments 1 and 3 of Lemma 3.2. Suppose for contradiction that $d_\ell (\bar{x}) \ne 0$. Then, it follows from Lemma 4.2 that $t_{k_j} \rightarrow 0$ since the existence of an accumulation point of $\{x^k\}$ implies boundedness of $\{F_i(x^k)\}$ for all i. Therefore, by the definition of $t_{k_j}$ in Step 4 of Algorithm 3.1, for sufficiently large j there exists some $i_{k_j} \in \{1,\ldots ,m\}$ such that

$$\begin{aligned} F_{i_{k_j}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) \ge F_{i_{k_j}}(x^{k_j}) + \xi ^{-1} t_{k_j} \rho \psi _{x^{k_j}}(d^{k_j}). \end{aligned}$$

Since i only takes finite number of values $\{1,\ldots ,m\}$, we can assume that $i_{k_j} = \bar{i}$ without loss of generality. We thus obtain

$$\begin{aligned} \frac{F_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} \ge \rho \psi _{x^{k_j}}(d^{k_j}). \end{aligned}$$

(4.1)

Recall that $0< \xi ^{-1} t_{k_j} < 1$. It follows from the definition (3.1) of $\psi _{x^{k_j}}$ that

$$\begin{aligned}&\psi _{x^{k_j}}(d^{k_j}) \\&\quad \ge \; \{\nabla f_{\bar{i}}(x^{k_j})^\top d^{k_j} + g_{\bar{i}}(x^{k_j} + d^{k_j}) - g_{\bar{i}}(x^{k_j})\} \\&\quad \ge \; \frac{\xi ^{-1} t_{k_j} \nabla f_{\bar{i}}(x^{k_j})^\top d^{k_j} + g_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - g_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} \\&\quad =\; \frac{f_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) + g_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - f_{\bar{i}}(x^{k_j}) - g_{\bar{i}}(x^{k_j}) + o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}} \\&\quad =\; \frac{F_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}, \end{aligned}$$

where the second inequality comes from the convexity of $g_i$ and Lemma 2.1, and the first equality follows from the differentiability of f. Therefore, we get

$$\begin{aligned} \psi _{x^{k_j}}(d^{k_j}) \ge \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

(4.2)

From (4.1) and (4.2), we have

$$\begin{aligned}&\frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \\&\quad \ge \; \rho \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \rho \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

We thus get

$$\begin{aligned} \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \ge \left( \frac{\rho }{1 - \rho } \right) \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

(4.3)

On the other hand, Lemma 4.1 yields

$$\begin{aligned} \psi _{x^{k_j}}(d^{k_j}) \le -\ell \Vert d^{k_j}\Vert ^2. \end{aligned}$$

Since $d^{k_j} \rightarrow d_\ell (\bar{x}) \ne 0$, there exists some $\delta > 0$ such that

$$\begin{aligned} - \delta&\ge \psi _{x^{k_j}}(d^{k_j}) \\&\ge \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}, \end{aligned}$$

where the last inequality comes from (4.2). Therefore, we obtain

$$\begin{aligned} \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \le -\delta - \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

(4.4)

From (4.3) and (4.4), it follows that

$$\begin{aligned} \left( \frac{\rho }{1 - \rho } \right) \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}} \le -\delta - \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

Taking $j \rightarrow \infty $, we have $0 \le - \delta $, which contradicts the fact that $\delta > 0$. Therefore, we conclude that $d_\ell (\bar{x}) = 0$. $\square $

4.2 Convergence of Algorithm 3.2

Let us now show the convergence of the search direction for the algorithm without line searches. Recall that we have to assume Lipschitz continuity of $\nabla f_i$, with a constant $L > 0$.

Lemma 4.3

Let $\{d^k\}$ be generated by Algorithm 3.2 and suppose that $\{F_i (x^k)\}$ is bounded from below for all $i = 1,\ldots ,m$. Then, we have

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert d^k \Vert = 0. \end{aligned}$$

Proof

From the so-called descent Lemma [6, Proposition A.24] and by Lipschitz continuity of $\nabla f_i$, we obtain

$$\begin{aligned} f_i(x^k + d^k) \le f_i(x^k) + \nabla f_i(x^k)^\top d^k + \frac{L}{2} \Vert d^k\Vert ^2 . \end{aligned}$$

(4.5)

At the kth iteration, we have

$$\begin{aligned}&f_i(x^k + d^k) + g_i(x^k + d^k) \\&\quad =\; f_i(x^k) + g_i(x^k) + f_i(x^k + d^k) - f_i(x^k) + g_i(x^k + d^k) - g_i(x^k) \\&\quad \le \; f_i(x^k) + g_i(x^k) + \nabla f_i(x^k)^\top d^k + g_i(x^k + d^k) - g_i(x^k) + \frac{L}{2} \Vert d^k\Vert ^2 \\&\quad \le \; f_i(x^k) + g_i(x^k) + \psi _{x^k}(d^k) + \frac{L}{2} \Vert d^k\Vert ^2 \\&\quad \le \; f_i(x^k) + g_i(x^k) + \frac{L - 2 \ell }{2} \Vert d^k\Vert ^2. \end{aligned}$$

Here, the first inequality follows from (4.5), the second one follows from the definition (3.1) of $\psi _{x^k}$, and the third one comes from Lemma 4.1. Since $\{F_i (x^k)\}$ is bounded from below, there exists $\tilde{F}_i \le F_i(x^k) = f_i(x^k) + g_i(x^k)$ for all $i,\, k$. Adding up the above inequality from $k = 0$ to $k = \tilde{k}$, we obtain

$$\begin{aligned} f_i(x^{\tilde{k} + 1}) + g_i(x^{\tilde{k} + 1}) \le f_i(x^0) + g_i(x^0) + \frac{L - 2 \ell }{2} \sum _{k = 0}^{\tilde{k}} \Vert d^k \Vert ^2. \end{aligned}$$

Since $\ell > L / 2$, we have

$$\begin{aligned} \sum _{k = 0}^{\tilde{k}} \Vert d^k \Vert ^2 \le \frac{2}{2 \ell - L} (f_i(x^0) + g_i(x^0) - \tilde{F}_i). \end{aligned}$$

Taking $\tilde{k} \rightarrow \infty $, we obtain

$$\begin{aligned} \sum _{k = 0}^\infty \Vert d^k \Vert ^2 < \infty \end{aligned}$$

and hence $\lim _{k \rightarrow \infty } \Vert d^k \Vert = 0$. $\square $

The next theorem directly follows from Lemmas 4.3 and 3.2.

Theorem 4.3

Every accumulation point of the sequence $\{x^k\}$ generated by Algorithm 3.2, if it exists, is a Pareto stationary point. In particular, if the level set of each $F_i$ is bounded, then $\{x^k\}$ has accumulation points and they are all Pareto stationary.

5 Applications

In this section, we consider two applications of multiobjective optimization problem (1.1) with (1.2), and discuss how to solve subproblems (3.2) in these particular applications.

5.1 Application to constrained multiobjective optimization

In this section, we consider the following constrained multiobjective optimization problem:

$$\begin{aligned} \begin{aligned} \min \quad&f(x) \\ \mathrm {s.t.}\quad&x \in S, \end{aligned} \end{aligned}$$

(5.1)

where $f:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m$ is a vector-valued function with $f :=(f_1,\ldots ,f_m)^\top $ and $S \subset {\mathbf {R}}^n$ is convex. Suppose that each component $f_i$ of f is continuously differentiable. Let $g:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m$ be a vector-valued function with $g :=(g_1,\ldots ,g_m)^\top $, where each $g_i$ is the indicator function of S, i.e.,

$$\begin{aligned} g_i(x) = {\left\{ \begin{array}{ll} 0 &{}\quad x \in S, \\ \infty &{}\quad x \notin S. \end{array}\right. } \end{aligned}$$

Then, we can rewrite the search direction given in (3.2) with $x = x^k$ as

$$\begin{aligned} d^k :=\mathop {\mathrm{argmin}}\limits _{d \in S - x^k} \left\{ \max _{i = 1,\ldots ,m} \{\nabla f_i(x^k)^\top d\} + \frac{\ell }{2} \Vert d\Vert ^2 \right\} , \end{aligned}$$

which coincides with the projected gradient direction for multiobjective optimization [17].

5.2 Application to robust multiobjective optimization

Now, let us apply the proposed algorithms to robust multiobjective optimization. Here, we suppose that the problems include uncertain parameters. Moreover, suppose that we can estimate the set of these uncertain parameters. Then, we try to optimize by considering the worst scenario. We observe that studies about robust multiobjective optimization is relatively new [11, 14, 19].

Here, we consider the convex function $g_i$ defined as follows:

$$\begin{aligned} g_i(x) :=\max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u). \end{aligned}$$

(5.2)

We call ${\mathcal {U}}_i \subseteq {\mathbf {R}}^n$ an uncertainty set. From now on, we assume ${\mathcal {U}}_i \subset {\mathbf {R}}^n$ and $\hat{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^n \rightarrow {\mathbf {R}}$ to be convex with respect to x. It is easy to see that $g_i$ is also convex. However, $g_i$ is not necessarily differentiable even if $\hat{g}_i$ is differentiable. First, let us reformulate the subproblem (3.2) by using an extra variable $\gamma \in {\mathbf {R}}$ as

$$\begin{aligned} \begin{aligned} \min _{\gamma , d} \quad&\gamma + \frac{\ell }{2}\Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + g_i(x+d) - g_i(x) \le \gamma , \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

Note that $g_i$ is not easy to calculate, and thus, the subproblem is difficult to solve. When $\hat{g}_i$ and $\mathcal {U}_i$ have some special structure, the constraints can be written as explicit formulae by using the duality of (5.2). Now, assume that the dual problem of the maximization problem (5.2) is written as follows:

$$\begin{aligned} \min _{w_i} \quad&\tilde{g}_i (x, w_i) \\ \mathrm {s.t.}\quad&w_i \in \tilde{\mathcal {U}}_i (x), \end{aligned}$$

where $\tilde{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^m \rightarrow {\mathbf {R}}$ and $\displaystyle \tilde{\mathcal {U}}_i :{\mathbf {R}}^n \rightarrow 2^{{\mathbf {R}}^m}$. If strong duality holds, then we see that the subproblem (3.2) is equivalent to

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i} \quad&\gamma + \frac{\ell }{2} \Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + \tilde{g}_i (x + d, w_i) - g_i(x) \le \gamma ,\\&w_i \in \tilde{\mathcal {U}}_i (x + d), \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

(5.3)

When $\tilde{g}_i$ and $\tilde{\mathcal {U}}_i$ have some explicit form, this problem is tractable. As we mention below, in this case, we can convert the above subproblem to some well-known convex optimization problems. This idea can be also seen in [4]. In the following, we will introduce some robust multiobjective optimization problems where the subproblems can be written as quadratic programming, second-order cone programming or semidefinite programming problems.

5.2.1 Linearly constrained quadratic programming problem case

Suppose that $\hat{g}_i (x,u) = u^\top x$ and ${\mathcal {U}}_i = \{ u \in {\mathbf {R}}^n \mid A_i u \le b_i \}$, where $A_i \in {\mathbf {R}}^{d \times n}$ and $b_i \in {\mathbf {R}}^d$, that is, $\hat{g}_i$ is linear in x, and $\mathcal {U}_i$ is a polyhedron. Suppose also that $\mathcal {U}_i$ is nonempty and bounded. Then, we can rewrite (5.2) as the following linear programming problem:

$$\begin{aligned} \begin{aligned} \max _u \quad&x^\top u \\ \mathrm {s.t.}\quad&A_i u \le b_i. \end{aligned} \end{aligned}$$

(5.4)

Its dual problem is given by

$$\begin{aligned} \begin{aligned} \min _w \quad&b_i^\top w \\ \mathrm {s.t.}\quad&A_i^\top w = x, \\&w \ge 0. \end{aligned} \end{aligned}$$

Since the strong duality holds, we can convert the subproblem (3.2) [or, equivalently (5.3)] to a linearly constrained quadratic programming problem:

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i} \quad&\gamma + \frac{\ell }{2}\Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + b_i^\top w_i - g_i(x) \le \gamma , \\&A_i^\top w_i = x + d, \\&w_i \ge 0, \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

(5.5)

5.2.2 Second-order cone programming problem case

Suppose that $\hat{g}_i (x,u) = u^\top x$ and ${\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1,\, v \in {\mathbf {R}}^n \}$, where $a_i \in {\mathbf {R}}^n$ and $\ P_i \in {\mathbf {R}}^{n \times n}$, that is, $\hat{g}_i$ is once again linear in x and $\mathcal {U}_i$ is an ellipsoid. Then, for all $i = 1,\ldots ,m$ we have

$$\begin{aligned} g_i (x)= & {} \max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u) \\= & {} \max _{v : \Vert v \Vert \le 1} (a_i + P_i v)^\top x \\= & {} a_i^\top x + \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v. \end{aligned}$$

If $P_i^\top x = 0$, then $\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = 0 = \Vert P_i^\top x \Vert $. If $P_i^\top x \ne 0$, then $\frac{P_i^\top x}{\Vert P_i^| x\Vert }$ is a solution of $\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v$, and hence $\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = \Vert P_i^\top x \Vert $. Consequently, we have

$$\begin{aligned} g_i(x) = a_i^\top x + \Vert P_i^\top x \Vert . \end{aligned}$$

Therefore, introducing slack variables $\gamma \in {\mathbf {R}}$ and $\tau \in {\mathbf {R}}$, the subproblem (3.2) can be written as

$$\begin{aligned} \begin{aligned} \min _{\tau , \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\nabla f_i (x)^\top d + \Vert P_i^\top (x + d) \Vert - \Vert P_i^\top x\Vert + a_i^\top d \le \gamma , \quad i = 1,\ldots ,m, \\&\gamma + \frac{\ell }{2} \Vert d \Vert ^2 \le \tau . \end{aligned} \end{aligned}$$

Note that convex quadratic constraints can be converted to second-order cone constraints. Using the expression given in [1, Section 2.1], we get the following second-order cone programming problem (SOCP):

$$\begin{aligned} \begin{aligned} \min _{\tau , \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\begin{bmatrix} -(\nabla f_i(x) + a_i)^\top d + \gamma + \Vert P_i^\top x\Vert \\ P_i^\top (x + d) \end{bmatrix} \in \mathcal {K}_{n + 1}, \\&\begin{bmatrix} 1 - \gamma + \tau \\ 1 + \gamma - \tau \\ \sqrt{2 \ell } d \end{bmatrix} \in \mathcal {K}_{n + 2}, \end{aligned} \end{aligned}$$

(5.6)

where $\mathcal {K}_q :=\{(y_0, \bar{y}) \in {\mathbf {R}}\times {\mathbf {R}}^{q - 1} \mid y_0 \ge \Vert \bar{y} \Vert \}$ is the second-order cone in ${\mathbf {R}}^{q}$. The above SOCP can be solved efficiently with an interior point method [1].

5.2.3 Semidefinite programming problem case

Suppose that^{Footnote 1}$\hat{g}_i (x,u) = (x + u)^\top A_i (x + u)$ and ${\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1 \}$, where $A_i \in {\mathbf {R}}^{n \times n}$ and $A_i \succeq O,\, a_i \in {\mathbf {R}}^n$ and $P_i \in {\mathbf {R}}^{n \times n}$. Then, there exists a matrix $M_i \in {\mathbf {R}}^{n \times n}$ such that $A_i = M_i M_i^\top $. Note that $\hat{g}_i$ is convex quadratic and $\mathcal {U}_i$ is an ellipsoid. Here, without loss of generality we can assume that A is a symmetric matrix since $(x + u)^\top A_i (x + u) = (x + u)^\top \tilde{A}_i (x + u)$, where $\tilde{A}_i :=(A_i+A_i^\top ) / 2$. Then, $g_i(x)$ can be given as

$$\begin{aligned} g_i(x) = \max _{v : \Vert v \Vert \le 1} (x + a_i + P_i v)^\top A_i (x + a_i + P_i v). \end{aligned}$$

(5.7)

Since problem (5.7) is a maximization problem of a convex function, it is not a convex optimization problem. Fortunately, it can be seen as a subproblem of a trust region method, so its optimal value $g_i(x)$ can be obtained efficiently. Considering (5.7), we observe that

$$\begin{aligned} g_i(x + d) = \max _{v : \Vert v \Vert \le 1} (x + d + a_i + P_i v)^\top A_i (x + d + a_i + P_i v). \end{aligned}$$

(5.8)

From [2, Section 3], the Lagrangian dual of the maximization problem (5.8) is given by

$$\begin{aligned} \begin{aligned} \min _{\alpha , w} \quad&-w \\ \mathrm {s.t.}\quad&\begin{bmatrix} -\,P_i^\top A_i P_i&\quad -\,P_i^\top A_i (x + d + a_i) \\ -\,(x + d + a_i)^\top A_i^\top P_i&\quad -\,(x + d + a_i)^\top A_i (x + d + a_i) - w \end{bmatrix} \\&\qquad \qquad \qquad \succeq \alpha \begin{bmatrix} -I_n&\quad 0 \\ 0&\quad 1 \end{bmatrix}, \\&\alpha \ge 0, \end{aligned} \end{aligned}$$

(5.9)

where $I_n$ stands for the identity matrix of dimension n. Let $(\alpha ^*, w^*)$ be an optimal solution of (5.9) and assume that^{Footnote 2}$\dim (\ker (A_i + \alpha ^*I_n)) \ne 1$. Since both (5.8) and (5.9) have strictly feasible solutions and $I_n \succ O$, then the strong duality holds from [2, Theorem 3.5]. Therefore, recalling (5.3), the subproblem (3.2) is equivalent to

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i, \alpha _i} \quad&\gamma + \frac{\ell }{2} \Vert d \Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i (x)^\top d - w_i - g_i (x) \le \gamma , \\&\begin{bmatrix} -P_i^\top A_i P_i + \alpha _i I_n&\quad -P_i^\top A_i (x + d + a_i) \\ -(x + d + a_i)^\top A_i^\top P_i&\quad -(x + d + a_i)^\top A_i (x + d + a_i) - w_i -\alpha _i \end{bmatrix} \\&\qquad \qquad \qquad \succeq O, \\&\alpha _i \ge 0, \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

Now, by using slack variables $\tau \in {\mathbf {R}}$ and $\zeta _i \in {\mathbf {R}}$ and converting the convex quadratic constraints to second-order cone ones, we get the following semidefinite programming problem:

$$\begin{aligned} \begin{aligned} \min _{\tau , \alpha _i, w_i, \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d - w_i - g_i(x) \le \gamma , \\&\begin{bmatrix} 1 - \gamma + \tau \\ 1 + \gamma - \tau \\ \sqrt{2 \ell } d \end{bmatrix} \in \mathcal {K}_{n + 2}, \\&\begin{bmatrix} -\,P_i^\top A_i P_i + \alpha _i I_n&\quad -\,P_i^\top A_i (x + d + a_i) \\ -\,(x + d + a_i)^\top A_i^\top P_i&\quad \zeta _i \end{bmatrix} \succeq O, \\&\begin{bmatrix} \dfrac{1 - \zeta _i - w_i - \alpha _i}{2} \\ \dfrac{1 + \zeta _i + w_i + \alpha _i}{2} \\ M_i^\top (x + d + a_i) \end{bmatrix} \in \mathcal {K}_{n + 2}, \\&\alpha _i \ge 0, \quad i = 1,\ldots ,m, \end{aligned} \end{aligned}$$

(5.10)

where O stands for a zero matrix with appropriate dimension. Note that the second-order cone constraints can be converted further into semidefinite constraints.

6 Numerical experiments

In this section, we present some numerical results using Algorithm 3.2 for the problems in Sect. 5.2. The experiments are carried out on a machine with a 1.8 GHz Intel Core i5 CPU and 8GB memory, and we implement all codes in MATLAB R2017a. We consider the problem (1.1), where $n = 5,\, m = 2,\, f_i(x) = \frac{1}{2} x^\top A_i x + a_i^\top x,\, g_i(x) = \max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u),\ A_i \in {\mathbf {R}}^{n \times n},\, a_i \in {\mathbf {R}}^n$, and $\hat{g}_i:{\mathbf {R}}^n \rightarrow {\mathbf {R}},\, i = 1,\ldots , m$. Here, we assume that each $A_i$ is positive semidefinite, so it can be decomposed as $A_i = M_i M_i^\top $, where $M_i \in {\mathbf {R}}^{n \times n}$. We generate $M_i$ and $a_i$ by choosing every component randomly from the standard normal distribution. To implement Algorithm 3.2, we make the following choices.

Remark 6.1

Every component of $x^0$ is chosen randomly from the standard normal distribution.
In Experiments 1 and 3, we set the constant $\ell = 5$. In Experiment 2, we set the constant $\ell = 7$.
The terminate criteria is replaced by $\Vert d^k\Vert < \varepsilon :=10^{-6}$.

Also, we run each one of the following experiments 100 times from different initial points, and with $\delta = 0, 0.05, 0.1$. Naturally, when $\delta = 0$, no uncertainties are considered.

6.1 Experiment 1

In the first experiment, we solve the problem of Sect. 5.2(a). We assume that $g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x,\, i = 1,2$, where $\mathcal {U}_1 = \{u \in {\mathbf {R}}^5\mid -\delta \le u_i \le \delta ,\, i = 1,\ldots ,5 \}$ and $\mathcal {U}_2 = \{u \in {\mathbf {R}}^5\mid -\delta \le (Bu)_i \le \delta ,\, i = 1,\ldots ,5 \}$. Here, every component of $B \in {\mathbf {R}}^{5 \times 5}$ is chosen randomly from the standard normal distribution and $\delta \ge 0$. We use the MATLAB solver linprog to solve (5.4) and quadprog to solve (5.5). Figure 1 is the result for this experiment. For each $\delta $, we obtained part of the Pareto frontier, and as $\delta $ gets smaller the objective values become smaller.

6.2 Experiment 2

In the second experiment, we solve the problem of Sect. 5.2(b). We assume that $g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x$, where $\mathcal {U}_i = \{u \in {\mathbf {R}}^5 \mid \Vert u\Vert \le \delta \},\, i = 1, 2$. We use the MATLAB solver SeDuMi [21] to solve (5.6). Figure 2 is the result for this experiment. Once again, we obtained part of the Pareto frontier for the problems with and without uncertainties.

6.3 Experiment 3

Now, in the last experiment, we solve the problem of Sect. 5.2(c). We assume that $g_i(x) = \max _{u \in \mathcal {U}_i } (u + x)^\top B B^\top (u + x)$, where $\mathcal {U}_i = \{u \in {\mathbf {R}}^5\ | \ \Vert u\Vert \le \delta \},\, i = 1, 2$. Here, once again, every component of $B \in {\mathbf {R}}^{5 \times 5}$ is chosen randomly from the standard normal distribution and $\delta \ge 0$. We use the MATLAB solver fmincon to solve (5.7) and SeDuMi to solve (5.10). As it can be seen in Fig. 3, we also obtained the Pareto frontier in this case.

7 Conclusion

We proposed proximal gradient methods for unconstrained multiobjective optimization problems with, and without line searches. Under reasonable assumptions, we proved that each accumulation points of the sequences generated by the proposed algorithms are Pareto stationary. Moreover, we presented some applications in constrained optimization and robust multiobjective optimization. In constrained optimization, the proposed search direction is equivalent to the projected gradient direction, and in some robust optimization problems we can convert the subproblems to well-known optimization problems. Finally, we carried out some numerical experiments for robust multiobjective optimization problems and we observed that the Pareto frontier changes when the uncertainty set is modified.

We have not analyzed the proposed methods in view of the convergence rate. It is known that scalar-valued proximal gradient method is sublinear convergent under reasonable assumptions [3]. In recent years, faster methods such as Newton’s method [12] for differentiable multiobjective optimization problem have been also proposed. Therefore, an interesting topic for future research is to investigate the convergence rate of the proposed methods and to propose a proximal Newton-type algorithm for multiobjective optimization.

Notes

We denote $A \succeq (\succ ) O$ when A is positive semidefinite (positive definite). Also, $A \succeq (\succ ) B$ if and only if $A - B \succeq (\succ ) O$.
Here, $\dim $ denotes dimension of a space and $\ker $ means kernel of a matrix.

References

Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 1–52 (2001)
MathSciNet MATH Google Scholar
Beck, A., Eldar, Y.: Strong duality in nonconvex quadratic optimization with two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006)
Article MathSciNet MATH Google Scholar
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Article MathSciNet MATH Google Scholar
Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998)
Article MathSciNet MATH Google Scholar
Berge, C., Patterson, E.M.: Topological Spaces. Dover Publications, Edinburgh (1963)
Google Scholar
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
MATH Google Scholar
Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
MATH Google Scholar
Bonnel, H., Iusem, A.N., Svaiter, B.F.: Proximal methods in vector optimization. SIAM J. Optim. 15(4), 953–970 (2005)
Article MathSciNet MATH Google Scholar
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Article MathSciNet MATH Google Scholar
Cruz Neto, J.X., Silva, G.J.P., Ferreira, O.P., Lopes, J.O.: A subgradient method for multiobjective optimization. Comput. Optim. Appl. 54(3), 461–472 (2013)
Article MathSciNet MATH Google Scholar
Ehrgott, M., Ide, J., Schöbel, A.: Minmax robustness for multi-objective optimization problems. Eur. J. Oper. Res. 239(1), 17–31 (2014)
Article MathSciNet MATH Google Scholar
Fliege, J., Graña Drummond, L.M., Svaiter, B.F.: Newton’s method for multiobjective optimization. SIAM J. Optim. 20(2), 602–626 (2009)
Article MathSciNet MATH Google Scholar
Fliege, J., Svaiter, B.F.: Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 51(3), 479–494 (2000)
Article MathSciNet MATH Google Scholar
Fliege, J., Werner, R.: Robust multiobjective optimization and applications in portfolio optimization. Eur. J. Oper. Res. 234(2), 422–433 (2014)
Article MathSciNet MATH Google Scholar
Fukuda, E.H., Graña Drummond, L.M.: A survey on multiobjective descent methods. Pesquisa Operacional 34(3), 585–620 (2014)
Article Google Scholar
Geoffrion, A.M.: Proper efficiency and the theory of vector maximization. J. Math. Anal. Appl. 22(3), 618–630 (1968)
Article MathSciNet MATH Google Scholar
Graña Drummond, L.M., Iusem, A.N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28(1), 5–29 (2004)
Article MathSciNet MATH Google Scholar
Hogan, W.: Point-to-set maps in mathematical programming. SIAM Rev. 15(3), 591–603 (1973)
Article MathSciNet MATH Google Scholar
Morishita, M.: A descent method for robust multiobjective optimization in the presence of implemention errors. Master’s thesis, Kyoto University (2016)
Saul, G., Thomas, S.: The computational algorithm for the parametric objective function. Naval Res. Logist. Q. 2(12), 39–45 (1955)
MathSciNet Google Scholar
Sturm, J.F.: Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)
Article MathSciNet MATH Google Scholar
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Article MathSciNet MATH Google Scholar
Zadeh, L.: Optimality and non-scalar-valued performance criteria. IEEE Trans. Autom. Control 8(1), 59–60 (1963)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Kyoto University Foundation, and the Grant-in-Aid for Scientific Research (C) (17K00032) from Japan Society for the Promotion of Science. We are also grateful to the anonymous referees for their useful comments.

Author information

Authors and Affiliations

Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Yoshida-Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan
Hiroki Tanabe, Ellen H. Fukuda & Nobuo Yamashita

Authors

Hiroki Tanabe
View author publications
You can also search for this author in PubMed Google Scholar
Ellen H. Fukuda
View author publications
You can also search for this author in PubMed Google Scholar
Nobuo Yamashita
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ellen H. Fukuda.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tanabe, H., Fukuda, E.H. & Yamashita, N. Proximal gradient methods for multiobjective optimization and their applications. Comput Optim Appl 72, 339–361 (2019). https://doi.org/10.1007/s10589-018-0043-x

Download citation

Received: 20 July 2018
Published: 31 October 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10589-018-0043-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Proximal gradient methods for multiobjective optimization and their applications

Abstract

Similar content being viewed by others

An accelerated proximal gradient method for multiobjective optimization

Multiple reduced gradient method for multiobjective optimization problems

A Gradient-Free Method for Multi-objective Optimization Problem

1 Introduction

2 Preliminaries

Lemma 2.1

Lemma 2.2

Proof

3 Proximal gradient methods for multiobjective optimization

Lemma 3.1

Proof

Remark 3.1

Lemma 3.2

Proof

3.1 A proximal gradient method with line searches

Lemma 3.3

Proof

Algorithm 3.1

3.2 A proximal gradient method without line searches

Algorithm 3.2

4 Convergence analysis

Theorem 4.1

Lemma 4.1

Proof

4.1 Convergence of Algorithm 3.1

Lemma 4.2

Proof

Theorem 4.2

Proof

4.2 Convergence of Algorithm 3.2

Lemma 4.3

Proof

Theorem 4.3

5 Applications

5.1 Application to constrained multiobjective optimization

5.2 Application to robust multiobjective optimization

5.2.1 Linearly constrained quadratic programming problem case

5.2.2 Second-order cone programming problem case

5.2.3 Semidefinite programming problem case

6 Numerical experiments

Remark 6.1

6.1 Experiment 1

6.2 Experiment 2

6.3 Experiment 3

7 Conclusion

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation