1 Introduction

Multiobjective optimization consists in minimizing (or maximizing) more than one objective function at the same time and under possible constraints. In most of the cases, it is not possible to find a single point that minimizes all objective functions at once, so the concept of Pareto optimality becomes essential. A point is called Pareto optimal or efficient, if there does not exist another point with the same or smaller objective function values, and with at least one objective function value being strictly smaller. Multiobjective optimization problems have applications in many fields. We refer to [15] for a list of applications.

One of the most well-known methods for multiobjective optimization problems is the scalarization approach [16, 20, 23]. It consists in solving one or several parametrized single-objective optimization problems to find a solution of the original multiobjective problem. However, in the non-convex case, we may not necessarily get all Pareto optimal solutions using this approach.

In recent years, descent methods for multiobjective optimization problems have attracted a lot of attention in the optimization community. For example, a steepest descent method for differentiable unconstrained multiobjective optimization problems was proposed in [13]. Afterwards, a proximal point method [8], that can be applied to nondifferentiable problems, was considered. However, this method is just a conceptual scheme and does not necessarily generate subproblems that are easy to solve. For nondifferentiable problems, a subgradient method was also developed in [10]. However, for some particular problems, these methods may not be efficient.

In this paper, we consider the following unconstrained multiobjective optimization problem:

$$\begin{aligned} \begin{aligned} \min \quad&F(x) \\ \mathrm {s.t.}\quad&x \in {\mathbf {R}}^n, \end{aligned} \end{aligned}$$
(1.1)

where \(F:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) is a vector-valued function with \(F :=(F_1,\ldots ,F_m)^\top \) and \(\top \) denotes transpose. We assume that each component \(F_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) is defined by

$$\begin{aligned} F_i(x) :=f_i(x) + g_i(x),\quad i = 1,\ldots ,m, \end{aligned}$$
(1.2)

where \(f_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) is continuously differentiable and \(g_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{\infty \}\) is proper convex and lower semicontinuous but not necessarily differentiable.

In order to solve (1.1), we propose a proximal gradient method, that combines proximal point and gradient methods [8, 13]. The proximal point method solves subproblems iteratively, and their objective functions are defined as the sum of the original function F with regularization terms. However, these subproblems are nonlinear in general, so the computational cost for solving them is possibly high. On the other hand, a proximal gradient method applies a gradient method for the differentiable part \(f_i\) and a proximal point method for the convex part \(g_i\). As it can be seen in the single objective function cases [3, 22], this method is shown to be efficient when \(g_i\) has some special structure.

We also observe that the problem and the proposed methods have many applications. For example, when \(g_i\) is an indicator function of a convex set S, (1.1) is equivalent to the optimization problems with constraints \(x \in S\). Also, as it can be seen in Sect. 5.2, we can deal with robust optimization problems. These problems include uncertain parameters and basically consists in optimizing under the worst scenario. Although the literature about robust optimization is vast, the studies about robust multiobjective optimization is relatively new [11, 14, 19].

The outline of this paper is as follows. We present some notations and notions of Pareto optimality and Pareto stationarity in Sect. 2. In Sect. 3, we propose the proximal gradient methods for unconstrained multiobjective optimization. We show the global convergence of the proposed algorithms in Sect. 4. In Sect. 5, we apply the proposed method to constrained problems and to robust optimization. Finally, we report some numerical experiments by solving robust multiobjective optimization problems in Sect. 6.

2 Preliminaries

Let us first present some notations that will be used in this paper. Let \({\mathbf {R}}\) denote the set of real numbers and let \({\mathbf {N}}\) be the set of positive integers. The symbol \(\Vert \cdot \Vert \) stands for the Euclidean norm in \({\mathbf {R}}^n\). We also define the relation \(\le \) (<) in \({\mathbf {R}}^m\) as \(u \le v\) (\(u < v\)) if and only if \(u_i \le v_i\) (\(u_i < v_i\)) for all \(i = 1, \ldots , m\). Moreover, we call

$$\begin{aligned} h^\prime (x; d) :=\lim _{\alpha \searrow 0} \frac{h(x + \alpha d) - h(x)}{\alpha } \end{aligned}$$

the directional derivative of \(h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) at x in the direction d. It is easy to see that \(h^\prime (x; d) = \nabla h(x)^\top d\) when h is differentiable at x, where \(\nabla h(x)\) denotes the gradient of h at x. The following well-known result shows a non-decreasing property when h is convex.

Lemma 2.1

 [7, Section 4.3] Let \(h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) be a convex function.

Then, the function \(\tilde{h} :(0, +\infty ) \rightarrow {\mathbf {R}}\) defined by

$$\begin{aligned} \tilde{h}(\alpha ) :=\frac{h(x + \alpha d) - h(x)}{\alpha } \end{aligned}$$

is non-decreasing. In particular, it follows that

$$\begin{aligned} h(x + d) - h(x) \ge \frac{h(x + \alpha d) - h(x)}{\alpha } \quad \text {for all } \alpha \in (0, 1). \end{aligned}$$

Now, we introduce the concept of optimality for the multiobjective optimization problem (1.1). Recall that \(x^*\in {\mathbf {R}}^n\) is a Pareto optimal point for F, if there is no \(x \in {\mathbf {R}}^n\) such that \(F(x) \le F(x^*)\) and \(F(x) \ne F(x^*)\). The set of all Pareto optimal values is called Pareto frontier. Likewise, \(x^*\in {\mathbf {R}}^n\) is a weakly Pareto optimal point for F, if there is no \(x \in {\mathbf {R}}^n\) such that \(F(x) < F(x^*)\). It is known that Pareto optimal points are always weakly Pareto optimal, and the converse is not always true. We also say that \(\bar{x} \in {\mathbf {R}}^n\) is Pareto stationary (or critical), if and only if,

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (\bar{x};d) \ge 0 \quad \text {for all } d \in {\mathbf {R}}^n. \end{aligned}$$

Observe that this definition generalizes the one given in [13], because here we have to deal with possibly nondifferentiable \(F_i\) (or, in particular, \(g_i\)). Moreover, instead of considering subdifferentials as in [10], we use the directional derivative notion. Still, with this definition, we can show in the next lemma that weakly Pareto optimal points are always Pareto stationary, but the converse is not always true. However, if every component \(F_i\) is convex, then Pareto stationarity implies weak Pareto optimality. Furthermore, if every component \(F_i\) is strictly convex, then Pareto stationary points are also Pareto optimal.

Lemma 2.2

  1. 1.

    If \(x \in {\mathbf {R}}^n\) is a weakly Pareto optimal point of F, then x is Pareto stationary.

  2. 2.

    Let every component \(F_i\) of F be convex. If \(x \in {\mathbf {R}}^n\) is a Pareto stationary point of F, then x is weakly Pareto optimal.

  3. 3.

    Let every component \(F_i\) of F be strictly convex. If \(x \in {\mathbf {R}}^n\) is a Pareto stationary point of F, then x is Pareto optimal.

Proof

We show the contrapositions of statements 1, 2 and 3.

  1. 1.

    Suppose that x is not Pareto stationary. Then, there exists \(d \in {\mathbf {R}}^n\) such that

    $$\begin{aligned} F_i^\prime (x;d) < 0,\quad i = 1, \ldots , m. \end{aligned}$$

    By the definition of directional derivative, for a sufficiently small scalar \(\alpha > 0\) we obtain

    $$\begin{aligned} F_i(x + \alpha d) - F_i(x) < 0,\quad i = 1, \ldots , m, \end{aligned}$$

    which means that x is not weakly Pareto optimal.

  2. 2.

    Suppose that x is not weakly Pareto optimal. Then, there exists a point \(y \in {\mathbf {R}}^n\) such that \(y \ne x\) and

    $$\begin{aligned} F_i(y) < F_i(x),\quad i = 1, \ldots , m. \end{aligned}$$
    (2.1)

    Since \(F_i\) is convex, it follows from Lemma 2.1 that

    $$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } \le F_i(y) - F_i(x) \end{aligned}$$

    for an arbitrary i and \(\alpha \in (0, 1)\). From the convexity of \(F_i\) and Lemma 2.1, the left-hand side is non-decreasing with respect to \(\alpha \). Therefore, we have

    $$\begin{aligned} F_i^\prime (x; y - x) \le F_i(y) - F_i(x) < 0, \end{aligned}$$

    where the second inequality follows from (2.1). Since this inequality holds for all \(i = 1, \ldots , m\), we have

    $$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d) < 0 \quad \text {for } d = y - x. \end{aligned}$$

    Consequently, we conclude that x is not Pareto stationary.

  3. 3.

    Suppose that x is not Pareto optimal. Then, there exists a point \(y \in {\mathbf {R}}^n\) such that \(y \ne x\) and \(F_i(y) \le F_i(x)\) for all \(i = 1, \ldots , m\). Since \(F_i\) is strictly convex, for each i we have

    $$\begin{aligned} F_i(x + \alpha (y - x)) < F_i(x) + \alpha (F_i(y) - F_i(x)) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$

    Therefore, it follows that

    $$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } < F_i(y) - F_i(x) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$

    In the same way to statement 2, we obtain \(F_i^\prime (x; y - x) < 0\) for all \(i = 1, \ldots , m\), which concludes the proof. \(\square \)

3 Proximal gradient methods for multiobjective optimization

In this section, we propose two types of proximal gradient methods for unconstrained multiobjective optimization. Both generate some sequence \(\{x^k\}\) iteratively with the following procedure:

$$\begin{aligned} x^{k + 1} = x^k + t_k d^k, \end{aligned}$$

where \(d^k\) is a search direction, and \(t_k\) is a step size. For the method without line searches, we set \(t_k = 1\) in each iteration. Now, define the function \(\psi _x :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) by

$$\begin{aligned} \psi _x(d) :=\max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i(x + d) - g_i(x) \right\} , \end{aligned}$$
(3.1)

where \(\nabla f_i(x)\) denotes the gradient of \(f_i\) at x. The term inside the maximum represents the approximation of \(F_i(x + d) - F_i(x)\). It is clear that \(\psi _x\) is convex and \(\psi _x(0)=0\). The following lemma shows an important property of \(\psi _x\).

Lemma 3.1

For all \(d \in {\mathbf {R}}^n\), the following equality holds:

$$\begin{aligned} \psi _x^\prime (0;d) = \max _{i = 1,\ldots ,m} F_i^\prime (x; d). \end{aligned}$$

Proof

Since \(\psi _x(0) = 0\), by the definition of directional derivative, we get

$$\begin{aligned} \psi _x^\prime (0;d) = \lim _{\alpha \searrow 0} \frac{\psi _x(\alpha d)}{\alpha }. \end{aligned}$$

Moreover, the definition of \(\psi _x\) in (3.1) shows that

$$\begin{aligned} \lim _{\alpha \searrow 0} \frac{\psi _x(\alpha d)}{\alpha }&= \lim _{\alpha \searrow 0} \max _{i = 1,\ldots ,m} \frac{\nabla f_i(x)^\top (\alpha d) + g_i(x+\alpha d) - g_i(x)}{\alpha } \\&= \max _{i = 1,\ldots ,m} \lim _{\alpha \searrow 0} \frac{\nabla f_i(x)^\top (\alpha d) + g_i(x+\alpha d) - g_i(x)}{\alpha } \\&= \max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i^\prime (x;d)\right\} \\&= \max _{i = 1,\ldots ,m} F_i^\prime (x; d), \end{aligned}$$

where the second equality follows from the continuity of the max function and the third one comes from the definition of directional derivative. Thus, the claim holds. \(\square \)

Now, let \(\ell \) be a positive constant. Here, we define \(\phi _{\ell , x} :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) as

$$\begin{aligned} \phi _{\ell , x}(d) :=\psi _x(d) + \frac{\ell }{2} \Vert d\Vert ^2, \end{aligned}$$

where the function \(\psi _x\) is defined in (3.1). Clearly, \(\phi _{\ell , x}\) is strongly convex and \(\phi _{\ell , x}(0) = 0\). Using this function, we define the search direction at an iteration k, which we call proximal gradient direction, as \(d^k = d_\ell (x^k)\), where

$$\begin{aligned} d_\ell (x) :=\mathop {\mathrm{argmin}}\limits _{d\in {\mathbf {R}}^{n}} \phi _{\ell , x}(d). \end{aligned}$$
(3.2)

Remark 3.1

  1. 1.

    When \(g_i = 0\), \(d_\ell (x)\) given in (3.2) corresponds to the search direction of the multiobjective steepest descent method [13]. On the other hand, when \(f_i = 0\), \(d_\ell (x)\) given in (3.2) corresponds to the search direction of the multiobjective proximal point method [8].

  2. 2.

    Since \(\phi _{\ell , x}\) is strongly convex, (3.2) has a unique solution, and so \(d_\ell (x)\) is well-defined.

  3. 3.

    Since \(\phi _{\ell , x}(0) = 0\), we have \(\phi _{\ell , x}(d_\ell (x)) \le 0\).

Let \(\beta _\ell (x)\) be the optimal value in (3.2), i.e.,

$$\begin{aligned} \beta _\ell (x) :=\min _{d \in {\mathbf {R}}^n} \phi _{\ell , x}(d) = \phi _{\ell , x}(d_\ell (x)). \end{aligned}$$
(3.3)

The following lemma characterizes the stationarity in terms of \(d_\ell (\cdot )\) and \(\beta _\ell (\cdot )\).

Lemma 3.2

Let \(d_\ell (x)\) and \(\beta _\ell (x)\) be defined in (3.2) and (3.3), respectively. Then, the following statements hold.

  1. 1.

    If x is Pareto stationary, then \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\). Conversely, if \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\), then x is Pareto stationary.

  2. 2.

    If x is not Pareto stationary, then \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\). Conversely, if \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\), then x is not Pareto stationary.

  3. 3.

    The mappings \(d_\ell (\cdot )\) and \(\beta _\ell (\cdot )\) are continuous.

Proof

  1. 1.

    Let x be Pareto stationary. Suppose, for the purpose of contradiction, that \(d_\ell (x) \ne 0\) or \(\beta _\ell (x) < 0\). From statements 2 and 3 in Remark 3.1 it follows that \(d_\ell (x) \ne 0\) if and only if \(\beta _\ell (x) < 0\). This means that \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\). Therefore, we see that

    $$\begin{aligned} \beta _\ell (x) = \psi _x(d_\ell (x)) + \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2 < 0. \end{aligned}$$
    (3.4)

Since \(\psi _x\) is convex and \(\psi _x(0) = 0\), we get

$$\begin{aligned} \psi _x(\alpha d_\ell (x))&= \psi _x(\alpha d_\ell (x) + (1 - \alpha ) \cdot 0) \\&\le \alpha \psi _x(d_\ell (x)) + (1 - \alpha ) \psi _x(0) \\&= \alpha \psi _x(d_\ell (x)) \\&< - \frac{\alpha \ell }{2} \Vert d_\ell (x)\Vert ^2 \quad \text {for all } \alpha \in (0, 1), \end{aligned}$$

where the last inequality follows from (3.4). Thus, for all \(\alpha \in (0, 1)\) we have

$$\begin{aligned} \frac{\psi _x(\alpha d_\ell (x))}{\alpha } < - \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2. \end{aligned}$$

Since \(d_\ell (x) \ne 0\) and \(\ell > 0\), letting \(\alpha \searrow 0\) we obtain

$$\begin{aligned} \psi _x^\prime (0; d_\ell (x)) \le - \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2 < 0. \end{aligned}$$

It then follows from Lemma 3.1 that

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d_\ell (x)) < 0, \end{aligned}$$

which contradicts the Pareto stationarity of x.

Let us now prove the converse. Then, suppose that \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\). From the definition of \(\beta _\ell (x)\) given in (3.3), we have

$$\begin{aligned} \phi _{\ell , x}(d) = \psi _x(d) + \frac{\ell }{2} \Vert d\Vert ^2 \ge \beta _\ell (x) = 0 \quad \text {for all } d. \end{aligned}$$

Let \(\alpha \in (0,1)\). We get

$$\begin{aligned} \frac{\psi _x(\alpha d) + \frac{\ell }{2} \Vert \alpha d\Vert ^2}{\alpha } \ge 0 \quad \text {for all } d. \end{aligned}$$

Letting \(\alpha \searrow 0\) and using Lemma 3.1, we obtain

$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d) \ge 0, \end{aligned}$$

which is our claim.

  1. 2.

    This statement is equivalent to statement 1.

  2. 3.

    It is easy to see that the function

    $$\begin{aligned} \max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i(x+d) - g_i(x) \right\} + \frac{\ell }{2} \Vert d\Vert ^2 \end{aligned}$$
    (3.5)

is continuous with respect to x and d. Therefore, the optimal value function \(\beta _\ell (\cdot )\) is also continuous from [5, Maximum Theorem]. Moreover, since the optimal set mapping \(d_\ell (\cdot )\) is unique, \(d_\ell (\cdot )\) is continuous from [18, Corollary 8.1]. \(\square \)

3.1 A proximal gradient method with line searches

In this section, we present the proposed method with line searches. To compute the step length \(t_k > 0\), we use an Armijo rule. Let \(\rho \in (0,1)\) be a prespecified constant. The condition to accept \(t_k\) is given by

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k) + t_k \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m. \end{aligned}$$
(3.6)

We begin with \(t_k = 1\) and while (3.6) is not satisfied, we update

$$\begin{aligned} t_k :=\xi t_k, \end{aligned}$$

where \(\xi \in (0,1)\). The following lemma demonstrates the finiteness of this procedure.

Lemma 3.3

Let \(d^k\) be defined in (3.2) with \(x = x^k\) and assume that \(x^k\) is not Pareto stationary. If \(\rho \in (0, 1)\), then there exists some \(\bar{t}_k > 0\) such that

$$\begin{aligned} F_i(x^k + t d^k) \le F_i(x^k) + t \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m \end{aligned}$$

for any \(t \in (0, \bar{t}_k]\).

Proof

Let \(t \in (0, 1]\). Since \(g_i\) is convex for all \(i = 1, \ldots , m\), we have

$$\begin{aligned} g_i(x^k + td^k) - g_i(x^k)&= g_i ( (1-t)x^k + t(x^k + d^k) ) - g_i(x^k) \\&\le (1-t)g_i(x^k) + tg_i(x^k + d^k) - g_i(x^k) \\&= t(g_i(x^k + d^k) - g_i(x^k)). \end{aligned}$$

Therefore, from the differentiability of f we obtain

$$\begin{aligned}&f_i(x^k + td^k) + g_i(x^k + td^k) \\&\quad \le \; f_i(x^k) + g_i(x^k) + t\nabla f_i(x^k)^\top d^k + t \left( g_i(x^k + d^k) - g_i(x^k) \right) + o(t) \\&\quad \le \; f_i(x^k) + g_i(x^k) + t\psi _{x^k} (d^k) + o(t), \end{aligned}$$

where the last inequality comes from the definition (3.1) of \(\psi _{x^k}\). Since \(x^k\) is not Pareto stationary, we have \(\psi _{x^k} (d^k) < 0\) from Lemma 3.2. Thus, if \(\rho \in (0,1)\), then there exists some \(\bar{t}_k > 0\) such that

$$\begin{aligned} f_i(x^k + t d^k) + g_i(x^k + t d^k) \le f_i(x^k) + g_i(x^k) + t \rho \psi _{x^k}(d^k),\quad i = 1,\ldots ,m \end{aligned}$$

for any \(t \in (0, \bar{t}_k]\). \(\square \)

Finally, based on the previous discussions, we state below the proposed method, considering line searches.

Algorithm 3.1

Step 1 :

Choose \(\ell > 0,\ \rho \in (0,1),\ \xi \in (0,1),\ x^0 \in {\mathbf {R}}^n\) and set \(k :=0\).

Step 2 :

Compute \(d^k\) by solving subproblem (3.2) with \(x = x^k\).

Step 3 :

If \(d^k = 0\), then stop.

Step 4 :

Compute the step length \(t_k \in (0, 1]\) as the maximum of

$$\begin{aligned} T_k :=\{ t = \xi ^j \mid j \in {\mathbf {N}},\, F_i(x^k + t_k d^k) \le F_i(x^k) + t_k \rho \psi _{x^k}(d^k),\, i = 1,\ldots ,m \} \end{aligned}$$
Step 5 :

Set \(x^{k+1} :=x^k + t_k d^k\), \(k :=k + 1\), and go to Step 2.

Observe that from Lemma 3.2, the algorithm stops at Step 3 with a Pareto stationary point or produces an infinite sequence of nonstationary points \(\{x^k\}\). If Step 4 is reached in some iteration k, it means that in Step 3, \(d^k \ne 0\), or equivalently, \(\beta _\ell (x^k) < 0\). Thus, we have \(\psi _{x^k}(d^k) < 0\). From the Armijo condition, we conclude that all objective functions decrease, i.e.,

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k), \quad i = 1,\ldots ,m. \end{aligned}$$

3.2 A proximal gradient method without line searches

In this section, we assume that \(\nabla f_i\) is Lipschitz continuous with constant L. When we set \(\ell > L / 2\) in (3.2), we can fix the step length \(t_k = 1\) for every iteration. We then state below the proposed method.

Algorithm 3.2

Step 1 :

Choose \(\ell > L / 2,\ x^0 \in {\mathbf {R}}^n\) and set \(k :=0\)

Step 2 :

Compute \(d^k\) by solving subproblem (3.2) with \(x = x^k\).

Step 3 :

If \(d^k = 0\), then stop.

Step 4 :

Set \(x^{k+1} :=x^k + d^k\), \(k :=k + 1\), and go to Step 2.

Similarly to Algorithm 3.1, it stops with a Pareto stationary point or generates an infinite sequence of nonstationary points. Moreover, as we can see in Lemma 4.3, the objective function values also decrease in each iteration.

4 Convergence analysis

In this section, we prove that the sequences generated by Algorithm 3.1 and Algorithm 3.2 converge to Pareto stationary points, respectively. From now on, we assume that an infinite sequence is generated. To begin with, we recall the so-called three points property, which is a key to show convergence of proximal point type algorithms.

Theorem 4.1

(Three points property) [9, Lemma 3.2] Let \(\theta :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) be proper convex and define

$$\begin{aligned} x^*= \mathop {\mathrm{argmin}}\limits _{x \in {\mathbf {R}}^n} \left\{ \theta (x) + \frac{1}{2} \Vert x - y \Vert ^2 \right\} . \end{aligned}$$

Then, for all \(z \in {\mathbf {R}}^n\), we have

$$\begin{aligned} \theta (x^*) - \theta (z) \le - \frac{1}{2} \Vert z - x^*\Vert ^2 - \frac{1}{2} \Vert y - x^*\Vert ^2 + \frac{1}{2} \Vert z - y \Vert ^2. \end{aligned}$$

In order to show the convergence of the search direction, we first prove the following lemma.

Lemma 4.1

Let \(\{d^k\}\) be generated by Algorithms 3.1 or 3.2 and recall the definition of \(\psi _x\) in (3.1). Then, we have

$$\begin{aligned} \psi _{x^k}(d^k) \le -\ell \Vert d^k\Vert ^2 \quad \text {for all } k. \end{aligned}$$

Proof

Defining \(\theta :=\psi _{x^k} / \ell \) we can rewrite (3.2) with \(x = x^k\) as

$$\begin{aligned} d^k = \mathop {\mathrm{argmin}}\limits _{d \in {\mathbf {R}}^n} \left\{ \theta (d) + \frac{1}{2} \Vert d-0\Vert ^2 \right\} . \end{aligned}$$

Thus, substituting \(x^*= d^k\) and \(y = z = 0\) into Theorem 4.1, we get \(\theta (d^k) - \theta (0) \le -\Vert d^k\Vert ^2\). Therefore, recalling that \(\psi _{x^k} (0) = 0\), we have

$$\begin{aligned} \psi _{x^k}(d^k) \le -\ell \Vert d^k\Vert ^2 \quad \text {for all } k. \end{aligned}$$

\(\square \)

4.1 Convergence of Algorithm 3.1

Let us recall the algorithm with line searches. We first show the convergence of the length of steps \(\Vert x^{k + 1} - x^k\Vert \).

Lemma 4.2

Let \(\{d^k\}\) be generated by Algorithm 3.1 and suppose that \(\{F_i (x^k)\}\) is bounded from below for all \(i = 1,\ldots ,m\). Then, it follows that

$$\begin{aligned} \sum _{k = 0}^\infty t_k \Vert d^k \Vert ^2 < \infty . \end{aligned}$$

Proof

From Lemma 4.1 and from (3.6) it follows that

$$\begin{aligned} F_i(x^k + t_k d^k) \le F_i(x^k) - t_k \rho \ell \Vert d^k\Vert ^2,\quad i = 1,\ldots ,m. \end{aligned}$$

Since \(\{F_i (x^k)\}\) is bounded from below, there exists \(\tilde{F}_i \le F_i(x^k)\) for all i and k. Adding up the above inequality from \(k = 0\) to \(k = \tilde{k}\), we obtain

$$\begin{aligned} F_i(x^{\tilde{k} + 1}) \le F_i(x^0) - \rho \ell \sum _{k = 0}^{\tilde{k}} t_k \Vert d^k \Vert ^2. \end{aligned}$$

Thus, we have

$$\begin{aligned} \sum _{k = 0}^{\tilde{k}} t_k \Vert d^k \Vert ^2 \le \frac{1}{\rho \ell } (F_i(x^0) - \tilde{F}_i). \end{aligned}$$

Taking \(\tilde{k} \rightarrow \infty \), we have \(\sum _{k = 0}^\infty t_k \Vert d^k \Vert ^2 < \infty \). \(\square \)

The next theorem is our main result. Basically, if the sequence produced by Algorithm 3.1 has accumulation points, then they are all Pareto stationary.

Theorem 4.2

Every accumulation point of the sequence \(\{x^k\}\) generated by Algorithm 3.1, if it exists, is a Pareto stationary point. In particular, if the level set of each \(F_i\) is bounded, then \(\{x^k\}\) has accumulation points and they are all Pareto stationary.

Proof

The second statement follows immediately from the first. Let \(\bar{x}\) be an accumulation point of \(\{x^k\}\) and let \(\{x^{k_j}\}\) be a subsequence converging to \(\bar{x}\). From statement 3 of Lemma 3.2, we have \(d^{k_j} = d_\ell (x^{k_j}) \rightarrow d_\ell (\bar{x})\). Here, it is sufficient to show that \(d_\ell (\bar{x}) = 0\) because of statments 1 and 3 of Lemma 3.2. Suppose for contradiction that \(d_\ell (\bar{x}) \ne 0\). Then, it follows from Lemma 4.2 that \(t_{k_j} \rightarrow 0\) since the existence of an accumulation point of \(\{x^k\}\) implies boundedness of \(\{F_i(x^k)\}\) for all i. Therefore, by the definition of \(t_{k_j}\) in Step 4 of Algorithm 3.1, for sufficiently large j there exists some \(i_{k_j} \in \{1,\ldots ,m\}\) such that

$$\begin{aligned} F_{i_{k_j}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) \ge F_{i_{k_j}}(x^{k_j}) + \xi ^{-1} t_{k_j} \rho \psi _{x^{k_j}}(d^{k_j}). \end{aligned}$$

Since i only takes finite number of values \(\{1,\ldots ,m\}\), we can assume that \(i_{k_j} = \bar{i}\) without loss of generality. We thus obtain

$$\begin{aligned} \frac{F_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} \ge \rho \psi _{x^{k_j}}(d^{k_j}). \end{aligned}$$
(4.1)

Recall that \(0< \xi ^{-1} t_{k_j} < 1\). It follows from the definition (3.1) of \(\psi _{x^{k_j}}\) that

$$\begin{aligned}&\psi _{x^{k_j}}(d^{k_j}) \\&\quad \ge \; \{\nabla f_{\bar{i}}(x^{k_j})^\top d^{k_j} + g_{\bar{i}}(x^{k_j} + d^{k_j}) - g_{\bar{i}}(x^{k_j})\} \\&\quad \ge \; \frac{\xi ^{-1} t_{k_j} \nabla f_{\bar{i}}(x^{k_j})^\top d^{k_j} + g_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - g_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} \\&\quad =\; \frac{f_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) + g_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - f_{\bar{i}}(x^{k_j}) - g_{\bar{i}}(x^{k_j}) + o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}} \\&\quad =\; \frac{F_{\bar{i}}(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_{\bar{i}}(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}, \end{aligned}$$

where the second inequality comes from the convexity of \(g_i\) and Lemma 2.1, and the first equality follows from the differentiability of f. Therefore, we get

$$\begin{aligned} \psi _{x^{k_j}}(d^{k_j}) \ge \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$
(4.2)

From (4.1) and (4.2), we have

$$\begin{aligned}&\frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \\&\quad \ge \; \rho \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \rho \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

We thus get

$$\begin{aligned} \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \ge \left( \frac{\rho }{1 - \rho } \right) \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$
(4.3)

On the other hand, Lemma 4.1 yields

$$\begin{aligned} \psi _{x^{k_j}}(d^{k_j}) \le -\ell \Vert d^{k_j}\Vert ^2. \end{aligned}$$

Since \(d^{k_j} \rightarrow d_\ell (\bar{x}) \ne 0\), there exists some \(\delta > 0\) such that

$$\begin{aligned} - \delta&\ge \psi _{x^{k_j}}(d^{k_j}) \\&\ge \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} + \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}, \end{aligned}$$

where the last inequality comes from (4.2). Therefore, we obtain

$$\begin{aligned} \frac{F_i(x^{k_j} + \xi ^{-1} t_{k_j} d^{k_j}) - F_i(x^{k_j})}{\xi ^{-1} t_{k_j}} \le -\delta - \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$
(4.4)

From (4.3) and (4.4), it follows that

$$\begin{aligned} \left( \frac{\rho }{1 - \rho } \right) \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}} \le -\delta - \frac{o(\xi ^{-1} t_{k_j} \Vert d^{k_j}\Vert )}{\xi ^{-1} t_{k_j}}. \end{aligned}$$

Taking \(j \rightarrow \infty \), we have \(0 \le - \delta \), which contradicts the fact that \(\delta > 0\). Therefore, we conclude that \(d_\ell (\bar{x}) = 0\). \(\square \)

4.2 Convergence of Algorithm 3.2

Let us now show the convergence of the search direction for the algorithm without line searches. Recall that we have to assume Lipschitz continuity of \(\nabla f_i\), with a constant \(L > 0\).

Lemma 4.3

Let \(\{d^k\}\) be generated by Algorithm 3.2 and suppose that \(\{F_i (x^k)\}\) is bounded from below for all \(i = 1,\ldots ,m\). Then, we have

$$\begin{aligned} \lim _{k \rightarrow \infty } \Vert d^k \Vert = 0. \end{aligned}$$

Proof

From the so-called descent Lemma [6, Proposition A.24] and by Lipschitz continuity of \(\nabla f_i\), we obtain

$$\begin{aligned} f_i(x^k + d^k) \le f_i(x^k) + \nabla f_i(x^k)^\top d^k + \frac{L}{2} \Vert d^k\Vert ^2 . \end{aligned}$$
(4.5)

At the kth iteration, we have

$$\begin{aligned}&f_i(x^k + d^k) + g_i(x^k + d^k) \\&\quad =\; f_i(x^k) + g_i(x^k) + f_i(x^k + d^k) - f_i(x^k) + g_i(x^k + d^k) - g_i(x^k) \\&\quad \le \; f_i(x^k) + g_i(x^k) + \nabla f_i(x^k)^\top d^k + g_i(x^k + d^k) - g_i(x^k) + \frac{L}{2} \Vert d^k\Vert ^2 \\&\quad \le \; f_i(x^k) + g_i(x^k) + \psi _{x^k}(d^k) + \frac{L}{2} \Vert d^k\Vert ^2 \\&\quad \le \; f_i(x^k) + g_i(x^k) + \frac{L - 2 \ell }{2} \Vert d^k\Vert ^2. \end{aligned}$$

Here, the first inequality follows from (4.5), the second one follows from the definition (3.1) of \(\psi _{x^k}\), and the third one comes from Lemma 4.1. Since \(\{F_i (x^k)\}\) is bounded from below, there exists \(\tilde{F}_i \le F_i(x^k) = f_i(x^k) + g_i(x^k)\) for all \(i,\, k\). Adding up the above inequality from \(k = 0\) to \(k = \tilde{k}\), we obtain

$$\begin{aligned} f_i(x^{\tilde{k} + 1}) + g_i(x^{\tilde{k} + 1}) \le f_i(x^0) + g_i(x^0) + \frac{L - 2 \ell }{2} \sum _{k = 0}^{\tilde{k}} \Vert d^k \Vert ^2. \end{aligned}$$

Since \(\ell > L / 2\), we have

$$\begin{aligned} \sum _{k = 0}^{\tilde{k}} \Vert d^k \Vert ^2 \le \frac{2}{2 \ell - L} (f_i(x^0) + g_i(x^0) - \tilde{F}_i). \end{aligned}$$

Taking \(\tilde{k} \rightarrow \infty \), we obtain

$$\begin{aligned} \sum _{k = 0}^\infty \Vert d^k \Vert ^2 < \infty \end{aligned}$$

and hence \(\lim _{k \rightarrow \infty } \Vert d^k \Vert = 0\). \(\square \)

The next theorem directly follows from Lemmas 4.3 and 3.2.

Theorem 4.3

Every accumulation point of the sequence \(\{x^k\}\) generated by Algorithm 3.2, if it exists, is a Pareto stationary point. In particular, if the level set of each \(F_i\) is bounded, then \(\{x^k\}\) has accumulation points and they are all Pareto stationary.

5 Applications

In this section, we consider two applications of multiobjective optimization problem (1.1) with (1.2), and discuss how to solve subproblems (3.2) in these particular applications.

5.1 Application to constrained multiobjective optimization

In this section, we consider the following constrained multiobjective optimization problem:

$$\begin{aligned} \begin{aligned} \min \quad&f(x) \\ \mathrm {s.t.}\quad&x \in S, \end{aligned} \end{aligned}$$
(5.1)

where \(f:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) is a vector-valued function with \(f :=(f_1,\ldots ,f_m)^\top \) and \(S \subset {\mathbf {R}}^n\) is convex. Suppose that each component \(f_i\) of f is continuously differentiable. Let \(g:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) be a vector-valued function with \(g :=(g_1,\ldots ,g_m)^\top \), where each \(g_i\) is the indicator function of S, i.e.,

$$\begin{aligned} g_i(x) = {\left\{ \begin{array}{ll} 0 &{}\quad x \in S, \\ \infty &{}\quad x \notin S. \end{array}\right. } \end{aligned}$$

Then, we can rewrite the search direction given in (3.2) with \(x = x^k\) as

$$\begin{aligned} d^k :=\mathop {\mathrm{argmin}}\limits _{d \in S - x^k} \left\{ \max _{i = 1,\ldots ,m} \{\nabla f_i(x^k)^\top d\} + \frac{\ell }{2} \Vert d\Vert ^2 \right\} , \end{aligned}$$

which coincides with the projected gradient direction for multiobjective optimization [17].

5.2 Application to robust multiobjective optimization

Now, let us apply the proposed algorithms to robust multiobjective optimization. Here, we suppose that the problems include uncertain parameters. Moreover, suppose that we can estimate the set of these uncertain parameters. Then, we try to optimize by considering the worst scenario. We observe that studies about robust multiobjective optimization is relatively new [11, 14, 19].

Here, we consider the convex function \(g_i\) defined as follows:

$$\begin{aligned} g_i(x) :=\max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u). \end{aligned}$$
(5.2)

We call \({\mathcal {U}}_i \subseteq {\mathbf {R}}^n\) an uncertainty set. From now on, we assume \({\mathcal {U}}_i \subset {\mathbf {R}}^n\) and \(\hat{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^n \rightarrow {\mathbf {R}}\) to be convex with respect to x. It is easy to see that \(g_i\) is also convex. However, \(g_i\) is not necessarily differentiable even if \(\hat{g}_i\) is differentiable. First, let us reformulate the subproblem (3.2) by using an extra variable \(\gamma \in {\mathbf {R}}\) as

$$\begin{aligned} \begin{aligned} \min _{\gamma , d} \quad&\gamma + \frac{\ell }{2}\Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + g_i(x+d) - g_i(x) \le \gamma , \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

Note that \(g_i\) is not easy to calculate, and thus, the subproblem is difficult to solve. When \(\hat{g}_i\) and \(\mathcal {U}_i\) have some special structure, the constraints can be written as explicit formulae by using the duality of (5.2). Now, assume that the dual problem of the maximization problem (5.2) is written as follows:

$$\begin{aligned} \min _{w_i} \quad&\tilde{g}_i (x, w_i) \\ \mathrm {s.t.}\quad&w_i \in \tilde{\mathcal {U}}_i (x), \end{aligned}$$

where \(\tilde{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^m \rightarrow {\mathbf {R}}\) and \(\displaystyle \tilde{\mathcal {U}}_i :{\mathbf {R}}^n \rightarrow 2^{{\mathbf {R}}^m}\). If strong duality holds, then we see that the subproblem (3.2) is equivalent to

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i} \quad&\gamma + \frac{\ell }{2} \Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + \tilde{g}_i (x + d, w_i) - g_i(x) \le \gamma ,\\&w_i \in \tilde{\mathcal {U}}_i (x + d), \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$
(5.3)

When \(\tilde{g}_i\) and \(\tilde{\mathcal {U}}_i\) have some explicit form, this problem is tractable. As we mention below, in this case, we can convert the above subproblem to some well-known convex optimization problems. This idea can be also seen in [4]. In the following, we will introduce some robust multiobjective optimization problems where the subproblems can be written as quadratic programming, second-order cone programming or semidefinite programming problems.

5.2.1 Linearly constrained quadratic programming problem case

Suppose that \(\hat{g}_i (x,u) = u^\top x\) and \({\mathcal {U}}_i = \{ u \in {\mathbf {R}}^n \mid A_i u \le b_i \}\), where \(A_i \in {\mathbf {R}}^{d \times n}\) and \(b_i \in {\mathbf {R}}^d\), that is, \(\hat{g}_i\) is linear in x, and \(\mathcal {U}_i\) is a polyhedron. Suppose also that \(\mathcal {U}_i\) is nonempty and bounded. Then, we can rewrite (5.2) as the following linear programming problem:

$$\begin{aligned} \begin{aligned} \max _u \quad&x^\top u \\ \mathrm {s.t.}\quad&A_i u \le b_i. \end{aligned} \end{aligned}$$
(5.4)

Its dual problem is given by

$$\begin{aligned} \begin{aligned} \min _w \quad&b_i^\top w \\ \mathrm {s.t.}\quad&A_i^\top w = x, \\&w \ge 0. \end{aligned} \end{aligned}$$

Since the strong duality holds, we can convert the subproblem (3.2) [or, equivalently (5.3)] to a linearly constrained quadratic programming problem:

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i} \quad&\gamma + \frac{\ell }{2}\Vert d\Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d + b_i^\top w_i - g_i(x) \le \gamma , \\&A_i^\top w_i = x + d, \\&w_i \ge 0, \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$
(5.5)

5.2.2 Second-order cone programming problem case

Suppose that \(\hat{g}_i (x,u) = u^\top x\) and \({\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1,\, v \in {\mathbf {R}}^n \}\), where \(a_i \in {\mathbf {R}}^n\) and \(\ P_i \in {\mathbf {R}}^{n \times n}\), that is, \(\hat{g}_i\) is once again linear in x and \(\mathcal {U}_i\) is an ellipsoid. Then, for all \(i = 1,\ldots ,m\) we have

$$\begin{aligned} g_i (x)= & {} \max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u) \\= & {} \max _{v : \Vert v \Vert \le 1} (a_i + P_i v)^\top x \\= & {} a_i^\top x + \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v. \end{aligned}$$

If \(P_i^\top x = 0\), then \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = 0 = \Vert P_i^\top x \Vert \). If \(P_i^\top x \ne 0\), then \(\frac{P_i^\top x}{\Vert P_i^| x\Vert }\) is a solution of \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v\), and hence \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = \Vert P_i^\top x \Vert \). Consequently, we have

$$\begin{aligned} g_i(x) = a_i^\top x + \Vert P_i^\top x \Vert . \end{aligned}$$

Therefore, introducing slack variables \(\gamma \in {\mathbf {R}}\) and \(\tau \in {\mathbf {R}}\), the subproblem (3.2) can be written as

$$\begin{aligned} \begin{aligned} \min _{\tau , \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\nabla f_i (x)^\top d + \Vert P_i^\top (x + d) \Vert - \Vert P_i^\top x\Vert + a_i^\top d \le \gamma , \quad i = 1,\ldots ,m, \\&\gamma + \frac{\ell }{2} \Vert d \Vert ^2 \le \tau . \end{aligned} \end{aligned}$$

Note that convex quadratic constraints can be converted to second-order cone constraints. Using the expression given in [1, Section 2.1], we get the following second-order cone programming problem (SOCP):

$$\begin{aligned} \begin{aligned} \min _{\tau , \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\begin{bmatrix} -(\nabla f_i(x) + a_i)^\top d + \gamma + \Vert P_i^\top x\Vert \\ P_i^\top (x + d) \end{bmatrix} \in \mathcal {K}_{n + 1}, \\&\begin{bmatrix} 1 - \gamma + \tau \\ 1 + \gamma - \tau \\ \sqrt{2 \ell } d \end{bmatrix} \in \mathcal {K}_{n + 2}, \end{aligned} \end{aligned}$$
(5.6)

where \(\mathcal {K}_q :=\{(y_0, \bar{y}) \in {\mathbf {R}}\times {\mathbf {R}}^{q - 1} \mid y_0 \ge \Vert \bar{y} \Vert \}\) is the second-order cone in \({\mathbf {R}}^{q}\). The above SOCP can be solved efficiently with an interior point method [1].

5.2.3 Semidefinite programming problem case

Suppose thatFootnote 1\(\hat{g}_i (x,u) = (x + u)^\top A_i (x + u)\) and \({\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1 \}\), where \(A_i \in {\mathbf {R}}^{n \times n}\) and \(A_i \succeq O,\, a_i \in {\mathbf {R}}^n\) and \(P_i \in {\mathbf {R}}^{n \times n}\). Then, there exists a matrix \(M_i \in {\mathbf {R}}^{n \times n}\) such that \(A_i = M_i M_i^\top \). Note that \(\hat{g}_i\) is convex quadratic and \(\mathcal {U}_i\) is an ellipsoid. Here, without loss of generality we can assume that A is a symmetric matrix since \((x + u)^\top A_i (x + u) = (x + u)^\top \tilde{A}_i (x + u)\), where \(\tilde{A}_i :=(A_i+A_i^\top ) / 2\). Then, \(g_i(x)\) can be given as

$$\begin{aligned} g_i(x) = \max _{v : \Vert v \Vert \le 1} (x + a_i + P_i v)^\top A_i (x + a_i + P_i v). \end{aligned}$$
(5.7)

Since problem (5.7) is a maximization problem of a convex function, it is not a convex optimization problem. Fortunately, it can be seen as a subproblem of a trust region method, so its optimal value \(g_i(x)\) can be obtained efficiently. Considering (5.7), we observe that

$$\begin{aligned} g_i(x + d) = \max _{v : \Vert v \Vert \le 1} (x + d + a_i + P_i v)^\top A_i (x + d + a_i + P_i v). \end{aligned}$$
(5.8)

From [2, Section 3], the Lagrangian dual of the maximization problem (5.8) is given by

$$\begin{aligned} \begin{aligned} \min _{\alpha , w} \quad&-w \\ \mathrm {s.t.}\quad&\begin{bmatrix} -\,P_i^\top A_i P_i&\quad -\,P_i^\top A_i (x + d + a_i) \\ -\,(x + d + a_i)^\top A_i^\top P_i&\quad -\,(x + d + a_i)^\top A_i (x + d + a_i) - w \end{bmatrix} \\&\qquad \qquad \qquad \succeq \alpha \begin{bmatrix} -I_n&\quad 0 \\ 0&\quad 1 \end{bmatrix}, \\&\alpha \ge 0, \end{aligned} \end{aligned}$$
(5.9)

where \(I_n\) stands for the identity matrix of dimension n. Let \((\alpha ^*, w^*)\) be an optimal solution of (5.9) and assume thatFootnote 2\(\dim (\ker (A_i + \alpha ^*I_n)) \ne 1\). Since both (5.8) and (5.9) have strictly feasible solutions and \(I_n \succ O\), then the strong duality holds from [2, Theorem 3.5]. Therefore, recalling (5.3), the subproblem (3.2) is equivalent to

$$\begin{aligned} \begin{aligned} \min _{\gamma , d, w_i, \alpha _i} \quad&\gamma + \frac{\ell }{2} \Vert d \Vert ^2 \\ \mathrm {s.t.}\quad&\nabla f_i (x)^\top d - w_i - g_i (x) \le \gamma , \\&\begin{bmatrix} -P_i^\top A_i P_i + \alpha _i I_n&\quad -P_i^\top A_i (x + d + a_i) \\ -(x + d + a_i)^\top A_i^\top P_i&\quad -(x + d + a_i)^\top A_i (x + d + a_i) - w_i -\alpha _i \end{bmatrix} \\&\qquad \qquad \qquad \succeq O, \\&\alpha _i \ge 0, \quad i = 1,\ldots ,m. \end{aligned} \end{aligned}$$

Now, by using slack variables \(\tau \in {\mathbf {R}}\) and \(\zeta _i \in {\mathbf {R}}\) and converting the convex quadratic constraints to second-order cone ones, we get the following semidefinite programming problem:

$$\begin{aligned} \begin{aligned} \min _{\tau , \alpha _i, w_i, \gamma , d} \quad&\tau \\ \mathrm {s.t.}\quad&\nabla f_i(x)^\top d - w_i - g_i(x) \le \gamma , \\&\begin{bmatrix} 1 - \gamma + \tau \\ 1 + \gamma - \tau \\ \sqrt{2 \ell } d \end{bmatrix} \in \mathcal {K}_{n + 2}, \\&\begin{bmatrix} -\,P_i^\top A_i P_i + \alpha _i I_n&\quad -\,P_i^\top A_i (x + d + a_i) \\ -\,(x + d + a_i)^\top A_i^\top P_i&\quad \zeta _i \end{bmatrix} \succeq O, \\&\begin{bmatrix} \dfrac{1 - \zeta _i - w_i - \alpha _i}{2} \\ \dfrac{1 + \zeta _i + w_i + \alpha _i}{2} \\ M_i^\top (x + d + a_i) \end{bmatrix} \in \mathcal {K}_{n + 2}, \\&\alpha _i \ge 0, \quad i = 1,\ldots ,m, \end{aligned} \end{aligned}$$
(5.10)

where O stands for a zero matrix with appropriate dimension. Note that the second-order cone constraints can be converted further into semidefinite constraints.

6 Numerical experiments

In this section, we present some numerical results using Algorithm 3.2 for the problems in Sect. 5.2. The experiments are carried out on a machine with a 1.8 GHz Intel Core i5 CPU and 8GB memory, and we implement all codes in MATLAB R2017a. We consider the problem (1.1), where \(n = 5,\, m = 2,\, f_i(x) = \frac{1}{2} x^\top A_i x + a_i^\top x,\, g_i(x) = \max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u),\ A_i \in {\mathbf {R}}^{n \times n},\, a_i \in {\mathbf {R}}^n\), and \(\hat{g}_i:{\mathbf {R}}^n \rightarrow {\mathbf {R}},\, i = 1,\ldots , m\). Here, we assume that each \(A_i\) is positive semidefinite, so it can be decomposed as \(A_i = M_i M_i^\top \), where \(M_i \in {\mathbf {R}}^{n \times n}\). We generate \(M_i\) and \(a_i\) by choosing every component randomly from the standard normal distribution. To implement Algorithm 3.2, we make the following choices.

Remark 6.1

  • Every component of \(x^0\) is chosen randomly from the standard normal distribution.

  • In Experiments 1 and 3, we set the constant \(\ell = 5\). In Experiment 2, we set the constant \(\ell = 7\).

  • The terminate criteria is replaced by \(\Vert d^k\Vert < \varepsilon :=10^{-6}\).

Also, we run each one of the following experiments 100 times from different initial points, and with \(\delta = 0, 0.05, 0.1\). Naturally, when \(\delta = 0\), no uncertainties are considered.

6.1 Experiment 1

In the first experiment, we solve the problem of Sect. 5.2(a). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x,\, i = 1,2\), where \(\mathcal {U}_1 = \{u \in {\mathbf {R}}^5\mid -\delta \le u_i \le \delta ,\, i = 1,\ldots ,5 \}\) and \(\mathcal {U}_2 = \{u \in {\mathbf {R}}^5\mid -\delta \le (Bu)_i \le \delta ,\, i = 1,\ldots ,5 \}\). Here, every component of \(B \in {\mathbf {R}}^{5 \times 5}\) is chosen randomly from the standard normal distribution and \(\delta \ge 0\). We use the MATLAB solver linprog to solve (5.4) and quadprog to solve (5.5). Figure 1 is the result for this experiment. For each \(\delta \), we obtained part of the Pareto frontier, and as \(\delta \) gets smaller the objective values become smaller.

Fig. 1
figure 1

Result for Experiment 1

Fig. 2
figure 2

Result for Experiment 2

6.2 Experiment 2

In the second experiment, we solve the problem of Sect. 5.2(b). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x\), where \(\mathcal {U}_i = \{u \in {\mathbf {R}}^5 \mid \Vert u\Vert \le \delta \},\, i = 1, 2\). We use the MATLAB solver SeDuMi [21] to solve (5.6). Figure 2 is the result for this experiment. Once again, we obtained part of the Pareto frontier for the problems with and without uncertainties.

Fig. 3
figure 3

Result for Experiment 3

6.3 Experiment 3

Now, in the last experiment, we solve the problem of Sect. 5.2(c). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } (u + x)^\top B B^\top (u + x)\), where \(\mathcal {U}_i = \{u \in {\mathbf {R}}^5\ | \ \Vert u\Vert \le \delta \},\, i = 1, 2\). Here, once again, every component of \(B \in {\mathbf {R}}^{5 \times 5}\) is chosen randomly from the standard normal distribution and \(\delta \ge 0\). We use the MATLAB solver fmincon to solve (5.7) and SeDuMi to solve (5.10). As it can be seen in Fig. 3, we also obtained the Pareto frontier in this case.

7 Conclusion

We proposed proximal gradient methods for unconstrained multiobjective optimization problems with, and without line searches. Under reasonable assumptions, we proved that each accumulation points of the sequences generated by the proposed algorithms are Pareto stationary. Moreover, we presented some applications in constrained optimization and robust multiobjective optimization. In constrained optimization, the proposed search direction is equivalent to the projected gradient direction, and in some robust optimization problems we can convert the subproblems to well-known optimization problems. Finally, we carried out some numerical experiments for robust multiobjective optimization problems and we observed that the Pareto frontier changes when the uncertainty set is modified.

We have not analyzed the proposed methods in view of the convergence rate. It is known that scalar-valued proximal gradient method is sublinear convergent under reasonable assumptions [3]. In recent years, faster methods such as Newton’s method [12] for differentiable multiobjective optimization problem have been also proposed. Therefore, an interesting topic for future research is to investigate the convergence rate of the proposed methods and to propose a proximal Newton-type algorithm for multiobjective optimization.