Abstract
We propose new descent methods for unconstrained multiobjective optimization problems, where each objective function can be written as the sum of a continuously differentiable function and a proper convex but not necessarily differentiable one. The methods extend the well-known proximal gradient algorithms for scalar-valued nonlinear optimization, which are shown to be efficient for particular problems. Here, we consider two types of algorithms: with and without line searches. Under mild assumptions, we prove that each accumulation point of the sequence generated by these algorithms, if exists, is Pareto stationary. Moreover, we present their applications in constrained multiobjective optimization and robust multiobjective optimization, which is a problem that considers uncertainties. In particular, for the robust case, we show that the subproblems of the proximal gradient algorithms can be seen as quadratic programming, second-order cone programming, or semidefinite programming problems. Considering these cases, we also carry out some numerical experiments, showing the validity of the proposed methods.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Multiobjective optimization consists in minimizing (or maximizing) more than one objective function at the same time and under possible constraints. In most of the cases, it is not possible to find a single point that minimizes all objective functions at once, so the concept of Pareto optimality becomes essential. A point is called Pareto optimal or efficient, if there does not exist another point with the same or smaller objective function values, and with at least one objective function value being strictly smaller. Multiobjective optimization problems have applications in many fields. We refer to [15] for a list of applications.
One of the most well-known methods for multiobjective optimization problems is the scalarization approach [16, 20, 23]. It consists in solving one or several parametrized single-objective optimization problems to find a solution of the original multiobjective problem. However, in the non-convex case, we may not necessarily get all Pareto optimal solutions using this approach.
In recent years, descent methods for multiobjective optimization problems have attracted a lot of attention in the optimization community. For example, a steepest descent method for differentiable unconstrained multiobjective optimization problems was proposed in [13]. Afterwards, a proximal point method [8], that can be applied to nondifferentiable problems, was considered. However, this method is just a conceptual scheme and does not necessarily generate subproblems that are easy to solve. For nondifferentiable problems, a subgradient method was also developed in [10]. However, for some particular problems, these methods may not be efficient.
In this paper, we consider the following unconstrained multiobjective optimization problem:
where \(F:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) is a vector-valued function with \(F :=(F_1,\ldots ,F_m)^\top \) and \(\top \) denotes transpose. We assume that each component \(F_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) is defined by
where \(f_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) is continuously differentiable and \(g_i :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{\infty \}\) is proper convex and lower semicontinuous but not necessarily differentiable.
In order to solve (1.1), we propose a proximal gradient method, that combines proximal point and gradient methods [8, 13]. The proximal point method solves subproblems iteratively, and their objective functions are defined as the sum of the original function F with regularization terms. However, these subproblems are nonlinear in general, so the computational cost for solving them is possibly high. On the other hand, a proximal gradient method applies a gradient method for the differentiable part \(f_i\) and a proximal point method for the convex part \(g_i\). As it can be seen in the single objective function cases [3, 22], this method is shown to be efficient when \(g_i\) has some special structure.
We also observe that the problem and the proposed methods have many applications. For example, when \(g_i\) is an indicator function of a convex set S, (1.1) is equivalent to the optimization problems with constraints \(x \in S\). Also, as it can be seen in Sect. 5.2, we can deal with robust optimization problems. These problems include uncertain parameters and basically consists in optimizing under the worst scenario. Although the literature about robust optimization is vast, the studies about robust multiobjective optimization is relatively new [11, 14, 19].
The outline of this paper is as follows. We present some notations and notions of Pareto optimality and Pareto stationarity in Sect. 2. In Sect. 3, we propose the proximal gradient methods for unconstrained multiobjective optimization. We show the global convergence of the proposed algorithms in Sect. 4. In Sect. 5, we apply the proposed method to constrained problems and to robust optimization. Finally, we report some numerical experiments by solving robust multiobjective optimization problems in Sect. 6.
2 Preliminaries
Let us first present some notations that will be used in this paper. Let \({\mathbf {R}}\) denote the set of real numbers and let \({\mathbf {N}}\) be the set of positive integers. The symbol \(\Vert \cdot \Vert \) stands for the Euclidean norm in \({\mathbf {R}}^n\). We also define the relation \(\le \) (<) in \({\mathbf {R}}^m\) as \(u \le v\) (\(u < v\)) if and only if \(u_i \le v_i\) (\(u_i < v_i\)) for all \(i = 1, \ldots , m\). Moreover, we call
the directional derivative of \(h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) at x in the direction d. It is easy to see that \(h^\prime (x; d) = \nabla h(x)^\top d\) when h is differentiable at x, where \(\nabla h(x)\) denotes the gradient of h at x. The following well-known result shows a non-decreasing property when h is convex.
Lemma 2.1
[7, Section 4.3] Let \(h :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) be a convex function.
Then, the function \(\tilde{h} :(0, +\infty ) \rightarrow {\mathbf {R}}\) defined by
is non-decreasing. In particular, it follows that
Now, we introduce the concept of optimality for the multiobjective optimization problem (1.1). Recall that \(x^*\in {\mathbf {R}}^n\) is a Pareto optimal point for F, if there is no \(x \in {\mathbf {R}}^n\) such that \(F(x) \le F(x^*)\) and \(F(x) \ne F(x^*)\). The set of all Pareto optimal values is called Pareto frontier. Likewise, \(x^*\in {\mathbf {R}}^n\) is a weakly Pareto optimal point for F, if there is no \(x \in {\mathbf {R}}^n\) such that \(F(x) < F(x^*)\). It is known that Pareto optimal points are always weakly Pareto optimal, and the converse is not always true. We also say that \(\bar{x} \in {\mathbf {R}}^n\) is Pareto stationary (or critical), if and only if,
Observe that this definition generalizes the one given in [13], because here we have to deal with possibly nondifferentiable \(F_i\) (or, in particular, \(g_i\)). Moreover, instead of considering subdifferentials as in [10], we use the directional derivative notion. Still, with this definition, we can show in the next lemma that weakly Pareto optimal points are always Pareto stationary, but the converse is not always true. However, if every component \(F_i\) is convex, then Pareto stationarity implies weak Pareto optimality. Furthermore, if every component \(F_i\) is strictly convex, then Pareto stationary points are also Pareto optimal.
Lemma 2.2
-
1.
If \(x \in {\mathbf {R}}^n\) is a weakly Pareto optimal point of F, then x is Pareto stationary.
-
2.
Let every component \(F_i\) of F be convex. If \(x \in {\mathbf {R}}^n\) is a Pareto stationary point of F, then x is weakly Pareto optimal.
-
3.
Let every component \(F_i\) of F be strictly convex. If \(x \in {\mathbf {R}}^n\) is a Pareto stationary point of F, then x is Pareto optimal.
Proof
We show the contrapositions of statements 1, 2 and 3.
-
1.
Suppose that x is not Pareto stationary. Then, there exists \(d \in {\mathbf {R}}^n\) such that
$$\begin{aligned} F_i^\prime (x;d) < 0,\quad i = 1, \ldots , m. \end{aligned}$$By the definition of directional derivative, for a sufficiently small scalar \(\alpha > 0\) we obtain
$$\begin{aligned} F_i(x + \alpha d) - F_i(x) < 0,\quad i = 1, \ldots , m, \end{aligned}$$which means that x is not weakly Pareto optimal.
-
2.
Suppose that x is not weakly Pareto optimal. Then, there exists a point \(y \in {\mathbf {R}}^n\) such that \(y \ne x\) and
$$\begin{aligned} F_i(y) < F_i(x),\quad i = 1, \ldots , m. \end{aligned}$$(2.1)Since \(F_i\) is convex, it follows from Lemma 2.1 that
$$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } \le F_i(y) - F_i(x) \end{aligned}$$for an arbitrary i and \(\alpha \in (0, 1)\). From the convexity of \(F_i\) and Lemma 2.1, the left-hand side is non-decreasing with respect to \(\alpha \). Therefore, we have
$$\begin{aligned} F_i^\prime (x; y - x) \le F_i(y) - F_i(x) < 0, \end{aligned}$$where the second inequality follows from (2.1). Since this inequality holds for all \(i = 1, \ldots , m\), we have
$$\begin{aligned} \max _{i = 1,\ldots ,m} F_i^\prime (x; d) < 0 \quad \text {for } d = y - x. \end{aligned}$$Consequently, we conclude that x is not Pareto stationary.
-
3.
Suppose that x is not Pareto optimal. Then, there exists a point \(y \in {\mathbf {R}}^n\) such that \(y \ne x\) and \(F_i(y) \le F_i(x)\) for all \(i = 1, \ldots , m\). Since \(F_i\) is strictly convex, for each i we have
$$\begin{aligned} F_i(x + \alpha (y - x)) < F_i(x) + \alpha (F_i(y) - F_i(x)) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$Therefore, it follows that
$$\begin{aligned} \frac{F_i(x + \alpha (y - x)) - F_i(x)}{\alpha } < F_i(y) - F_i(x) \quad \text {for all } \alpha \in (0,1). \end{aligned}$$In the same way to statement 2, we obtain \(F_i^\prime (x; y - x) < 0\) for all \(i = 1, \ldots , m\), which concludes the proof. \(\square \)
3 Proximal gradient methods for multiobjective optimization
In this section, we propose two types of proximal gradient methods for unconstrained multiobjective optimization. Both generate some sequence \(\{x^k\}\) iteratively with the following procedure:
where \(d^k\) is a search direction, and \(t_k\) is a step size. For the method without line searches, we set \(t_k = 1\) in each iteration. Now, define the function \(\psi _x :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) by
where \(\nabla f_i(x)\) denotes the gradient of \(f_i\) at x. The term inside the maximum represents the approximation of \(F_i(x + d) - F_i(x)\). It is clear that \(\psi _x\) is convex and \(\psi _x(0)=0\). The following lemma shows an important property of \(\psi _x\).
Lemma 3.1
For all \(d \in {\mathbf {R}}^n\), the following equality holds:
Proof
Since \(\psi _x(0) = 0\), by the definition of directional derivative, we get
Moreover, the definition of \(\psi _x\) in (3.1) shows that
where the second equality follows from the continuity of the max function and the third one comes from the definition of directional derivative. Thus, the claim holds. \(\square \)
Now, let \(\ell \) be a positive constant. Here, we define \(\phi _{\ell , x} :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\) as
where the function \(\psi _x\) is defined in (3.1). Clearly, \(\phi _{\ell , x}\) is strongly convex and \(\phi _{\ell , x}(0) = 0\). Using this function, we define the search direction at an iteration k, which we call proximal gradient direction, as \(d^k = d_\ell (x^k)\), where
Remark 3.1
-
1.
When \(g_i = 0\), \(d_\ell (x)\) given in (3.2) corresponds to the search direction of the multiobjective steepest descent method [13]. On the other hand, when \(f_i = 0\), \(d_\ell (x)\) given in (3.2) corresponds to the search direction of the multiobjective proximal point method [8].
-
2.
Since \(\phi _{\ell , x}\) is strongly convex, (3.2) has a unique solution, and so \(d_\ell (x)\) is well-defined.
-
3.
Since \(\phi _{\ell , x}(0) = 0\), we have \(\phi _{\ell , x}(d_\ell (x)) \le 0\).
Let \(\beta _\ell (x)\) be the optimal value in (3.2), i.e.,
The following lemma characterizes the stationarity in terms of \(d_\ell (\cdot )\) and \(\beta _\ell (\cdot )\).
Lemma 3.2
Let \(d_\ell (x)\) and \(\beta _\ell (x)\) be defined in (3.2) and (3.3), respectively. Then, the following statements hold.
-
1.
If x is Pareto stationary, then \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\). Conversely, if \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\), then x is Pareto stationary.
-
2.
If x is not Pareto stationary, then \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\). Conversely, if \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\), then x is not Pareto stationary.
-
3.
The mappings \(d_\ell (\cdot )\) and \(\beta _\ell (\cdot )\) are continuous.
Proof
-
1.
Let x be Pareto stationary. Suppose, for the purpose of contradiction, that \(d_\ell (x) \ne 0\) or \(\beta _\ell (x) < 0\). From statements 2 and 3 in Remark 3.1 it follows that \(d_\ell (x) \ne 0\) if and only if \(\beta _\ell (x) < 0\). This means that \(d_\ell (x) \ne 0\) and \(\beta _\ell (x) < 0\). Therefore, we see that
$$\begin{aligned} \beta _\ell (x) = \psi _x(d_\ell (x)) + \frac{\ell }{2} \Vert d_\ell (x)\Vert ^2 < 0. \end{aligned}$$(3.4)
Since \(\psi _x\) is convex and \(\psi _x(0) = 0\), we get
where the last inequality follows from (3.4). Thus, for all \(\alpha \in (0, 1)\) we have
Since \(d_\ell (x) \ne 0\) and \(\ell > 0\), letting \(\alpha \searrow 0\) we obtain
It then follows from Lemma 3.1 that
which contradicts the Pareto stationarity of x.
Let us now prove the converse. Then, suppose that \(d_\ell (x)=0\) and \(\beta _\ell (x)=0\). From the definition of \(\beta _\ell (x)\) given in (3.3), we have
Let \(\alpha \in (0,1)\). We get
Letting \(\alpha \searrow 0\) and using Lemma 3.1, we obtain
which is our claim.
-
2.
This statement is equivalent to statement 1.
-
3.
It is easy to see that the function
$$\begin{aligned} \max _{i = 1,\ldots ,m} \left\{ \nabla f_i(x)^\top d + g_i(x+d) - g_i(x) \right\} + \frac{\ell }{2} \Vert d\Vert ^2 \end{aligned}$$(3.5)
is continuous with respect to x and d. Therefore, the optimal value function \(\beta _\ell (\cdot )\) is also continuous from [5, Maximum Theorem]. Moreover, since the optimal set mapping \(d_\ell (\cdot )\) is unique, \(d_\ell (\cdot )\) is continuous from [18, Corollary 8.1]. \(\square \)
3.1 A proximal gradient method with line searches
In this section, we present the proposed method with line searches. To compute the step length \(t_k > 0\), we use an Armijo rule. Let \(\rho \in (0,1)\) be a prespecified constant. The condition to accept \(t_k\) is given by
We begin with \(t_k = 1\) and while (3.6) is not satisfied, we update
where \(\xi \in (0,1)\). The following lemma demonstrates the finiteness of this procedure.
Lemma 3.3
Let \(d^k\) be defined in (3.2) with \(x = x^k\) and assume that \(x^k\) is not Pareto stationary. If \(\rho \in (0, 1)\), then there exists some \(\bar{t}_k > 0\) such that
for any \(t \in (0, \bar{t}_k]\).
Proof
Let \(t \in (0, 1]\). Since \(g_i\) is convex for all \(i = 1, \ldots , m\), we have
Therefore, from the differentiability of f we obtain
where the last inequality comes from the definition (3.1) of \(\psi _{x^k}\). Since \(x^k\) is not Pareto stationary, we have \(\psi _{x^k} (d^k) < 0\) from Lemma 3.2. Thus, if \(\rho \in (0,1)\), then there exists some \(\bar{t}_k > 0\) such that
for any \(t \in (0, \bar{t}_k]\). \(\square \)
Finally, based on the previous discussions, we state below the proposed method, considering line searches.
Algorithm 3.1
- Step 1 :
-
Choose \(\ell > 0,\ \rho \in (0,1),\ \xi \in (0,1),\ x^0 \in {\mathbf {R}}^n\) and set \(k :=0\).
- Step 2 :
-
Compute \(d^k\) by solving subproblem (3.2) with \(x = x^k\).
- Step 3 :
-
If \(d^k = 0\), then stop.
- Step 4 :
-
Compute the step length \(t_k \in (0, 1]\) as the maximum of
$$\begin{aligned} T_k :=\{ t = \xi ^j \mid j \in {\mathbf {N}},\, F_i(x^k + t_k d^k) \le F_i(x^k) + t_k \rho \psi _{x^k}(d^k),\, i = 1,\ldots ,m \} \end{aligned}$$ - Step 5 :
-
Set \(x^{k+1} :=x^k + t_k d^k\), \(k :=k + 1\), and go to Step 2.
Observe that from Lemma 3.2, the algorithm stops at Step 3 with a Pareto stationary point or produces an infinite sequence of nonstationary points \(\{x^k\}\). If Step 4 is reached in some iteration k, it means that in Step 3, \(d^k \ne 0\), or equivalently, \(\beta _\ell (x^k) < 0\). Thus, we have \(\psi _{x^k}(d^k) < 0\). From the Armijo condition, we conclude that all objective functions decrease, i.e.,
3.2 A proximal gradient method without line searches
In this section, we assume that \(\nabla f_i\) is Lipschitz continuous with constant L. When we set \(\ell > L / 2\) in (3.2), we can fix the step length \(t_k = 1\) for every iteration. We then state below the proposed method.
Algorithm 3.2
- Step 1 :
-
Choose \(\ell > L / 2,\ x^0 \in {\mathbf {R}}^n\) and set \(k :=0\)
- Step 2 :
-
Compute \(d^k\) by solving subproblem (3.2) with \(x = x^k\).
- Step 3 :
-
If \(d^k = 0\), then stop.
- Step 4 :
-
Set \(x^{k+1} :=x^k + d^k\), \(k :=k + 1\), and go to Step 2.
Similarly to Algorithm 3.1, it stops with a Pareto stationary point or generates an infinite sequence of nonstationary points. Moreover, as we can see in Lemma 4.3, the objective function values also decrease in each iteration.
4 Convergence analysis
In this section, we prove that the sequences generated by Algorithm 3.1 and Algorithm 3.2 converge to Pareto stationary points, respectively. From now on, we assume that an infinite sequence is generated. To begin with, we recall the so-called three points property, which is a key to show convergence of proximal point type algorithms.
Theorem 4.1
(Three points property) [9, Lemma 3.2] Let \(\theta :{\mathbf {R}}^n \rightarrow {\mathbf {R}}\cup \{ \infty \}\) be proper convex and define
Then, for all \(z \in {\mathbf {R}}^n\), we have
In order to show the convergence of the search direction, we first prove the following lemma.
Lemma 4.1
Let \(\{d^k\}\) be generated by Algorithms 3.1 or 3.2 and recall the definition of \(\psi _x\) in (3.1). Then, we have
Proof
Defining \(\theta :=\psi _{x^k} / \ell \) we can rewrite (3.2) with \(x = x^k\) as
Thus, substituting \(x^*= d^k\) and \(y = z = 0\) into Theorem 4.1, we get \(\theta (d^k) - \theta (0) \le -\Vert d^k\Vert ^2\). Therefore, recalling that \(\psi _{x^k} (0) = 0\), we have
\(\square \)
4.1 Convergence of Algorithm 3.1
Let us recall the algorithm with line searches. We first show the convergence of the length of steps \(\Vert x^{k + 1} - x^k\Vert \).
Lemma 4.2
Let \(\{d^k\}\) be generated by Algorithm 3.1 and suppose that \(\{F_i (x^k)\}\) is bounded from below for all \(i = 1,\ldots ,m\). Then, it follows that
Proof
From Lemma 4.1 and from (3.6) it follows that
Since \(\{F_i (x^k)\}\) is bounded from below, there exists \(\tilde{F}_i \le F_i(x^k)\) for all i and k. Adding up the above inequality from \(k = 0\) to \(k = \tilde{k}\), we obtain
Thus, we have
Taking \(\tilde{k} \rightarrow \infty \), we have \(\sum _{k = 0}^\infty t_k \Vert d^k \Vert ^2 < \infty \). \(\square \)
The next theorem is our main result. Basically, if the sequence produced by Algorithm 3.1 has accumulation points, then they are all Pareto stationary.
Theorem 4.2
Every accumulation point of the sequence \(\{x^k\}\) generated by Algorithm 3.1, if it exists, is a Pareto stationary point. In particular, if the level set of each \(F_i\) is bounded, then \(\{x^k\}\) has accumulation points and they are all Pareto stationary.
Proof
The second statement follows immediately from the first. Let \(\bar{x}\) be an accumulation point of \(\{x^k\}\) and let \(\{x^{k_j}\}\) be a subsequence converging to \(\bar{x}\). From statement 3 of Lemma 3.2, we have \(d^{k_j} = d_\ell (x^{k_j}) \rightarrow d_\ell (\bar{x})\). Here, it is sufficient to show that \(d_\ell (\bar{x}) = 0\) because of statments 1 and 3 of Lemma 3.2. Suppose for contradiction that \(d_\ell (\bar{x}) \ne 0\). Then, it follows from Lemma 4.2 that \(t_{k_j} \rightarrow 0\) since the existence of an accumulation point of \(\{x^k\}\) implies boundedness of \(\{F_i(x^k)\}\) for all i. Therefore, by the definition of \(t_{k_j}\) in Step 4 of Algorithm 3.1, for sufficiently large j there exists some \(i_{k_j} \in \{1,\ldots ,m\}\) such that
Since i only takes finite number of values \(\{1,\ldots ,m\}\), we can assume that \(i_{k_j} = \bar{i}\) without loss of generality. We thus obtain
Recall that \(0< \xi ^{-1} t_{k_j} < 1\). It follows from the definition (3.1) of \(\psi _{x^{k_j}}\) that
where the second inequality comes from the convexity of \(g_i\) and Lemma 2.1, and the first equality follows from the differentiability of f. Therefore, we get
We thus get
On the other hand, Lemma 4.1 yields
Since \(d^{k_j} \rightarrow d_\ell (\bar{x}) \ne 0\), there exists some \(\delta > 0\) such that
where the last inequality comes from (4.2). Therefore, we obtain
From (4.3) and (4.4), it follows that
Taking \(j \rightarrow \infty \), we have \(0 \le - \delta \), which contradicts the fact that \(\delta > 0\). Therefore, we conclude that \(d_\ell (\bar{x}) = 0\). \(\square \)
4.2 Convergence of Algorithm 3.2
Let us now show the convergence of the search direction for the algorithm without line searches. Recall that we have to assume Lipschitz continuity of \(\nabla f_i\), with a constant \(L > 0\).
Lemma 4.3
Let \(\{d^k\}\) be generated by Algorithm 3.2 and suppose that \(\{F_i (x^k)\}\) is bounded from below for all \(i = 1,\ldots ,m\). Then, we have
Proof
From the so-called descent Lemma [6, Proposition A.24] and by Lipschitz continuity of \(\nabla f_i\), we obtain
At the kth iteration, we have
Here, the first inequality follows from (4.5), the second one follows from the definition (3.1) of \(\psi _{x^k}\), and the third one comes from Lemma 4.1. Since \(\{F_i (x^k)\}\) is bounded from below, there exists \(\tilde{F}_i \le F_i(x^k) = f_i(x^k) + g_i(x^k)\) for all \(i,\, k\). Adding up the above inequality from \(k = 0\) to \(k = \tilde{k}\), we obtain
Since \(\ell > L / 2\), we have
Taking \(\tilde{k} \rightarrow \infty \), we obtain
and hence \(\lim _{k \rightarrow \infty } \Vert d^k \Vert = 0\). \(\square \)
The next theorem directly follows from Lemmas 4.3 and 3.2.
Theorem 4.3
Every accumulation point of the sequence \(\{x^k\}\) generated by Algorithm 3.2, if it exists, is a Pareto stationary point. In particular, if the level set of each \(F_i\) is bounded, then \(\{x^k\}\) has accumulation points and they are all Pareto stationary.
5 Applications
In this section, we consider two applications of multiobjective optimization problem (1.1) with (1.2), and discuss how to solve subproblems (3.2) in these particular applications.
5.1 Application to constrained multiobjective optimization
In this section, we consider the following constrained multiobjective optimization problem:
where \(f:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) is a vector-valued function with \(f :=(f_1,\ldots ,f_m)^\top \) and \(S \subset {\mathbf {R}}^n\) is convex. Suppose that each component \(f_i\) of f is continuously differentiable. Let \(g:{\mathbf {R}}^n \rightarrow {\mathbf {R}}^m\) be a vector-valued function with \(g :=(g_1,\ldots ,g_m)^\top \), where each \(g_i\) is the indicator function of S, i.e.,
Then, we can rewrite the search direction given in (3.2) with \(x = x^k\) as
which coincides with the projected gradient direction for multiobjective optimization [17].
5.2 Application to robust multiobjective optimization
Now, let us apply the proposed algorithms to robust multiobjective optimization. Here, we suppose that the problems include uncertain parameters. Moreover, suppose that we can estimate the set of these uncertain parameters. Then, we try to optimize by considering the worst scenario. We observe that studies about robust multiobjective optimization is relatively new [11, 14, 19].
Here, we consider the convex function \(g_i\) defined as follows:
We call \({\mathcal {U}}_i \subseteq {\mathbf {R}}^n\) an uncertainty set. From now on, we assume \({\mathcal {U}}_i \subset {\mathbf {R}}^n\) and \(\hat{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^n \rightarrow {\mathbf {R}}\) to be convex with respect to x. It is easy to see that \(g_i\) is also convex. However, \(g_i\) is not necessarily differentiable even if \(\hat{g}_i\) is differentiable. First, let us reformulate the subproblem (3.2) by using an extra variable \(\gamma \in {\mathbf {R}}\) as
Note that \(g_i\) is not easy to calculate, and thus, the subproblem is difficult to solve. When \(\hat{g}_i\) and \(\mathcal {U}_i\) have some special structure, the constraints can be written as explicit formulae by using the duality of (5.2). Now, assume that the dual problem of the maximization problem (5.2) is written as follows:
where \(\tilde{g}_i :{\mathbf {R}}^n \times {\mathbf {R}}^m \rightarrow {\mathbf {R}}\) and \(\displaystyle \tilde{\mathcal {U}}_i :{\mathbf {R}}^n \rightarrow 2^{{\mathbf {R}}^m}\). If strong duality holds, then we see that the subproblem (3.2) is equivalent to
When \(\tilde{g}_i\) and \(\tilde{\mathcal {U}}_i\) have some explicit form, this problem is tractable. As we mention below, in this case, we can convert the above subproblem to some well-known convex optimization problems. This idea can be also seen in [4]. In the following, we will introduce some robust multiobjective optimization problems where the subproblems can be written as quadratic programming, second-order cone programming or semidefinite programming problems.
5.2.1 Linearly constrained quadratic programming problem case
Suppose that \(\hat{g}_i (x,u) = u^\top x\) and \({\mathcal {U}}_i = \{ u \in {\mathbf {R}}^n \mid A_i u \le b_i \}\), where \(A_i \in {\mathbf {R}}^{d \times n}\) and \(b_i \in {\mathbf {R}}^d\), that is, \(\hat{g}_i\) is linear in x, and \(\mathcal {U}_i\) is a polyhedron. Suppose also that \(\mathcal {U}_i\) is nonempty and bounded. Then, we can rewrite (5.2) as the following linear programming problem:
Its dual problem is given by
Since the strong duality holds, we can convert the subproblem (3.2) [or, equivalently (5.3)] to a linearly constrained quadratic programming problem:
5.2.2 Second-order cone programming problem case
Suppose that \(\hat{g}_i (x,u) = u^\top x\) and \({\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1,\, v \in {\mathbf {R}}^n \}\), where \(a_i \in {\mathbf {R}}^n\) and \(\ P_i \in {\mathbf {R}}^{n \times n}\), that is, \(\hat{g}_i\) is once again linear in x and \(\mathcal {U}_i\) is an ellipsoid. Then, for all \(i = 1,\ldots ,m\) we have
If \(P_i^\top x = 0\), then \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = 0 = \Vert P_i^\top x \Vert \). If \(P_i^\top x \ne 0\), then \(\frac{P_i^\top x}{\Vert P_i^| x\Vert }\) is a solution of \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v\), and hence \(\displaystyle \max _{v : \Vert v \Vert \le 1} (P_i^\top x)^\top v = \Vert P_i^\top x \Vert \). Consequently, we have
Therefore, introducing slack variables \(\gamma \in {\mathbf {R}}\) and \(\tau \in {\mathbf {R}}\), the subproblem (3.2) can be written as
Note that convex quadratic constraints can be converted to second-order cone constraints. Using the expression given in [1, Section 2.1], we get the following second-order cone programming problem (SOCP):
where \(\mathcal {K}_q :=\{(y_0, \bar{y}) \in {\mathbf {R}}\times {\mathbf {R}}^{q - 1} \mid y_0 \ge \Vert \bar{y} \Vert \}\) is the second-order cone in \({\mathbf {R}}^{q}\). The above SOCP can be solved efficiently with an interior point method [1].
5.2.3 Semidefinite programming problem case
Suppose thatFootnote 1\(\hat{g}_i (x,u) = (x + u)^\top A_i (x + u)\) and \({\mathcal {U}}_i = \{ a_i + P_i v \in {\mathbf {R}}^n \mid \Vert v \Vert \le 1 \}\), where \(A_i \in {\mathbf {R}}^{n \times n}\) and \(A_i \succeq O,\, a_i \in {\mathbf {R}}^n\) and \(P_i \in {\mathbf {R}}^{n \times n}\). Then, there exists a matrix \(M_i \in {\mathbf {R}}^{n \times n}\) such that \(A_i = M_i M_i^\top \). Note that \(\hat{g}_i\) is convex quadratic and \(\mathcal {U}_i\) is an ellipsoid. Here, without loss of generality we can assume that A is a symmetric matrix since \((x + u)^\top A_i (x + u) = (x + u)^\top \tilde{A}_i (x + u)\), where \(\tilde{A}_i :=(A_i+A_i^\top ) / 2\). Then, \(g_i(x)\) can be given as
Since problem (5.7) is a maximization problem of a convex function, it is not a convex optimization problem. Fortunately, it can be seen as a subproblem of a trust region method, so its optimal value \(g_i(x)\) can be obtained efficiently. Considering (5.7), we observe that
From [2, Section 3], the Lagrangian dual of the maximization problem (5.8) is given by
where \(I_n\) stands for the identity matrix of dimension n. Let \((\alpha ^*, w^*)\) be an optimal solution of (5.9) and assume thatFootnote 2\(\dim (\ker (A_i + \alpha ^*I_n)) \ne 1\). Since both (5.8) and (5.9) have strictly feasible solutions and \(I_n \succ O\), then the strong duality holds from [2, Theorem 3.5]. Therefore, recalling (5.3), the subproblem (3.2) is equivalent to
Now, by using slack variables \(\tau \in {\mathbf {R}}\) and \(\zeta _i \in {\mathbf {R}}\) and converting the convex quadratic constraints to second-order cone ones, we get the following semidefinite programming problem:
where O stands for a zero matrix with appropriate dimension. Note that the second-order cone constraints can be converted further into semidefinite constraints.
6 Numerical experiments
In this section, we present some numerical results using Algorithm 3.2 for the problems in Sect. 5.2. The experiments are carried out on a machine with a 1.8 GHz Intel Core i5 CPU and 8GB memory, and we implement all codes in MATLAB R2017a. We consider the problem (1.1), where \(n = 5,\, m = 2,\, f_i(x) = \frac{1}{2} x^\top A_i x + a_i^\top x,\, g_i(x) = \max _{u \in {\mathcal {U}}_i } \hat{g}_i (x,u),\ A_i \in {\mathbf {R}}^{n \times n},\, a_i \in {\mathbf {R}}^n\), and \(\hat{g}_i:{\mathbf {R}}^n \rightarrow {\mathbf {R}},\, i = 1,\ldots , m\). Here, we assume that each \(A_i\) is positive semidefinite, so it can be decomposed as \(A_i = M_i M_i^\top \), where \(M_i \in {\mathbf {R}}^{n \times n}\). We generate \(M_i\) and \(a_i\) by choosing every component randomly from the standard normal distribution. To implement Algorithm 3.2, we make the following choices.
Remark 6.1
-
Every component of \(x^0\) is chosen randomly from the standard normal distribution.
-
In Experiments 1 and 3, we set the constant \(\ell = 5\). In Experiment 2, we set the constant \(\ell = 7\).
-
The terminate criteria is replaced by \(\Vert d^k\Vert < \varepsilon :=10^{-6}\).
Also, we run each one of the following experiments 100 times from different initial points, and with \(\delta = 0, 0.05, 0.1\). Naturally, when \(\delta = 0\), no uncertainties are considered.
6.1 Experiment 1
In the first experiment, we solve the problem of Sect. 5.2(a). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x,\, i = 1,2\), where \(\mathcal {U}_1 = \{u \in {\mathbf {R}}^5\mid -\delta \le u_i \le \delta ,\, i = 1,\ldots ,5 \}\) and \(\mathcal {U}_2 = \{u \in {\mathbf {R}}^5\mid -\delta \le (Bu)_i \le \delta ,\, i = 1,\ldots ,5 \}\). Here, every component of \(B \in {\mathbf {R}}^{5 \times 5}\) is chosen randomly from the standard normal distribution and \(\delta \ge 0\). We use the MATLAB solver linprog to solve (5.4) and quadprog to solve (5.5). Figure 1 is the result for this experiment. For each \(\delta \), we obtained part of the Pareto frontier, and as \(\delta \) gets smaller the objective values become smaller.
6.2 Experiment 2
In the second experiment, we solve the problem of Sect. 5.2(b). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } u^\top x\), where \(\mathcal {U}_i = \{u \in {\mathbf {R}}^5 \mid \Vert u\Vert \le \delta \},\, i = 1, 2\). We use the MATLAB solver SeDuMi [21] to solve (5.6). Figure 2 is the result for this experiment. Once again, we obtained part of the Pareto frontier for the problems with and without uncertainties.
6.3 Experiment 3
Now, in the last experiment, we solve the problem of Sect. 5.2(c). We assume that \(g_i(x) = \max _{u \in \mathcal {U}_i } (u + x)^\top B B^\top (u + x)\), where \(\mathcal {U}_i = \{u \in {\mathbf {R}}^5\ | \ \Vert u\Vert \le \delta \},\, i = 1, 2\). Here, once again, every component of \(B \in {\mathbf {R}}^{5 \times 5}\) is chosen randomly from the standard normal distribution and \(\delta \ge 0\). We use the MATLAB solver fmincon to solve (5.7) and SeDuMi to solve (5.10). As it can be seen in Fig. 3, we also obtained the Pareto frontier in this case.
7 Conclusion
We proposed proximal gradient methods for unconstrained multiobjective optimization problems with, and without line searches. Under reasonable assumptions, we proved that each accumulation points of the sequences generated by the proposed algorithms are Pareto stationary. Moreover, we presented some applications in constrained optimization and robust multiobjective optimization. In constrained optimization, the proposed search direction is equivalent to the projected gradient direction, and in some robust optimization problems we can convert the subproblems to well-known optimization problems. Finally, we carried out some numerical experiments for robust multiobjective optimization problems and we observed that the Pareto frontier changes when the uncertainty set is modified.
We have not analyzed the proposed methods in view of the convergence rate. It is known that scalar-valued proximal gradient method is sublinear convergent under reasonable assumptions [3]. In recent years, faster methods such as Newton’s method [12] for differentiable multiobjective optimization problem have been also proposed. Therefore, an interesting topic for future research is to investigate the convergence rate of the proposed methods and to propose a proximal Newton-type algorithm for multiobjective optimization.
Notes
We denote \(A \succeq (\succ ) O\) when A is positive semidefinite (positive definite). Also, \(A \succeq (\succ ) B\) if and only if \(A - B \succeq (\succ ) O\).
Here, \(\dim \) denotes dimension of a space and \(\ker \) means kernel of a matrix.
References
Alizadeh, F., Goldfarb, D.: Second-order cone programming. Math. Program. 95(1), 1–52 (2001)
Beck, A., Eldar, Y.: Strong duality in nonconvex quadratic optimization with two quadratic constraints. SIAM J. Optim. 17(3), 844–860 (2006)
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009)
Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res. 23(4), 769–805 (1998)
Berge, C., Patterson, E.M.: Topological Spaces. Dover Publications, Edinburgh (1963)
Bertsekas, D.P.: Nonlinear Programming, 2nd edn. Athena Scientific, Belmont (1999)
Bertsekas, D.P.: Convex Analysis and Optimization. Athena Scientific, Belmont (2003)
Bonnel, H., Iusem, A.N., Svaiter, B.F.: Proximal methods in vector optimization. SIAM J. Optim. 15(4), 953–970 (2005)
Chen, G., Teboulle, M.: Convergence analysis of a proximal-like minimization algorithm using Bregman functions. SIAM J. Optim. 3(3), 538–543 (1993)
Cruz Neto, J.X., Silva, G.J.P., Ferreira, O.P., Lopes, J.O.: A subgradient method for multiobjective optimization. Comput. Optim. Appl. 54(3), 461–472 (2013)
Ehrgott, M., Ide, J., Schöbel, A.: Minmax robustness for multi-objective optimization problems. Eur. J. Oper. Res. 239(1), 17–31 (2014)
Fliege, J., Graña Drummond, L.M., Svaiter, B.F.: Newton’s method for multiobjective optimization. SIAM J. Optim. 20(2), 602–626 (2009)
Fliege, J., Svaiter, B.F.: Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 51(3), 479–494 (2000)
Fliege, J., Werner, R.: Robust multiobjective optimization and applications in portfolio optimization. Eur. J. Oper. Res. 234(2), 422–433 (2014)
Fukuda, E.H., Graña Drummond, L.M.: A survey on multiobjective descent methods. Pesquisa Operacional 34(3), 585–620 (2014)
Geoffrion, A.M.: Proper efficiency and the theory of vector maximization. J. Math. Anal. Appl. 22(3), 618–630 (1968)
Graña Drummond, L.M., Iusem, A.N.: A projected gradient method for vector optimization problems. Comput. Optim. Appl. 28(1), 5–29 (2004)
Hogan, W.: Point-to-set maps in mathematical programming. SIAM Rev. 15(3), 591–603 (1973)
Morishita, M.: A descent method for robust multiobjective optimization in the presence of implemention errors. Master’s thesis, Kyoto University (2016)
Saul, G., Thomas, S.: The computational algorithm for the parametric objective function. Naval Res. Logist. Q. 2(12), 39–45 (1955)
Sturm, J.F.: Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones. Optim. Methods Softw. 11(1–4), 625–653 (1999)
Tseng, P.: Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Program. 125(2), 263–295 (2010)
Zadeh, L.: Optimality and non-scalar-valued performance criteria. IEEE Trans. Autom. Control 8(1), 59–60 (1963)
Acknowledgements
This work was supported by the Kyoto University Foundation, and the Grant-in-Aid for Scientific Research (C) (17K00032) from Japan Society for the Promotion of Science. We are also grateful to the anonymous referees for their useful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tanabe, H., Fukuda, E.H. & Yamashita, N. Proximal gradient methods for multiobjective optimization and their applications. Comput Optim Appl 72, 339–361 (2019). https://doi.org/10.1007/s10589-018-0043-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-018-0043-x