1 Introduction

Constrained multicriteria minimization problems appear frequently in many different areas, such as statistics (Carrizosa and Frenk 1998), management science (Gravel et al. 1992; White 1998), environmental analysis (Leschine et al. 1992), space exploration (Tavana 2004), design (Fu and Diwekar 2004) and engineering (Eschenauer et al. 1990). There are many strategies for solving such problems; one of the most popular techniques is the weighting method, where one minimizes a linear combination of the objectives. Its main drawback is the fact that we do not know a priori which are the suitable weights of this combination, i.e., those that do not lead to unbounded problems. Some extensions of classical real-valued methods to the vector-valued setting have been recently proposed to overcome that disadvantage. In this work, we propose an extension of Zangwill’s external penalty method (Zangwill 1967), which is a well-known technique for constrained scalar optimization problems. It is an iterative real-valued method that consists in adding a certain term to the objective function, in such a way that its value increases with the violation of the original restrictions, and then minimizing this penalized function in the whole space. In general, solutions of the penalized problems are infeasible, but, for large values of the so-called penalty parameters, the iterates are close to the constraint set. In other words, one substitutes a constrained problem by a sequence of unconstrained ones, for which we do have efficient solving techniques. Under reasonable assumptions, the sequence of those unconstrained minimizers converges to an optimal point of the original problem.

As far as we know, there was one attempt to generalize this strategy to multiobjective optimization: In 1984, White proposed a method (White 1984) that, at each iteration, requires the Pareto unconstrained minimization of the penalized objective. In order to obtain Pareto optimality of the accumulation points of the generated sequences, an extra condition (not necessary in the scalar case) is required.

Here, we present a vector-valued version of Zangwill’s procedure, which, at each iteration, requires solving an unconstrained scalar problem. This new extension shares some features with its real-valued counterpart. For instance, in general, the generated sequence is also infeasible and, along it, the penalized objective values are nondecreasing. Moreover, as in the real-valued case, all accumulation points, if any, are optima of the original problem. Finally, the conditions under which the sequence fully converges to an optimal point are generalizations of the hypotheses required in the scalar case, and no additional assumptions are needed. Besides the (diverging) parameter sequence of positive real numbers and the penalty function, the proposed method uses an auxiliary function (whose presence in the scalar case would be irrelevant) and, depending on which one is chosen, the convergence will be to Pareto or weak Pareto optimal points.

The strategy proposed here has similar properties as other classical scalar-valued methods’ extensions, as the steepest descent (Fliege and Svaiter 2000; Drummond 2005), the projected gradient (Fukuda and Graña Drummond 2011, 2013; Graña Drummond and Iusem 2004), Newton (Fliege et al. 2009; Graña Drummond et al. 2014) and the proximal point (Bonnel et al. 2005) vector-valued methods. All these extensions have in common an important feature: The iterates are computed by solving scalar optimization subproblems; moreover, each iterate could be implicitly obtained by the application of the corresponding real-valued method to a certain (a priori unknown) linear combination of the objectives. All of these procedures just seek a single Pareto (or weak Pareto) point and the convergence results are natural extensions of their scalar correlatives. Nevertheless, as shown in Fliege et al. (2009), through numerical experiments, by initializing them with randomly chosen points, in some cases, we can expect to obtain good approximations of the optimal set.

The outline of this paper is as follows. In Sect. 2, we introduce the problem, as well as the notion of vector external penalty function with some examples. In Sect. 3, we define a couple of auxiliary function classes, state their properties and show examples. In Sect. 4, we present the external penalty method for multiobjective optimization; we make some comments, study its behavior and exhibit a very simple example, in which it works far better than the weighting method. The convergence analysis is in Sect. 5; basically, it consists of the extension of the classical results for the single objective case. We also propose an implementable local version of the method, analyzing its convergence. Additionally, a brief comparison with White’s method is presented. In Sect. 6, we present another family of auxiliary functions and show an example in which, by simply varying a single parameter on the auxiliary function, we can retrieve the whole optimal set. Finally, in Sect. 7, we make some final remarks, briefly commenting on when the whole (weak) Pareto frontier can obtained by using the method.

2 The problem and the penalty-type functions

Let \(\mathbb {R}^m\) be endowed with the partial order induced by \(\mathbb {R}^m_+\), the nonnegative orthant of \(\mathbb {R}^m\), given by

$$\begin{aligned} u \le v \quad \text{ if } \quad u_j \le v_j \quad \text{ for } \text{ all } \quad j = 1,\dots ,m. \end{aligned}$$

We also consider the following stronger relation defined in \(\mathbb {R}^m\):

$$\begin{aligned} u < v \quad \text{ if } \quad u_j < v_j \quad \text{ for } \text{ all } \quad j = 1,\dots ,m. \end{aligned}$$

Given a continuous function \(f :\mathbb {R}^n \rightarrow \mathbb {R}^m\) and a nonempty closed set \(D \subseteq \mathbb {R}^n\), consider the following constrained vector-valued optimization problem:

$$\begin{aligned} \begin{array}{ll} \text{ minimize } &{} f(x) \\ \text{ subject } \text{ to } &{} x \in D, \end{array} \end{aligned}$$
(1)

understood in the Pareto or weak Pareto sense.

We recall that a point \(x^* \in D\) is a weak Pareto optimal solution of (1) if there does not exist \(x \in D\) such that \(f(x) < f(x^*)\). The point \(x^* \in D\) is a Pareto optimal solution of (1) if there does not exist \(x \in D\) such that \(f(x) \le f(x^*)\) and \(f_{j_0}(x) < f_{j_0}(x^*)\) for at least one \(j_0 \in \{1,\dots ,m\}\). Note that a Pareto optimal solution is a weak Pareto optimum.

We intend to find Pareto or weak Pareto solutions of the above problem by means of an external penalty-type method. For the vector-valued case, we may define a penalty function for the feasible set as follows.

Definition 1

A vector external penalty function for the set D is a continuous function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\) such that \(P(x) = 0\) if and only if \(x \in D\).

Note that, when \(m = 1\), we retrieve the classical notion of external penalty function used in scalar-valued optimization (Luenberger 2003). In order to ensure that \(\tilde{x} \in D\), this point must satisfy the following system of nonlinear equations

$$\begin{aligned} P_j(\tilde{x}) = 0, \quad j = 1,\dots ,m. \end{aligned}$$

But, whenever each \(P_j\) is a (scalar-valued) external penalty function for D, then, for guaranteeing that \(\tilde{x} \in D\), it suffices that \(P_j(\tilde{x}) = 0\) for just one j (no matter which one). A particular case of this situation is when \(P_j = \hat{P}\) for all j, where \(\hat{P}\) is a scalar-valued external penalty function for D.

Differently from the scalar case, a vector external penalty function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\) does not necessarily satisfy \(P(x) > 0\) if and only if \(x \notin D\). Indeed, for \( m \ge 2\), it suffices to take, for instance, \(P_1(x) = 0\) for all \(x \in \mathbb {R}^n\) and, for \(j \ne 1\), \(P_j\) such that P is a vector external penalty for D. But, if we take P such that, for all i, \(P_i\) is a (scalar-valued) external penalty function for D, then we clearly do have that \(P(x) > 0\) if and only if \(x \notin D\); in particular, this happens to those vector external penalty functions that have all components equal to a certain real-valued external penalty function for D.

2.1 Examples of external penalty functions

We now give some examples of external penalty-type functions.

Example 1

Consider the following classical case of a constraint set for both scalar and vector-valued optimization:

$$\begin{aligned} D := \big \{ x \in \mathbb {R}^n :g(x) \le 0, \, h(x) = 0 \big \}, \end{aligned}$$
(2)

where \(g :\mathbb {R}^n \rightarrow \mathbb {R}^q\) and \(h :\mathbb {R}^n \rightarrow \mathbb {R}^r\) are continuous functions. For simplicity, let us suppose that \(r+q \le m\). Take

$$\begin{aligned} p(x)^\top := \Big ( \max \{0,g_1(x)\}, \dots , \max \{0,g_q(x)\}, h_1(x), \dots , h_r(x) \Big ) \end{aligned}$$

for all \(x \in \mathbb {R}^n\), with \(\top \) denoting the transpose. Define \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\) as follows:

$$\begin{aligned} \begin{array}{ll} P_i(x) = |p_i(x)|^\beta , &{} \quad i = 1,\dots ,r+q,\\ P_i(x) = 0, &{} \quad i= r+q+1,\dots ,m \end{array} \end{aligned}$$

for all \(x \in \mathbb {R}^n\), where \(\beta \ge 1\). Since P is continuous and \(P(x) = 0\) if and only if \(x \in D\), P is an external penalty function for D. As we know, if fgh are smooth, P is also differentiable for \(\beta > 1\). If \( r+q > m\), we can simply take \(P(x)^\top = (\sum _{i=1}^{r+q} |p_i(x)|^\beta ,0,\dots ,0)\) or \(P(x) = \sum _{i=1}^{r+q} |p_i(x)|^\beta e\), where \(e = (1,\dots ,1)^\top \in \mathbb {R}^m\).

In the next example, we show that, as in some scalar problems, the distance from a point to a constraint set D can be taken as an external penalty function. First, let us mention that, from now on, \(\Vert \cdot \Vert \) will always stand for the Euclidean norm, i.e., \(\Vert x\Vert ^2 : = \langle x,x \rangle \), where \(\langle x,y \rangle : = \sum _{i=1}^n x_iy_i\) for all \(x,y \in \mathbb {R}^n\).

Example 2

Let us consider the case in which D is defined by finitely many homogeneous linear equations. Assume that \(D : = \bigcap _{i=1}^r D_i\), where \(D_i \subset \mathbb {R}^n\) is a hyperplane, say the orthogonal of a norm one vector \(w^i \in \mathbb {R}^n\) for all \(i = 1,\dots ,r\), with \(r \le m\). We can take \(P_i(x) = 0\) for \(i=r+1,\dots ,m\) and \(P_i(x)\) as the distance between x and the \((n-1)\)-dimensional subspace \(D_i\), i.e., \(P_i(x) = \Vert x-\Pi _{D_i} (x)\Vert \) for \(i=1,\dots ,r\), where \(\Pi _{D_i} :\mathbb {R}^n \rightarrow \mathbb {R}^n\) is the orthogonal projector onto \(D_i\), i.e., \(\Pi _{D_i}(x) : = x - \langle w^i,x \rangle w^i\), so

$$\begin{aligned} \begin{array}{ll} P_i(x) = |\langle w^i,x \rangle |, &{} \quad i=1,\dots ,r,\\ P_i(x) = 0, &{} \quad i= r+1,\dots ,m. \end{array} \end{aligned}$$

In particular, if we have just one hyperplane, say the orthogonal of a norm one vector \(w\in \mathbb {R}^n\), we can take \(P(x) = |\langle w,x \rangle |e\), with \(e = (1,\dots ,1)^\top \).

In general, if \(D_i\) is a closed convex nonempty set of \( \mathbb {R}^n\) for \(i =1,\dots ,r\), with \(r \le m\), we can take the following external penalty function for D:

$$\begin{aligned} \begin{array}{ll} P_i(x) = \text{ dist }(x, D_i), &{} \quad i=1,\dots ,r, \\ P_i(x) = 0, &{} \quad i= r+1,\dots ,m, \end{array} \end{aligned}$$

where, as usual, dist\((x, D_i) : = \min _{y \in D_i} \Vert x-y\Vert \).

In the case that \(r > m\), we can take \(P(x)^\top : = \big (\sum _{i=1}^m\)dist\((x,D_i),0,\dots ,0\big )\) or, simply, \(P(x): = \sum _{i=1}^m\)dist\((x,D_i)e\), where, once again, e stands for the m-vector of ones.

3 Auxiliary functions

In this section, we introduce some auxiliary functions necessary for the vector-valued external penalty method. First, we consider a generalization of the notion of increasing function for the vector-valued case, and then, we present some examples.

3.1 Assumptions on auxiliary functions

We now present different notions of monotonic vector-valued functions. We point out that these concepts are not new. For a more general setting, we can find them in Jahn (2003, Chapter 5, Definition 5.1 (b), (c)) as well as in Luc (1989, Chapter 1, Definition 4.1), where even more general cases are considered.

Definition 2

A function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is called strictly (monotonically) increasing or weakly increasing (w-increasing) if, for all \(u, v \in \mathbb {R}^m\),

$$\begin{aligned} u < v \quad \Rightarrow \quad \Phi (u) < \Phi (v), \end{aligned}$$

and \(\Phi \) is called strongly (monotonically) increasing (s-increasing) if for all \(u, v \in \mathbb {R}^m\),

$$\begin{aligned} u \le v\quad \text{ and } \quad u_{j_0} < v_{j_0} \, \text{ for } \text{ at } \text{ least } \text{ one } j_0 \quad \Rightarrow \quad \Phi (u) < \Phi (v). \end{aligned}$$

Clearly, an s-increasing function is also w-increasing. A simple example of a w-increasing function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\), which is not s-increasing, is given by \(\Phi (u) : = \max _{i=1,\dots ,m} \{u_i\}\). A general example of w-increasing function which is not s-increasing is the following: \(\Phi (u): = \sum _{i=1}^m \phi _i(u_i)\), where \(\phi _i :\mathbb {R}\rightarrow \mathbb {R}\) is nondecreasing for \(i=1,\dots ,m\) and there exists at least one index \(i_0\) such that \(\phi _{i_0}\) is increasing and not all \(\phi _i\)’s are increasing functions. Moreover, an example of an s-increasing (bounded) function is \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\), given by \(\Phi (u): = \sum _{i=1}^m \arctan (u_i)\). More generally, \(\Phi (u): = \sum _{i=1}^m \phi _i(u_i)\), where \(\phi _i :\mathbb {R}\rightarrow \mathbb {R}\) is increasing for \(i=1,\dots ,m\), is an s-increasing function (not necessarily bounded).

It is easy to see that if the function \(\Phi \) is w-increasing and continuous, then we have

$$\begin{aligned} u \le v \Rightarrow \Phi (u) \le \Phi (v) \quad \text{ for } \text{ any } u, v \in \mathbb {R}^m. \end{aligned}$$
(3)

For the sake of completeness, we state a simple result for monotonic functions which relates optimality for the scalar-valued problem \(\min _{x \in D} \Phi (f(x))\) to weak Pareto and Pareto optimality for the vector-valued Problem (1).

Lemma 1

  1. 1.

    If \(\Phi \) is a w-increasing function and \(x^* \in \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\), then \(x^*\) is a weak Pareto optimal solution for Problem (1).

  2. 2.

    If \(\Phi \) is an s-increasing function and \(x^* \in \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\), then \(x^*\) is a Pareto optimal solution for Problem (1).

Proof

The results follow from Jahn (2003, Chapter 5, Lemmas 5.14 and 5.24). \(\square \)

For the definition of the method, we need to introduce two classes of auxiliary functions. We begin with those which are continuous, monotonic and overestimate the canonical projections.

Definition 3

A continuous w-increasing function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is of weak type (or w-type) if it satisfies the following property:

$$\begin{aligned} u_j \le \Phi (u) \quad \quad \text{ for } \text{ any } u \in \mathbb {R}^m \text{ and } \text{ all } j=1,\dots ,m.\qquad (\mathcal {P}) \end{aligned}$$

Similarly, a continuous s-increasing function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is of strong type (or s-type) if it satisfies \((\mathcal {P})\).

Clearly, an s-type function is necessarily of w-type, because s-increasing functions are w-increasing. Let us now introduce the other class of auxiliary functions which will also be used in the method we are about to present.

Definition 4

A function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) continuous, w-increasing and subadditive (i.e., \(\Phi (u+v) \le \Phi (u) + \Phi (v)\) for all \(u, v \in \mathbb {R}^m\)) is weakly subadditive (or w-subadditive), if given a sequence \(\{u^k\} \subset \mathbb {R}^m_+\) and \(M>0\) there exists \(T>0\) such that

$$\begin{aligned} \Phi (u^k) \le M \quad \text{ for } \text{ all } \ k \Longrightarrow \,\, \Vert u^k\Vert \le T \ \text{ for } \text{ all } \ k.\qquad (\mathcal {Q}) \end{aligned}$$

If \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is continuous, s-increasing, subadditive, and satisfies property \((\mathcal {Q})\), it is a strongly subadditive (or s-subadditive) function.

Property \((\mathcal {Q})\) tells us that there is no unbounded set in \(\mathbb {R}^m_+\) with bounded image via \(\Phi \). Note also that, even though we now ask a stronger property on the auxiliary function \(\Phi \), namely, its subadditivity, we relax property \((\mathcal {P})\). Indeed, property \((\mathcal {Q})\) is weaker than \((\mathcal {P})\), since any function \(\Phi \) that satisfies property \((\mathcal {P})\) also verifies \((\mathcal {Q})\) with \(T = \sqrt{m}M\).

3.2 Examples of auxiliary functions

First, let us exhibit some general examples of w-type and s-type functions.

Example 3

Let \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) be defined by

$$\begin{aligned} \Phi (u) := \max _{i=1,\dots ,m} \{u_i + a_i\} + b, \end{aligned}$$

with \(a_i + b \ge 0\) for all \(i = 1,\dots ,m\). Then, \(\Phi \) is a w-type function but not an s-type one. In particular, \(\Phi (u) := \max _{i=1,\dots ,m} \{u_i\}\) is also a w-type function but not an s-type one.

Example 4

Let \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) be given by

$$\begin{aligned} \Phi (u) := \max _{i=1,\dots ,m} \big \{ \phi _i(u) + \xi _i(u) \big \} + \zeta (u), \end{aligned}$$

where \(\phi _i, \xi _i, \zeta :\mathbb {R}^m \rightarrow \mathbb {R}\) are continuous functions, with \( \xi _i(u) + \zeta (u) \ge 0\), such that \(\phi _i\) satisfies \((\mathcal {P})\) for all i and \(\Phi (u)\) is w-increasing (e.g., if \(\phi _i, \xi _i\) and \(\zeta \) are w-increasing for all i). Then \(\Phi \) is a w-type function, but not necessarily an s-type function. Note that, if one of these three functions is s-increasing for all \(i=1,\dots ,m\) and the other two are w-increasing, then \(\Phi \) is an s-type function.

Example 5

Let \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) be a function defined by

$$\begin{aligned} \Phi (u) := \psi _1(u_1) + \dots + \psi _m(u_m), \end{aligned}$$

where \(\psi _i :\mathbb {R}\rightarrow \mathbb {R}_+\) is a continuous increasing function satisfying property \((\mathcal {P})\) for all \(i = 1,\dots ,m\), i.e., \(t \le \psi _i(t)\) for all \(t \in \mathbb {R}\) and all i (e.g., \(\psi _i(u_i) := a_i \exp (u_i)\), with \(a_i \ge 1\)). Then, \(\Phi \) is an s-type function.

Example 6

If \(\Phi _1,\dots ,\Phi _r :\mathbb {R}^m \rightarrow \mathbb {R}\) are w-type (s-type) functions and \(\alpha _1,\dots ,\alpha _r\) are nonnegative scalars adding up 1, then

$$\begin{aligned} \Phi : = \sum _{i=1}^r \alpha _i \Phi _i \end{aligned}$$

is also of weak type (strong type). Clearly, linear combinations of nonnegative w-type (s-type) functions with all scalars greater than or equal to 1 are also of the same type.

Example 7

Assume that \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is a w-type function. Let \(\omega \in \mathbb {R}^m\) and define \(\hat{\omega }:= \max _{i=1,\dots ,m} |\omega _i|\) or \(\hat{\omega }:= a \exp (|\omega _1|+\dots +|\omega _m|)\), where \(a \ge 1\). Then, the function \(\Phi _\omega :\mathbb {R}^m \rightarrow \mathbb {R}\), defined by

$$\begin{aligned} \Phi _\omega (u) := \Phi (u + \omega ) + \hat{\omega }\end{aligned}$$

is of w-type. Moreover, if \(\Phi \) is an s-type function, then \(\Phi _\omega \) is of s-type.

Example 8

Let \(\Psi :\mathbb {R}^m \rightarrow \mathbb {R}\) be an s-type function, \(\Phi , \Upsilon :\mathbb {R}^m \rightarrow \mathbb {R}\) w-type functions, and \(\mathrm {e} := (1,\dots ,1)^\top \in \mathbb {R}^m\). Then,

$$\begin{aligned} u \mapsto \Phi \big ( \Upsilon (u) \mathrm {e} \big ) \quad \quad \text{ and }\quad \quad u \mapsto \Phi \big ( \Psi (u) \mathrm {e} \big ) \end{aligned}$$

are w-type and s-type functions, respectively. If \(\psi :\mathbb {R}\rightarrow \mathbb {R}\) is a continuous increasing function satisfying property \((\mathcal {P})\), then the compositions of functions

$$\begin{aligned} \psi \circ \Psi \quad \quad \text{ and } \quad \quad \psi \circ \Phi \end{aligned}$$

are s-type and w-type functions, respectively.

Besides the max-type ones, the previous examples of w-type auxiliary functions are basically compositions of inner products with continuous increasing scalar-valued functions. Next example shows that these are not all the possibilities.

Example 9

Let \(\Psi :\mathbb {R}^m \rightarrow \mathbb {R}\) be defined by

$$\begin{aligned} \Psi (u)= \Phi (u)\Upsilon (u), \end{aligned}$$

where \(\Phi \) is as in Example 5 and \(\Upsilon :\mathbb {R}^m \rightarrow \mathbb {R}\) is a continuous w-increasing function such that \(\Upsilon (u) \ge 1\) for all u (e.g., \(\Upsilon (u)= \sum _{i=1}^m \gamma _i(u_i)\), with \(\gamma _i(t)=\arctan (t)+\pi \) for all i). Then, \(\Psi \) is an s-type function.

Now, we exhibit some examples of weakly and strongly subadditive auxiliary functions.

Example 10

Consider \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\), defined by \(\Phi (u) = \max _{i=1,\dots ,m} \{u_i\}\). Clearly, \(\Phi \) is subadditive and, as we saw in Example 3, it is of w-type, so \(\Phi \) is a w-subadditive function. And, since \(\Phi \) is not s-increasing, it is not an s-subadditive function.

Example 11

Take \(a \in \mathbb {R}^m\) such that \(a > 0\) and let the linear function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) be defined by

$$\begin{aligned} \Phi (u) = \sum _{i=1}^m a_i u_i. \end{aligned}$$

For any \(u \in \mathbb {R}^m\) such that \(u \ge 0\), we have that \(\Phi (u) \ge \kappa \sum _{i=1}^m u_i = \kappa \sum _{i=1}^m |u_i| =: \kappa \Vert u\Vert _1\), where \(\kappa : = \min _{i=1,\dots ,m} a_i > 0\). So, since \(\Vert u\Vert \le \Vert u\Vert _1\), property \((\mathcal {Q})\) holds with \(T = M/\kappa \). As \(\Phi \) is s-increasing and satisfies \((\mathcal {Q})\), it is an s-subadditive function. Observe that \(\Phi \) is not an s-type function, since \(\Phi (\frac{1}{a_1},\frac{-1}{a_2},0,\dots ,0) = 0 \ngeq \frac{1}{a_1}\), and so property \((\mathcal {P})\) does not hold.

Example 12

Let \(\Psi , \Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) be w-subadditive (s-subadditive) functions such that \(\Phi (x) \ge 0\) for all \(x \in \mathbb {R}^m\). Then the composition \(x \mapsto \Psi (\Phi (x)e)\), where e is the m-vector of ones, is also w-subadditive (s-subadditive).

As we will see in the next section, any of these w-type (s-type), w-subadditive (s-subadditive) functions can be employed in the algorithm. Properties \((\mathcal {P})\) and \((\mathcal {Q})\) will be used in the convergence proofs. Nevertheless, in practical terms, we do not always need them. Indeed, let us recall Example 7, and note that the minimizers of \(x \mapsto \Phi _\omega \big (f(x) + \rho _kP(x)\big ) = \Phi \big (f(x) + \rho _kP(x) + \omega \big ) + \hat{\omega }\) are the same as those of \(x \mapsto \Psi _\omega \big (f(x) + \rho _kP(x)\big )\), where \(\Psi _\omega : = \Phi _\omega - \hat{\omega }\) is a continuous w-increasing function which does not satisfy property \((\mathcal {P})\) and it is not subadditive, so it is neither of w-type nor w-subadditive. But \(\Psi _\omega \) can be used to generate the same sequence of iterates (and, therefore, with the same convergence properties) as the one produced by the auxiliary function \(\Phi _\omega \).

4 An external penalty-type method for multiobjective optimization

In this section, we define the multicriteria external penalty method (MEPM) for solving Problem (1). First, let us consider the weak version of the method. Let \(\mathbb {R}_{++}\) be the set of positive real numbers. Take \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\) a vector external penalty for \(D \subseteq \mathbb {R}^n\), \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) a w-type or w-subadditive function and \(\{\rho _k\} \subset \mathbb {R}_{++}\) a divergent sequence, such that \(\rho _{k+1} > \rho _k\) for all k. The method is iterative and generates a sequence \(\{x^k\} \subset \mathbb {R}^n\) by

$$\begin{aligned} x^k \in \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi \Big (f(x) + \rho _k P(x)\Big ), \quad k = 1,2,\dots . \end{aligned}$$
(4)

The strong version of the method is formally identical to the weak one, but with \(\Phi \) as an s-type or s-subadditive function.

Let us make some comments and observations concerning both versions of the method.

  1. 1.

    Note that, as in the classical real-valued external penalty method, some hypotheses are needed in order to guarantee the existence of \(x^k\) for all k. For instance, we can apply the method for any functions f, \(\Phi \) and P such that \(x \mapsto \Phi \big (f(x) + \rho _k P(x)\big )\) is coercive in \(\mathbb {R}^n\) for all k. In this case, by continuity, we have \(\mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi \big (f(x) + \rho _k P(x)\big ) \ne \emptyset \).

  2. 2.

    In both versions of MEPM, a necessary condition for the well-definedness of the whole sequence \(\{x^k\}\) is the following:

    $$\begin{aligned} -\infty < \inf _{x \in D} \Phi \big (f(x)\big ). \end{aligned}$$
    (5)

    Indeed, for any \(\tilde{x} \in D\), from the fact that P is a penalty function for D, we get

    $$\begin{aligned} -\infty < \Phi \big (f(x^k) + \rho _k P(x^k)\big )= & {} \min _{x \in \mathbb {R}^n} \Phi \big (f(x) + \rho _k P(x)\big )\\\le & {} \Phi \big (f(\tilde{x}) + \rho _k P(\tilde{x})\big ) = \Phi \big (f(\tilde{x})\big ). \end{aligned}$$

    Since \(\tilde{x}\) is an arbitrary element of D, condition (5) follows.

  3. 3.

    This method inherits some features and drawbacks of its real-valued counterpart. Firstly, we mention that it does not have any kind of “memory”, i.e., the former iterate is not used to compute the current one; nevertheless, in order to obtain \(x^k\), it seems reasonable to initialize the subroutine used to (approximately) solve subproblem (4) with the former iterate \(x^{k-1}\). Secondly, the benefit of applying MEPM is to change a constrained (vector-valued) problem by a sequence of unconstrained (scalar-valued) ones with continuous objective functions.

  4. 4.

    When \(m=1\), taking \(\Phi (u) = \max _{i=1,\dots ,m} \{u_i\}\), we retrieve the classical (scalar-valued) external penalty method. Actually, in the scalar case, any auxiliary function \(\Phi :\mathbb {R}\rightarrow \mathbb {R}\) is increasing, so iteration (4) generates the same sequence as Zangwill’s method.

    One may ask why it is worth to use this method instead of others. We observe that Problem (1) may have a very poor structure: f is just required to be continuous. Whenever \(\Phi (u) = \max _{i=1,\dots ,m} \{u_i\}\) and \(P_j = \hat{P}\), where \(\hat{P} :\mathbb {R}^n \rightarrow \mathbb {R}\) for all j, MEPM is just the scalar-valued external penalty method applied to the minimization of the continuous function \(x \mapsto \max _{i=1,\dots ,m} \{f_i(x)\}\) in D, with \(\hat{P}\) as a penalty function.

  5. 5.

    One may ask why we should use this method instead of applying the classical scalar external penalty method to problem \(\min _{x \in D} \Phi \big (f(x)\big )\). An answer to this question is that, MEPM has more degrees of freedom: We do not always need to choose a max-type auxiliary function \(\Phi \) nor do we have to use a penalty of the type \(P = (\hat{P},\dots ,\hat{P})\), where \(\hat{P}\) is a scalar-valued penalty for D.

  6. 6.

    As mentioned in the introduction, MEPM shares the following feature with other extensions of classical scalar methods to the vectorial setting: Under certain regularity conditions, all iterates are implicitly obtained by the application of the corresponding real-valued algorithm to a certain weighted scalarization. In order to see this assertion, assume that f and \(P = (\hat{P},\dots ,\hat{P})^\top \) are \(\mathbb {R}^m_+\)-convex (i.e., \(f_j\) and \(\hat{P}\) are convex for all j) differentiable functions, with \(\hat{P}\) a scalar-valued penalty for D. Let \(\Phi \) be defined by \(\Phi (u) = \max _{i=1,\dots ,m} \{ u_i \}\) for all \(u \in \mathbb {R}^m\). Next, reformulate \(\min _{x \in \mathbb {R}^n} \max _{i=1,\dots ,m} \{f_i(x) + \rho _k P_i(x)\}\) as the minimization of t subject to \(f_j(x) + \rho _k P_j(x) \le t,\; j= 1,\dots ,m\), a smooth problem in \((x,t) \in \mathbb {R}^n \times \mathbb {R}\). Now, from the first-order optimality condition of the above reformulation, we see that

    $$\begin{aligned} x^k \in \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \langle \lambda ^k, f(x) \rangle + \rho _k \hat{P}(x), \end{aligned}$$

    which means that \(\{x^k\}\) can be obtained via the application of the classical (scalar) external penalty method to the real-valued function \(x \mapsto \langle \lambda ^k, f(x) \rangle \), a weighted scalarization of the vector-valued objective f, with weighting vector given by \(\lambda ^{k} \in \mathbb {R}^m_+\), using the scalar-valued penalty \(\hat{P}(x)\) for D and \(\{\rho _k\}\) as the parameter sequence. Of course we do not know, a priori, the nonnegative weights \(\lambda _1^{k},\dots ,\lambda _m^{k}\), which add up one.

The next proposition establishes a simple condition under which both versions of MEPM converge to optimal points in its very first iteration.

Proposition 1

Consider MEPM implemented with an external penalty function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\), a sequence of parameters \(\{\rho _k\} \subset \mathbb {R}_{++}\) and a w-type or w-subadditive function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\). If we have

$$\begin{aligned} \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big ) = \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi \big (f(x)\big ), \end{aligned}$$

then the method converges in one iteration to a weak Pareto solution of Problem (1). If \(\Phi \) is an s-type or s-subadditive function, MEPM converges in a single iteration to a Pareto optimum of (1).

Proof

If \(x^* \in \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\), then \(P(x^*) = 0\) and so, combining the optimality of \(x^*\) in \(\mathbb {R}^n\), the facts that \(P(x) \ge 0\) for all \(x \in \mathbb {R}^n\) and \(\rho _1 >0\) with (3), we get

$$\begin{aligned} \Phi \big (f(x^*) + \rho _1 P(x^*)\big ) = \Phi \big (f(x^*)\big ) \le \Phi \big (f(x)\big ) \le \Phi \big (f(x) + \rho _1 P(x)\big ) \quad \text{ for } \text{ all } x \in \mathbb {R}^n. \end{aligned}$$

Therefore, \(x^* \in \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi \big (f(x) + \rho _1 P(x)\big )\), and so \(\Phi \big (f(x^1) + \rho _1 P(x^1)\big ) = \Phi \big (f(x^*) + \rho _1 P(x^*)\big ) = \Phi \big (f(x^*)\big )\). Hence, once again by (3), we obtain

$$\begin{aligned} \Phi \big (f(x^1)\big ) \le \Phi \big (f(x^1) + \rho _1 P(x^1)\big ) = \Phi \big (f(x^*)\big ) \le \Phi \big (f(x)\big ) \quad \text{ for } \text{ all } x \in \mathbb {R}^n. \end{aligned}$$

Whence, \(x^1 \in \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi \big (f(x)\big ) = \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\). The result then follows from Lemma 1. The strong result is also a consequence of Lemma 1. \(\square \)

Let us show a very simple application of the above proposition.

Example 13

Consider \(n = 1\), \(m=2\), \(D = [-1,+\infty )\) and \(f :\mathbb {R}\rightarrow \mathbb {R}^2\), given by \(f(t) = (t,-\ell t)^\top \), where \(\ell = 1,2,\dots \) In order to apply MEPM to this problem, we take \(\Phi (u) = \max _{i=1,\dots ,m} \{u_i\}\), a penalty function P, and \(\{\rho _k\}\) an increasingly divergent sequence of positive real numbers. It is easy to see that the condition required in Proposition 1 holds, and so MEPM converges in its first iteration to a Pareto point. We point out that the weighting method with scalarization parameter \(\alpha \in [0,1]\) applied to this problem fails when \(\alpha \in [0,\ell /(1+\ell )]\), which means that for large \(\ell \), it fails in a large set of weights.

We now show some elementary properties of sequences generated by MEPM which will be needed in the sequel. Since s-increasing functions are w-increasing, we just prove them for the weak version of the method.

Lemma 2

Let \(\tilde{x} \in D \subseteq \mathbb {R}^n\) and \(\{x^k\} \subset \mathbb {R}^n\) be a sequence generated by the weak version of MEPM implemented with a penalty function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\), a parameters sequence \(\{\rho _k\} \subset \mathbb {R}_{++}\) and a w-type or w-subadditive function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\). Then, for all \(k=1,2,\dots \), the following statements hold.

  1. 1.

    For any auxiliary function \(\Phi \), we have

    $$\begin{aligned} \Phi \big (f(x^k) + \rho _k P(x^k)\big )\le & {} \Phi \big (f(x^{k+1}) + \rho _{k+1} P(x^{k+1})\big )\\\le & {} \Phi \big (f(\tilde{x})\big ) \qquad \text{ for } \text{ all } \,\,\, j =1,\dots ,m. \end{aligned}$$

    If \(\Phi \) is of w-type, then we also have

    $$\begin{aligned} f_j(x^k) + \rho _k P_j(x^k) \le \Phi \big (f(x^k) + \rho _k P(x^k)\big ) \quad \text{ for } \text{ all } \,\,\, j =1,\dots ,m. \end{aligned}$$
    (6)
  2. 2.

    For any auxiliary function \(\Phi \), we have \(\Phi \big (f(x^k)\big ) \le \Phi \big (f(\tilde{x})\big )\). If \(\Phi \) is of w-type, then we also have that \(f_j(x^k) \le \Phi \big (f(\tilde{x})\big )\) for all \(j =1,\dots ,m\).

  3. 3.

    For any auxiliary function \(\Phi \), there exists \(\eta \in \mathbb {R}\) such that

    $$\begin{aligned} \lim _{k \rightarrow \infty } \Phi \big (f(x^k) + \rho _k P(x^k)\big ) = \eta . \end{aligned}$$

Proof

  1. 1.

    Using the properties of \(\Phi \) and the definitions of \(x^k\) and \(x^{k+1}\), we obtain

    $$\begin{aligned} \Phi \big (f(x^k) + \rho _k P(x^k)\big )&= \min _{x \in \mathbb {R}^n} \Phi \big (f(x) + \rho _k P(x)\big ) \\&\le \Phi \big (f(x^{k+1}) + \rho _k P(x^{k+1})\big ) \\&\le \Phi \big (f(x^{k+1}) + \rho _{k+1} P(x^{k+1})\big ) \\&= \min _{x \in \mathbb {R}^n} \Phi \big (f(x) + \rho _{k+1} P(x)\big ) \\&\le \Phi \big (f(\tilde{x}) + \rho _{k+1} P(\tilde{x})\big ) \\&= \Phi \big (f(\tilde{x})\big ), \end{aligned}$$

    where the first equality follows from (4), the second inequality is a consequence of the weak monotonic behavior of \(\Phi \) combined with the facts that \(0 < \rho _k < \rho _{k+1}\) for all k and \(P(x) \ge 0\) for all \(x \in \mathbb {R}^n\), and the last equality follows from the facts that P is a vector external penalty function for D and \(\tilde{x} \in D\). If \(\Phi \) is of w-type, then (6) follows immediately from property \((\mathcal {P})\).

  2. 2.

    From the proof of item 1, we have that \(\Phi \big (f(x^k) + \rho _k P(x^k)\big ) \le \Phi \big (f(\tilde{x})\big )\), so, from the facts that \(P \ge 0\), \(\rho _k > 0\) and \(\Phi \) is a w-increasing function, it follows that

    $$\begin{aligned} \Phi \big (f(x^k)\big ) \le \Phi \big (f(\tilde{x})\big ). \end{aligned}$$

    So, if \(\Phi \) is of w-type, from property \((\mathcal {P})\), \(f_j(x^k) \le \Phi (f(\tilde{x}))\) for all \(j = 1,\dots ,m\).

  3. 3.

    By item 1, \(\big \{\Phi \big (f(x^k) + \rho _k P(x^k)\big )\big \}\) is a nondecreasing bounded real numbers sequence, so, as \(k \rightarrow \infty \), it converges to some \(\eta \in \mathbb {R}\).

\(\square \)

As in the classical real-valued method, from item 1 of the above lemma, we observe that in the vector-valued case we have

$$\begin{aligned} \Phi \big (f(x^k) + \rho _k P(x^k)\big ) \le \Phi \big (f(x^{k+1}) + \rho _{k+1} P(x^{k+1})\big ) \quad k = 1,2,\dots \end{aligned}$$

for any sequence generated by MEPM implemented with a w or s-type, w or s-subadditive auxiliary function \(\Phi \). We also know that in the scalar case, we have the following facts: The real sequences \(\{P(x^k)\}\) and \(\{f(x^k) \}\) are nonincreasing and nondecreasing, respectively. However, in the general case (\(m\ge 2\)), we may not have such properties. When we choose an arbitrary auxiliary function \(\Phi \), even though we can not ensure that the functional values sequence is nondecreasing, we can, at least, say that they converge from below to the optimal values. Indeed, from item 2 of the last lemma, we have

$$\begin{aligned} \sup _{k=1,2,\dots } \Phi \big (f(x^k)\big )\le & {} \inf _{x\in D} \Phi (f(x)) \,\,\,\,\, \mathrm{and } \, \,\,\,\, \sup _{k=1,2,\dots } f_j(x^k)\\\le & {} \inf _{x\in D} \Phi (f(x)) \quad \mathrm{for \,\, all } \,\,\, j=1,\dots ,m. \end{aligned}$$

5 Convergence analysis

Let us now study the convergence properties of sequences produced by both versions of MEPM. We begin with an extension of a classical result for real-valued optimization which establishes that accumulation points, if any, of a sequence generated by the external penalty method are optima of the original constrained minimization problem. We also show that, as in the real-valued method, whenever the sequence \(\{x^k\}\) has infinitely many iterates, all of them are infeasible points.

Theorem 1

Let \(\{x^k\} \subset \mathbb {R}^n\) be a sequence generated by MEPM implemented with a penalty function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\), a parameters sequence \(\{\rho _k\} \subset \mathbb {R}_{++}\) and a w-type or w-subadditive function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\).

  1. 1.

    The point \(x^{k_0}\) belongs to D for some \(k_0\) if and only if \(x^{k_0}\) is a weak Pareto solution for Problem (1).

  2. 2.

    If \(\bar{x}\) is an accumulation point of \(\{x^k\}\), then \(\bar{x}\) is a weak Pareto optimum for the Problem (1).

If MEPM is implemented with an s-type or an s-subadditive function \(\Phi \), then items 1 and 2 hold with Pareto optimal solutions instead of weak Pareto optima.

Proof

  1. 1.

    If \(x^{k_0}\) is a weak Pareto solution of (1), in particular, \(x^{k_0}\) belongs to D.

    Conversely, let us now assume that \(x^{k_0} \in D\). By item 2 of Lemma 2, we have

    $$\begin{aligned} \Phi \big (f(x^{k_0})\big ) \le \inf _{x \in D} \Phi \big (f(x)\big ), \end{aligned}$$

    and so \(x^{k_0}\) is a minimizer of \(\Phi \big (f(x)\big )\) in D. Then, from Lemma 1, \(x^{k_0}\) is a weak Pareto solution for Problem (1).

  2. 2.

    Assume now that MEPM is implemented with a w-type function \(\Phi \). Let K be an infinite subset of \(\{1,2,\dots \}\) such that \(\lim _{K \ni k \rightarrow \infty } x^k = \bar{x}\). Since f is a continuous function, we have that

    $$\begin{aligned} \lim _{K \ni k \rightarrow \infty } f_j(x^k) = f_j(\bar{x}) \quad \text{ for } \text{ all } j = 1,\dots ,m. \end{aligned}$$
    (7)

    Therefore,

    $$\begin{aligned} |f_j(x^k)| \le M, \quad \text{ for } \text{ some } M > 0 \text{ for } \text{ all } j=1,2,\dots ,m \text{ and } \text{ all } k \in K. \end{aligned}$$
    (8)

    On the other hand, by item 1 of Lemma 2,

    $$\begin{aligned} f_j(x^k) + \rho _k P_j(x^k) \le \Phi \big (f(\tilde{x})\big ) =: \tilde{f} \quad \text{ for } \text{ all } j =1,2,\dots m \text{ and } \text{ any } \tilde{x} \in D. \end{aligned}$$

    So, from (8) and the above inequality, for all \( j = 1,2,\dots ,m\) and all \( k \in K\), we get

    $$\begin{aligned} 0 \le \rho _k P_j(x^k) = [f_j(x^k) + \rho _k P_j(x^k)] - f_j(x^k) \le \tilde{f} + M. \end{aligned}$$
    (9)

    Therefore, since \(\rho _k \rightarrow +\infty \) and \(P_j(x^k) \ge 0\) for all j and \(k \in K\), necessarily,

    $$\begin{aligned} \lim _{K \ni k \rightarrow \infty } P_j(x^k) = 0 \quad \text{ for } \text{ all } j=1,\dots ,m \end{aligned}$$

    and, since all \(P_j\) are continuous,

    $$\begin{aligned} P_j(\bar{x}) = 0 \quad \text{ for } \text{ all } j=1,\dots ,m. \end{aligned}$$

    Hence, we have

    $$\begin{aligned} P(\bar{x}) = 0, \end{aligned}$$

    and so \(\bar{x} \in D\).

    Let us call \(\hat{v} : = \inf _{x \in D} \Phi \big (f(x)\big )\), which is a real number, in view of (5). Applying item 2 of Lemma 2, we get

    $$\begin{aligned} \Phi \big (f(x^k)\big ) \le \hat{v} \quad \text{ for } \text{ all } k = 1,2,\dots \end{aligned}$$

    Letting \(K \ni k \rightarrow \infty \) in the above inequality, we obtain

    $$\begin{aligned} \Phi \big (f(\bar{x})\big ) \le \hat{v}. \end{aligned}$$

    Using the fact that \(\bar{x} \in D\), we conclude that \(\bar{x} \in \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\), and the result follows from item 1 of Lemma 1.

    Now let us suppose that MEPM is implemented with a w-subadditive function \(\Phi \). Let K be an infinite subset of \(\{1,2,\dots \}\) such that \(\lim _{K \ni k \rightarrow \infty } x^k = \bar{x}\). From item 1 of Lemma 2, \( \Phi \big (f(x^k) + \rho _kP(x^k)\big ) \le \inf _{x \in D} \Phi (f(x)\big )) =: \hat{v}\) for all k and so, due to the facts that \(\rho _1 \le \rho _k\) for all k, P is nonnegative and \(\Phi \) is w-increasing, we have

    $$\begin{aligned} \Phi \big (f(x^k) + \rho _1P(x^k)\big ) \le \hat{v} \quad \text{ for } \text{ all } \,\, k \in K. \end{aligned}$$
    (10)

    Whence, since \(\Phi \) is subadditive and continuous, f is also continuous and \(\lim _{K \ni k \rightarrow \infty } x^k = \bar{x}\), we get

    $$\begin{aligned} \limsup _{K \ni k \rightarrow \infty } \Phi \big (\rho _k P(x^k)\big )&\le \limsup _{K \ni k \rightarrow \infty } \Big [\Phi \big (f(x^k) + \rho _k P(x^k)\big ) + \Phi \big (-f(x^k)\big )\Big ] \nonumber \\&\le \hat{v} + \Phi \big (-f(\bar{x})\big ). \end{aligned}$$
    (11)

    So, from the properties of \(\limsup \), there exists \(K_1 \subset K\) and \(k_0 \in K_1\) such that

    $$\begin{aligned} \Phi \big (\rho _k P(x^k)\big ) < |\hat{v} + \Phi \big (-f(\bar{x})\big )| + 1 \quad \text{ for } \text{ all } K_1 \ni k \ge k_0. \end{aligned}$$

    Since \(\rho _kP(x^k) \ge 0\) for all k and \(\Phi \) satisfies condition \((\mathcal {Q})\), from the above inequality we see that \(\Vert \rho _kP(x^k)\Vert \le T\) for some \(T > 0\) and all \(K_1 \ni k \ge k_0\). Using the fact that \(\{\rho _k\}\) is a divergent sequence of positive real numbers, it follows that

    $$\begin{aligned} \limsup _{K_1 \ni k \rightarrow \infty } P(x^k) = 0. \end{aligned}$$

    Whence, the continuity of P yields

    $$\begin{aligned} P(\bar{x}) = 0, \end{aligned}$$

    which means that \(\bar{x} \in D\). Therefore, since \(\Phi \) is w-increasing and \(\rho _kP(x^k) \ge 0\) for all k, letting \(K \ni k \rightarrow \infty \) in (10), we obtain

    $$\begin{aligned} \Phi \big (f(\bar{x})\big ) \le \hat{v}. \end{aligned}$$

    Since \(\hat{v} = \inf _{x \in D} \Phi \big (f(x)\big )\) and \(\bar{x} \in D\), we conclude that \(\bar{x} \in \mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\) and, once again, the result follows from item 1 of Lemma 1.

The proof for the strong version of MEPM is formally identical to the one we just saw, but using item 2 instead of item 1 of Lemma 1. \(\square \)

The following result establishes that both versions of MEPM are convergent whenever \(\Phi \circ f\) has a strict minimizer in D, which happens, for instance, if this composition is strictly convex and coercive.

Corollary 1

Assume that \(\{x^k\} \subset \mathbb {R}^n\), as in the first part of Theorem 1, has an accumulation point and that the auxiliary function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is such that \(x \mapsto \Phi \big (f(x)\big )\) has a strict minimizer \(\bar{x}\) in D. Then \(x^k \rightarrow \bar{x}\) and \(\bar{x}\) is a weak Pareto solution for Problem (1). If \(\Phi \) is an s-type or an s-subadditive function, the generated sequence converges to \(\bar{x}\), a Pareto optimal solution of (1).

Proof

From Theorem 1 and the strict optimality of \(\bar{x} \in D\), any subsequential limit of \(\{x^k\}\) is equal to \(\bar{x}\) and so \(x^k \rightarrow \bar{x}\), a weak Pareto optimal solution for (1). The strong convergence result also follows from Theorem 1. \(\square \)

5.1 Local implementation of MEPM and its convergence

We now sketch a practical implementation of both, strong and weak, versions of MEPM. Take P, an external penalty function for D, a w-type or w-subadditive auxiliary function \(\Phi \), an increasingly divergent sequence of positive penalty parameters \(\{\rho _k\}\) and \(V \subset \mathbb {R}^n\), a compact set with nonempty interior (e.g., a closed ball). Let

$$\begin{aligned} x^k \in \mathop {\mathrm {argmin}}_{x \in V} \Phi \big (f(x) + \rho _k P(x)\big ), \quad k = 1,2,\dots \end{aligned}$$
(12)

The strong version of this local implementation of the method is formally identical to the weak one, but with a strong type or strongly subadditive auxiliary function \(\Phi \).

Note that, by Weierstrass theorem, \(\mathop {\mathrm {argmin}}_{x \in V} \Phi \big (f(x) + \rho _k P(x)\big ) \ne \emptyset \), and so \(x^k\) always exists for any k. Therefore, differently from the global MEPM, its local variant does not require any additional assumption on the penalized objective functions and/or on the constraint set in order to be well-defined.

Assuming the existence of an isolated local minimizer of \(\Phi \circ f\) within D, we will prove that both the weak and the strong versions of the local method are fully convergent to weak Pareto and Pareto optimal solutions, respectively.

Theorem 2

Suppose that \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\) is a w-type or a w-subadditive auxiliary function and that \(\bar{x} \in D \subseteq \mathbb {R}^n\) is a strict local minimizer of \(\Phi \circ f\) in D, say \(\bar{x} = \mathop {\mathrm {argmin}}_{x \in U \cap D} \Phi \big (f(x)\big )\), for some vicinity \(U \subset \mathbb {R}^n\) of \(\bar{x}\). Let \(\{x^k\} \subset \mathbb {R}^n\) be a sequence generated by local MEPM implemented with a penalty function \(P :\mathbb {R}^n \rightarrow \mathbb {R}^m_+\), a parameters sequence \(\{\rho _k\} \subset \mathbb {R}_{++}\), the auxiliary function \(\Phi \) and \(V \subset U\), a compact vicinity of \(\bar{x}\). Then \(x^k \rightarrow \bar{x}\) and \(\bar{x}\) is a weak Pareto optimal solution for Problem (1). If \(\Phi \) is an s-type or an s-subadditive auxiliary function, then \(x^k \rightarrow \bar{x}\) and \(\bar{x}\) is a Pareto optimal solution for Problem (1).

Proof

The sequence \(\{x^k\} \subset V\) has an accumulation point \(\tilde{x} \in V\) because V is compact. As in item 2 of Theorem 1, we see that \(\tilde{x} \in \mathop {\mathrm {argmin}}_{x \in D \cap V} \Phi (f(x))\). Since \(\bar{x}\) is the unique minimizer of \(\Phi \circ f\) within \(D \cap U\) and \(V \subset U\), we conclude that \(\tilde{x} = \bar{x}\); so the unique accumulation point of \(\{x^k\}\) is \(\bar{x}\) and the proof is complete. \(\square \)

5.2 Comparison with White’s method

The method proposed by White (1984) for solving Problem (1) applies only to the case in which the constraint set D is compact. Moreover, at each iteration, it requires the computation of a Pareto solution of the penalized subproblem. For D compact, \(P = (\hat{P},\dots ,\hat{P}) \in \mathbb {R}^m\), with \(\hat{P}\) a scalar-valued external penalty function for D, by item 2 of Lemma 1, the strong version of MEPM falls in the scheme proposed by White (1984). From a computational point of view, penalty functions as those proposed by White, i.e., \(\hat{P}e\), with \(e^\top =(1,\dots ,1)\), seem to be a natural choice. However, in the case that some components \(f_i\) of the objective function are in quite different scales, we may compensate this drawback by using appropriate choices for \(P_i\). For general problems, our proposal is another possibility of a penalty approach.

So, MEPM extends White’s procedure, since it admits larger classes of constraint sets and of penalty functions. Moreover, in practical terms, iteration (4) has the advantage to give a precise definition of the whole sequence. In order to guarantee convergence to a Pareto optimal solution, White’s procedure requires convexity of the constraint set, as well as of the objective and the penalty functions. While the convergence results for MEPM are based on the existence of accumulation points of the generated sequence, as well as on the existence of (local) strict minimizers of the auxiliary function, both conditions required in Zangwill’s procedure for scalar optimization.

6 Other families of auxiliary functions

Up to now, we studied weak and strong versions of global and local MEPM, i.e., implemented with a w-type or a w-subadditive, s-type or s-subadditive auxiliary function \(\Phi \). Let us now analyze why we need these kind of functions. The reason is quite simple: As we know, by Lemma 1, the monotonic behavior of the auxiliary function guarantees that the minimizers of \(\Phi \circ f\) within D are weak Pareto or Pareto optimal points for Problem (1), while \((\mathcal {P})\) jointly with the continuity of \(\Phi \) or \((\mathcal {Q})\) together with the subadditivity and continuity of \(\Phi \) allow us to prove that \(\{\rho _kP(x^k)\}_{k \in K}\) has a bounded subsequence [see (9) and (11)]. So, proceeding as in the proofs of Theorems 1 and 2, we can see that the accumulation points of \(\{x^k\}\) are feasible and then establish that these limits are in \(\mathop {\mathrm {argmin}}_{x \in D} \Phi \big (f(x)\big )\), which means that they are (weak or strong) optima for Problem (1).

We just used those four kinds of auxiliary functions (of weak or strong type, weakly or strongly subadditive) due to the fact that they seem very natural and, mainly, because they allowed us to exhibit lots of very simple examples that can be used in practical implementations of the method. Moreover, we did not want to complicate the convergence result statements (and, perhaps their proofs too) by using larger classes of auxiliary functions.

In order to give an example of application of the method, let us now show other (parameterized) auxiliary function classes that can be used in the method and such that, with them, our convergence results also hold. First, let us study a class of w-increasing auxiliary functions that are neither of w-type nor of w-subadditive. Take \(\omega \in \mathbb {R}^m\), \(\alpha \in \mathbb {R}_{++}\), a w-type function \(\Phi :\mathbb {R}^m \rightarrow \mathbb {R}\), and define

$$\begin{aligned} \Phi _{\alpha ,\omega } :\mathbb {R}^m \rightarrow \mathbb {R}, \quad \Phi _{\alpha ,\omega }(u) : = \Phi (\alpha u + \omega ). \end{aligned}$$

Clearly, for any \((\alpha ,\omega )^\top \in \mathbb {R}_{++} \times \mathbb {R}^m_+\), the function \(\Phi _{\alpha ,\omega }\) is continuous, w-increasing and satisfies property \((\mathcal {Q})\). But, in view of the presence of \(\alpha \), it does not necessarily satisfy property \((\mathcal {P})\). So, in general, these functions are not of w-type and, in principle, they are not subadditive. Nevertheless, even for \(\omega \in \mathbb {R}^m \setminus \mathbb {R}^m_+\), the function \(\Phi _{\alpha ,\omega }\) can be used in the method and the produced sequence will have the same convergence properties as those generated with w-type or w-subadditive auxiliary functions. Indeed, all we need to show is that \(\rho _k P(x^k) \le M\) for all \(k \in K\), an infinite subset of \(\{1,2,\dots \}\), and some \(M> 0\), where \(x^k \in \mathop {\mathrm {argmin}}_{x \in \mathbb {R}^n} \Phi _{\alpha ,\omega }(f(x) + \rho _k P(x))\) for all \(k=1,2,\dots \) As in item 2 of Theorem 1, assume that \(\lim _{K \ni k \rightarrow \infty } x^k = \bar{x}\). Since \(\Phi \) satisfies property \((\mathcal {P})\), for any \(\tilde{x} \in D\), all \(j=1,\dots ,m\) and all \(k \in K\), we have

$$\begin{aligned} \alpha \big (f_j(x^k) + \rho _kP_j(x^k)\big ) + \omega _j\le & {} \Phi \big (\alpha [f(x^k) + \rho _kP(x^k)] + \omega \big ) \\= & {} \Phi _{\alpha ,\omega }\big (f(x^k) + \rho _kP(x^k)\big ) \\\le & {} \Phi _{\alpha ,\omega }(f(\tilde{x})). \end{aligned}$$

Hence,

$$\begin{aligned} 0 \le \limsup _{k \in K} \rho _kP_j(x^k) \le \frac{1}{\alpha } \big [\Phi _{\alpha ,\omega }\big (f(\tilde{x})\big ) - \omega _j\big ] - f_j(\bar{x}) \end{aligned}$$

for all \(j=1,\dots ,m\) and any \(\tilde{x} \in D\). Whence, \(\{\rho _kP(x^k)\}_{k \in K}\) has a bounded subsequence. The convergence and the optimality of the subsequential limit point follow as in the proof of Theorem 1, item 2.

Now we analyze a particular case of those new auxiliary function families. Observe that for \(\Phi (u) : = \max _{i=1,\dots ,m}\{u_i\}\), which is of w-type and w-subadditive, the function given by \(\Phi _{\alpha ,\omega } (u) : = \max _{i=1,\dots ,m}\{\alpha u_i + \omega _i\}\) is continuous, w-increasing and satisfies property \((\mathcal {Q})\), for any \(\alpha \in \mathbb {R}_{++}\) and \(\omega \in \mathbb {R}^m\), but, due to the presence of \(\omega \in \mathbb {R}^m\), it is not subadditive and, for \(\omega \in \mathbb {R}^m\setminus \mathbb {R}^m_+\), in principle it is not of w-type. However, as we will now show, this kind of functions can be used in the local version of MEPM and the results of Theorem 2 are still valid. Let \(x^k \in \mathop {\mathrm {argmin}}_{x \in V} \Phi _{\alpha ,\omega }(f(x) + \rho _k P(x))\), where \(V \subset \mathbb {R}^n\) is a compact vicinity of a point \(\bar{x} \in \mathbb {R}^n\). Assume that a subsequence \(\{x^k\}_{k \in K}\) is such that \(x^k \rightarrow \bar{x}\) as \(K \ni k \rightarrow \infty \). As we know, all we need to show is that \(\{\rho _kP(x^k)\}_{k \in K}\) has a bounded subsequence. For any \(\tilde{x} \in D\cap V\), we have

$$\begin{aligned} \limsup _{K \ni k \rightarrow \infty } \Phi _{\alpha ,\omega }\big (\rho _k P(x^k)\big )&\le \limsup _{K \ni k \rightarrow \infty } \Big [\Phi _{\alpha ,\omega } \big (f(x^k) + \rho _k P(x^k)\big ) + \Phi \big (-\alpha f(x^k)\big )\Big ] \nonumber \\&\le \Phi _{\alpha ,\omega }\big (f(\tilde{x})\big ) + \Phi \big (-\alpha f(\bar{x})\big ), \end{aligned}$$
(13)

where we used the subadditivity of \(\Phi \). We conclude that all results of Theorem 2 hold for \(\{x^k\}\) generated with \(\Phi _{\alpha ,\omega }\) as auxiliary function.

Note that, in the very same way, we can see that any auxiliary function of the form \(u \mapsto \Phi _{\alpha ,\omega }(u) : = \Phi (\alpha u + \omega )\), where \(\Phi \) is a w-increasing continuous function such that it satisfies \(\Phi _{\alpha ,\omega }(u+v) \le \Phi _{\alpha ,\omega }(u) + \Gamma (v)\), for some continuous \(\Gamma :\mathbb {R}^m \rightarrow \mathbb {R}\), will also allow us to prove that \(\{\Phi _{\alpha ,\omega }(\rho _kP(x^k))\}_{k \in K}\) has a bounded subsequence, where, of course, \(x^k \in \mathop {\mathrm {argmin}}_{x \in V}\Phi _{\alpha ,\omega }(f(x) + \rho _kP(x))\). Actually, for \(\Phi (u) = \max _{i=1,\dots ,m}\{u_i\}\), we have \(\Gamma = \Phi \) [see (13)].

So, w-increasing continuous auxiliary functions, different from those used in (4) and in (12), can also be used, by means of iterations like (4) or (12), in order to produce sequences \(\{x^k\}\) which enjoy good convergence properties. (And, clearly, we could have shown similar examples with strong type or strongly subadditive auxiliary functions.) We just need them to be monotonic and such that, in the global case, the iterates exist and, in both global and local cases, they allow us to prove that \(\{\rho _kP(x^k)\}\) has a bounded subsequence.

Let us finish this section with an application of the method for sequences produced with auxiliary functions as those we have just examined. Actually, we will exhibit a very simple instance of Problem (1), for which there exists a family of auxiliary functions \(\{\Psi _\omega \}_{\omega \in \Omega }\), \(\Omega \subset \mathbb {R}^m\), such that, using local MEPM implemented with these functions, any penalty function and any parameter sequence, by varying \(\omega \in \Omega \), we can retrieve the whole optimal set.

Example 14

Consider \(n=1\), \(m=2\), \(D =[-2,+\infty )\) and \(f :\mathbb {R}\rightarrow \mathbb {R}^2\) defined by \(f_1(x) := x^2 +1\), \(f_2(x) := x^2-2x+1\). In Fig. 1a, we see that, in the interval [0, 1], whenever \(f_2\) decreases, \(f_1\) increases and this happens only in this interval, that is to say, [0, 1] is the weak Pareto optimal set.

Fig. 1
figure 1

Objective and auxiliary functions from Example 14 a Functions \(f_1\) and \(f_2\). b Function \(\Psi _{(0,0)}\). c Function \(\Psi _{(-1,0)}\). d Function \(\Psi _{(-2,0)}\)

We will apply MEPM with \(\Psi _\omega (u) : = \Phi _{1,\omega }(u) = \max _{i=1,2}\{u_i + \omega _i\}\), \(\omega \in \mathbb {R}^2\). First of all, note that, in Fig. 1a, we can also verify that \(f_1(x) \le f_2(x)\) if and only if \(x \le 0\), so \(x \mapsto \Psi _0(f(x))\) has a strict minimizer at \(x=0\). Let us investigate \(\mathop {\mathrm {argmin}}_{x \in D} \Psi _\omega (f(x))\) not just for \(\omega = 0\). It is easy to see that \(x \mapsto \Psi _\omega (f(x))\) has a unique minimizer in [0, 1] at the sole point \(\bar{x} \in [0,1]\) where \(f_1(\bar{x}) +\omega _1 = f_2(\bar{x}) + \omega _2\), that is to say \(\omega _1 = \omega _2 -2\bar{x}\). Taking \(\omega ^\top : = (-2\bar{x},0)\), we have \(\mathop {\mathrm {argmin}}_{x \in D} \Phi _\omega (f(x)) = \{\bar{x}\}\), since, \(f_1(x) +\omega _1 \le f_2(x) +\omega _2\) if and only if \(x \le \bar{x}\) (see Fig. 1b–d for \(\bar{x} = 0, 0.5, 1\), respectively).

Following the proof of Theorem 2, for the auxiliary function \(\Psi _\omega \), with \(\omega ^\top : = (-2\bar{x},0)\), where \(\bar{x} \in [0,1]\), any vector external penalty function P for D, as well as any \(V \subset \mathbb {R}\) compact vicinity of \(\bar{x}\) and any parameter sequence \(\{\rho _k\}\), the generated sequence \(\{x^k\}\) converges to \(\bar{x}\), a weak Pareto optimal point for f in D. This means that, by varying the parameter \(\omega \in \Omega : = [-2,0] \times \{0\}\), the family of auxiliary functions \(\{\Psi _\omega \}_{\omega \in \Omega }\) allows us to retrieve the whole weak Pareto optimal set of the original problem.

Of course, this is an ad hoc example, but it may be useful in order to investigate when do we have auxiliary functions families such that by varying the parameters, we can obtain the whole optimal frontier by means of the corresponding sequences.

7 Final remarks

For the multicriteria optimization setting, we developed an extension of Zangwill’s scalar-valued method. As expected, the multiobjective convergence results are not stronger than those for the classical method. Actually, when restricted to single objective optimization, they are both the same.

An important subject to be examined, which will be left for a future research, is when we can obtain the whole Pareto (weak Pareto) frontier by using MEPM. Even though we do not intend to deepen in this matter here, let us make some comments on it. Example 14 suggests that Theorem 2 can shed some light on the subject: The fact that MEPM only converges to minimizers of the scalar representations induced by the auxiliary functions—which could be considered a drawback of the method—can be very useful in order to study necessary and/or sufficient conditions for the existence of auxiliary functions families \(\{\Phi _\omega \}_{\omega \in \Omega }\) with such property.

Another matter worth to be studied is the following. Recall that the main hypothesis in Theorem 2 is that \(\Phi \circ f\) has a strict local minimizer within D. This convergence result may also be true under a weaker condition, namely whenever \(S : = \mathop {\mathrm {argmin}}_{x \in U \cap D} \Phi (f(x))\) is an isolated set of \(V^* : = \{ x \in \mathbb {R}^n :\Phi (f(x)) = v^*\}\), where \(v^* = \inf _{x \in D} \Phi (f(x))\), which means that there exists a closed set \(G \subset \mathbb {R}^n\) such that \(\emptyset \ne S \subset \) int(G) and \(G \setminus S\subset \mathbb {R}^n \setminus V^*\), where int(G) stands for the interior of G. Finally, it would be also interesting to study the generalization of MEPM to the vector optimization case.