1 Introduction

When optimizing in engineering it is common the presence of additional challenges. Functions evaluated by conducting large numerical simulations are often associated with the absence of analytical expressions for derivatives, making numerical approximations impractical due to the large computational cost involved. There are also cases where the function to be optimized is nonsmooth, which prevents the use of derivative-based techniques. Several times there is more than one objective in this optimization process, generally conflicting, which motivates the use of multiobjective derivative-free algorithms.

Solution techniques depend on the moment where the decision maker establishes preferences relating the different objectives to optimize [22]. Methods could be classified as having an a prior articulation of preferences, when objectives are aggregated into a single objective function which is then optimized, generating a single point as solution to the multiobjective optimization problem. Changes in preferences will cause changes in the aggregating function and the optimization procedure will need to be reapplied. Additionally, it is well known that this procedure can fail in capturing nonconvex parts of the Pareto front [12].

Another approach consists in a posteriori articulation of preferences. The algorithms belonging to this class attempt to capture the whole Pareto front of the problem, never establishing preferences among the several objectives. We will focus on this last class of methods, providing to the decision maker a set of alternative solutions, such that selecting one instead of another will always compromise the quality of at least one of the objectives (while improving, at least, another). Typical approaches include evolutionary algorithms, like is the case of NSGA-II [15], CMA-ES [17], or MOPSO [7]. These are random algorithms, for which general convergence proofs have not yet been established. Different classes, presenting well-established convergence analysis, include trust-region interpolation based methods, until now only developed for biobjective optimization with algorithm BOTR [25], and direct search methods, like is the case of DFMO [21], BIMADS [4] or MultiMADS [5]. A review on some classes of multiobjective derivative-free optimization methods can be found in [9].

Direct MultiSearch (DMS) [11] is a well-established direct search method, based on a posteriori articulation of preferences. In [11] convergence results were derived, stating that at least one limit point of the sequence of iterates generated by DMS lies in a stationary form of a Pareto front. Intensive numerical testing showed its competitiveness with other state-of-art algorithms, like is the case of NSGA-II [15] or BIMADS [4]. As result of its good performance, DMS continues to be used for benchmark new derivative-free multiobjective optimization algorithms [21].

Nevertheless, as mentioned in [14], the optimization of multimodal functions raises issues regarding the convergence to the true Pareto front of a problem (the global Pareto front). In fact, this question is not specific to multiobjective optimization, since global optimization is an active field of research for single objective optimization, both in presence or absence of derivatives.

Recently, in the context of single objective derivative-free optimization, the algorithm GLODS [10] was proposed as a strategy for identifying global minimizers. GLODS is a directional direct search method [8] equipped with a clever multistart strategy. The different searches initialized with the multistart strategy are not all conducted until the end. Rather, when points generated by different searches start to be close to each other, GLODS will merge searches, giving up on the ones that are not promising. This procedure showed to be competitive when compared with state-of-art solvers in global derivative-free optimization. Moreover, numerical experiments showed its capability of identifying all the local (and global) optimums of the problem, a distinguishing feature from the remaining global derivative-free optimization solvers.

The goal of the present work is to address the computation of the global Pareto front of the problem:

$$\begin{aligned} \begin{aligned} \min \quad&F(x)\equiv (f_1(x),\ldots ,f_m(x))\\ \text{ s.t. } \quad&x\in \varOmega \subset \mathbb {R}^n, \end{aligned} \end{aligned}$$
(1)

where \(F:\mathbb {R}^n\rightarrow \mathbb {R}^m\cup \{(+\infty ,\ldots ,+\infty )\},n,m\ge 1\) represents a real-extended multi-valued function, and the compact set \(\varOmega \subset \mathbb {R}^n\) denotes the feasible region of the problem. Additionally, we would like to identify local Pareto fronts of the problem, when they exist. We assume that derivatives are not available for use, neither can they be numerically approximated.

Constraints will be addressed through an extreme barrier approach, meaning that only feasible points will be evaluated. In a context of expensive function evaluation, this avoids unnecessary computations. We also note that when the objective function represents a real application, the evaluation of infeasible points could be impossible, corresponding to points with no physical meaning. The extreme barrier approach will be implemented through the use of a barrier function:

$$\begin{aligned} F_\varOmega (x)=\left\{ \begin{array}{ll}F(x), &{} \quad \text {if }\; x \in \varOmega ,\\ (+\infty ,\ldots ,+\infty ), &{}\quad \text {otherwise.} \end{array}\right. \end{aligned}$$
(2)

Using the concept of Pareto dominance, we will generalize the approach followed by GLODS to multiobjective directional direct search, conferring a global flavor to DMS. In Sect. 2 we will describe the proposed algorithmic structure. The theoretical results associated with the new algorithm will be presented in Sect.  3, and Sect. 4 illustrates the corresponding global features. We end in Sect. 5 with some conclusions.

2 MultiGLODS: global and local multiobjective optimization using direct search

In multiobjective optimization, where the objective function presents several components, the concept of dominance is used for comparing pairs of points. We say that point x dominates point y, and represent it by \(x\prec y\), if the following condition is satisfied:

$$\begin{aligned} x\prec y \Leftrightarrow F_{\varOmega }(y)-F_{\varOmega }(x)\in \mathbb {R}^m_+{\setminus }\{0\}. \end{aligned}$$
(3)

It is now possible to define what is a solution for problem 1.

Definition 1

A point \(x_*\in \varOmega \) is said to be a global Pareto minimizer of F in \(\varOmega \) if \(\not \exists y\in \varOmega \) such that \(y\prec x_*\). If there exists a neighborhood \(\mathcal {N}(x_*)\) of \(x_*\) such that the previous property holds in \(\varOmega \cap \mathcal {N}(x_*)\), then \(x_*\) is called a local Pareto minimizer of F.

In general, a problem will have several points satisfying Definition 1. The set of all these points will define a Pareto front for the problem (global or local, depending on the type of condition satisfied).

Like any other directional direct search method, each iteration of MultiGLODS is organized in a search step and a poll step. The convergence properties of the algorithm are a direct consequence of the poll step. In the context of global optimization, the search step is mainly responsible for spreading the search in the feasible region, ensuring that promising subregions will be located. The quality of the computed Pareto front (as being local or global to the problem) is, in general, a consequence of the search step.

The algorithm keeps a list of feasible points, \(L_k\), which could be updated both at the search and the poll steps. Points are stored in this list as tuples \((x;\alpha _x;r_x;i_x)\), where x represents the point to be stored, \(\alpha _x\) a step size, \(r_x\) a comparison radius and \(i_x\) a binary indicator, which takes the value 1 if the point is active and 0, otherwise.

Similarly to GLODS [10], a point is added to the list as active or inactive. A point is added as active if it is far from all the points already stored, meaning that it is located in a part of the feasible region not yet explored (see lines 1 and 2 in Algorithm 2.2). Radius \(r_y\) is used in point comparisons, as a measure of closeness (see lines 1 and 3 in Algorithm 2.2). Figure 1 illustrates this situation for a biojective optimization problem. On the left we have a plot in the variables space, where the painted box represents the feasible region. On the right we have a plot in the objective functions space. Point \(L_1\) is already in the list. Point \(P_1\) will be added to the list as an active point, since it is not comparable with \(L_1\) (the comparison radius of \(L_1\) corresponds to the blue circle). This is a distinguishing feature of MultiGLODS, when compared with DMS. In DMS, since point \(L_1\) dominates \(P_1\), point \(P_1\) would not be added to the list, regardless of being located in a different part of the feasible region.

Fig. 1
figure 1

Adding active points to the list in MultiGLODS: points far from all the points already stored. (Color figure online)

Alternatively, points close to points already stored, which dominate active points are also added to the list. If the point dominates an active point and it is not dominated by any other close point already in the list, it will be added to the list as active (see lines 5 and 6 in Algorithm 2.2). This situation is represented in Fig. 2. In this case, point \(P_2\) is comparable with \(L_1\), an active point already in the list, and dominates it. This means that \(P_2\) will be added to the list as an active point and \(L_1\) will remain in the list, but it will change its status to inactive. An active point already in the list could change its status to inactive, if it is dominated by a new point added to the list (see line 4 in Algorithm 2.2). An inactive point will never change its status to active.

Fig. 2
figure 2

Adding active points to the list in MultiGLODS: points close to some points already stored, dominating active points and being nondominated. (Color figure online)

A point can also be added to the list as inactive when it dominates an active point, but it is dominated by another point (see line 6 in Algorithm 2.2). In Fig. 3 the blue point is comparable both with points \(A_5\) and \(B_2\), dominates the active point \(B_2\), but it is dominated by \(A_5\). This means that the blue point will be added to the list as an inactive point and point \(B_2\) will change its status to inactive. This procedure motivates the definition of merging iteration. The sequences of points indexed by \(A_i\) and \(B_i\) have been merged.

Fig. 3
figure 3

Adding inactive points to the list in MultiGLODS: merging searches. (Color figure online)

The possibility of adding points to the list that are dominated by other points already in the list, since they are located in a different part of the feasible region is one of the key features that distinguishes MultiGLODS from DMS. Adding inactive points to the list, or keeping inactive points in the list when they change the corresponding status, is another one. This will allow the algorithm to track which parts of the feasible region have already been explored, avoiding unnecessary initializations of new searches, unless that there is a clear evidence of an improvement in the corresponding region (when a new point that dominates an active point is found).

The relevance of the active indicator \(i_x\) and the step size parameter \(\alpha _x\) respects to the poll step of the algorithm. Each time that this step is performed, one active point x will be selected as a poll center and a local search around it will be conducted. This local search corresponds to the test of directions belonging to a positive spanning set D [13], scaled by the step size parameter \(\alpha _x\). As it will be clear in the convergence analysis (see Sect. 3), the set of poll directions is not required to positively span \(\mathbb {R}^n\) (although for coherency with the smooth case we will write it so in the algorithm below), and it is not necessarily drawn from a finite set of directions. Opportunistic or complete approaches can be used in the polling procedure. In the first case, the procedure will stop once that a new active point is added to the list. In the latter, all poll directions will be tested.

Different strategies can be used for generating new points at the search step. Random sampling [26], Latin hypercube sampling [23], Sobol sequences [19], Halton sequences [19] or \(2^n\)-Centers [10] are some possibilities. The major requisite is using an asymptotically dense sequence in a compact set. Additionally, points could be required to belong to an implicit mesh, depending on the globalization strategy considered (see Sect. 3).

As result of the two steps described, each iteration is classified as successful, unsuccessful, or merging. Successful iterations correspond to at least one active point added to the list. In this case, the corresponding step size parameter could be maintained or increased. Unsuccessful iterations occur when the list has no changes. In this case, it is mandatory that the step size corresponding to the poll center would be decreased. Adding only inactive points to the list corresponds to a merging iteration. In this case step size parameters are kept unchanged.

Comparison radius should always allow the comparison between the poll points and the corresponding poll center. Thus, if after a successful iteration, as a consequence of updating the step size parameter this property does not hold, the comparison radius will be increased to an adequate value.

At each iteration, if the search step fails in adding a new active point to the list of points, the poll step needs to be performed. The search step does not need to be executed at every iteration (when it is not conducted, it will be considered as a failure). Different strategies could be implemented to decide if the search step should be performed or not. Some possibilities could consider the frequency of unsuccessful iterations or the size of the step size parameters for active points.

A detailed description of the method proposed can be found in Algorithm 2.1.

figure a

Using (3), let us denote by Dom(x) the subset of \(\mathbb {R}^m\) corresponding to the images of the set of points dominated by x. Algorithm 2.2 corresponds to the procedure used both in the search and poll steps to add new points to the list.

figure b

Function \(\bar{\rho }(.)\) is related to the type of globalization strategy considered in the algorithm (see Sect. 3). If globalization is based on the use of integer lattices, it represents the constant zero function. When globalization results from requiring sufficient decrease for accepting new points, \(\bar{\rho }(.)\equiv \rho (.)\) will be a forcing function \(\rho :(0,+\infty )\rightarrow (0,+\infty )\), i.e., a continuous and nondecreasing function, satisfying \(\rho (t)/t \rightarrow 0\) when \(t \downarrow 0\) (see [20]). Typical examples of forcing functions are \(\rho (t) = t^{1+a}\), for \(a > 0\).

Considering integer lattices as globalization strategy allows an additional situation where points are added to the list as active. This occurs when the new point is comparable with other points already stored in the list (independently of being active or inactive points) and it is not dominated by any of them (see line 6 in Algorithm 2.2).

When adding points to the list, the corresponding step size parameters and comparison radius need to be defined. Similarly to GLODS [10], if the point was not comparable with any of the points already in the list, meaning that it belongs to a part of the feasible region not yet explored, the algorithm uses the initialization values (line numbered as 2 in Algorithm 2.2). Otherwise, if the point was generated at the poll step, both parameters will be equal to the ones of the poll center (line numbered as 8 in Algorithm 2.2). When the new point x was generated in the search step, it inherits the parameters of the point y, presenting the largest step size, comparable with it, for which \(F(y)\in Dom(x)+\bar{\rho }(\alpha _y)\) (line numbered as 7 in Algorithm 2.2).

3 Convergence analysis

The convergence analysis of MultiGLODS relies on the properties of the poll step, which begins with the choice of a poll center from the active points stored in the list. Merging iterations in MultiGLODS correspond to situations where no new active points are added to the list and some stored active points change their status to inactive. Thus, it is crucial to ensure that at each iteration of MultiGLODS there is always an active point in the list that could be selected as poll center. This is a major difference in what respects to DMS, where all the points kept in the list are candidates to poll centers.

Proposition 1

At the end of each iteration of Algorithm 2.1, all elements of the set of nondominated points in the list are active.

Proof

Suppose not. Let z be one inactive point of the set of nondominated points in the list, computed at the end of the current iteration. Two situations need to be analyzed:

  • z could have been added as inactive to the list (and to the set of nondominated points), during the current iteration;

  • z was an active point already in the list (and in the set of nondominated points), but has changed its status to inactive during the current iteration.

In the latter situation, there should have been a point x such that \(F(z)\in Dom(x)+\bar{\rho }(\alpha _z)\subseteq Dom(x)\). Since z was active, x will be added to the list of points, contradicting the fact of z being nondominated.

In the former situation, there should have been y already in the list, such that \(F(z)\in Dom(y)\), again contradicting the fact of z being nondominated. \(\square \)

Similarly to any other directional direct search method, the convergence analysis of MultiGLODS starts by establishing the existence of a subsequence of step size parameters that converges to zero. This will allow us to ensure the existence of at least one limit point for the sequence of iterates generated by the algorithm. The stationarity properties of this limit point will be further analyzed in Sect. 3.4.

Two globalization strategies can be adopted in order to ensure the existence of a subsequence of step size parameters with the above mentioned property. The first considers that all points generated by the algorithm lie in an implicit mesh (corresponding to an integer lattice) and will be analyzed in Sect. 3.1. In this case \(\bar{\rho }(.)\equiv 0\). Another possibility is to exchange the freedom in the type of points generated by the algorithm by a more strict condition when accepting new points. In this case \(\bar{\rho }(.)\) corresponds to a forcing function (see Sect. 3.2).

For establishing the results, we will need the following two assumptions.

Assumption 3.1

The set \(\varOmega \subset \mathbb {R}^n\) is compact.

Assumption 3.2

The function F is lower bounded in \(\varOmega \subset \mathbb {R}^n\).

3.1 Using integer lattices

The type of positive spanning sets that can be used by the algorithm depend on the level of smoothness present in the objective function. If the function is continuously differentiable, a finite set of directions with appropriate integrality requirements will suffice [2, 20].

Assumption 3.3

The set \(\mathcal {D}=D\) of positive spanning sets is finite and the elements of D are of the form \(G\bar{z}_j\), \(j=1,\ldots ,|D|\), where \(G \in \mathbb {R}^{n \times n}\) is a nonsingular matrix and each \(\bar{z}_j\) is a vector in \(\mathbb {Z}^n\).

In the presence of nonsmooth functions, the integrality requirements should be respected but additionally the union of the sets of directions (after normalization) considered through the iterations needs to be asymptotically dense in the unit sphere [3].

Assumption 3.4

Let D represent a finite set of positive spanning sets satisfying Assumption 3.3.

The set \(\mathcal {D}\) is so that the elements \(d_k \in D_k \in \mathcal {D}\) satisfy the following conditions:

  1. 1.

    \(d_k\) is a nonnegative integer combination of the columns of D.

  2. 2.

    The distance between \(x_k\) and the point \(x_k+\alpha _k d_k\) tends to zero if and only if \(\alpha _k\) does:

    $$\begin{aligned} \lim _{k\in K} \alpha _k \Vert d_k \Vert \; = \; 0 \; \Longleftrightarrow \;\lim _{k\in K}\alpha _k \; = \; 0, \end{aligned}$$

    for any infinite subsequence K.

  3. 3.

    The limits of all convergent subsequences of \(\bar{D}_k = \{ d_k/\Vert d_k\Vert : \; d_k\in D_k\}\) are positive spanning sets for \(\mathbb {R}^n\).

The third requirement above is included as part of the Mesh Adaptive Direct Search (MADS) original presentation [3], for coherency with the smooth case, but it is not used in the convergence analysis for nonsmooth objective functions.

Integrality requirements also impose conditions in the update of the step size parameter.

Assumption 3.5

Let \(\tau > 1\) be a rational number and \(m^{max} \ge 0\) and \(m^{min} \le -1\) integers. If the iteration is successful, then the step size parameter is maintained or increased by considering \(\alpha _{new}=\tau ^{m^{+}} \alpha \), with \(m^{+}\in \{0,\ldots ,m^{max}\}\). If the iteration is unsuccessful, then the step size parameter is decreased by setting \(\alpha _{new}=\tau ^{m^{-}}\alpha \), with \( m^{-} \in \{m^{min},\ldots ,-1\}\).

Notice that the step size update strategy proposed in Algorithm 2.1 complies to the one of Assumption 3.5 by setting \(\beta _1 = \tau ^{m^{min}}\), \(\beta _2 = \tau ^{-1}\), and \(\gamma = \tau ^{m^{max}}\).

Additionally, the points generated by the search step need to lie in the implicit mesh considered at each iteration by the algorithm (trivially all the poll points generated by the algorithm will also lie in this implicit mesh).

Assumption 3.6

At iteration k, the search step in Algorithm 2.1 only evaluates points in

$$\begin{aligned} M_k \; = \; \bigcup _{x\in E_k}\{x+\alpha _k D z:z\in \mathbb {N}_0^{|D|}\}, \end{aligned}$$

where \(E_k\) represents the set of all points evaluated by the algorithm previously to iteration k.

We are now in a position to establish the first result, regarding the sequence of step size parameters generated by MultiGLODS. Since both successful and merging iterations correspond to adding new feasible points to the list, the arguments consider are quite similar to the ones used for stating the same type of result in DMS.

Theorem 1

Let Assumption 3.1 hold. Algorithm 2.1 under one of the Assumptions 3.3 or 3.4 combined with Assumptions 3.53.6 and \(\bar{\rho }(\cdot ) \equiv 0\) generates a sequence of iterates satisfying

$$\begin{aligned} \liminf _{k \rightarrow + \infty } \alpha _k \; = \; 0. \end{aligned}$$

Proof

Let us assume that there is \(\alpha _{*}\) such that \(\alpha _k> \alpha _{*} > 0, \forall k\). Assumptions 3.3 or 3.4 combined with Assumptions 3.53.6 ensure that all points generated by Algorithm 2.1 lie in an integer lattice (see  [2, 3]). The intersection of a compact set with an integer lattice is finite. Since \(\varOmega \) is compact, there is only a finite number of different points that could be generated by the algorithm. Successful or merging iterations correspond to at least one new feasible point added to the list. Once a point is added to this list it will always remain on it (eventually changing its status to inactive). Thus, the number of successful and merging iterations must be finite, and consequently there is an infinite number of unsuccessful iterations and a finite number of points in the list. The step size at unsuccessful iterations is reduced by at least \(\beta _2 \in ] 0,1 [\), which contradicts the existence of a lower bound for the step size parameter. \(\square \)

3.2 Imposing sufficient decrease

When the globalization strategy is based on imposing a sufficient decrease condition, there is more flexibility in the update of the step size parameter and in the type of directions that could be considered by the algorithm. In fact, the only requirement is now expressed in Assumption 3.7, and it is trivially satisfied since we are considering bounded sets of directions.

Assumption 3.7

The distance between \(x_k\) and the point \(x_k+\alpha _k d_k\) tends to zero if and only if \(\alpha _k\) does:

$$\begin{aligned} \lim _{k\in K} \alpha _k \Vert d_k \Vert \; = \; 0 \; \Longleftrightarrow \;\lim _{k\in K}\alpha _k \; = \; 0, \end{aligned}$$

for all \(d_k \in D_k\) and for any infinite subsequence K.

A similar result to the one of Theorem 1 can now be derived. Differently from DMS, we start by establishing that after some iterations, a new active point added to the list has to be compared with some points already stored in the list.

Theorem 2

Let Assumptions 3.13.2 hold. Algorithm 2.1, when \(\bar{\rho }(\cdot )\) is a forcing function and Assumption 3.7 holds, generates a sequence of iterates satisfying

$$\begin{aligned} \liminf _{k \rightarrow + \infty } \alpha _k \; = \; 0. \end{aligned}$$

Proof

Assume that there is \(\alpha _{*}\) such that \(\alpha _k> \alpha _{*} > 0, \forall k\). Let us start by showing that there is an infinite number of successful iterations.

Suppose not. Active points are added to the list only at successful iterations. Thus, the number of active points in the list must be finite. At each merging iteration at least one of the active points in the list changes its status to inactive. Thus, the number of merging iterations is also finite.

Consequently, the number of unsuccessful iterations (where no points are added to the list) needs to be infinite. At each unsuccessful iteration the step size parameter of the corresponding active poll center is reduced by at least \(\beta _2 \in ] 0,1 [\), which contradicts the existence of the lower bound \(\alpha _{*} > 0\) for the step size.

The previous arguments allow us to conclude that there is an infinite number of successful iterations. Let x represent a new feasible active point added to the list \(L_k\), at iteration k. Then, \(\displaystyle \min _{y\in L_k}(\Vert x-y\Vert -r_y)>0\) or there should have been an active point \(y \in L_k\) such that \(\Vert x-y\Vert \le r_y\) and \(F(y)\in Dom(x)+\bar{\rho }(\alpha _y)\).

Let us assume that for each successful iteration k there is always a new active point, \(x_{k+1}\in \varOmega \), to be added to \(L_k\), such that \( \displaystyle \min _{y\in L_k}(\Vert x_{k+1}-y\Vert -r_y)>0\). Thus,

$$\begin{aligned} \forall y \in L_k, \Vert x_{k+1}-y\Vert>r_y \ge d_{min}\alpha _y> d_{min}\alpha _{*}>0. \end{aligned}$$

Once a point is added to the point list it will always remain in it (eventually being inactive). Thus, at each successful iteration the measure of

$$\begin{aligned} \varOmega {\setminus }\bigcup _{k\in S} B(x_{k+1};d_{min}\alpha _{*}) \end{aligned}$$

decreases by a strictly positive quantity. Here S represents the set of indexes corresponding to successful iterations and \(B(x_{k+1};d_{min}\alpha _{*})\) the closed ball of radius \(d_{min}\alpha _{*}\), centered at \(x_{k+1}\). Since \(\varOmega \) is compact there should have been \(k^{*}\in \mathbb {N}\) such that for each successful iteration \(k \ge k^{*}\) and for each new feasible active point x added to \(L_k\), there is an active point \(y \in L_k\), which changes the corresponding status to inactive, with \( \Vert x-y\Vert \le r_y\) and \(F(y)\in Dom(x)+\bar{\rho }(\alpha _y)\). Points are only added to the list at successful and merging iterations. For each point x added to \(L_k\) at a merging iteration, there should also have been an active point \(y \in L_k\), which changes the corresponding status to inactive, and such that \( \Vert x-y\Vert \le r_y\) with \(F(y)\in Dom(x)+\bar{\rho }(\alpha _y)\). Thus, each time that a point is added to the list, it will add a hypercube of side no smaller than \(\rho _*\) to the dominance region of an active point already in the list, which changes the corresponding status to inactive. After a finite number of iterations, a hypercube of side no smaller than \(\rho _*\) would have been added to the dominance region defined by all the points in the list. Since there is an infinite number of successful iterations, this contradicts Assumption 3.2. \(\square \)

3.3 Refining subsequences, refining directions and generalized directional derivatives

Convergence results for MultiGLODS are derived for limit points of the so-called refining subsequences.

Definition 2

A subsequence \(\{x_k\}_{k\in K}\) of iterates corresponding to unsuccessful poll steps is said to be a refining subsequence if \(\{\alpha _k\}_{k\in K}\) converges to zero.

Having established the existence of a subsequence of step size parameters converging to zero, the updating rules of the step size parameter allow us to ensure the existence of at least one convergent refining subsequence (see, for example,  [8]).

Theorem 3

Let the conditions required for establishing Theorems 1 or 2 hold. Algorithm 2.1 generates at least one refining subsequence \(\{x_k\}_{k\in K}\), converging to \(x_{*} \in \varOmega \).

MultiGLODS behavior will be analyzed at limit points of convergent refining subsequences, along refining directions. This last concept was introduced in [3], in the context of MADS.

Definition 3

Let \(x_*\) be the limit point of a convergent refining subsequence \(\{x_k\}_{k\in K}\). If the limit \(\lim _{k\in K'} d_k/\Vert d_k\Vert \) exists, where \(K'\subseteq K\) and \(d_k\in D_k\), and if \(x_k+\alpha _k d_k \in \varOmega \), for sufficiently large \(k \in K'\), then this limit is said to be a refining direction for \(x_*\).

Given the nonsmoothness present in the objective function, we will need to use a generalized definition of Pareto stationarity. This definition makes use of the Clarke-Jahn [18] generalized directional derivative, computed for directions belonging to the tangent cone to the feasible region or to its interior.

Definition 4

A vector \(d\in \mathbb {R}^n\) is said to be a Clarke tangent vector to the set \(\varOmega \subset \mathbb {R}^n\) at the point x in the closure of \(\varOmega \) if for every sequence \(\{y_k\}\) of elements of \(\varOmega \) that converges to x and for every sequence of positive real numbers \(\{t_k\}\) converging to zero, there exists a sequence of vectors \(\{w_k\}\) converging to d such that \(y_k+t_kw_k\in \varOmega \).

The Clarke tangent cone to \(\varOmega \) at x (\(T_\varOmega ^{Cl}(x)\)) is defined as the set of all Clarke tangent vectors to \(\varOmega \) at x and it is a generalization of the tangent cone commonly used in Nonlinear Programming (see, e.g., [24, Definition 12.2 and Fig. 12.8]).

The interior of this cone defines the hypertangent cone (\(T_\varOmega ^H(x)\)), which corresponds to the set of hypertangent vectors to \(\varOmega \) at x.

Definition 5

A vector \(d\in \mathbb {R}^n\) is said to be a hypertangent vector to the set \(\varOmega \subset \mathbb {R}^n\) at the point x in \(\varOmega \) if there exists a scalar \(\epsilon >0\) such that

$$\begin{aligned} y+tw\in \varOmega ,\quad \forall y\in \varOmega \cap B(x;\epsilon ),\quad w\in B(d;\epsilon ),\quad \text {and}\quad 0<t<\epsilon . \end{aligned}$$

The Clarke tangent cone can also be regarded as the closure of the hypertangent cone.

The Clarke-Jahn generalized directional derivative is well defined for functions locally Lipschitz continuous. Function F(x) is said to be Lipschitz continuous near x if each \(f_i(x)\), \(i=1,\ldots ,m\) is Lipschitz continuous in a neighborhood of x.

In these conditions, the Clarke-Jahn generalized directional derivative can be defined for a component of F, \(f_j\), in a direction d belonging to the hypertangent cone to \(\varOmega \) at x as,

$$\begin{aligned} f_j^\circ (x;d) \; = \; \limsup _{\begin{array}{c}x' \rightarrow x, x'\in \varOmega \\ t\downarrow 0,x'+td\in \varOmega \end{array}}\frac{f_j(x'+td)-f_j(x')}{t}, \quad j=1,\ldots ,m. \end{aligned}$$
(4)

The extension of this derivative to directions v in the tangent cone to \(\varOmega \) at x is computed by taking a limit, i.e., \(\displaystyle f_j^\circ (x;v) = \lim \nolimits _{d \in T_\varOmega ^H(x), d \rightarrow v} f_j^\circ (x;d)\), for \(j=1,\ldots ,m\) (see Proposition 3.9 in [3]).

We are now in conditions of defining the type of stationarity results that we intend to obtain for MultiGLODS.

Definition 6

Let F be Lipschitz continuous near a point \(x_*\in \varOmega \). We say that \(x_*\) is a Pareto-Clarke critical point of F in \(\varOmega \) if, for all directions \(d\in T_\varOmega ^{Cl}(x_*)\), there exists a \(j\in \{1,\ldots ,m\}\) such that \(f_j^\circ (x_*;d) \ge 0\).

When each component of the objective function is strictly differentiable, the previous definition can be restated using the gradient vectors.

Definition 7

Let F be strictly differentiable at a point \(x_*\in \varOmega \). We say that \(x_*\) is a Pareto-Clarke-KKT critical point of F in \(\varOmega \) if, for all directions \(d\in T_\varOmega ^{Cl}(x_*)\), there exists a \(j\in \{1,\ldots ,m\}\) such that \(\nabla f_j(x_*)^\top d \ge 0\).

3.4 Convergence results

Let us start by stating a first stationarity result, not for the whole set of directions belonging to the Clarke tangent cone to the feasible region, but for refining directions in the hypertangent cone. Crucial to establishing this result is the fact that the comparison radius always allows the comparison between the poll center and the poll points. The proof follows the classical reasoning of directional direct search, and in particular the one of DMS.

Theorem 4

Consider a refining subsequence \(\{x_k\}_{k\in K}\) converging to \(x_* \in \varOmega \) and let \(d\in T_\varOmega ^H(x_*)\) be a refining direction for \(x_*\). Assume that F is Lipschitz continuous near \(x_*\). Then there exists a \(j\in \{1,\ldots ,m\}\) such that \(f_j^\circ (x_*;d) \ge 0\).

Proof

Let \(\{x_k\}_{k\in K}\) be a refining subsequence converging to \(x_* \in \varOmega \) and

$$d=\lim _{k\in K''}d_k/\Vert d_k\Vert \in T_\varOmega ^H(x_*)$$

a refining direction for \(x_*\), with \(d_k\in D_k\) and \(x_k+\alpha _k d_k \in \varOmega , \; \forall k\in K'' \subseteq K\).

Since F is Lipschitz continuous near \(x_*\), the Clarke-Jahn generalized directional derivative is well defined for each \(f_j(x_*), j=1,\ldots ,m\) and we have:

$$\begin{aligned}f_j^{\circ }(x_*;d)= & {} \limsup _{\begin{array}{c}x\rightarrow x_*, x\in \varOmega \\ t \downarrow 0, x+td \in \varOmega \end{array}} \frac{f_j(x+t d)-f_j(x)}{t}\\\ge & {} \limsup _{k\in K''}\frac{f_j(x_k+\alpha _k \Vert d_k\Vert ( d_k / \Vert d_k \Vert ) )-f_j(x_k)}{\alpha _k \Vert d_k\Vert } + r_k\\= & {} \limsup _{k\in K''}\frac{f_j(x_k+\alpha _k d_k)-f_j(x_k)+\bar{\rho }(\alpha _k)}{\alpha _k \Vert d_k\Vert }- \frac{\bar{\rho }(\alpha _k)}{\alpha _k \Vert d_k\Vert } + r_k.\\ \end{aligned}$$

The first inequality follows from \(\{x_k\}_{k\in K''}\) being a feasible refining subsequence and the fact that \(x_k+\alpha _k d_k\) is feasible for \(k\in K''\). The term \(r_k\) is bounded above by \(\nu || d - d_k/\Vert d_k\Vert \Vert \), where \(\nu \) is the Lipschitz constant of F near \(x_*\). Note, also, that the limit \(\lim _{k\in K''} \bar{\rho }(\alpha _k) / ( \alpha _k \Vert d_k\Vert ) \) is 0 for both globalization strategies (Sects. 3.1 and 3.2). In the case of using integer lattices (Sect. 3.1), one uses \(\bar{\rho }(\cdot ) \equiv 0\). When imposing sufficient decrease (Sect. 3.2), this limit follows from the properties of the forcing function and the existence of \(d_{min}\), a strictly positive lower bound for the norm of the poll directions.

Since \(x_k+\alpha _k d_k\) corresponds to a point evaluated at the unsuccessful iteration k, it was necessarily compared with the active poll center \(x_k\). Thus \(F(x_k)\notin Dom(x_k+\alpha _k d_k)+\bar{\rho }(\alpha _{k})\), meaning that there should be \(j(k)\in \{1,\ldots ,m\}\) such that\( f_{j(k)}(x_k+\alpha _k d_k)-f_{j(k)}(x_k)+\bar{\rho }(\alpha _{k})>0\). Considering that the number of components of the objective function is finite, by passing to a subsequence indexed in \(K'''\subseteq K''\), there exists \(j=j(d)\) such that

$$\begin{aligned} f_{j(d)}^{\circ }(x_*;d)\ge \limsup _{k\in K'''}\frac{f_{j(d)}(x_k+\alpha _k d_k)-f_{j(d)}(x_k)+\bar{\rho }(\alpha _k)}{\alpha _k \Vert d_k\Vert }\ge 0. \end{aligned}$$

\(\square \)

Assuming strict differentiability of the objective function we can state a similar result but considering the gradient vectors of each component of F.

Corollary 3.1

Consider a refining subsequence \(\{x_k\}_{k\in K}\) converging to \(x_* \in \varOmega \) and let \(d\in T_\varOmega ^H(x_*)\) be a refining direction for \(x_*\). Assume that F is strictly differentiable at \(x_*\). Then there exists a \(j\in \{1,\ldots ,m\}\) such that \(\nabla f_j(x_*)^\top d \ge 0\).

Proof

The result follows immediately from Theorem 4 by noticing that strict differentiability implies that \(f_j^\circ (x_*;d)=\nabla f_j(x_*)^\top d\) (see [18]). \(\square \)

The previous results can now be extended to the whole Clarke tangent cone by assuming density of the set of refining directions associated with \(x_*\). For completeness, we include the proof, which relies in the same arguments used to establish a similar result in DMS.

Theorem 5

Consider a refining subsequence \(\{x_k\}_{k\in K}\) converging to \(x_* \in \varOmega \). Assume that \(T^{Cl}_\varOmega (x_*) \not = \emptyset \) and F is Lipschitz continuous near \(x_*\). If the set of refining directions for \(x_*\) is dense in \(T^{Cl}_\varOmega (x_*)\), then \(x_*\) is a Pareto-Clarke critical point.

In addition, if F is strictly differentiable at \(x_*\), then this point is a Pareto-Clarke-KKT critical point.

Proof

Given a direction \(v \in T^{Cl}_\varOmega (x_*)\), for each \(j\in \{1,\ldots ,m\}\) the Clarke-Jahn generalized directional derivative can be obtained as

$$ f_j^\circ (x_*;v) \; = \; \lim _{ \begin{array}{c} d \rightarrow v \\ d \in T_\varOmega ^H(x_*) \end{array}} f_j^\circ (x_*;d).$$

Since the set of refining directions for \(x_*\) is dense in \(T^{Cl}_\varOmega (x_*)\) then \(v \; = \; \lim _{r \in R} d_r \) with \(d_r \) a refining direction for \(x_*\) belonging to \(T^{H}_\varOmega (x_*)\). Considering the result of Theorem 4 and since the number of components of the objective function is finite, by passing to a subsequence \(R'\subseteq R\) we have \(v \; = \; \lim _{r \in R'} d_r \), with \(d_r \in T^{H}_\varOmega (x_*)\), and \(f_{j(v)}^\circ (x_*;d_r)\ge 0, \forall r\in R'\). The first statement of the theorem results from considering limits of this sequence of generalized derivatives. The second statement of the theorem results trivially. \(\square \)

4 Numerical experiments

In this section we intend to illustrate the numerical behavior of MultiGLODS. In particular, we aim at stating its ability to approximate local and global Pareto fronts of a given multiobjective derivative-free optimization problem. We have considered a test set with 14 bound constrained analytical problems collected from the multiobjective optimization literature:

$$\begin{aligned} \begin{aligned} \min \quad&F(x)\equiv (f_1(x),\ldots ,f_m(x))\\ \text{ s.t. } \quad&l\le x\le u \end{aligned} \end{aligned}$$
(5)

and a constrained real application problem related to styrene production [1, 5].

Table 1 reports the dimensions (n), the number of components of the objective function (m), and the variable bounds for the analytical problems.

As part of our test set, we have problems belonging to the ZDT [27] and the DTLZ [16] collections, and also two additional problems described in [14] (Sects. 4.1 and 5.1.2), named as Deb213 and Deb218, respectively. Additionally, the technique described in Sect. 4 of [14] was used to generate two new biobjective problems, both presenting local and global Pareto fronts. The goal of testing the two first collections is to state the quality of MultiGLODS as a general multiobjective derivative-free optimization solver. Problems proposed in [14] are useful to test its ability to identify local and global Pareto fronts of a given problem.

Table 1 Analytical problems considered in the numerical experiments

A Matlab numerical implementation of MultiGLODS is available at:

This implementation was compared with version 0.3 of Direct MultiSearch (DMS) [11]. DMS is a well-established algorithm, suited for derivative-free multiobjective optimization, which has proved to be competitive with state-of-art solvers, including NSGA-II [15], BIMADS [4], and AMOSA [6]. Nowadays it is still used as benchmark for new multiobjective derivative-free algorithms [21].

In order to access the real impact of the clever multistart strategy, both algorithms were run with parameters similar to the defaults of DMS. Exception occurs in the use of a cache, which was disabled for both algorithms. The idea is to access the value of each algorithmic structure by itself, without any further improvements. Initialization considered a number of points equal to the problem dimension, evenly spaced in a line joining the problem bounds. Inspired by the defaults of GLODS, MultiGLODS additionally considered the center of the feasible region. As globalization strategy, both algorithms used integer lattices. Coordinate directions were used as positive spanning sets and complete polling was performed.

Regarding MultiGLODS, each time that three consecutive unsuccessful iterations occurred, the search step was performed, using Sobol sequences to generate a number of feasible points equal to problem dimension. As poll center, MultiGLODS selected the active point, not yet identified as a local Pareto point, presenting the largest step size. A point is identified as a local Pareto point when the corresponding step size parameter is below \(10^{-3}\).

The step size was initialized as \(n\times \max _{i\in \{1,\ldots ,n\}}(u_i-l_i)\) in MultiGLODS and was set equal to 1 in DMS. Successful iterations kept constant the step size, which was halved at unsuccessful ones.

Both algorithms would stop when a maximum of 20,000 function evaluations was reached or when the step sizes for all points were below \(10^{-3}\) (in the case of MultiGLODS, only active points were considered).

4.1 Additional multimodal multiobjective problems

According to [14], let us consider the biobjective optimization problem defined as:

$$\begin{aligned} \begin{aligned} \min \quad&F(x_1,x_2)\equiv \left( x_1,\frac{g(x_2)}{x_1}\right) \\ \text{ s.t. } \quad&(x_1,x_2)\in \varOmega \subset \mathbb {R}^2, \end{aligned} \end{aligned}$$
(6)

where \(g(x_2)>0\).

Theorem 1 in [14] states that this problem has local or global Pareto optimal solutions \((x_1,x_2^*)\), where \(x_2^*\) corresponds to a local or global minimum of \(g(x_2)\), respectively, and \(x_1\) can take any feasible value.

Using the proposed technique, we have generated problems CAM1 and CAM2, considering the function g defined as below,

$$\begin{aligned} g(x_2)=2-e^{-\left( \frac{x_2-0.2}{0.004}\right) ^2}-c_1 \,e^{-\left( \frac{x_2-0.6}{0.4}\right) ^2}-c_2\,e^{-\left( \frac{x_2-0.9}{0.002} \right) ^2}. \end{aligned}$$

For CAM1, \(c_1=1.9\) and \(c_2=0\), while in CAM2, \(c_1=0.8\) and \(c_2=1.2\). In both problems \(x_1\in [0.1,1]\) and \(x_2\in [0,1]\). Figure 4 provides a graphical representation of function g for CAM1 and CAM2, respectively.

Fig. 4
figure 4

Plot of function g, used in the definition of problems CAM1 (left) and CAM2 (right). (Color figure online)

Comparing with the function g considered in [14], we see that for CAM1 the global minimum is no longer in a narrow valley. Problem CAM2 presents three local minimums, which correspond to two local and one global Pareto fronts.

4.2 Results on the ZDT and DTLZ collections

Starting with the ZDT collection, Fig. 5 represents the approximations to the Pareto front generated by DMS and MultiGLODS.

Fig. 5
figure 5

Approximations to the Pareto front generated by DMS and MultiGLODS for the ZDT collection. For each problem, the support of the true global Pareto front is represented in yellow. (Color figure online)

Fig. 6
figure 6

Approximations to the Pareto front generated by DMS and MultiGLODS for the DTLZ collection. (Color figure online)

As can be observed, the results obtained with each solver are very similar, supporting the quality of the final Pareto fronts generated by MultiGLODS. For problem ZDT6, MultiGLODS presents some points far from the global Pareto front. This is a direct consequence of the stopping criteria considered. If a higher number of function evaluations would have been allowed, eventually these points would change their status to inactive, not being part of the final approximation to the Pareto front generated by MultiGLODS. In the case of problem ZDT4, MultiGLODS is able not only to generate an approximation to the global Pareto front but also to some of the \(21^9\) local Pareto fronts reported in [27].

Figure 6 reports the results obtained with both solvers for the DTLZ collection.

The behavior of MultiGLODS for problems DTLZ1 and DTLZ3 is quite peculiar. In fact, these problems present several local Pareto fronts, all parallel to the global one [16]. MultiGLODS is able to identify one of these local Pareto fronts and also to move to the global one, as we can see in Fig. 7, where the plots are zoomed in.

Fig. 7
figure 7

Partial representation of the approximations to the Pareto front generated by DMS and MultiGLODS for problems DTLZ1 and DTLZ3. (Color figure online)

Problem DTLZ7 presents four disconnected Pareto-optimal regions. MultiGLODS was able to provided a quite good representation of each one of these areas. In problems DTLZ2 and DTLZ4, similarly to problem ZDT6, the results are a consequence of the stopping criteria, which considers a maximum number of function evaluations.

4.3 Results on the multimodal problems

Having stated the quality of the Pareto fronts generated by MultiGLODS, by comparison with the results obtained with DMS in ZDT and DTLZ collections, we would like to test its capability to identify local and global Pareto fronts of a given problem.

We had already a flavor of it, with the results reported for problems ZDT4, DTLZ1, and DTLZ3. The last two problems present 3 objectives and are of dimension 7 and 12, respectively, for which a computational budget of 20,000 function evaluations could somehow be limited. So we decided to use the two dimension, biobjective problems reported in [14] (Sects. 4.1 and 5.1.2) and the two additional problems described in Sect. 4.1. Figure 8 presents the final approximations to the Pareto fronts obtained with DMS and MultiGLODS.

Fig. 8
figure 8

Approximations to the Pareto front generated by DMS and MultiGLODS for multimodal, two dimension, biobjective problems. (Color figure online)

In all problems, MultiGLODS was able to identify the global Pareto front. The identification of all local Pareto fronts was well succeeded in the majority of the situations. The exception occurs in problem CAM1, where the nature of the narrow local minimum of function g prevented MultiGLODS from successfully identifying the corresponding local Pareto front.

4.4 A chemical engineering problem of styrene production

In [1] the authors describe the simulation of a styrene production process and the corresponding optimization problem. Basically, the styrene production process presents four phases: reactants preparation, catalytic reactions, styrene recovery and benzene recovery, which have been implemented in a black-box simulator using the Sequential Modular Simulation (SMS) paradigm. There are three objectives to be maximized: the net present value of the styrene production process (\(f_1\)), the purity of the produced styrene (\(f_2\)) and the purity of the produced benzene (\(f_3\)). Eight variables define parameters of the simulation model. Variables are subject to bounds and to nine other constraints related with industrial and environmental regulations.

Two different approaches have been proposed for the problem. In [1], the authors consider a single objective optimization problem by maximizing \(f_1\), defining upper bounds for \(f_2\) and \(f_3\), and treating these last two objectives as constraints. Using the single objective derivative-free optimization algorithm MADS, with a search step defined by a variable neighborhood search, the authors obtained a single point as solution of the problem. In the process, a budget of 10,000 function evaluations was considered and surrogates were used to guide the search in the variable space.

In [5] a multiobjective derivative-free optimization approach was taken. MultiMADS was applied to the three-objective optimization problem, generating an approximation to the Pareto front of the problem comprising 22 points, when considering a budget of 30,000 function evaluations. None of the 22 points dominates the point generated by MADS, neither the single objective solution dominates any of the 22 points generated by MultiMADS.

We kept the settings used through the whole numerical section, in particular the budget of 20,000 function evaluations, and ran both DMS and MultiGLODS in the styrene production problem. DMS ended without using the maximum number of function evaluations allowed, generating an approximation to the Pareto front of the problem with 3 points. In the case of MultiGLODS, the 20,000 function evaluations were required and it ended with 613 active points. Figure 9 represents the approximations to the Pareto front obtained by the four solvers. The plot corresponds to the symmetric of the objective function, since it is a maximization problem.

Fig. 9
figure 9

Approximations to the Pareto front of the styrene production problem generated by DMS, MultiGLODS, MultiMADS and MADS, in the last case solving a single objective version of the problem. (Color figure online)

The final active points generated by MultiGLODS dominate all the solutions generated both by MultiMADS and DMS. Results obtained with DMS indicate the existence of a local Pareto front of the problem, which was also identified by MultiGLODS. Regarding the point produced by MADS, as result of the single objective optimization, it is not dominated by the final active points generated by MultiGLODS. In fact, it presents the best value for \(f_1\), which is natural since a single objective optimization was performed in this objective. Although, there are also active points generated by MultiGLODS which are not dominated by the point generated by MADS. The best values obtained for \(f_2\) and \(f_3\) correspond to points generated by MultiGLODS.

5 Conclusions

In this paper we have proposed, analyzed, and numerically tested a new algorithm for optimizing multiobjective, multifront, derivative-free functions.

The new directional direct search method generalizes GLODS [10] to multiobjective optimization and confers a global behavior to DMS [11]. Multistart is used to initialize new searches, generally not conducted until the end, since they merge when start to be close to each other. A comparison radius, directly related to the step size parameter, is the keystone in this merging procedure. Points sufficiently close to each other are compared and only nondominated points will remain active. In the end of the optimization process, the set of all active points will define the approximations to the Pareto fronts of the problem (local and global).

Under the common assumptions of directional direct search, convergence results were derived. Numerical experiments evidence the quality of the final solutions generated by the new algorithm and its capability in identifying approximations to global and local Pareto fronts of a given problem.