Keywords

1 Introduction

In many problems in engineering and finance the problem arises that several objectives have to be optimized concurrently [3, 7, 8, 11, 12, 14, 17, 22, 27, 28, 30, 37]. One main challenge of such multi-objective optimization problems (MOPs) is that their solution set—the so-called Pareto set respectively its image, the Pareto front—typically forms a (\(k-1\))-dimensional object, where k is the number of objectives involved in the problem. This is in contrast to classical scalar optimization problems (SOPs), where one expects that the optimum is taken at one single solution. Modern heuristics such as Multiobjective Evolutionary Algorithms (MOEAs, e.g., [2, 9, 10, 23]) are able to provide an approximation of the entire Pareto set/front of a given MOP in one single run of the algorithm due to their set oriented approach. These methods are very robust e.g. to initial conditions and only require minimal assumptions from the model (e.g., no derivative information). Evolutionary algorithms as well as other related heuristics are hence very popular for the numerical treatment of MOPs as well as other optimization problems. One drawback, however, that most of them suffer is that they need quite a few function evalutions in order to obtain accuate approximations of the Pareto sets/fronts. As a remedy, many researchers have in the past hybridized the (global) evolutionary algorithms with local search techniques mainly coming from Mathematical Programming that utilize derivative information from the objectives and the constraint functions. Such methods are termed hybrid evolutionary algorithms or memetic algorithms. While such methods yield in almost all cases satisfying results, they are still relatively expensive since the derivative information is required for every point that is designated for local search.

In this chapter, we review several recently developed tools that allow to realize a local search within a population based optimization algorithm with low computational cost. The basic idea of the Gradient Subspace Approximation (GSA, [33]) is to utilize existing neighborhood information to estimate the most greedy direction within the search space that is spanned by the samples. One advantage of this approach is that it can easily be extended to the context of constrained problems. The focus of this chapter is on bi-objective optimization problems (BOPs, i.e., MOPs with \(k=2\) objective functions). For this, we will consider the descent direction for BOPs proposed in [25] and present recent adaptations for constrained problems ( [38]). Next, we will show how GSA can be used to build a low-cost local search engine that can be used within a population based algorithm such as a MOEA. In order to show the efficiency of the method, we will show some numerical results from a hybrid evolutionary algorithm that coupled the GSA based local search engine with the famous evolutionary algorithm NSGSA-II ( [10]) which is state-of-the-art for MOPs with two and three objectives.

The remainder of this chapter is organized as follows: in Sect. 2, we review the Gradient Subspace Approximation for the approximation of the most greedy search direction out of given neighborhood information. In Sect. 3, we review a particular descent direction for bi-objective optimization problems for unconstrained, equality and inequality contrained problems. We further combine these two concepts in order to build a low-cost local searcher within a chosen set-oriented optimization method (such as an evolutionary algorithm). In Sect. 4 we present some numerical examples on a hybrid evolutionary algorithm that is based on the famous NSGA-II. Finally, we draw our conclusions Sect. 5.

2 Gradient Subspace Approximation

In this section, we will review the Gradient Subspace Approximation that aims to compute a descent direction at a given point \(x_0\) and a given scalar optimization problem using existing neighborhood information. For more details the reader is referred to [33].

2.1 Background and Related Work

In this section, we will consider continuous scalar optimization problems (SOP) of the following form

$$\begin{aligned} \begin{aligned} \min _x&\; f(x)\\ \hbox {s.t.}&\; g_i(x) \le 0,\; i=1,\ldots ,p\\&\; h_j(x) = 0,\; j=1,\ldots ,m.\\ \end{aligned} \end{aligned}$$
(1)

Hereby, \(f:\mathbbm {R}^n\rightarrow \mathbbm {R}\) is called the objective function, and the functions \(g_i:\mathbbm {R}^n\rightarrow \mathbbm {R}\) and \(h_j:\mathbbm {R}^n\rightarrow \mathbbm {R}\) are called the inequality and equality constraints, respectively. We assume that all functions f, \(g_i\) and \(h_j\) are differentiable.

A point x is called feasible if it satisfies all constraints. A point \(x^*\) is called a solution to (1) if it is feasible and if there exists no other feasible point y that has a lower objective value.

The object of interest in this section is the gradient of a function at a given point \(x_0\). Formally, the gradient of the objective at \(x_0\) is defined by

$$\begin{aligned} \nabla f(x) = \left( \frac{\partial f}{\partial x_1}(x),\ldots , \frac{\partial f}{\partial x_n}(x)\right) ^T\in \mathbbm {R}^n. \end{aligned}$$
(2)

A vector \(\nu \in \mathbbm {R}^n\) is called a descent direction for f at \(x_0\) if

$$\begin{aligned} \langle \nabla f(x_0),\nu \rangle < 0; \end{aligned}$$

in that case, it holds for all sufficiently small step sizes \(t>0\) that \(f(x_0+t\nu )<f(x_0)\).

When the gradient of a function is not available in analytic form, there are several ways to obtain either \(\nabla f(x_0)\) at a given point \(x_0\) or approximations of this vector. The most prominent and widely used technique is to use finite differences (e.g., [29]). The method presented in this section, GSA is also using a finite difference approach. The variance, however, is that GSA can gather the sampling points from all directions whereas the classical finite difference method utilizes samples in coordinate directions. GSA is hence more suited to the use within population based algorithms since in this case the neighboring samples are already given (and typically not aligned in coordinate directions). In [6], a very similar method is proposed to approximate the Jacobians of the objective map of a given unconstrained multi-objective optimization problem. This work, however, does not discuss how to address constrained problems. The Hill Climber with Sidestep [26] and the Directed Search Method [35, 36] both use neighborhood samples in order to determine promising search directions for the local search within hybrid evolutionary algorithms. The difference to the GSA is that these works do not directly aim to approximate the gradient. Automatic Differentiation (AD, [18]) can be used to evaluate the exact gradient at a given point \(x_0\) if the function is specified by a computer program. One drawback of AD is that it can not be applied if this computer program is provided in form of binary code.

Further, there are several methods that replace the original objective by easier models. One example is response surface methodology (RSM), where the objective function f is replaced by low-order polynomials \(\tilde{f}\) (mainly of degree one and two) those gradients are approximated using least squares techniques [24]. If a first-order model is chosen, the match of the gradients \(\nabla \tilde{f}(x_{0})\) and \(\nabla f(x_{0})\) is typically quite good for a nonlinear function f if the chosen point \(x_0\) is sufficiently far away from the optimum. For second-order models, the match is in general much better, however, this accuracy comes with an additional cost since \(n^2\) parameters have to be fitted at every point \(x_0\). Further works that can utilize scattered samples can be found in [13, 20]. In [20], a least squares regression is performed while in [13] statistical expectation is used. In both works, the authors restrict themselves to unconstrained problems.

2.2 The Basic Idea

The task is to compute a cost-free good approximation of the normalized gradient

$$\begin{aligned} n(x_0) := -\frac{\nabla f(x_0)}{\Vert \nabla f(x_0)\Vert _2}, \end{aligned}$$
(3)

evaluated at a given point \(x_0\) within a particular subspace of the \(\mathbb {R}^n\). For this, we make use of the fact that the gradient is the direction of the steepest ascent, and hence \(n(x_0)\) can be seen as the solution of the following optimization problem:

$$\begin{aligned} \begin{aligned} \min _{\nu \in \mathbbm {R}^n}&\; \langle \nabla f(x_0),\nu \rangle \\ \hbox {s.t.}&\; \Vert \nu \Vert _2^2 = 1. \end{aligned} \end{aligned}$$
(4)

To avoid to directly compute the gradient we can make use of neighboring information as follows: assume that we are given the points \(x_1,\ldots , x_r\) in the vicinity of \(x_0\) (e.g., these samples may be contained within the population or archive of a set based optimization method such as an evolutionary algorithm) as well as their function values \(f(x_i)\). In that case, we can use this information to approximate the directional derivatives in the directions

$$\begin{aligned} \nu _i := \frac{x_i-x_0}{\Vert x_i-x_0\Vert _2},\qquad i=1,\ldots , r. \end{aligned}$$
(5)

Since it holds

$$\begin{aligned} f_{\nu _i}'(x_0) = \langle \nabla f(x_0),\nu _i\rangle = \frac{f(x_i)-f(x_0)}{\Vert x_i-x_0\Vert _2} + O(\Vert x_i-x_0\Vert _2), \end{aligned}$$
(6)

where O denotes the Landau symbol, and since we are given the points \(x_0\) and \(x_i\) with their respective objective values \(f(x_0)\) and \(f(x_i)\), we can hence use the approximations

$$\begin{aligned} f_{\nu _i}'(x_0) \approx \frac{f(x_i)-f(x_0)}{\Vert x_i-x_0\Vert _2},\qquad i=1,\ldots , r. \end{aligned}$$
(7)

Given the samples \(x_1,\ldots , x_r\) and \(\nu _i\) as above we define the subspace S as

$$\begin{aligned} S = span\{\nu _1,\ldots , \nu _r \} \end{aligned}$$
(8)

and are interested in a best approximation of \(n(x_0)\) within S. Since every \(\nu \in S\) can be written as

$$\begin{aligned} v = \sum _{i=1}^r \lambda _i \nu _i \end{aligned}$$
(9)

for some \(\lambda = (\lambda _1,\ldots ,\lambda _r)\in \mathbb {R}^r,\) and

$$\begin{aligned} \langle \nabla f(x_0),\nu \rangle = \sum _{i=1}^r \lambda _i \langle \nabla f(x_0),\nu _i\rangle \end{aligned}$$
(10)

we can state problem (4), where we restrict the search to S, as follows:

$$\begin{aligned} \begin{aligned} \min _{\lambda \in \mathbbm {R}^r}&\; \sum _{i=1}^r \lambda _i \langle \nabla f(x_0),\nu _i \rangle \\ \hbox {s.t.}&\; \left\| \sum _{i=1}^r \lambda _i \nu _i\right\| _2^2 = 1. \end{aligned} \end{aligned}$$
(11)

Hence, when using an approximation of the directional derivatives as in (7) via using neighboring samples, we can avoid to directly compute the gradient. One advantage of using problem (11) is that constraint information can directly be incorporated.

In the following, we will analyze the best fit approximations of \(n(x_0)\) within the subspace S both for unconstrained and constrained SOPs. For this, we will first consider the ideal scenario where we assume that we are given all directional derivatives, and later on we will discuss the gradient-free realizations.

2.3 Gradient Subspace Approximation

We will in the following discuss how to approximate the most greedy search direction out of the given data, separately for unconstrained, equality constrained, and inequality constrained problems.

2.3.1 Unconstrained Problems

We are given the problem

$$\begin{aligned} \min _{x\in \mathbbm {R}^n} f(x), \end{aligned}$$
(12)

where \(f:\mathbbm {R}^n\rightarrow \mathbbm {R}\). Further, we are given a point \(x_0\in \mathbbm {R}^n\) and the samples \(x_1,\ldots , x_r\in \mathbbm {R}^n\) in the vicinity of \(x_0\), together with their objective values \(f(x_i)\). Further, for purpose of a better analysis of the problem we also assume that we are given the directional derivatives

$$\begin{aligned} \langle \nabla f(x_0), \nu _i \rangle , \qquad i=1,\ldots , r. \end{aligned}$$
(13)

Define the matrix V by

$$\begin{aligned} V = (\nu _1,\ldots ,\nu _r)\in \mathbbm {R}^{n\times r}. \end{aligned}$$
(14)

Then, the most greedy search direction within the subset

$$\begin{aligned} S = span\{\nu _1,\ldots , \nu _r \} \end{aligned}$$
(15)

is given by the solution of the following problem

$$\begin{aligned} \begin{aligned} \min _{\lambda \in \mathbbm {R}^r}&\; \sum _{i=1}^r \lambda _i \langle \nabla f(x_0),\nu _i\rangle \\ \hbox {s.t.}&\; \lambda ^TV^TV\lambda - 1 = 0. \end{aligned} \end{aligned}$$
(16)

The following result shows that the solution of (16) can be computed in closed form.

Proposition 1

Let \(\nu _1,\ldots ,\nu _r\in \mathbbm {R}^n\), \(r\le n\), be linearly independent and

$$\begin{aligned} \tilde{\lambda }^* := -(V^TV)^{-1} V^T\nabla f(x_0). \end{aligned}$$
(17)

Then

$$\begin{aligned} \lambda ^* := \frac{\tilde{\lambda }^*}{\Vert V\lambda ^*\Vert _2^2} \end{aligned}$$
(18)

is the unique solution of (16) and

$$\begin{aligned} \nu ^* = \frac{-1}{\Vert V\lambda ^*\Vert _2^2} V(V^TV)^{-1}V^T\nabla f(x_0) \end{aligned}$$
(19)

is the most greedy search direction in S.

Proof

The Karush-Kuhn Tucker (KKT) system of (16) reads as

$$\begin{aligned} \nabla _\lambda L(\lambda ,\mu ) =&V^T \nabla f(x_0) + 2\mu V^T V\lambda = 0 \end{aligned}$$
(20)
$$\begin{aligned} h(\lambda ) =&\lambda ^TV^TV\lambda -1 = 0. \end{aligned}$$
(21)

Apparently, Equation (21) is only used for normalization. If we omit this equation and the factor \(2\mu \) in (20) we can rewrite (20) as the following normal equation system

$$\begin{aligned} V^TV \lambda = -V^T \nabla f(x_0). \end{aligned}$$
(22)

To solve the entire KKT system we have to choose \(2\mu = \Vert V\lambda ^*\Vert _2^2\). Finally, the claim follows since the Hessian of the Lagrangian

$$\begin{aligned} \nabla ^2_{\lambda \lambda } L(\lambda ,\mu ) = V^TV \end{aligned}$$
(23)

is positive definite since the directions \(\nu _i\) are linearly independent.    \(\square \)

Next, we discuss how to approximate the most greedy solution \(\nu ^*\) without explicitly computing or approximating the gradients. Since

$$\begin{aligned} V^T \nabla (x_0) = \left( \begin{array}{c} \langle \nabla f(x_0),\nu _1\rangle \\ \vdots \\ \langle \nabla f(x_0),\nu _r\rangle \end{array}\right) \end{aligned}$$
(24)

we can do the approximation as follows: let \(d = (d_1,\ldots , d_r)\in \mathbb {R}^r\), where

$$\begin{aligned} d_i:= \frac{f(x_i)-f(x_0)}{\Vert x_i-x_0\Vert _2},\qquad i=1,\ldots ,r, \end{aligned}$$
(25)

and \(\tilde{\lambda }\) be the vector that solves the system of linear equations

$$\begin{aligned} V^T V\tilde{\lambda } = -d, \end{aligned}$$
(26)

then the most greedy search direction can be approximated as

$$\begin{aligned} \tilde{\nu }^* = \frac{-1}{\Vert V\tilde{\lambda }\Vert _2^2} V(V^T V)^{-1}d. \end{aligned}$$
(27)

Remark 1

(a) To compute \(\nu ^*\) one has to solve system (22). It is hence advisable to avoid to choose directions \(\nu _i\) that nearly point into the same directions.

The linear equation system yields the best condition number if the directions are chosen orthogonal to each other. In this case, we obtain

$$\begin{aligned} \nu ^* = \frac{-1}{\Vert \lambda ^*\Vert _2^2} VV^T\nabla f(x_0), \end{aligned}$$
(28)

i.e., the orthogonal projection of \(\nabla f(x_0)\) onto S. That is, \(\nu ^*\) is the best approximation of \(n(x_0)\) in S.

  1. (b)

    In the special case that the coordinate directions are chosen, i.e., if we choose

    $$\begin{aligned} x_i = x_0 + t_i e_{j_i},\ i=1,\ldots ,r, \end{aligned}$$
    (29)

    for the samples, where \(e_j\) denotes the j-th unit vector, we obtain for the \(j_i\)-th entry of \(\tilde{\nu }^*\) (without normalization)

    $$\begin{aligned} \tilde{\nu }^*_{j_i} = \frac{f(x_0+t_i e_{j_i}) - f(x_0)}{|t_i|}. \end{aligned}$$
    (30)

    That is, if we for instance choose \(x_i = x_0 + t_i e_i\), \(i=1,\ldots ,n\) (i.e., all coordinate directions), the search direction \(\nu ^*\) coincides with the forward difference quotient.

  2. (c)

    The idea of GSA is to utilize existing data whenever possible. However, it may be the case that for a given point \(x_0\) that the existing data is not sufficient (e.g., not enough individuals of the current population are close enough to \(x_0\)). A possible remedy may be to sample further points in order to compute a search direction. In that case it makes sense to choose the points so that the resulting directions \(\nu _i\) are orthogonal to each other as well as to all existing directions. See [1] for a possible realization.

Example 1

We consider the objective \(f:\mathbbm {R}^6\rightarrow \mathbbm {R}\), where

$$\begin{aligned} f(x) = \sum _{i=1}^6 x_i^2. \end{aligned}$$
(31)

Let \(x_{0} = \left[ 1,1,1,1,1,1\right] ^{T}\), then \(\nabla f(x_{0}) = \left[ 2,2,2,2,2,2\right] ^{T}\) and

$$\begin{aligned} g=\frac{-1}{\sqrt{24}} \left[ 2,2,2,2,2,2\right] ^{T} \end{aligned}$$
(32)

for which \(\langle \nabla f(x_{0}),g \rangle = -4.8990.\)

First, we choose three orthogonal search directions that form the matrix \(V^{T}\) as follows:

$$\begin{aligned} V^{T} = \left( \begin{array}{cccccc} 2 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 3 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 3 &{} 1 \end{array} \right) . \end{aligned}$$
(33)

Doing so, we obtain \(\langle \nabla f(x_{0}), \frac{v_{1}}{\Vert v_{1} \Vert } \rangle = 2.6833, \;\langle \nabla f(x_{0}), \frac{v_{2}}{\Vert v_{2} \Vert } \rangle = 2.5298\) and \(\langle \nabla f(x_{0}), \frac{v_{3}}{\Vert v_{3} \Vert } \rangle = 2.5298.\) If we solve problem (16) we get

$$\begin{aligned} v^{*} = [-0.5367,-0.1789,-0.2683,-0.5367,-0.5367,-0.1789]^{T}, \end{aligned}$$
(34)

for which \(\langle \nabla f(x_{0}),v^{*} \rangle = -4.4721.\)

Next, we use the samples

$$\begin{aligned} x_{i} = x_{0} + 0.1v_{i}, \;\; \hbox {for} \;\;i=1,2,3, \end{aligned}$$
(35)

and via formula (27) we obtain

$$\begin{aligned} \tilde{v}^{*} = [-0.5237,-0.1813,-0.2618,-0.5438,-0.5438,-0.1813]^{T} \end{aligned}$$

which leads to \(\langle \nabla f(x_{0}),\tilde{v}^{*} \rangle = -4.4714.\)

If choosing the non-orthogonal search directions

$$\begin{aligned} V^{T} = \left( \begin{array}{cccccc} 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 2 &{} 1 &{} 2 &{} 0 &{} 0 \\ 1 &{} 0 &{} 2 &{} 0 &{} 1 &{} 2 \end{array} \right) , \end{aligned}$$
(36)

we obtain \(\langle \nabla f(x_{0}), \frac{v_{1}}{\Vert v_{1} \Vert } \rangle = 2.8284 , \;\langle \nabla f(x_{0}), \frac{v_{2}}{\Vert v_{2} \Vert } \rangle =3.3333\) and \(\langle \nabla f(x_{0}), \frac{v_{3}}{\Vert v_{3} \Vert } \rangle = 3.7947.\)

Solving (16) leads to

$$\begin{aligned} v^{*} = [-0.3769,-0.4744,-0.5686,-0.3054,-0.2080,-0.4159]^{T} \end{aligned}$$
(37)

with \(\langle \nabla f(x_{0}),v^{*} \rangle = -4.6985\). For the discretized problem we obtain

$$\begin{aligned} \tilde{v}^{*} = [-0.3595,-0.4674,-0.5770,-0.3171,-0.2092,-0.4184]^{T} \end{aligned}$$
(38)

with \(\langle \nabla f(x_{0}),\tilde{v}^{*} \rangle = -4.6972.\)

2.3.2 Equality Constrained Problems

Next, we assume that the SOP contains some equality constraints, i.e., that we are given the following problem

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbbm {R}^n}&\; f(x)\\ \hbox {s.t.}&\; h_i(x) = 0,\;i=1,\ldots ,p, \end{aligned} \end{aligned}$$
(39)

where we assume that each \(h_i:\mathbbm {R}^n\rightarrow \mathbbm {R}\) is differentiable.

Analogously to the unconstrained case discussed above the most greedy search direction at \(x_0\) in the entire space \(\mathbb {R}^n\) is given by

$$\begin{aligned} \begin{aligned} \min _{\nu \in \mathbbm {R}^n}&\; \langle \nabla f(x_0),\nu \rangle \\ \hbox {s.t.}&\; \Vert \nu \Vert _2^2 = 1\\&\; \langle \nabla h_i(x),\nu \rangle = 0,\;i=1,\ldots ,p, \end{aligned} \end{aligned}$$
(40)

and the most greedy direction at \(x_0\) within the subspace S is given by

$$\begin{aligned} \begin{aligned} \min _{\lambda \in \mathbbm {R}^r}&\;\sum _{i=1}^r \lambda _i \langle \nabla f(x_0),\nu _i\rangle \\ \hbox {s.t.}&\; \lambda ^TV^TV\lambda -1 = 0\\&\; \sum _{i=1}^r \lambda _i \langle \nabla h_j(x_0),\nu _i\rangle = 0,\; j=1,\ldots ,p. \end{aligned} \end{aligned}$$
(41)

Denote the matrix H by

$$\begin{aligned} H = \left( \begin{array}{c}\nabla h_1(x_0)^T\\ \vdots \\ \nabla h_p(x_0)^T\end{array} \right) \in \mathbbm {R}^{p\times n}. \end{aligned}$$
(42)

As for the unconstrained case, we can also express the most greedy solution for an equality constrained problem analytically.

Proposition 2

Let \(\nu _1,\ldots ,\nu _r\in \mathbbm {R}^n\) be linearly independent where \(p\le r\le n\), let \(rank(H) = p\), and

(43)

then

$$\begin{aligned} \lambda ^* := \frac{\tilde{\lambda }^*}{\Vert V\lambda ^*\Vert _2^2} \end{aligned}$$
(44)

is the unique solution of (41) and thus

$$\begin{aligned} \nu ^* = \frac{-1}{\Vert V\lambda ^*\Vert _2^2} V(V^TV)^{-1}V^T\nabla f(x_0) \end{aligned}$$
(45)

is the most greedy search direction in \(span\{\nu _i,\ldots ,\nu _r\}\).

Proof

The KKT system of (41) is given by

$$\begin{aligned} V^T \nabla f(x_0) + 2\mu _0 V^TV\lambda + (HV)^T\mu&= 0 \end{aligned}$$
(46)
$$\begin{aligned} HV\lambda&= 0 \end{aligned}$$
(47)
$$\begin{aligned} \lambda ^TV^TV\lambda - 1&= 0, \end{aligned}$$
(48)

and via applying the same “normalization trick” as above we can transform the KKT equations into

(49)

To show that the matrix is regular, let \(y\in \mathbbm {R}^r\) and \(z\in \mathbbm {R}^p\) such that

$$\begin{aligned} \left( \begin{array}{c@{\quad }c} V^TV &{} (HV)^T\\[2mm] HV &{} 0\end{array}\right) \left( \begin{array}{c}y\\[2mm]z\end{array}\right) = 0. \end{aligned}$$
(50)

It follows that \(HVy=0\) and hence that

(51)

Thus, it is \(y=0\) since \(V^TV\) is positive definite. Further, by (50) it follows that \(V^TH^Tz=0\). Since \(V^T\in \mathbbm {R}^{n\times r}\) has rank \(r\ge p\), it follows that \(V^TH^T\) has rank p. This implies that also \(z=0\), and thus, that the matrix in (43) is regular.

The rest follows by the discussion above setting \(2\mu _0=\Vert \sum _{i=1}^r \tilde{\lambda }_i^* \nu _i\Vert _2^2\) and since the Hessian of the Lagrangian \(\nabla _{\lambda \lambda }^2 L(\lambda ,\mu ) = V^TV\) is positive definite.    \(\square \)

The key for a gradient-free approximation of the search direction is the matrix HV. Since

$$\begin{aligned} (HV)_{ij} = \nabla h_i(x_0)^T \nu _j, \end{aligned}$$

we compute an approximation \(M = (m_{ij})\) of HV via

$$\begin{aligned} m_{ij} := \frac{h_i(x_j)-h_i(x_0)}{\Vert x_j-x_i\Vert _2}, \quad i=1,\ldots ,p,\; j=1,\ldots ,r. \end{aligned}$$
(52)

Doing so, we can now solve the system

(53)

which leads to \(\lambda ^*\). To obtain \(\nu ^*\) we proceed as for the unconstrained case using the approximation \(V^T \nabla f(x_0)\approx d\).

Example 2

We consider the SOP from Example 1 and impose the constraint

$$\begin{aligned} h(x) = x_1+x_2+x_3 = 0. \end{aligned}$$
(54)

For \(x_{0} = \left[ -1,0,1,1,1,1\right] ^{T}\) we have \(\nabla f(x_{0}) = \left[ 2,2,2,2,2,2\right] ^{T}\), \(g=\frac{-1}{\sqrt{24}} \left[ 2,2,2,2,\right. \left. 2,2\right] ^{T}\), \(\nabla _h(x_{0}) = \left[ 1,1,1,0,0,0\right] ^{T}\) and \(\langle \nabla f(x_{0}),g \rangle = -4.8990.\)

First, we choose again the three orthogonal search directions

$$\begin{aligned} V^{T} = \left( \begin{array}{cccccc} 2 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 3 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 3 &{} 1 \end{array} \right) ; \end{aligned}$$
(55)

and, we obtain \(\langle \nabla f(x_{0}), \frac{v_{1}}{\Vert v_{1} \Vert } \rangle = 2.6833, \;\langle \nabla f(x_{0}), \frac{v_{2}}{\Vert v_{2} \Vert } \rangle = 2.5298\), \(\langle \nabla f(x_{0}), \frac{v_{3}}{\Vert v_{3} \Vert } \rangle = 2.5298\), and

$$\begin{aligned} v^{*} = [0.1210,-0.1815,0.0605,-0.5444,-0.7662,-0.2554]^{T} \end{aligned}$$
(56)

with \(\langle \nabla f(x_{0}),v^{*} \rangle = -3.1322.\)

Using the above setting and the samples

$$\begin{aligned} x_{i} = x_{0} + 0.1v_{i}, \;\; \hbox {for} \;\;i=1,2,3 \end{aligned}$$
(57)

we obtain the search direction

$$\begin{aligned} \tilde{v}^{*} = [0.1293,-0.1939,0.0646,-0.5817,-0.7368,-0.2456]^{T} \end{aligned}$$

which leads to \(\langle \nabla f(x_{0}),\tilde{v}^{*} \rangle = -3.1281.\)

In a next step, we consider the non-orthogonal search directions

$$\begin{aligned} V^{T} = \left( \begin{array}{cccccc} 1 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 2 &{} 1 &{} 2 &{} 0 &{} 0 \\ 1 &{} 0 &{} 2 &{} 0 &{} 1 &{} 2 \end{array} \right) \end{aligned}$$
(58)

leading to \(\langle \nabla f(x_{0}), \frac{v_{1}}{\Vert v_{1} \Vert } \rangle = 2.8284 , \;\langle \nabla f(x_{0}), \frac{v_{2}}{\Vert v_{2} \Vert } \rangle =3.3333\) and \(\langle \nabla f(x_{0}), \frac{v_{3}}{\Vert v_{3} \Vert } \rangle = 3.7947.\)

When solving (41) we obtain

$$\begin{aligned} v^{*} = [0.4050,0.2244,-0.6294,-0.3963,-0.2156,-0.4312]^{T} \end{aligned}$$

with \(\langle \nabla f(x_{0}),v^{*} \rangle = -2.0863\) for the idealized problem and for the discretized problem we obtain

$$\begin{aligned} \tilde{v}^{*} = [0.4490,0.1617,-0.6107,-0.4742,-0.1868,-0.3736]^{T} \end{aligned}$$

with \(\langle \nabla f(x_{0}),\tilde{v}^{*} \rangle = -2.0691.\)

2.3.3 Inequality Constrained Problems

Finally, we assume we are given an inequality constrained SOP of the form

$$\begin{aligned} \begin{aligned} \min _{x\in \mathbbm {R}^n}&\; f(x)\\ \hbox {s.t.}&\; g_i(x) \le 0,\;i=1,\ldots ,m, \end{aligned} \end{aligned}$$
(59)

where we for simplicity assume that all inequalities are active at a given point \(x_0\). The most greedy search direction at \(x_0\) is given by

$$\begin{aligned} \begin{aligned} \min _{\nu \in \mathbbm {R}^n}&\; \langle \nabla f(x_0),\nu \rangle \\ \hbox {s.t.}&\; \Vert \nu \Vert _2^2 = 1\\&\; \langle \nabla g_i(x),\nu \rangle \le 0,\;i=1,\ldots ,m, \end{aligned} \end{aligned}$$
(60)

and the related subspace optimization problem reads as

$$\begin{aligned} \begin{aligned} \min _{\lambda \in \mathbbm {R}^r}&\;\sum _{i=1}^r \lambda _i \langle \nabla f(x_0),\nu _i\rangle \\ \hbox {s.t.}&\; \lambda ^TV^TV\lambda -1 = 0\\&\; \sum _{i=1}^r \lambda _i \langle \nabla g_j(x_0),\nu _i\rangle \le 0,\; j=1,\ldots ,m. \end{aligned} \end{aligned}$$
(61)

One way to find a solution to (61) is to use gradient projection which is advantageous in particular if m is small and \(r\gg m\). In the following, we first discuss the special case \(m=1\) (i.e., one active inequality constraint) and will later on discuss the general case.

The classical gradient projection approach is to take the solution \(\nu ^*\) of the underlying unconstrained problem (16) and to project it to the space \(\nabla g(x_0)^\perp \) that is orthogonal to \(\nabla g(x_0)\) (see Fig. 1): given a QR decomposition of \(\nabla g(x_0)\), i.e.,

$$\begin{aligned} \nabla g(x_0) = QR = (q_1,\ldots ,q_n) R, \end{aligned}$$
(62)

then the vectors \(q_2,\ldots ,q_n\) build an orthonormal basis (ONB) of \(\nabla g(x_0)^\perp \). Using \(Q_g = (q_2,\ldots ,q_n)\), the projection is hence given by

$$\begin{aligned} \nu _{new} = Q_g Q_g^T\nu ^*. \end{aligned}$$
(63)

It is of course not advisable to follow this approach directly since \(\nabla g(x_0)\) is neither given, nor do we want to approximate it. Alternatively, we propose to proceed as follows: let

$$\begin{aligned} \begin{aligned} M := \nabla g(x_0)^T V = (\langle \nabla g(x_0),\nu _1\rangle ,\ldots , \langle \nabla g(x_0),\nu _r\rangle ) \in \mathbbm {R}^{1\times r}. \end{aligned} \end{aligned}$$
(64)

Note that if w is a kernel vector of M then Vw is perpendicular to \(\nabla g(x_0)\) and vice versa. Thus, we can compute the matrix

$$\begin{aligned} K = (k_1,\ldots ,k_{r-1}) \in \mathbbm {R}^{r\times (r-1)} \end{aligned}$$
(65)

those column vectors build an ONB of the kernel of M. If the search directions \(\nu _i\) are orthogonal, then also the vectors \(Vk_1,\ldots , Vk_{r-1}\) are orthogonal to each other. The latter are the column vectors of \(VK\in \mathbbm {R}^{n\times (r-1)}\) (if the \(\nu _i\)’s are not orthogonal to each other, VK has to be orthogonalized via another QR decomposition). Doing so, the projected vector to the kernel of M is given by

$$\begin{aligned} \tilde{\nu }_{new} = VK(VK)^T\nu ^* = VKK^TV^T\nu ^*. \end{aligned}$$
(66)
Fig. 1.
figure 1

Handling inequality constraints using gradient projection.

For a general number m of inequality constraints we can extend the method as follows: let

$$\begin{aligned} M = GV = (\langle \nabla g_i(x_0),\nu _j\rangle )_{i=1,\ldots ,m\atop j=1,\ldots ,r}, \end{aligned}$$
(67)

and perform the following steps

  1. (1)

    compute an orthonormal basis \(K\in \mathbbm {R}^{r\times (r-m)}\) of the kernel of M

  2. (2)

    compute \(VK = QR = (q_1,\ldots ,q_{r-m},\ldots ,q_n)R\) and set \(O := (q_1,\ldots ,q_{r-m})\in \mathbbm {R}^{n\times (r-m)}\)

  3. (3)

    \(\tilde{\nu }_{new} = \tilde{Q}\tilde{Q}^T \nu ^*\)

Example 3

We consider again the SOP from Example 1 but this time we impose the inequality

$$\begin{aligned} g(x) = 1-x_1 \le 0. \end{aligned}$$
(68)

For \(x_{0} = \left[ 1,1,1,1,1,1\right] ^{T}\) we have \(\nabla f(x_{0}) = \left[ 2,2,2,2,2,2\right] ^{T}\), \(g=\frac{-1}{\sqrt{24}} \left[ 2,2,2,\right. \left. 2,2,2\right] ^{T}\) and \(\nabla _c(x_{0}) = \left[ -1,0,0,0,0,0\right] ^{T}\). Thus \(\langle \nabla f(x_{0}),g \rangle = -4.8990.\)

First, we again choose the three orthogonal search directions

$$\begin{aligned} V^{T} = \left( \begin{array}{cccccc} 2 &{} 0 &{} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 0 &{} 3 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 3 &{} 1 \end{array} \right) , \end{aligned}$$
(69)

and obtain \(\langle \nabla f(x_{0}),\frac{v_{1}}{\Vert v_{1} \Vert } \rangle = 2.6833, \;\langle \nabla f(x_{0}), \frac{v_{2}}{\Vert v_{2} \Vert } \rangle = 2.5298\) and \(\langle \nabla f(x_{0}), \frac{v_{3}}{\Vert v_{3} \Vert } \rangle = 2.5298.\)

For the search direction we obtain

$$\begin{aligned} v^{*} = [-0.5367,-0.1789,-0.2683,-0.5367,-0.5367,-0.1789]^{T} \end{aligned}$$

with \(\langle \nabla f(x_{0}),v^{*} \rangle = -4.4721.\) Then by the gradient projection approach and \(v^{*}\), we obtain the projected vector

$$\begin{aligned} v_{new} = \left[ 0,-0.1789,-0.2683,-0.5367,-0.5367,-0.1789\right] ^{T} \end{aligned}$$

with \(\langle \nabla f(x_{0}),v_{new} \rangle = -3.3988.\) Next, we obtain

$$\begin{aligned} \tilde{v}_{new} = \left[ 0,-0.1789,0,-0.5367,-0.5367,-0.1789\right] ^{T} \end{aligned}$$

via Eq. (54) with \(\langle \nabla f(x_{0}),\tilde{v}_{new} \rangle = -2.8622.\)

Next we use the sampling

$$\begin{aligned} x_{i} = x_{0} + 0.1v_{i}, \;\; \hbox {for} \;\;i=1,2,3 \end{aligned}$$
(70)

which leads to the search direction

$$\tilde{v}^{*} = [-0.5237,-0.1813,-0.2618,-0.5438,-0.5438,-0.1813]^{T}$$

with \(\langle \nabla f(x_{0}),\tilde{v}^{*} \rangle = -4.4714.\) Then by the gradient projection approach and \(\tilde{v}^{*}\), we obtain \(\tilde{v}_{new} = \left[ 0,-0.1813,0,-0.5438,-0.5438,-0.1813\right] ^{T} \) and \(\langle \nabla f(x_{0}),\tilde{v}_{new} \rangle = -2.9004.\)

3 Bi-objective Optimization

In this section, we will review a descent direction for bi-objective optimization problems, and will show how GSA can be used approximate these directions gradient-free. For details the reader is referred to [38].

3.1 Background and Related Work

In many applications one is faced with the problem that several objectives have to be optimized concurrently leading to a multi-objective optimization problem (MOP, e.g., [3, 8, 12, 14, 17, 27, 28, 30, 37]).

A continuous MOP can be expressed mathematically as

$$\begin{aligned} \begin{aligned} \min _x&\; (f_1(x),\ldots , f_k(x))^T\\ \hbox {s.t.}&\; g_i(x) \le 0,\; i=1,\ldots ,p\\&\; h_j(x) = 0,\; j=1,\ldots ,m,\\ \end{aligned} \end{aligned}$$
(71)

where the \(f_i\), \(i=1,\ldots , k\) are the objectives to be minimized, and the \(g_i\)’s and \(h_j\)’s are the inequalities and equalities, respectively. Denote by Q the feasible set. We assume that all objectives and all constraint functions are differentiable. To define optimality of a MOP the concept of Pareto dominance is used: let \(v,w\in \mathbb {R}^k\), then we say that the vector v is less than the vector w (\(v<_pw\)), if \(v_i< w_i\) for all \(i\in \{1,\ldots ,k\}\); the relation \(\le _p\) is defined analogously. A vector \(y\in Q\) is dominated by a vector \(x\in Q\) (\(x\prec y\)) with respect to (71) if \(F(x)\le _p F(y)\) and \(F(x)\ne F(y)\), else y is called non-dominated by x. A point \(x^* \in \mathbb {R}^n\) is Pareto optimal to (71) if there is no \(y\in Q\) which dominates x. The set of all the Pareto optimal points is called the Pareto set and its image is the Pareto front. Both Pareto set and front form under certain (mild) smoothness assumptions a (\(k-1\))-dimensional object ( [21]). We will in this section focus on bi-objective problems (BOPs), i.e., on MOPs where \(k=2\).

A Multi-objective Descent Direction (MODD) \(\nu \) at a point \(x_0\) and a given MOP is a direction in which a sufficiently small movement yields dominating solutions, i.e.,

$$\begin{aligned} x_0 + t\nu \prec x_0\qquad \forall t\in (0,\bar{t}) \end{aligned}$$
(72)

for a certain \(\bar{t}>0\). In [25], a descent direction has been proposed for unconstrained BOPs.

Proposition 3

( [25]). Let \(x\in \mathbb {R}^{n}\) and \(f_{1},f_{2}:\mathbb {R}^{n}\rightarrow \mathbb {R}\) define an unconstrained BOP. If \(\nabla f_{i}(x)\ne 0\) for \(i \in \lbrace 1,2\rbrace \) then

$$\begin{aligned} \nu _{L}:= -\left[ \frac{\nabla f_{1}(x)}{\Vert \nabla f_{1}(x)\Vert }+\frac{\nabla f_{2}(x)}{\Vert \nabla f_{2}(x)\Vert }\right] , \end{aligned}$$
(73)

is a descent direction of x at F.

In the sequel we will show how to adapt this to the context of constrained BOPs, and how to make a gradient-free realization via utilizing GSA.

It is worth mentioning that there exist some proposals to compute MODDs in general; they make use of first or second-order information related to the objective functions. Among these proposals is what is called the Steepest Descent Direction [15], which requires the solution to a quadratic programming problem involving the Jacobian of F. This method is valid for both convex or non-convex Pareto fronts. In [32], the authors introduced a mathematical formula that uses the solution of a stochastic differential equation related to the Karush-Kuhn-Tucker conditions in order to generate the solution set. This technique requires all the involved functions to be continuously differentiable, and it only works for unconstrained MOPs. One particular work to mention is [5] where appears the idea of the descent cone. The descent cone gets determined by the intersection of the negative half-spaces generated by the objective function gradients. In [4] the authors notice that computing a MODD yields again a multi-objective optimization problem since the negative gradient of every objective function is in conflict with the remaining objectives’ gradients. In this same work, the authors propose a way to calculate the entire set of MODDs to take one of them at random finally. Another proposal [19], called the Pareto Descent Method, computes a set of possible Pareto descent directions by solving a linear programming problem with the information from the descent cones. This method applies just for unconstrained and inequality CMOP. There exist also approaches based on the Newton method as [14] where at each iteration of a line search, one minimization subproblem has to be solved to obtain the direction to follow. In this method, the Hessians of all functions need to be available. In this direction, also, [12, 16, 34] proposed an extension of the multi-objective Newton method for equality constrained problems. In this chapter, we decided to work with expression (73) since it is the easiest and no-cost proposal since no linear or quadratic programming solvers are necessary for its computation. This feature also makes it suitable for the application presented in Sect. 4 when this direction will be introduced into a population-based heuristic.

3.2 A Descent Direction for Constrained BOPs

3.3 Equality Constrained BOPs

Here we consider equality constrained BOPs of the form

(74)

We will in the following discuss separately the cases where \(x_0\) is feasible and infeasible.

Feasible Case

We first assume that \(x_0\) is feasible, i.e., that \(h_j(x_0) = 0\) for all \(j=1,\ldots , m\). One way to obtain a MODD for the given BOP is to project the MODD of the related unconstrained BOP to the tangent space \(T_{h^{-1}(0)}(x_0)\) of the feasible set \(h^{-1}(0)\) at \(x_0\): let \(\nu \) be the MODD for the unconstrained BOP, i.e.,

$$\begin{aligned} \nu := -\left[ \frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert }+\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert }\right] , \end{aligned}$$
(75)

and let

$$\begin{aligned} H^{T} = (\nabla h_1(x_0)\; \ldots , \; \nabla h_m(x_0)) = QR = \left( q_{1},\ldots ,q_{m},q_{m+1},\ldots , q_{n}\right) R \end{aligned}$$
(76)

be a QR-factorization of \(H^T\). Then, the vectors \(q_1,\ldots ,q_m\) build an orthonormal basis of the image of \(H^T\). Further, since the image of \(H^T\) is the orthogonal complement of the kernel of H, we have for \(i=m+1,\ldots , n\):

$$\begin{aligned} Hq_i = 0\qquad \Leftrightarrow \qquad (\nabla h_j(x_0)^T q_i = 0,\quad j=1,\ldots , m). \end{aligned}$$
(77)

That is, the column vectors of

$$\begin{aligned} \tilde{Q}:= \left( q_{m+1},\ldots , q_{n}\right) \end{aligned}$$
(78)

form an orthonormal basis of the tangent space \(T_{h^{-1}(0)}(x_0)\) of \(h^{-1}(0)\) at \(x_0\). The orthogonal projection of \(\nu \) onto \(T_{h^{-1}(0)}(x_0)\) is hence given by (see also Algorithm 1)

$$\begin{aligned} \nu _{p}:= \tilde{Q} \tilde{Q}^{T} \nu . \end{aligned}$$
(79)

The following result establishes criteria under which \(\nu _{p}\) is a MODD.

figure a

Proposition 4

Let a BOP of the form (74) be given and \(x_0\) with \(\nabla f_{i}(x_0) \ne 0\) for \(i\in \{1,2\}\) and \(h_j(x_0) = 0\), \(j=1,\ldots , m\). Further, let \(\nu _{p}\) by given as in Eq. (79) such that for \(j\in \{1, \ldots ,m\}\). Then the following holds:

  1. (a)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)>0\), then \(\nu _{p}\) is a MODD of F at \(x_0\).

  2. (b)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)=0\) and \(\tilde{Q}^{T}\nabla f_{i}(x_0)\ne 0\) for an index \(i\in \lbrace 1,2 \rbrace \), then \(\nu _{p}\) is a MODD of F at \(x_0\).

  3. (c)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)<0\), then \(\nu _{p}\) is not a descent direction of F at \(x_0\).

Proof

Note that for the first objective we obtained

$$\begin{aligned} \nabla f_{1}(x_0)^{T} \nu _{p}= & {} \nabla f_{1}(x_0)^{T} \tilde{Q}\tilde{Q}^{T} \nu _{L} \nonumber \\= & {} - \left[ \frac{\nabla f_{1}(x_0)^{T}\tilde{Q}\tilde{Q}^{T} \nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } + \frac{\nabla f_{1}(x_0)^{T}\tilde{Q}\tilde{Q}^{T} \nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \right] , \end{aligned}$$
(80)

and for the second objective

$$\begin{aligned} \nabla f_{2}(x_0)^{T} \nu _{p}= & {} \nabla f_{2}(x_0)^{T} \tilde{Q}\tilde{Q}^{T} \nu _{L} \nonumber \\= & {} - \left[ \frac{\nabla f_{2}(x_0)^{T}\tilde{Q}\tilde{Q}^{T} \nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } + \frac{\nabla f_{2}(x_0)^{T}\tilde{Q}\tilde{Q}^{T} \nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \right] . \end{aligned}$$
(81)

Further, since \(0<\Vert \nabla f_{1}(x_0) \Vert \; \hbox {and} \; 0< \Vert \nabla f_{2}(x_0)\Vert \), and \(\tilde{Q} \tilde{Q}^{T}\) is symmetric we obtain

$$\begin{aligned} \nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0) = \nabla f_{2}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{1}(x_0) \end{aligned}$$
(82)

and

$$\begin{aligned} \nabla f_{i}(x_0)^{T}\tilde{Q}\tilde{Q}^{T} \nabla f_{i}(x_0)= & {} \left\langle \tilde{Q}^{T} \nabla f_{i}(x_0), \tilde{Q}^{T} \nabla f_{i}(x_0) \right\rangle \nonumber \\= & {} \Vert \tilde{Q}^{T} \nabla f_{i}(x_0) \Vert ^{2} \ge 0 \;\; \hbox {for}\;\; i \in \{1,2\}. \end{aligned}$$
(83)

Then, three cases arise:

  • Case 1. If \(\nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)>0\) holds, then by (80), (81) and  (83) we obtain

    $$\begin{aligned} \nabla f_{1}(x_0)^{T}\nu _{p}<0\;\; \hbox {and} \;\; \nabla f_{2}(x_0)^{T}\nu _{p}<0; \end{aligned}$$

    which means that \(\nu _{p}\) is a descent direction of F at \(x_0.\)

  • Case 2. If \(\nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)=0,\) then

    1. (i)

      If \(\Vert \tilde{Q}^{T} \nabla f_{i}(x_0)\Vert >0\) for \(i\in \{1,2\}\), then by (80), (81) and  (83) \(\nabla f_{i}(x_0)^{T}\nu _{p}<0\) for \(i\in \{1,2\},\) i.e., \(\nu _{p}\) is a descent direction for F at \(x_0.\)

    2. (ii)

      If \(\Vert \tilde{Q}^{T} \nabla f_{1}(x_0)\Vert >0\) and \(\Vert \tilde{Q} \nabla f_{2}(x_0)\Vert =0,\) then \(\nabla f_{1}(x_0)^{T}\nu _{p}<0\) and \(\nabla f_{2}(x_0)^{T} \nu _{p}=0,\) i.e., \(\nu _{p}\) is a descent direction for F at \(x_0.\)

    3. (iii)

      If \(\Vert \tilde{Q}^{T} \nabla f_{1}(x_0)\Vert =0\) and \(\Vert \tilde{Q} \nabla f_{2}(x_0)\Vert >0\), then \(\nabla f_{1}(x_0)^{T}\nu _{p}=0\) and \(\nabla f_{2}(x_0)^{T} \nu _{p}<0,\) i.e., \(\nu _{p}\) is a descent direction for F at \(x_0.\)

    Therefore, if \(\nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)=0\) and \(\tilde{Q}^{T}\nabla f_{i}(x_0)\ne 0\) for an index \(i\in \lbrace 1,2 \rbrace \), then \(\nu _{p}\) is a descent direction of F at \(x_0\).

  • Case 3. If \(\nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)<0\) assume, for the sake of contradiction, that \(\nu _{p}\) is a descent direction for F at \(x_0\).

    Then, if \(\nabla f_{1}(x_0)^{T} \nu _{p}<0\) we have by Eq. (80) the following:

    (84)

    Analogously, if \(\nabla f_{2}(x_0)^{T}\nu _{p}<0\) then by Eq. (81) we have

    $$\begin{aligned}\cos {\theta } < \frac{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \Big \Vert }{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } \Big \Vert }. \end{aligned}$$

    When considering \(-1< \cos {\theta } <1,\)

    $$\begin{aligned} \frac{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } \Big \Vert }{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \Big \Vert }<1 \;\; \hbox {and} \;\; \frac{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \Big \Vert }{\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } \Big \Vert } <1 \end{aligned}$$

    leads to

    $$\begin{aligned}\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } \Big \Vert<\Big \Vert \tilde{Q}^{T}\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \Big \Vert \;\; \hbox {and} \;\; \Big \Vert \tilde{Q}^{T}\frac{\nabla f_{2}(x_0)}{\Vert \nabla f_{2}(x_0)\Vert } \Big \Vert < \Big \Vert \tilde{Q}^{T}\frac{\nabla f_{1}(x_0)}{\Vert \nabla f_{1}(x_0)\Vert } \Big \Vert , \end{aligned}$$

    which is not possible. Thus, we conclude that if \(\nabla f_{1}(x_0)^{T}\tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)<0,\) then \(\nu _{p}\) is not a descent direction of F at \(x_0\), and we are done.    \(\square \)

Example 4

Consider the following BOP:

Minimize

$$\begin{aligned} f_{1}(x_{1},x_{2})= & {} x_{1}^{2}+\left( x_{2}+3\right) ^{2}\\ f_{2}(x_{1},x_{2})= & {} (x_{1}+3)^2+x_{2}^2 \nonumber \end{aligned}$$
(85)

subject to

$$\begin{aligned} h(x_{1}, x_{2}) = x_{1}^2+x_{2}^2-1=0. \end{aligned}$$

For this example, consider the computation of direction \(\nu _{p}\) starting from a feasible initial point. Figure 2 illustrates the case when \(\nu _{p}\) lays over the descent cone, hence \(\nu _{p}\) is a descent direction.

Fig. 2.
figure 2

Considering a feasible starting point when computing \(\nu _{p}\) using Eq. (73). Here \(\nu _{p}\) is a MODD of BOP (85). This figure illustrates the case presented in Example 4.

Infeasible Case

Next, we consider that the initial point \(x_0\) is infeasible, i.e., that for at least one \(j\in \{1, \ldots , m\}\) it holds \(h_{j}(x)\ne 0\). Further, we assume that all equalities are linear, i.e., we are given the problem

$$\begin{aligned} \min _{x\in \mathbb {R}^{n}}&F(x):=[f_{1}(x),f_{2}(x)]^{T},&\\ \hbox {s.t}&\; h(x)= Ax-b = 0,&\nonumber \end{aligned}$$
(86)

where \(A\in \mathbb {R}^{m\times n}\). In this case we can consider the Newton method applied to the residual \(r:\; \mathbb {R}^{n+m+k}\rightarrow \mathbb {R}^{n+m+k}\) which is given by

$$\begin{aligned} r(x,\alpha , \nu )=\left( \begin{array}{c} J(x)^{T}\alpha + A^{T}\nu \\ Ax-b \\ \bar{e}^{T}\alpha -1 \end{array}\right) , \end{aligned}$$
(87)

where J denotes the Jacobian of F and \(\bar{e}=\left[ 1,\ldots , 1\right] ^{T}\in \mathbb {R}^k\). The first order Taylor approximation of r near an estimate \(y=(x,\alpha ,\nu )\in \mathbb {R}^{n+k+m}\) is given by

$$\begin{aligned} r(y+z)\approx r(y) + Dr(y)z, \end{aligned}$$
(88)

where Dr(y) is the Jacobian of r at y. The Newton step \(\Delta y\) for the Newton method applied to r solves the following linear system of equations:

$$\begin{aligned} Dr(y)\Delta y = -r(y). \end{aligned}$$
(89)

Denote

$$\begin{aligned} W_{\alpha }:= \sum _{j=1}^{2}\alpha _{j}\nabla ^{2}f_{j}(x), \end{aligned}$$
(90)

then

$$\begin{aligned} Dr(x,\alpha ,\nu )=\left( \begin{array}{c c c} W_\alpha &{} J(x)^{T} &{} A^{T} \\ A &{} 0 &{} 0 \\ 0 &{} \bar{e}^{T} &{} 0 \end{array} \right) , \end{aligned}$$
(91)

and the Newton step is given by the vector

$$\begin{aligned} \Delta y= \left( \Delta x, \Delta \alpha , \Delta \nu \right) \end{aligned}$$

that solves

$$\begin{aligned} \left( \begin{array}{c c c} W_{\alpha } &{} J(x)^{T} &{} A^{T} \\ A &{} 0 &{} 0 \\ 0 &{} \bar{e}^{T}&{} 0 \end{array} \right) \left( \begin{array}{c} \Delta x \\ \Delta \alpha \\ \Delta \nu \end{array} \right) = -\left( \begin{array}{c} J(x)^{T} \alpha + A^{T}\nu \\ Ax-b \\ \bar{e}^{T}\alpha -1 \end{array} \right) . \end{aligned}$$
(92)

If we set \(\nu ^{+}:= \nu + \Delta \nu ,\) we can rewrite the above system as

$$\begin{aligned} \left( \begin{array}{c c c} W_{\alpha } &{} J(x)^{T} &{} A^{T} \\ A &{} 0 &{} 0 \\ 0 &{} \bar{e}^{T} &{} 0 \end{array} \right) \left( \begin{array}{c} \Delta x \\ \Delta \alpha \\ \nu ^{+} \end{array}\right) = - \left( \begin{array}{c} J(x)^{T} \alpha \\ Ax-b \\ \bar{e}^{T}\alpha -1 \end{array} \right) . \end{aligned}$$
(93)

The following discussion shows that the norm of r decreases for sufficiently small step sizes in direction \(\Delta y\): it is

$$\begin{aligned} \frac{d}{dt} \Vert r(y+t \Delta y) \Vert ^{2} \big \vert _{t=0}= & {} -2 r(y)^{T}Dr(y)\Delta y \nonumber \\= & {} -2 r(y)^{T}r(y). \end{aligned}$$
(94)

Taking out the square leads to

$$\begin{aligned} \frac{d}{dt} \Vert r(y+t \Delta y) \Vert \big \vert _{t=0} = -r(y)^{T}r(y) = - \Vert r(y) \Vert \end{aligned}$$
(95)

which is negative at y with \(r(y)\ne 0.\)

In the following we summarize this result.

Proposition 5

Let a BOP be of the form (86) and suppose \(x_0\) is given such that \(h_{j}(x_0)\ne 0\) for at least one \(j\in \{1,\ldots , m\}.\) The Newton step on the residual r as defined in (87) is given by the vector that solves equation system (93), and \(\Vert r \Vert \) decreases for sufficiently small steps in direction of the Newton step.

Proof

It follows by the above discussion.    \(\square \)

Example 5

Consider

$$\begin{aligned} F: \mathbb {R}^{5} \rightarrow \mathbb {R}^{2} \end{aligned}$$

subject to one linear equality constraint as

$$\begin{aligned} \begin{array}{l cclc} &{} f_{j}(x) &{} = &{} \sum \nolimits _{i=1, i \ne j}^{5}\left( x_{i} -a_{i}^{i}\right) ^2 + \left( x_{j}- a_{j}^{j}\right) ^4, &{} j = 1,2 \\ \hbox {s.t.} &{} \frac{1}{2} x_{1} &{} = &{} x_{2}.&{} \end{array} \end{aligned}$$
(96)

Here, \(a^{1} = \left[ 1, . . . , 1\right] ^{T} \in \mathbb {R}^{5}\) and \( a^{2} = -a^{1}.\) We apply Newton’s method for the initial infeasible point \(p_{0} = a^{1}\) with \(h(p_{o}) = -0.5\). Figure 3 shows the obtained solutions in each Newton step in the (a) variable and (b) objective space. In the fourth step we obtain the final solution

$$\begin{aligned} p_{4} = \left[ 0.2668, 0.1334,0.3750,0.3750,0.3750\right] ^{T} \end{aligned}$$

with \(h(p_{4}) = -2.7756e{-}17,\) which can be considered to be feasible.

Fig. 3.
figure 3

Newton steps starting from \(p_{0}\) which is an infeasible point for Problem (96). This figure illustrates Example 5.

3.4 Inequality Constrained BOPs

Next we consider inequality constrained BOPs of the form

$$\begin{aligned} \min _{x\in \mathbb {R}^{n}}&F(x):=[f_{1}(x),f_{2}(x)]^{T},&\nonumber \\ \hbox {s.t}&g_{j}(x)\le 0,&j = 1, \ldots , p. \end{aligned}$$
(97)

Let \(x_0\) be given and denote by

$$\begin{aligned} I(x_0):= \lbrace g_{i1}(x_0), g_{i2}(x_0), \ldots , g_{is}(x_0)\rbrace \end{aligned}$$
(98)

the set of active inequalities at \(x_0\). Assume that \(I(x_0)\) is not empty, i.e., that \(s\ge 1.\) Denote by

$$\begin{aligned} G :=\left( \begin{array}{c} \nabla g_{i1}(x_0)^{T} \\ \vdots \\ \nabla g_{is}(x_0)^{T} \end{array} \right) \in \mathbb {R}^{s \times n} \end{aligned}$$
(99)

the matrix formed by the gradients of the active inequality constraints.

Similarly to the feasible case for equality constrained BOPs, we can also in this case generate descent directions via projection as follows: suppose that \(rank(G)=s\) (i.e., maximal), then we can compute the factorization

$$\begin{aligned} G^{T}= QR = \left( q_{1},\ldots ,q_{s},q_{s+1},\ldots , q_{n}\right) R, \end{aligned}$$
(100)

where \(Q \in \mathbb {R}^{ n \times n}\) is orthogonal and \(R \in \mathbb {R}^{ n \times s}\) right upper triangular. Doing so, the last \(n-s\) column vectors of Q form an orthonormal basis of the tangent space of the feasible set \(g_{i}^{-1}(0)\) at \(x_0\) with \(\nabla f_{i}(x_0)\ne 0\) and \(g_{i}(x_0)=0\). The orthogonal projection \(\nu ^{p}\) onto \(T_{g_{i}^{-1}(0)}(x)\) is hence given by

$$\begin{aligned} \nu _p:= \tilde{Q} \tilde{Q}^{T} \nu _L, \end{aligned}$$
(101)

where \(\nu _L\) is as in (73) and

$$\begin{aligned} \tilde{Q}:= \left( q_{s+1},\ldots , q_{n}\right) . \end{aligned}$$
(102)

Similarly to the equality constrained case, \(\nu _p\) is a descent direction under certain conditions (see also Algorithm 2).

Proposition 6

For a BOP of the form (97), suppose \(\nabla f_{i}(x_0) \ne 0\) for \(i\in \{1,2\}.\) Let \(\nu _p\) be as in Eq. (101) such that . Let \(x\in \mathbb {R}^{n}\) such that \(g_{i}(x_0)=0\) for every \(i \in I(x_0).\) Then

  1. (a)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)>0\), then \(\nu _p\) is a MOPP of F at \(x_0\).

  2. (b)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)=0\) and \(\tilde{Q}^{T}\nabla f_{i}(x_0)\ne 0\) for an index \(i\in \lbrace 1,2 \rbrace \), then \(\nu _p\) is a MODD of F at \(x_0\).

  3. (c)

    If \(\nabla f_{1}(x_0)^{T} \tilde{Q} \tilde{Q}^{T} \nabla f_{2}(x_0)<0\), then \(\nu _p\) is not a descent direction of F at \(x_0\).

Proof

Note that by construction \(\nabla g_{i}(x_0)^{T} \nu _p =0\) for all active \(i\in I(x_0);\) thus, the proof is analog to the one from Proposition 4.    \(\square \)

figure b

Example 6

Consider the following BOP proposed in [8]:

Minimize

$$\begin{aligned} f_{1}(x)= & {} x_{1} \\ f_{2}(x)= & {} g(x)* \left( 1 -\sqrt{\frac{x_{1}}{g(x)}}\right) , \nonumber \\ \hbox {with}&g(x) = 1+x_{2}. \nonumber \end{aligned}$$
(103)

Subject to

$$\begin{aligned} \cos \theta \left( f_{2}(x_{1},x_{2})-e\right) -\sin \theta f_{1}(x_{1},x_{2})\ge & {} \end{aligned}$$
(104)
$$\begin{aligned} a \vert \sin \left( b\pi \left( \sin \theta \left( f_{2}(x_{1},x_{2}) -e\right) + \cos \theta f_{1}(x_{1},x_{2})\right) ^{c}\right) \vert ^{d}, \end{aligned}$$
(105)

where \(0 \le x_{i} \le 1\) for \(i\in \lbrace 1,2 \rbrace \). And \(a=0.1, b=10, c=2, d=0.5, e=1\) and \(\theta =-0.2*\pi .\) Fig. 4 shows an example of the proposed MODD. Note that \(\nu _{p}\) lays over the descent cone and fulfills the above criterion; thus \(\nu _{p}\) is a descent direction.

Fig. 4.
figure 4

Considering a starting point for an active constraint when computing \(\nu _{p}\) using Eq. (73), it can be a descent direction or not. This figure illustrates the behavior in decision variable space for Example 6.

3.5 A Gradient Free Approximation of a MODD for CBOPs

In the following we use the GSA method to compute the above discussed descent directions for (unconstrained or constrained) bi-objective optimization problems.

We assume that we are given a candidate solution \(x_{0}\) that is designated for local search, and that we are given sample points \(x_{1},x_{2},\ldots ,x_{r}\) in the vicinity of \(x_{0}\). We further assume that their objective functions values \(f(x_{i})\) are already known, which is indeed the case if the \(x_i\)’s are chosen from a given population within a MOEA. Recall from GSA that we can use

$$\begin{aligned} \nu _{i}:=\frac{x_{i}-x_{0}}{\Vert x_{i}-x_{0} \Vert _{2}}, \;\; d_{i}:= \frac{f(x_{i})-f(x_{0})}{\Vert x_{i}-x_{0} \Vert _{2}}, \, i \in \{1,\ldots ,r\}, \end{aligned}$$
(106)

Thus, (16) turns into

$$\begin{aligned} \hbox {min} \;\; \lambda ^{T} d\\ \hbox {s.t.} \;\; \lambda ^{T}V^{T}V \lambda -1 =0, \nonumber \end{aligned}$$
(107)

which is the problem we have to solve. Then, consider the matrix

$$\begin{aligned} \tilde{V}:= (\nu _{1}, \ldots , \nu _{r}), \end{aligned}$$
(108)

to finally obtain

$$\begin{aligned} \tilde{\nu ^{*}}= -\frac{1}{\Vert \tilde{V} \tilde{\lambda ^{*}}\Vert } \tilde{V}(\tilde{V}^{T}\tilde{V})^{-1}d. \end{aligned}$$
(109)

Then, \(\tilde{\nu ^{*}}\) is the most greedy direction for a single-objective. It is worth to notice that this derivation can be done for all objective functions in a MOP.

Unconstrained Case

For the unconstrained case, we can apply the previous idea to approximate Eq. (73), hence obtaining a gradient-free descent direction. Recall that

$$\begin{aligned} \nu _{L}:= -\left[ \frac{\nabla f_{1}(x)}{\Vert \nabla f_{1}(x)\Vert }+\frac{\nabla f_{2}(x)}{\Vert \nabla f_{2}(x)\Vert }\right] , \end{aligned}$$
(110)

then we can approximate \(\nabla f_{1}(x)\) and \(\nabla f_{2}(x)\) as follows:

$$\begin{aligned} \tilde{\nu _{j}^{*}}= -\frac{1}{\Vert \tilde{V} \tilde{\lambda ^{*}}\Vert } \tilde{V}(\tilde{V}^{T}\tilde{V})^{-1}d^{j}, \end{aligned}$$
(111)

for \(j=\lbrace 1,2 \rbrace .\) Then, we can approximate \(\nu _{L}\) as

$$\begin{aligned} \tilde{\nu }_{L}:= -\left[ \frac{\tilde{\nu _{1}^{*}}}{\Vert \tilde{\nu _{1}^{*}}\Vert }+\frac{\tilde{\nu _{2}^{*}}}{\Vert \tilde{\nu _{2}^{*}} \Vert }\right] . \end{aligned}$$
(112)

Equality Constrained Case

For this scenario assume that \(x_{0}\) is a feasible solution, and that we are given \(x_{1},x_{2},\ldots ,x_{r}\) which are sample points in the neighborhood of \(x_{0};\) also, that their objective functions values \(f(x_{i})\) are already known. From the constrained case of GSA recall that the Matrix \(M:=(m_{ji}) \in \mathbb {R}^{m\times r}\) is given by

$$\begin{aligned} m_{j,i}:=\frac{h_{j}(x_{i})-h_{j}(x_{0})}{\Vert x_{i}-x_{0} \Vert _{2}}, \; i\in \{1,\ldots ,r\}, \;j\in \{1,\ldots ,m\}. \end{aligned}$$
(113)

Via Eqs. (108) and (113) we can compute an approximation \(\tilde{H}^{T}\) of the Jacobian matrix \(H^{T}\) of the constraint functions given by

$$\begin{aligned} \tilde{H}^{T} = \tilde{V}(\tilde{V}^{T}\tilde{V})^{-1}M^{T} \end{aligned}$$
(114)

that will be used to compute the projection of \(\nu _{p}.\) Then, considering a feasible solution \(x_{0},\) we proceed analogously to the unconstrained case, and first compute \(\tilde{\nu }_{L}\) as in Eq. (112). Next, in order to compute \(\tilde{\nu }_{p},\) we compute a QR decomposition of \(\tilde{H}^{T}\), and define

$$\begin{aligned} \tilde{Q}:= \left( \tilde{q}_{m+1},\ldots , \tilde{q}_{n}\right) , \end{aligned}$$

where \(\tilde{q}_{i}\), \(i\in \{m+1, \ldots , n\}\), are the last \(n-m\) column vectors of the orthogonal matrix Q obtained by the QR-decomposition of \(\tilde{H}^{T}.\) Then

$$\begin{aligned} \tilde{\nu }_{p}:= \tilde{Q} \tilde{Q}^{T} \tilde{\nu }_{L}, \end{aligned}$$
(115)

is the orthogonal projection of \(\tilde{\nu }_{p}\) onto the set of feasible directions. Assuming the notation for \(\tilde{V}\) and \(d_{i}^{j}\) as in Eq. (108) and (106), the following criteria manages the application of our gradient-free proposal:

For a BOP of the form (74), with \(\nabla f_{i}(x) \ne 0\) for \(i \in \{1,2\}\) and \(h_{j}(x)=0.\) Compute \(\tilde{\nu }_{p},\) as in Eq. (115) and

$$\begin{aligned} C_{g}:= d^{1T}(\tilde{V}^{T}\tilde{V})^{-1}\tilde{V}^{T}\tilde{Q}\tilde{Q}^{T} \tilde{V}(\tilde{V}^{T}\tilde{V})^{-1} d^{2}. \end{aligned}$$
(116)

Then we proceed as follows:

  1. 1.

    If \(C_{g}>0,\) then perform a line search over \(\tilde{\nu }_{p}.\)

  2. 2.

    If \(C_{g}=0\) and \(d^{iT}(\tilde{V}^{T}\tilde{V})^{-1}\tilde{V}^{T} \tilde{Q}\tilde{Q}^{T}\tilde{V} (\tilde{V}^{T}\tilde{V})^{-1}d^{i}\ne 0\) for an index \(i\in \lbrace 1,2 \rbrace \), then perform a line search over \(\tilde{\nu }_{p}\).

  3. 3.

    If \(C_{g}<0\), then the line search is not applied.

Note that the above criteria allow us to decide, during the algorithm’s running time, when the information available is likely enough to have an approximation of a MODD. After deciding to approximate such direction, we compute the new iterative point \(x_{i}\) as follows:

$$\begin{aligned} x_{i}:= x_{0}+ t \tilde{\nu }_{p}, \end{aligned}$$
(117)

where t is a suitable step length. In this work, we computed t by a backtracking procedure based on the Armijo’s condition [29]. The description in Algorithm 3 corresponds to the standalone gradient-free algorithm for equality constrained MOPs.

figure c

Inequality Constrained Case

In the case that inequality constraints are present, the consideration is made over I(x), that is the set of active inequality constraints at x. Thus we obtain the new approximation of \(\tilde{\nu }_{L}^{p}\) as follows:

$$\begin{aligned} \tilde{\nu }_{L}^{p}:= \tilde{Q} \tilde{Q}^{T} \tilde{\nu }_{L}. \end{aligned}$$
(118)

Assuming the notation for \(\tilde{V}\) and \(d_{i}^{j}\) as in Eq. (108) and (113), the following result states the criteria for the application of the gradient-free proposal:

For a BOP of the form (97), with \(\nabla f_{i}(x) \ne 0\) for \(i \in \{1,2\}\) and \(g_{i}(x)=0\) for every \(i\in I(x).\) Compute \(\tilde{\nu }_{L}^{p}\) as in Eq. (118) and

$$\begin{aligned} C_{g}:= d^{1T}(\tilde{V}^{T}\tilde{V})^{-1}\tilde{V}^{T}\tilde{Q}\tilde{Q}^{T} \tilde{V}(\tilde{V}^{T}\tilde{V})^{-1} d^{2}. \end{aligned}$$
(119)

Then we proceed as follows:

  1. 1.

    If \(C_{g}>0\), then perform a line search over \(\tilde{\nu }_{p}\).

  2. 2.

    If \(C_{g}=0\) and \(d^{iT}(\tilde{V}^{T}\tilde{V})^{-1}\tilde{V}^{T} \tilde{Q}\tilde{Q}^{T}\tilde{V} (\tilde{V}^{T}\tilde{V})^{-1}d^{i}\ne 0\) for an index \(i\in \lbrace 1,2 \rbrace \), then perform a line search over \(\tilde{\nu }_{p}\).

  3. 3.

    If \(C_{g}<0\), then the line search is not applied.

After deciding to approximate such direction, we compute the new iterative point \(x_{i}\) as follows:

$$\begin{aligned} x_{i}:= x_{0}+ t \tilde{\nu }_{L}^{p}, \end{aligned}$$
(120)

where t is a suitable step length. In this work, we compute t via a backtracking procedure based again on Armijo’s condition. The standalone algorithm for inequality CMOPs is described in Algorithm 4.

figure d

We have presented in this section some criteria to decide when a solution is a MODD. The potential of these criteria will be displayed in the next section when used in combination with population-based heuristics.

4 Application: Use of GFDD Within NSGA-II

In order to apply the developed ideas, we will in the following show some examples of the integration the above ideas to perform a multi-objective local search within the execution of the well-known algorithm NSGA-II [10] as demonstrator. This is a state-of-the-art algorithm for bi- and three-objective optimization problems that makes use of an archiving strategy based on the crowding distance. This strategy will play an important role in the hybridization as it will help us to decide which individual is a suitable starting point to perform the local search.

Fig. 5.
figure 5

Points \(p_{1}\) and \(p_{2}\) are considered to be neighbors of \(p_{0}.\) The green triangles outside the circle are population elements which are not used for the GSA computation.

figure e

Algorithm 5 shows a pseudo code of a hybrid of GFDD and NSGA-II which also shows that such a coupling can be done relatively easily with in principle any other MOEA.The first part of the algorithm (lines 1 to 12) coincides with the evolutionary process of the NSGA-II. Then, the interleaving of our proposal starts; in lines 13 and 14 we select an individual \(x_{s}\) related to the biggest crowding value. If \(x_{s}\) is feasible we decide based on the propositions presented above if the local search is suitable. For the case of inequality constraints we just consider the set of active constraints. Once the proposed low-cost MODD is successfully computed, we apply a regular line search through it with a suitable step size control provided with a traditional backtracking tuning.

Fig. 6.
figure 6

This figure illustrates a particular instant of a certain iteration in the population-based algorithm. The population is represented by circles, while the initial and final solutions involved in the local search procedure are marked by light and dark diamonds correspondingly. The tested function corresponds to the CZDT2 from  [31].

Fig. 7.
figure 7

This figure illustrates a particular instant of a certain iteration in the population-based algorithm. The population is represented by circles, while the initial and final solutions involved in the local search procedure are marked by light and dark diamonds correspondingly. The image of the constraint function is also marked in the figure. The tested function corresponds to the CZDT3 from [31].

Fig. 8.
figure 8

This figure illustrates a particular instant of a certain iteration in the population-based algorithm. The population is represented by circles, while the initial and final solutions involved in the local search procedure are marked by light and dark diamonds correspondingly. The tested function corresponds to the CF2 from [39].

Fig. 9.
figure 9

This figure illustrates a particular instant of a certain iteration in the population-based algorithm. The population is represented by circles, while the initial and final solutions involved in the local search procedure are marked by light and dark diamonds correspondingly. The tested function corresponds to the CF4 from [39].

At each generation, we applied the local search only to one selected individual mainly because we do not want to make big changes through the entire population of the evolutionary algorithm. If we apply the local search from many individual the possibility of diversity losses or premature convergence increases. Also, the computational cost (in terms of function evaluations) will increase due to step size computation. When selecting the starting point for the local search, the straight decision is to choose the individual with the most significant crowding distance value in order to assure the existence of close neighbors. These neighbors are used to approximate the gradient information required by the proposed operator (see Fig. 5). By doing this, there is a chance to generate a new individual such that: (i) it is not that far from \(x_{s}\); but (ii) it can be deleted by the crowding process itself. Therefore, there is a compromise between choosing a candidate that has enough neighbors to approximate the gradient-free MODD and the chances of losing the new candidate due to crowding. A better idea could be to choose an individual \(x_{s}\) with an average crowding value and at least r close neighbors.

Numerical results that support the advantages of the use of this proposal can be found in [38]. Next, we present some examples of the performance of a population-based algorithm when applying this proposed low-cost local search. Figures 6 and 7 illustrates the application of NSGA-II applied to the CZDT2 and CZDT3 benchmark functions given in [31]. These functions are bi-objective optimization problems with equality constraints. The figures show a certain generation of the Multi-objective Evolutionary Algorithm (MOEA) when the local search is applied to generate a new individual. Figures 8 and 9 illustrates the application of NSGA-II applied to the CF2 and CF4 benchmark functions [39]. These functions are bi-objective optimization problems with inequality constraints. The figures show a certain generation of the MOEA when the local search is applied to generate a new individual.

5 Conclusions

In this chapter we have reviewed some tools that allow to realize a local search engine for consrained bi-objective optimization problems with a low cost when using within set based optimization strategies such as evolutionary algorithms. The basic idea of the Gradient Subspace Approximation is to compute the most greedy search direction at a given point out of given neighborhood information. Next, we have presented a way of how to compute descent directions for bi-objective optimization problems. The method can be applied both to unconstrained as well as to constrained problems. Next, we have shown how to utilize GSA in order to obtain a “gradient free” approximation of these search directions. Finally, we have demonstrated the possible benefit of such a resulting low cost searcher on a hybrid evolutionary algorithm, which couples the proposed search technique with the widely used evolutionary algorithm NSGA-II. We stress, however, that the gradient free local search engine can in principle be integrated into any other evolutionary algorithm or any other set-oriented search heuristic.