1 Introduction

In many applications one is faced with the problem that several objectives have to be optimized concurrently resulting in a multi-objective optimization problem (MOP). One important characteristic of MOPs is that their solution sets, the so-called Pareto sets, respectively their images, the Pareto fronts, do not consist of a single point as for ’classical’ scalar optimization problems. Instead, these sets typically form (\(k-1\))-dimensional objects, where k is the number of objectives involved in the given MOP (Hillermeier 2001).

For the treatment of MOPs specialized evolutionary algorithms, multi-objective evolutionary algorithms (EMOAs), have caught the interest of many researchers (see, e.g., Coello et al. 2007; Deb 2001; Rozenberg et al. 2012; Zhang and Li 2007 and references therein). Reasons for this include that EMOAs are applicable to a wide range of problems, are of global nature and hence in principle not depending on the initial candidate set (i.e., the initial population). Further, due to their set based approach they allow to compute a finite size representation of the entire Pareto set in a single run of the algorithm.

On the other hand, it is widely accepted that EMOAs tend to converge slowly resulting in a relatively high number of function evaluations needed to obtain a suitable representation of the set of interest. As a possible remedy, researchers have proposed memetic or hybrid strategies (e.g., Bosman 2012; Ishibuchi et al. 2003; Jaszkiewicz et al. 2012; Knowles and Corne 2005; Lara et al. 2010; Moscato 1989; Shukla 2007; Vasile and Zuiani 2011). Algorithms of that type hybridize local search strategies coming e.g. from mathematical programming techniques with EMOAs in order to obtain fast and reliable global search procedures.

The quality of the approximation of the Pareto front may be measured by the scalar-valued hypervolume indicator (HV) Zitzler (1999). The idea to integrate this indicator in the selection process of EMOAs was proposed in Knowles (2002) and Knowles and Corne (2003) and later generalized for arbitrary indicators (Zitzler and Künzli 2004). In general, indicator-based EMOAs (also known as IBEAs) aim at generating a set of solutions maximizing a specific performance indicator. The SMS-EMOA (Beume et al. 2006) is known as a powerful state-of-the-art EMOA internally maximizing the hypervolume indicator with particular merits in case of more than 2 objectives (Wagner et al. 2007). This performance measure has a number of appealing properties (see e.g. Bringmann and Friedrich 2010a, b; Knowles and Corne 2002; Zitzler et al. 2003) like strict monotonicity and Pareto compliance. The distributions obtained using the HV are biased toward the knee regions of the Pareto front (Auger et al. 2009). However, computational complexity increases exponentially with the problem dimension. Since the hypervolume indicator (as well as any other performance indicator) induces an auxiliary scalar optimization problem which is defined on the set of archives its gradient can be computed (Emmerich and Deutz 2014) which gives rise to set based mathematical programming techniques as done in Hernández et al. (2014).

In this work, we present a hybrid of a particular LS mechanism tailored for hypervolume approximations with the SMS-EMOA. The basis for the LS mechanism, the Hypervolume Directed Search (HVDS), is a certain region division which is possible for MOPs with differentiable objectives. We will argue that a different local search strategy is most promising if the candidate solution x is either (i) ’far away’ from the Pareto set, (ii) ’near’, or (iii) ’in between’. In case x is ’far away’ we propose to use a steepest descent like method in order to approach the rough location of the Pareto set as fast as possible, while for candidate solutions in the other two regions a particular steering is advisable. For this, we will use and adapt the recently proposed Directed Search (DS) method (Schütze et al. 2016) as this method is capable of steering the search from a given point into any direction d given in objective space. Thus, DS is well suited for the problem at hand as the hypervolume indicator is defined in objective space. We present HVDS both as standalone algorithm and as local search engine within SMS-EMOA leading to SMS-EMOA-HVDS which is the first hybrid EMOA for hypervolume approximations of the Pareto front. Numerical experiments on several benchmark MOPs with two and three objectives show the benefit of the new approach. A preliminary study of this work can be found in Hernández et al. (2013) which is restricted to bi-objective problems.

The idea to utilize steering features within hybrid evolutionary algorithms is not new. Brown and Smith (2005) use the multi-objective gradient (Fliege and Fux 2000) together with the so-called negative gradient simplex to obtain well-spread solutions along the Pareto front. Bosman (2012) uses a line search procedure as local search engine within an evolutionary algorithm such that the according movement in objective space (i.e., \(F(x_{i+i}) - F(x_i)\), where \(x_i\) is the current and \(x_{i+1}\) the novel iterate) is equal to a predescribed direction in objective space. Finally, the Directed Search method (Schütze et al. 2016) utilizes a similar idea than the one from Bosman (2012) but uses this to steer the search both toward and along the Pareto front. The main differences of these works to the current study are (i) the region division that is proposed here in order to perform a different local search in each of these regions, and (ii) that the method is tailored to the context of HV approximations which requires separate considerations.

The remainder of this paper is organized as follows: in Sect. 2, we briefly state the background required for the understanding of the sequel. In Sect. 3, we present the region division and the local searcher HVDS that is tailored for hypervolume maximization. In Sect. 4, we will present SMS-EMOA-HVDS which is a hybrid of SMS-EMOA and HVDS. In Sect. 5, we will show some numerical results and comparisons to demonstrate the strength of the novel hybrid. Finally, we conclude and show possible paths for future research in Sect. 6.

2 Background

In the following we consider continuous multi-objective optimization problems (MOPs) that can be expressed as

$$\begin{aligned} \min _{x\in Q} F(x), \end{aligned}$$
(1)

where \(Q\subset {\mathbb {R}}^n\) is the domain and \(F:Q\rightarrow {\mathbb {R}}^k\) is defined as the vector of the objective functions

$$\begin{aligned} F(x) = (f_1(x),\ldots ,f_k(x))^T, \end{aligned}$$
(2)

and where each objective \(f_i:Q\rightarrow {\mathbb {R}}\) is assumed to be sufficiently smooth. In this work, we will concentrate on unconstrained problems, i.e., problems of the form (1) where \(Q={\mathbb {R}}^n\).

The optimality of a MOP is defined by the concept of dominance.

Definition 1

  1. (a)

    Let \(v,w\in {\mathbb {R}}^k\). Then the vector v is less than w (\(v<_pw\)), if \(v_i< w_i\) for all \(i\in \{1,\ldots ,k\}\). The relation \(\le _p\) is defined analogously.

  1. (b)

    A vector \(y\in Q\) is dominated by a vector \(x\in Q\) (\(x\prec y\)) with respect to (1) if \(F(x)\le _p F(y)\) and \(F(x)\ne F(y)\), else y is called non-dominated by x.

  2. (c)

    A point \(x\in Q\) is called (Pareto) optimal or a Pareto point if there is no \(y\in Q\) which dominates x.

  3. (d)

    The set \(P_Q\) of all Pareto optimal solutions is called the Pareto set and its image \(F(P_Q)\) the Pareto front.

The Directed Search (DS) method is a point-wise iterative search procedure that is capable of steering the search from a given point into a desired direction \(d\in {\mathbb {R}}^k\) in objective space (Schütze et al. 2016). To be more precise, given a point \(x\in {\mathbb {R}}^n\), a search direction \(\nu \in {\mathbb {R}}^n\) is sought such that

$$\begin{aligned} \lim _{t\searrow 0} \frac{f_i(x_0+t\nu )-f_i(x_0)}{t} = d_i,\quad i=1,\ldots ,k. \end{aligned}$$
(3)

Such a direction vector \(\nu \) solves the following system of linear equations:

$$\begin{aligned} J(x_0) \nu = d. \end{aligned}$$
(4)

Hereby, J(x) denotes the Jacobian of F at x, i.e.,

$$\begin{aligned} J(x) = \left( \begin{array}{c} \nabla f_1(x)^T\\ \vdots \\ \nabla f_k(x)^T \end{array}\right) \in {\mathbb {R}}^{k\times n}, \end{aligned}$$
(5)

where \(\nabla f_i(x)\) denotes the gradient of the i-th objective \(f_i\) at x. Since typically \(k\ll n\), we can assume that the system in Eq. (4) is (highly) underdetermined. Among the solutions of Eq. (4), the one with the least 2-norm can be viewed as the greedy direction for the given context. This solution is given by

$$\begin{aligned} \nu _+ := J(x)^+ d, \end{aligned}$$
(6)

where \(J(x)^+\) denotes the pseudo inverse of J(x). If the rank of \(J:= J(x)\) is k (i.e., maximal) the pseudo inverse is given by \(J^+ = J^T(JJ^T)^{-1}\). Since there is no restriction on d the search can be steered in any direction, e.g., toward and along the Pareto set.

A commonly accepted measure for assessing the quality of an approximation is the so-called hypervolume indicator (Zitzler 1999).

Definition 2

Let \(v^{(1)}, v^{(2)}, \ldots v^{(\mu )} \in {\mathbb {R}}^k\) be a non-dominated set and \(R\in {\mathbb {R}}^k\) such that \(v^{(i)} \prec R\) for all \(i=1,\ldots ,\mu \). The value

$$\begin{aligned} H(v^{(1)},\ldots ,v^{(\mu )}; R) = \varLambda _k\left( \,\bigcup _{i=1}^\mu [v^{(i)}, R] \,\right) \end{aligned}$$
(7)

is termed the dominated hypervolume or value of the hypervolume indicator with respect to reference point \(R=(r_1,\ldots ,r_k)^T\), where \(\varLambda _k(\cdot )\) denotes the Lebesgue measure in \({\mathbb {R}}^k\).

This measure has a number of appealing properties, however, determining its value is getting the more tedious the larger the number of objectives is being considered (Beume 2009).Footnote 1 In case of two objectives (\(k=2\)) and a lexicographically ordered non-dominated set \(v^{(1)}, v^{(2)}, \ldots v^{(\mu )}\) the calculation of (7) reduces to

$$\begin{aligned} H(v^{(1)},\ldots ,v^{(\mu )}; R) = \left[ r_1-v^{(1)}_1\right] \cdot \left[ r_2-v^{(1)}_2\right] + \sum _{i=2}^\mu \left[ r_1-v^{(i)}_1\right] \cdot \left[ v^{(i-1)}_2-v^{(i)}_2\right] . \end{aligned}$$

3 HVDS: a local search algorithm for hypervolume maximization

In this section we present HVDS, a point-wise iterative local search method for hypervolume maximization. The underlying idea is to divide the search space into three different regions since points in each region demand a different treatment. We will propose such a region division and the according local search in each region.

3.1 Motivation

Assume first that a point \(x\in Q\) assigned for local search is ‘far away’ from the Pareto set which is quite likely in early stages of the search process. It is thus desired for the new iterate \(x_{new}\) to approach the set of interest (as a whole) as fast as possible while the particular location of \(x_{new}\) with respect to the Pareto set is rather secondary. On the contrary, the following discussion shows that a steering of the search (e.g., via DS) might be even hindering the overall performance: in Brown and Smith (2005) it has been observed that the objectives’ gradients of a given MOP typically point toward similar directions if x is ‘far away’ from the Pareto set (if existing). Consider for simplicity the extreme case, namely that all gradients point into the same direction, i.e., let

$$\begin{aligned} \nabla f_i(x) = \lambda _i \nabla f_1(x), \quad i=1,\ldots , k, \end{aligned}$$
(8)

where \(\lambda _i>0\), \(i=1,\ldots ,k\). If \(x_{new}\) is obtained via line search it can be written as

$$\begin{aligned} x_{new}=x+t \nu , \end{aligned}$$
(9)

where \(t\in {\mathbb {R}}_+\) is the step size and \(\nu \in {\mathbb {R}}^n\) the chosen search direction. Note that \(\nu \) defines the movement in decision space, and that the related movement in objective space is given by \(J(x)\nu \) (this can be seen from Eq. (3)). Thus, in case (8) holds, we have

$$\begin{aligned} J(x)\nu = \left( \begin{array}{c} \nabla f_1(x)^T\nu \\ \dots \\ \nabla f_k(x)^T\nu \end{array}\right) = \nabla f_1(x)^T\nu \cdot \lambda , \end{aligned}$$
(10)

where \(\lambda =(\lambda _1,\ldots ,\lambda _k)^T\in {\mathbb {R}}^k\). Note that the vector in (10) is a multiple of \(\lambda \), i.e., we obtain a one-dimensional movement regardless of the choice of \(\nu \) which is n-dimensional. Though (8) and thus (10) is apparently only true if the distance of x to the Pareto set is infinite, the range of directions \(d=J(x)\nu \) in which the search can be steered with a satisfactory step size is expected to be small for points x that are sufficiently far away from the Pareto set. We refer to Schütze et al. (2016) for a more thorough discussion. Thus, rather than using DS (or any other steering mechanism) it seems to be wise to perform a greedy search toward the Pareto set.

Second, we consider the other extreme situation, namely that x is already nearly optimal which happens in later stages of the search process. In that case no further significant improvements can be expected when searching along descent directions. Instead, it seems to be wise to perform the search along the Pareto front.

Finally, points do apparently not have to be ‘far away‘ nor ‘near’ to the Pareto set. In that case we suggest to search along descent directions that (locally) improve the hypervolume value.

To summarize, we suggest to divide the decision space Q into three regions. These together with their suggested local search are as follows:

  • Region I x is ‘far away’ from the Pareto set (denoted by ‘\(x\in I\)’). In that case, a greedy search toward the rough location of the Pareto front is desired.

  • Region II x is ‘in between’, i.e., neither ‘far away’ nor ‘near’ to the Pareto front. In that case, a descent direction has to be selected such that a movement in that direction maximizes the hypervolume.

  • Region III x is ‘near’ to the Pareto set. In that case, a search along the Pareto front will be performed.

In the following we propose a particular way to realize the region division and will further on discuss the local search in each region.

3.2 Region division

Assume we are given a MOP whose objectives are differentiable. Let

$$\begin{aligned} R_Q:=\{x\in Q \; :\; \nabla f_i(x)\ne 0, \;i=1,\ldots ,k\} \end{aligned}$$
(11)

and define \(\eta :R_Q\rightarrow {\mathbb {R}}\) via

$$\begin{aligned} \eta (x)=\left\| \sum _{i=1}^k \tilde{\alpha }_i \frac{\nabla f_i(x)}{||\nabla f_i(x)||_{2}} \right\| _2^2, \end{aligned}$$
(12)

where \(\tilde{\alpha }\) is a solution of the following optimization problem

$$\begin{aligned} \min _{\alpha \in {\mathbb {R}}} \left\{ \left\| \sum _{i=1}^k \alpha _i \frac{\nabla f_i(x)}{||\nabla f_i(x)||_{2}} \right\| _2^2 : \; \alpha _i\ge 0,\;i=1,\ldots ,k,\; \sum _{i=1}^k \alpha _i = 1\right\} . \end{aligned}$$
(13)

The following result is the basis for the region division.

Proposition 1

Let all objectives \(f_i\), \(i=1,\ldots ,k\), be continuously differentiable in \(x\in R_Q\), \(\eta \) be as in (12) and \(x\in R_Q\). Then

  1. (a)

    \(\eta (x)\in [0,1]\)

  2. (b)

    \(\eta (x)=1\) \(\Leftrightarrow \) \(\nabla f_i(x)=\lambda _i \nabla f_1(x)\) with \(\lambda _i>0\) for \(i=1,\ldots ,k\).

  3. (c)

    \(\eta (x)=0\) \(\Leftrightarrow \) \(x\in R_Q\) is a Karush–Kuhn–Tucker (KKT) point.

  4. (d)

    \(\eta (x)\) is a continuous mapping.

Proof

  1. (a)

    Clearly, \(\eta (x)\ge 0\). Further, it is

    $$\begin{aligned} \eta (x) =\left\| \sum _{i=1}^k \tilde{\alpha }_i \frac{\nabla f_i(x)}{||\nabla f_i(x)||_{2}} \right\| _2^2 \le \sum _{i=1}^k \tilde{\alpha }_i \left\| \frac{\nabla f_i(x)}{||\nabla f_i(x)||_{2}} \right\| _2^2 = \sum _{i=1}^k \tilde{\alpha }_i = 1. \end{aligned}$$
    (14)
  1. (b)

    \(\Rightarrow \)”: by strict convexity of the function in (13) it follows directly that

    $$\begin{aligned} \frac{\nabla f_1(x)}{\Vert \nabla f_1(x)\Vert _2} = \ldots = \frac{\nabla f_k(x)}{\Vert \nabla f_k(x)\Vert _2} \end{aligned}$$
    (15)

    which is equivalent to (8). “\(\Leftarrow \)”: let \(\alpha \) be a convex weight, then

    $$\begin{aligned} \sum _{i=1}^k \alpha _i \frac{\nabla f_i(x)}{||\nabla f_i(x)||_2} = \sum _{i=1}^k \alpha _i \frac{\lambda _i \nabla f_1(x)}{||\lambda _i \nabla f_1(x)||_2} = \sum _{i=1}^k \alpha _i \frac{\nabla f_1(x)}{||\nabla f_1(x)||_2} = \frac{\nabla f_1(x)}{||\nabla f_1(x)||_2} \end{aligned}$$
    (16)

    by which it follows that \(\eta (x)=1\).

  2. (c)

    \(\Rightarrow \)”: let \(\eta (x)=0\), then there exists a convex weight \(\alpha \) s.t.

    $$\begin{aligned} \sum _{i=1}^k \alpha _i \frac{\nabla f_i(x)}{||\nabla f_i(x)||_2} = \sum _{i=1}^k \frac{\alpha _i}{||\nabla f_i(x)||_2} \nabla f_i(x) = 0. \end{aligned}$$
    (17)

    Thus, choosing \(\tilde{w}_i:= \alpha _i/\Vert \nabla f_i(x)\Vert _2\), \(i=1,\ldots ,k\), the vector \(w:=\tilde{w}/\Vert \tilde{w}\Vert _1\) is a convex weight with \(\sum _{i=1}^k w_i \nabla f_i(x) = 0\), i.e., x is a KKT point. “\(\Leftarrow \)”: now let \(x\in R_Q\) be a KKT point. That is, there exists a convex weight \(\alpha \) s.t.

    $$\begin{aligned} \sum _{i=1}^k \alpha _i \nabla f_i(x) = \sum _{i=1}^k \alpha _i\Vert \nabla f_i(x)\Vert _2 \frac{\nabla f_i(x)}{||\nabla f_i(x)||_2}. \end{aligned}$$
    (18)

    Choosing \(\tilde{w}_i:= \alpha _i\Vert \nabla f_i(x)\Vert _2\), \(i=1,\ldots ,k\), then \(w:= \tilde{w}/\Vert \tilde{w}\Vert _1\) is a convex weight such that \(\sum _{i=1}^k w_i \nabla f_i(x)/\Vert \nabla f_i(x)\Vert _2 = 0\), i.e., \(\eta (x)=0\).

  3. (d)

    follows by construction of \(\eta \).\(\square \)

Thus, by the above result and using the observation made in Brown and Smith (2005) we can conclude that (i) x is far away from the Pareto set iff \(\eta (x)\) is close to 1, (ii) x is near to the Pareto set iff \(\eta (x)\) is close to 0, and (iii) else x is in between. Hence, by determining two values \(a,b\in (0,1)\) with \(a<b\) we can make the region division via

$$\begin{aligned} \begin{aligned} x\in I&\Leftrightarrow \; \eta (x)\ge b\\ x\in II&\Leftrightarrow \; \eta (x)\in (a,b)\\ x\in III&\Leftrightarrow \; \eta (x)\le a \end{aligned} \end{aligned}$$
(19)

As example consider the three objective problem

$$\begin{aligned} \begin{aligned} f_1,f_2,f_3&: {\mathbb {R}}^3 \rightarrow {\mathbb {R}}\\ f_i(x)&=\sum _{j=1}^{3}(x_j - a_{j}^{i})^{2} , \end{aligned} \end{aligned}$$
(20)

where \(a^1=(1,0,0)^T\), \(a^2=(0,1,0)^T\) and \(a^3=(0,0,1)^T\). Figure 1a shows the value of \(\eta \) (in objective space for a better visualization) and Fig. 1b a possible division of the image space into the three regions. For the region division we have used \(a=0.1\) and \(b=0.5\).

Fig. 1
figure 1

Values of \(\eta \) and possible region division for MOP (20). a Values of \(\eta \). b Region division

Remark 1

For the important special case \(k=2\) (i.e., for bi-objective problems) one can simplify the region division by using alternatively the angle between the two gradients:

$$\begin{aligned} \cos \alpha (x) =\frac{\langle \nabla f_1(x), \nabla f_2(x)\rangle }{||\nabla f_1(x)||_2 ||\nabla f_2(x)||_2}\; \in \; [-1,1]. \end{aligned}$$
(21)

Using (21) one can proceed analogously to (19) and set

$$\begin{aligned} \begin{aligned} x\in I&\Leftrightarrow \; \cos \alpha (x)\ge \tilde{b}\\ x\in II&\Leftrightarrow \; \cos \alpha (x)\in (a,b)\\ x\in III&\Leftrightarrow \; \cos \alpha (x)\le \tilde{a} \end{aligned} \end{aligned}$$
(22)

for \(\tilde{a},\tilde{b}\in (-1,1)\) with \(\tilde{a}<\tilde{b}\). In the previous study (Hernández et al. 2013) the values \(\tilde{a}=-0.8\) and \(\tilde{b}=0.8\) were proposed.

3.3 Performing the local search in each region

In the following we will propose a search strategy for each of the three regions. For this, we will first consider the extreme case that the given archive/population consists of only one element, and will later on consider the general case. Reason for this is that if a point x is assigned for local search in a general archive, this can be led back to the case of one element archives.

3.3.1 One element archives

Assume we are given an archive \(A=\{x\}\), i.e., a point \(x\in Q\) which is assigned for local search and a reference point \(R=(r_1,\ldots ,r_k)^T\) for the hypervolume indicator.

Local search in Region I As the above discussion shows the explicit steering of the search is a delicate problem. Thus, it seems to be wise to utilize established multi-objective descent methods in order to approach the Pareto set/front as fast as possible. Such methods can be found e.g. in Fliege et al. (2009), Fliege and Fux (2000), and Schäffler et al. (2002). A descent direction \(\nu \) at a point x satisfies \(\langle \nabla f_i(x),\nu \rangle < 0\), \(i=1,\ldots ,k\).

For the treatment of bi-objective problems we have used the descent direction proposed in Lara et al. (2010)

$$\begin{aligned} \nu =\frac{1}{2}\left( \frac{\nabla f_1(x)}{||\nabla f_1(x)||_2}+\frac{\nabla f_2(x)}{||\nabla f_2(x)||_2}\right) \end{aligned}$$
(23)

as it does not require to solve an additional optimization problem coupled with an Armijo-like step size control as used in Lara et al. (2010). Note, however, that the descent direction in (23) is restricted to bi-objective problems and has no analogon for MOPs with more than two objectives. For the treatment of such problems we have used the descent method proposed in Schäffler et al. (2002) which requires to solve an additional optimization problem to compute the search direction. More precisely, we choose the vector \(-q(x)\) which is defined in the following.

Theorem 1

(Schäffler et al. 2002) Let problem (1) be given and the map \(q:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) be defined by

$$\begin{aligned} q(x) = \sum _{i=1}^k\hat{\alpha }_i\nabla f_i(x), \end{aligned}$$
(24)

where \(\hat{\alpha }\) is a solution of

$$\begin{aligned} \min _{\alpha \in {\mathbb {R}}^k} \left\{ \left\| \sum _{i=1}^k\alpha _i\nabla f_i(x)\right\| _2^2; \alpha _i\ge 0,i=1,\ldots ,k, \sum _{i=1}^k\alpha _i = 1\right\} . \end{aligned}$$
(25)

Then the following statements hold.

  1. (a)

    Either \(q(x)=0\) or \(-q(x)\) is a descent direction.

  2. (b)

    For each \(\hat{x}\in {\mathbb {R}}^n\), there exists a neighborhood \(N(\hat{x})\) and a constant \(L_{\hat{x}}\in {\mathbb {R}}_0^+\) such that

    $$\begin{aligned} \Vert q(x)-q(y)\Vert _2 \le L_{\hat{x}} \Vert x-y\Vert _2,\quad \forall \; x,y\in N(\hat{x}). \end{aligned}$$
    (26)

Note that if \(q({\mathbf {x}})=0\), then \({\mathbf {x}}\) is a KKT point. Thus, the computation of \(-q(x)\) includes a test for first order optimality.

Local search in Region II If x is in Region II, the task is to find a descent direction \(d_{II}<_p 0\) (in objective space) such that a movement in that direction maximizes the hypervolume. Since the new iterate \(x_{new}\) is performed via DS, we can (idealized) write its image \(y_{new}= F(x_{new})\) as

$$\begin{aligned} y_{new} = F(x)+t d_{II}, \end{aligned}$$
(27)

where \(t\in {\mathbb {R}}\) is a given (fixed) step size and \(d_{II}\) has to be chosen such that it solves the k-dimensional problem

$$\begin{aligned} \begin{aligned} \max _{d\in {\mathbb {R}}^{k}}&\; vol(d) = \prod _{i=1}^k(r_i-f_i(x)-t d_i)\\ \text{ s.t. }&\; ||d||_{2}^{2}=1, \end{aligned} \end{aligned}$$
(28)

which is related to the maximization of the hypervolume (compare to Fig. 2). If one replaces the 2-norm by the infinity norm in the constraint of problem (28) (which drops the assumption that the movement is done with an equal step in objective space) a straightforward computation shows that

$$\begin{aligned} d_{II,\infty } = F(x)-R \end{aligned}$$
(29)

solves the problem (where we assume that \(F(x)<_p R\)). For our implementations we have used the direction in Eq. (29) since it is easier to calculate and since we observed that it yields no difference in the performance of the algorithm compared to the solution of problem (28). Note, however, that the solution of (28) may come with additional CPU time but not with additional function evaluations.

Fig. 2
figure 2

Local search in Region II

Local search in Region III If x is nearly optimal, a search in a descent direction will not lead to significant improvements any more. Instead, it seems to be wise to continue the search along the Pareto set. Here, we suggest to perform a search via DS along the linearized Pareto front (see Fig. 3). The particular steps of the search are described in the following for KKT points, the steps for nearly optimal solutions are the same.

Fig. 3
figure 3

Movement along linearized Pareto front in order to improve the hypervolume in Region III

(1) Compute the KKT weight \(\alpha \) associated to x. This can be done by solving (25).

(2) It is known that \(\alpha \) is orthogonal to the linearized Pareto front at F(x) (Hillermeier 2001). Thus, an orthonormal basis of the tangent space can be obtained as follows: compute a QU-factorization of \(\alpha \),

$$\begin{aligned} \alpha = QU = (q_1,q_2,\ldots ,q_k)U. \end{aligned}$$
(30)

Since \(Q\in {\mathbb {R}}^{k\times k}\) is orthogonal, the vectors \(q_2,\ldots ,q_k\) span the desired tangent space.

(3) Now we can maximize the hypervolume in objective space similar to (28). The differences here are that (i) we are restricting the search to directions \(d\in span(q_2,\ldots ,q_k)\) and (ii) the optimization will not only yield the optimal direction but also the optimal step size (i.e., \(y_{new}\) those corresponding decision vector \(x_{new}\) can be retrieved via DS). The (\(k-1\))-dimensional unconstrained optimization problem is given by

$$\begin{aligned} \max _{\lambda \in {\mathbb {R}}^{k-1}} \; vol(\lambda ) = \prod _{i=1}^k (r_i-f_i(x)-(Q_2\lambda )_i), \end{aligned}$$
(31)

where

$$\begin{aligned} Q_2=(q_2,\ldots ,q_k)\in {\mathbb {R}}^{k\times k-1}. \end{aligned}$$
(32)

The search direction is thus given by

$$\begin{aligned} d_{III} = Q_2 \lambda ^*, \end{aligned}$$
(33)

where \(\lambda ^*\) is a solution of (31).

(4) Using DS we compute \(x_{new}\) such that \(y_{new} \approx F(x) + Q_2\lambda ^*\). If the hypervolume value is not increased by \(x_{new}\) we perform the Armijo-like backtracking approach as in Lara et al. (2010) using the hypervolume value as objective function.

For the special case \(k=2\) the solution of (31) can be expressed analytically if all the elements of the KKT weight are positive. Let \(\alpha = (\alpha _1,\alpha _2)^T\) be the KKT weight associated to x, then the tangent vector of the linearized Pareto front at F(x) is e.g. given by \(d=(-\alpha _2,\alpha _1)^T\) and problem (31) can be stated as

$$\begin{aligned} \max _{\lambda \in {\mathbb {R}}} vol_2(\lambda ) = (r_1 - f_1(x)-\lambda d_1)\times (r_2 - f_2(x)-\lambda d_2) . \end{aligned}$$
(34)

Proposition 2

Let \(k=2\) and \(\alpha >_p 0\), then the global maximizer of Problem (34) is given by

$$\begin{aligned} \lambda ^{*}=\frac{d_1r_2+d_2r_1-d_1f_2(x)-d_2f_1(x)}{2d_1d_2}. \end{aligned}$$
(35)

Proof

If \(\alpha _p>_p 0\), then it follows that also \(d_1, d_2 \ne 0\). The first derivative of \(vol_2\) is given by

$$\begin{aligned} vol_2\hbox { }'(\lambda ) = 2\lambda d_1d_2+d_2f_1(x)-r_1d_2+d_1f_2(x)-d_1r_2 \end{aligned}$$
(36)

Setting this to zero leads to

$$\begin{aligned} \lambda ^{*}=\frac{d_1r_2+d_2r_1-d_1f_2(x)-d_2f_1(x)}{2d_1d_2}. \end{aligned}$$
(37)

Further, the second derivative at \(\lambda ^{*}\) is given by

$$\begin{aligned} vol_2\hbox { }''(\lambda ^{*})=2d_1d_2<0. \end{aligned}$$
(38)

The negativity holds since \(\alpha >_{p}0\) and by construction of d, and the claim follows. \(\square \)

Remark 2

We stress that due to the linearization of the Pareto front at F(x) it can happen that the new iterate \(x_{new}\) is located in Region II leading to a kind of oscillating behavior in the sequence of iterates. We have, however, not observed any instabilities in our computations. Further, if x is already near to the optimal solution, such oscillations do typically not happen.

Algorithm 1 shows the pseudo code of the HVDS as standalone algorithm for one-element archives which puts together the above discussion. Figure 4 shows some exemplary iterations of HVDS in all three regions of MOP (20).

figure a
Fig. 4
figure 4

Exemplary iterations of the HVDS in each region of MOP (20). a Region I. b Region II. c Region III

3.3.2 General archives

In the following we describe the adaption of the HVDS for archives with arbitrary sizes.

Local search in Region I If a point \(x\in A\) is chosen for local search that is according to the region division ’far away’ from the Pareto front, an improvement toward the solution set is desired regardless of the location of the other elements of A. Hence, we recommend to proceed as the one element archive by choosing a descent direction from Eq. (23) in the case of \(k=2\) or proceed as in Schäffler et al. (2002) for \(k>2\).

Fig. 5
figure 5

Local search in Region II for multiple archive entries

Local search in Region II If a point \(x\in A\) is located in the Region II it is necessary to consider its neighboring solutions in A since the chosen movement in objective space should ideally increase the contribution of the newly found solutions \(x_{new}\) computed from x without decreasing the contribution of the other elements in A. To perform this movement we have to look for the k neighboring entries of x in A in order to set a new reference point. For \(k=2\) the adaption comes straightforward since the neighborhood from a point x is given by only two points for non-extreme points (where the archive entries are sorted according to one objective). For the extreme points we used the updated reference point given by the SMS-EMOA. Figure 5 shows the bi-objective case for setting the reference point \(R_{F(x)}\) for a non-extreme point.

For \(k>2\), the problem becomes more complicated since a neighborhood is not well defined (in the case of \(k=2\) it is enough only ordering the points in descending order to define the neighbors, however for \(k>2\) it is not the case). To overcome this problem we propose to choose such points that minimize the difference between each objective of F(x) from all non-dominated points in the population. Once having these points we are now in the position to construct a new \(R_F\) and use Equation (29) for obtaining the improvement direction. Algorithm 2 shows the pseudo code to obtain the reference point.

figure b
figure c

Local search in Region III For Region III we proceed to use the new reference point as stated for Region II. To be more precise, we propose to use the reference point \(R_{F(x_i)}\) for intermediate points (i.e., \(i\in \{2,\ldots ,l-1\}\)) and the original point R for the extreme archive entries. In this case we follow the same process as the one element archive using the generated reference point.

4 Integrating HVDS into SMS-EMOA

Here we make a first attempt to integrate HVDS into an EMOA. As base algorithm we have chosen to take SMS-EMOA since it is a state-of-the-art evolutionary algorithm that aims for optimal hypervolume approximations of the Pareto front. HVDS is integrated into SMS-EMOA as additional generational operator with a certain probability \(p_{HVDS}\) for application (for an empirical evaluation of the values of \(p_{HVDS}\), a, and b we refer to the next section). To be more precise, the new hybrid algorithm SMS-EMOA-HVDS starts by initializing a population P of \(\mu \) elements at random from the domain Q. Once having the initialized population the process begins by generating only one offspring x by genetic operators (crossover and mutation). HVDS is integrated as an additional generational operator for the produced new elements in each generation. That is, after producing a new element the algorithm selects a value \(p\in [0,1]\) uniformly at random. If p is less or equal than the probability \(p_{HVDS}\) the local search procedure will be applied, otherwise the new candidate solution is added to P. It is important to mention that the probability \(p_{HVDS}\) balances the emphasis between global and local search during the whole run of the algorithm. The aim to apply HVDS is to push the entire population toward the solution set during the algorithm process. In this case the produced element can be everywhere within the objective space, however, thanks to the region division we can make the decision of which LS strategy should be chosen in order to improve the hypervolume. Each time the HVDS strategy is called, it executes only one iteration step in our current implementation. Further, we use the dynamic reference point described above to get R. Algorithm 3 shows the pseudocode of the resulting algorithm SMS-EMOA-HVDS.

Fig. 6
figure 6

DTLZ1 (2D)

Fig. 7
figure 7

DTLZ2 (2D) (\(\bigstar \))

Fig. 8
figure 8

DTLZ3 (2D)

Fig. 9
figure 9

P1-HVDS (2D)

Fig. 10
figure 10

P3-HVDS (2D)

Fig. 11
figure 11

ZDT1 (2D) (\(\bigstar \))

Fig. 12
figure 12

ZDT2 (2D) (\(\bigstar \))

Fig. 13
figure 13

ZDT3 (2D)

Fig. 14
figure 14

ZDT6 (2D) (\(\bigstar \))

Fig. 15
figure 15

DTLZ1 (3D) (\(\bigstar \))

Fig. 16
figure 16

DTLZ2 (3D) (\(\bigstar \))

Fig. 17
figure 17

DTLZ3 (3D) (\(\bigstar \))

Fig. 18
figure 18

DTLZ7 (3D) (\(\bigstar \))

Fig. 19
figure 19

Convex3 (3D) (\(\bigstar \))

5 Experiments

Systematic experiments were conducted in order to investigate and validate the behavior of the proposed algorithm SMS-EMOA-HVDS. For this purpose well-established two-objective (2D) and three-objective (3D) test functions from the ZDT (shifted and thus unconstrained versions, see Lara et al. 2010, Shukla 2007, and Shukla 2007) and DTLZ test suites were chosen supplemented by specific functions (see Table 1 to look the description of the problems). The domains Q for all problems have been chosen such that the Pareto sets have an empty intersection with the boundary of Q. All the details regarding the use of SMS-EMOA and its hybrid variant can be found in Table 2.

The algorithm was applied 20 times with population sizes 100 (2D) respectively 300 (3D). The value of the hypervolume indicator (HV) of the final result is used as basis of comparison to respective runs of the original SMS-EMOA. Based on extensive systematic investigations the settings \(a=0.2, b=0.4 \) (2D) resp. \(a=0.1 \) (3D) together with \(p_{HVDS}=0.005\) turned out to be most stable over all settings while in general algorithm behavior is not extremely sensitive regarding the respective settings of a and b. A dynamic reference point was used for Region II (see Algorithm 2).

Fig. 20
figure 20

Comparison using boxplots in P3 HVDS (2D) during 3000 function evaluations

Fig. 21
figure 21

Comparison using boxplots in DTLZ2 (3D) during 3000 function evaluations

Table 1 Test function problems
Table 2 Configuration values for SMS-EMOA-HVDSMO and SMS-EMOA, where DI means “distribution index” and p is the probability to apply the operator
Table 3 HV results of SMS-EMOA with and without HVDS as operator (using the same number of function evaluations)

Table 3 provides detailed algorithm settings along with the experimental results and thereby allows for the assessment of the effect of the HVDS operator.

For all test functions, the superior algorithm regarding the average HV is highlighted in italic that the sophisticated local search strategy indeed has a positive effect on algorithm performance. The corresponding ordering regarding median performance only changes slightly. Moreover, on all considered problems the percentage of the covered optimal hypervolume is very high while keeping in mind that the number of function evaluations has not been specifically tuned to gain optimal hypervolume. The optimal HV values for all the problems have been computed by numerically solving the following \((n\cdot \mu \))-dimensional scalar optimization problem

$$\begin{aligned}&H:{\mathbb {R}}^{n\cdot \mu }\rightarrow {\mathbb {R}}\nonumber \\&\max _{v^{(1)},\ldots ,v^{(\mu )}\in {\mathbb {R}}^n} H(v^{(1)},\ldots ,v^{(\mu )}; R), \end{aligned}$$
(39)

where \(\mu \) denotes the population sizes.

Robustness and statistical significance of the results can be assessed by observing boxplots of the obtained HV in the final generation over all runs in Figs. 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, and 19. The respective boxes range from the lower up to the upper quartile while the median is highlighted by a horizontal line inside the box. Points are plotted individually as outliers if they are larger than box boundaries plus resp. minus 1.5 times the interquartile range. The plotted whisker terminates at the adjacent point, which is the most extreme point that is not an outlier. Results with (LS) and without local search (WLS) are compared while the horizontal line visualizes the median HV of the original SMS-EMOA runs.

Figures marked with (\(\bigstar \)) reflect a statistically significant result based on the Wilcoxon Rank test using a significance level \(\alpha =0.05\). It becomes obvious that in 3D the SMS-EMOA supplemented with the HVDS strategy outperforms the original version for almost all test functions. Exceptions are DTLZ3, but in this case much more stable results were generated, and the disconnected problem DTLZ7. In 2D, results are comparable for all test functions and superiority of the local search variant is given for DTLZ3, ZDT1, ZDT2 and ZDT6. However, due to the sophisticated strategy in Region III which searches along the Pareto front, the algorithm in general faces special challenges for disconnected fronts such as ZDT3 and DTLZ7.

Moreover, observing the algorithms’ behavior along the whole run, SMS-EMOA-HVDS often shows faster convergence in earlier phases of optimization, see the exemplary Figs. 20 and 21.

6 Conclusions and future work

In this paper, we have proposed a new hybrid evolutionary algorithm for the effective hypervolume approximation of the Pareto fronts of differentiable multi-objective optimization problems. The basis of the local search mechanism is a novel division of the decision space into three regions as we have argued that in each region another local search seems to be most promising. For the local search we have adapted the Directed Search method that is able to steer the search into any direction given in objective space and is thus well suited for the problem at hand as the hypervolume is defined in objective space. We have presented the resulting iterative method, HVDS, as standalone algorithm as well as local searcher within a hybrid variant of SMS-EMOA. Experimental results showed that especially in 3D, the integration of HVDS into the SMS-EMOA has a substantial positive effect on the algorithm performance while in 2D the results are at least comparable to the original SMS-EMOA but often superior as well. With respect to the considered test functions which cover a wide range of problem characteristics, robust parameter settings for the SMS-EMOA-HVDS could be provided. Therefore, we see promising perspectives for even higher dimensions and further research will focus on generalizing the introduced strategy in this direction.

Further paths of future research include the adaption of the method to constrained problems which needs separate considerations. Further, it is desirable to obtain a gradient-free realization of HVDS, including the region division, in order to increase the applicability of the method to a broader class of methods. Finally, it would be interesting to investigate if and to which extent the chosen method can be applied to other performance indicators.