Keywords

1 Introduction

Traditionally, single objective evolutionary optimization researchers focused on the convergence abilities of their algorithms. However, since its advent, Evolutionary Mutliobjective Optimization (EMO) focused on both convergence and diversity. EMO switched the interest from only one solution to a set of competing solutions representing the trade-off among conflicting objectives, the Pareto front. Most early EMO algorithms relied on the concept of Pareto domination [1] for convergence. This practice continued until today. The major difference between these algorithms was in the techniques they use to maintain diversity among solutions. Those techniques were mainly borrowed from single-objective evolutionary computation literature [2].

SPEA2 [3] and NSGA-II [4] are very good examples of successful algorithms that dominated the field for years. They were however unable to maintain diversity in more than two objectives [5]. Although SPEA2 can be considered the first considerable attempt in three dimensions, its high complexity and inability to scale further put it in the same category as NSGA-II.

Maintaining diversity in three or more objectives remained an obstacle until Zhang and Li proposed MOEA/D in 2007 [6]. Instead of the widely used crowding distance operator adopted by NSGA-II, MOEA/D used a decomposition based approach where a multiobjective optimization problem is divided into a number of different single objective optimization subproblems. Along with a population based algorithm, MOEA/D was able to solve up to four objectives. In 2014, Deb and Jain, proposed NSGA-III [7]. Their algorithm used a predefined evenly distributed set of reference directions to guide the search procedure. With a carefully designed normalization mechanism, their algorithm was shown to successfully maintain diversity up to fifteen objectives.

All these efforts remained in their majority isolated form theoretical/mathematical optimization and even from single-objective evolutionary optimization algorithms. Ishibuchi’s IM-MOGLS [8] was the first study incorporating mathematical optimization as a local search in the context of EMO. His idea was simply to combine several objectives using a weighted sum approach and start a local search from each individual in his combined parents-offspring population. Subsequent researchers followed the same path [9]. Others explored achievement scalarization function (ASF) as an alternative means of combining several objectives into one in EMO, among those are Bosman [10] and Sindhya et al. [11].

Dutta et al. took another important step in bridging the gap between evolutionary and theoretical/mathematical optimization when they proposed their Karush-Kuhn-Tucker Proximity Measure (KKTPM) for single objective optimization [12]. They theoretically proved the correctness of their new metric and clearly showed how effective it can be in the context of evolutionary algorithms. The only drawback was that exactly computing KKTPM for a single solution is very expensive. The process involves solving a whole optimization sub-problem. Recently, their efforts were carried even further by Deb and Abouhawwash [13] who extended the metric to the realm of multiobjective optimization. In a later study they proposed an approximate yet computationally fast method to calculate KKTPM that avoids the internal optimization task [14].

All these advancements motivate us to explore further use of KKTPM in the design of efficient EMO methods. In this study we propose an integrated approach that combines NSGA-III, local search and KKTPM. The local search is performed using two distinct operators, one designed to enhance overall population diversity while the other puts additional emphasis on convergence. And since we are now able to quickly compute KKTPM to get an idea on the level of convergence of solutions, the second local search operator is guided by KKTPM throughout the entire optimization process.

A bipartite precursor of this study [15, 16] showed that using KKTPM was very fruitful in terms of speeding up convergence, however we noticed a visible deterioration in terms of diversity. On the other hand if we completely abandoned KKTPM we will miss out on a significant convergence speed up. Hence, our proposed approach is intended to allow for improved diversity and convergence while keeping balance between them.

It is worth noting that NSGA-III follows the usual trend of continuously emphasising non-domination throughout the entire optimization process. This continuous emphasis influences the algorithm to seek convergence then diversity, in this specific order. DC-NSGA-III frequently breaks this emphasis during optimization. The reason behind this change is that the convergence-first principle that NSGA-III follows has some drawbacks which we summarize in the following points.

  1. 1.

    Losing diversity in early stages of the optimization process might be incurable in later generations, causing the population either to be trapped in local Pareto fronts or lose part(s) of the true Pareto front. The latter effect is usually observed with problems having non-continuous or disconnected Pareto fronts.

  2. 2.

    Most of the association and normalization efforts spent by NSGA-III at early generations are not very meaningful. This is true because reference directions keep changing with the least change in any of the extreme points reached so far.

In order to address all the aforementioned issues, we propose (DC-NSGA-III). The next section describes our proposed algorithm in detail. Section 3, discusses the theoretical and practical reasons behind employing each of our local search techniques in its place. Section 4, provides extensive simulation results on a set of both unconstrained and constrained, bi-objective and many-objective optimization problems. Finally, we conclude this study in Sect. 5.

2 DC-NSGA-III

In this study we keep the key characteristics of NSGA-III intact. Specifically speaking, our algorithm uses a predefined set of reference directions to keep diversity. These directions are evenly distributed over the normalized hyperplane which connects the normalized extreme point of the current population. Extreme points are extracted at each iteration from the combined parent offspring population using an ASF-based comparison along the M available extreme directions (\(M=\,\)number of objectives). Association is performed every generation over the combined population, where each individual (solution/point) gets attached to its closest reference direction (in terms of perpendicular distance). Throughout this study we will use the notations s.d and d.surroundings to refer to the direction to which an individual s is attached and the set of individuals attached to a direction d respectively.

Having that said, our proposed approach breaks the usual NSGA-III pattern of continuously emphasising non-dominance. Every \(\alpha \) generations our approach uses a different niching algorithm. The proposed algorithm alternates among three different phases. Phase-1 is outlined in Algorithm 1, while phase-2 and phase-3 are both outlined in Algorithm 2.

Fig. 1.
figure 1

Phases 1, 2 and 3. Part(a) shows how the algorithm continues in phase-1 until all extreme points stagnates. Part(b) shows both phase-2, where a local search operation is started from the closest point to an empty reference direction, and phase-3 where the algorithm tries to pull a poorly converged (large KKTPM) point.

Fig. 2.
figure 2

Details showing how phases alternate among each other.

2.1 Phase-1

Figure 1 shows a graphical representation of the three proposed phases. In the first phase, our goal is to find the extreme points of the Pareto-optimal front as accurately as possible. Thus, initially, the algorithm seeks better extreme points using a biased weighted-sum local search, starting from the already existing extreme points. We call this step phase-1. Any problem has a number of extreme points equal to the number of objectives M. If a better extreme point – than the one already existing – is found, the old one will no longer serve as an extreme point. The new point will then be used instead for later normalizations. And since The number of iterations required to find accurate extreme points of the Pareto front is unknown beforehand, phase-1 starts in the beginning and continues for an undefined number of generations, and may even continue until the final generation if extreme points keep improving.

A stagnation triggers the transition from phase-1 to phase-2. A stagnant extreme point is a one that is not improving anymore using phase-1 local search. If the algorithm finds itself trying to repeat extreme point local search from the same starting point, it implies that the last local search was unsuccessful and consequently any further local search operations from the same starting point and using the same optimizer will be useless. Once all extreme points reach stagnation, the algorithm shifts its focus towards maintaining better diversity and moves on to phase-2.

2.2 Phase-2

In phase-2, our approach re-normalizes the current population using those extreme points attained so far. Association is then performed using the new reference directions. As opposed to the traditional front-by-front accommodation scheme adopted in NSGA-II and NSGA-III, this algorithm selects the closest individual to each reference direction. And since each reference directions represents a niche in the objective space, the algorithm tries to fill all the gaps in the current front. Consequently, regardless of an individual’s rank, if it is the only representative of its niche, it will be selected at the expense of some – possibly – better ranked individuals which are already outperformed in their own niche (line 4).

The procedure explained above does not guarantee generating enough individuals to completely fill the next population. And even if it did, some of these individuals might be already dominated. Hence, the proposed algorithm makes an initial attempt to cover empty reference directions (i.e. those having no associations so far), using an ASF-based local search procedure. Since, empty reference directions has no associated points, local search is started from the closest point in the first front to the empty direction. Figure 1(b) shows how point A is used to start a local search along direction D.

2.3 Phase-3

After covering all reference directions, the algorithm then moves to the final phase. Phase-3 focus completely on achieving better convergence for badly converged individuals. However, in order to detect such individuals in a population a deterministic and accurate scheme of preferability is needed. The usual domination check or a niching-based check cannot provide information about the extent of convergence of a solution in a multi-objective algorithm. However, in many cases, specially towards the end of optimization, most individuals fall in the same non-dominated front, although some of the nondominated solutions can be far away from the true Pareto-optimal front. The niching-based metrics can only provide information about their inter-spaced diversity. This is when a metric like KKTPM which has the ability to provide point-wise convergence property comes to our rescue. KKTPM can differentiate better or worse converged individuals even if they are on the same front. Such a metric was unavailable before and in this study we exploit its theoretical underpinning to isolate badly converged non-dominated solutions for a special treatment. It is worth noting that for black box optimization, using numerical gradients might significantly increase the total number of function evaluations. However this is left for a future study to investigate. Phase-3 uses individuals having the worst KKTPM values from the entire combined feasible population as starting points for the ASF-based local search. Figure 1(b) shows how point B having the highest KKTPM is enhanced through an ASF-based local search along the direction to which it is attached.

All Function Evaluations (FEs) from the three phases contribute to the total FE count of the algorithm. In order keep this number low, the total number of local search operations performed per cycle in both phases 2 and 3 is bounded by the parameter \(\beta \) which is arbitrarily set to 2 in all our simulations.

figure a

In phases 2 and 3, and for the sake of saving even more function evaluations, the maximum limit of allowed function evaluation is kept proportional to the KKTPM value of the starting point. Consequently, weak points will be given extra emphasis during local search without consuming more function evaluations. Finally, the new population is completed by adding the best-ranked individuals among those overlooked during all the previous steps.

2.4 Alternating Phases

It is important to mention that although extreme points can reach a state of stagnation by the end of phase-1, this does not mean that these are the true extremes of the Pareto front. That is why the three phases of DC-NSGA-III are always applied in an alternating manner. During the evolutionary process, and while the algorithm is in either phase-2 or 3, if a better extreme point is found using the traditional genetic operators, the algorithm switches back immediately to phase-1 and waits for another state of stagnation. Figure 2 graphically explains what triggers transition from each phase to the other.

The key idea behind this is to always find the extremes as accurately as possible. This will make the normalization of the objectives better and will help in matching the reference points with population objective vectors as close as possible so that a good distribution of points can be obtained at the end. Phase-2 ensures finding points for empty reference lines proactively, thereby emphasizing the distribution of points. Phase-3 focuses specifically on the convergence issue of poorly converged solutions. The reason for alternate use of these phases is the unreliability in obtaining the true desired Pareto-optimal solution in a single application of the local search. A local search using classical point-based optimization method can be useful in a problem, but it is not always sufficient with one application. It is after all a classical single-objective point by point procedure, which is prone to be trapped in a local optimum, and unable to solve the multi-objective optimization problem through decomposition specially in a single run. On the other hand, EMO algorithms are powerful, reliable but lack the ability to focus or even identify those points/sections that need more attention. Hence, we propose this approach of seamless information exchange and alternation between the three phases by the use of recent advancements in multi-objective algorithmic and theoretical fields.

3 Local Search Formulations

In this study we employ Matlab’s® fmincon() optimization routine. First, we formulate a single-objective optimization problem out of the original multiobjective problem (scalarization), then fmincon() is applied to reach a solution. As mentioned before, our approach includes two types of local search, one for searching extreme points (phase-1) and the other for locally searching solutions along some specific reference direction (phases-2 and 3). Although both types of local search use the same optimizer, they differ in the way the problem is formulated.

$$\begin{aligned} \begin{aligned}&\underset{\mathbf {x}}{\text {Minimize}} \quad \mathrm{BWS_i}(\mathbf {x})=\epsilon \tilde{f}_i(x) + \sum _{j = 1, \ j \ne i}^{M} w_j \tilde{f}_j(x), \\&\text {subject to}\quad {\epsilon } < \min _{j=1, j\ne i}^M w_j \end{aligned} \end{aligned}$$
(1)
$$\begin{aligned} \begin{aligned}&\underset{\mathbf {x}}{\text {Minimize}} \quad \mathrm{ASF}(\mathbf {x},\mathbf {z}^r,\mathbf {w})=\max \limits ^M_{i=1}\bigg (\frac{\tilde{f}_i(x)-u_i}{w_i}\bigg ), \\&\text {subject to} \quad g_j(\mathbf {x}) \le 0, \quad j = 1,2,\ldots ,J. \end{aligned} \end{aligned}$$
(2)

Extreme points local search optimizers are formulated in a Biased Weighted Sum (BWS) form. Equation 1 shows how a minimization problem aiming at finding extreme point (i) is formulated. Note that \(\tilde{f}_k(x)\) is the normalized value of objective k. For all our bi-objective and many objectives simulations we use \(\epsilon = 0.01\) and \(\epsilon = 0.1\) respectively.

The first term in Eq. 1 is included to avoid weak extreme points. Thanks to this term, weak extreme points will have larger objective values than their true extreme counterparts.

figure b

For directional local search, an Achievement Scalarization Function (ASF) is used, as shown in Eq. 2. Given a specific reference direction, an ASF is a means of converting a multi-objective optimization problem to a single-objective optimization problem whose optimal solution is the intersection point between the given direction and the original Pareto front.

The reason behind using two different formulations instead of one is that each of them has its merits and weaknesses. A weighted sum approach is straightforward, easy to understand and implement and – as will be shown later – easy to optimize using fmincon(). But, it is theoretically unable to attain any point lying on a non-convex section of the front. Consequently, it cannot be used as a general local search procedure for finding Pareto points.

However, since extreme points by their very nature can never be located on non-convex sections of the front, weighted sum can theoretically reach any extreme point. Combined with its simplicity and ease of use, BWS becomes the perfect choice for searching extreme points. On the other hand, ASF can – in general – reach any Pareto point if the appropriate reference direction is provided, which makes it the perfect choice to be used as a general local search mechanism with a reference direction based algorithm like NSGA-III. But when it comes to extreme points, ASF faces some serious problems that we summarize in the next points:

  1. 1.

    Unless the starting point is placed in a certain position with respect to the extreme point (inferior to the extreme point in all objectives), an ASF-based local search will never be able to reach it.

  2. 2.

    We observe that, the more flat/steep the provided direction is, the more difficult the problem is for fmincon() to solve.

For all these reasons, we use a BWS based local search in phase-1 to seek extreme points, and an ASF-based local search in phases-2 and 3 to cover gaps and enhance internal diversity. It is worth noting that ASF-based local searches are unable to handle even some of the easiest problems (e.g. ZDT2) due to the reasons discussed earlier. On the contrary, BWS based local search, can reach the true extreme points of easy problems in one attempt. However, for harder problems, one local search per extreme point is not sufficient. In these situations, our proposed approach with its inherent information exchange and alternation of phases shows its aptitude.

4 Results

Our simulation results involve a selected set of bi-objective and many-objective problems with and without constraints. The problems are also selected to exhibit a wide range of difficulties. The parameters used in our simulations are summarized in Table 1. Both GD and IGD are shown to compare convergence and overall performance, respectively. All the results presented in this study are for the median run out of 31 independent runs each started with a different random initial population.

Table 1. Parameters - From left to right: problem name, population size, total number of solution evaluations per run (SE), local search frequency (\(\alpha \)), maximum local search operations per generation (\(\beta \)), maximum limit of solution evaluations per each local search operation

4.1 Two Objectives

For bi-objective simulations we use TNK, SRN, OSY, ZDT4 and ZDT6 [17]. An appropriate maximum number of solution evaluations is set for each problem based on the difficulty of the problem as shown in Table 1. Note that both DC-NSGA-III and NSGA-III are allocated the same number of total SEs as mentioned in the third column of Table 1. The numbers are decided by running DC-NSGA-III for enough generations to get a reasonable performance (IGD). For easy problems (TNK and SRN) the difference in performance is minimal (figures removed due to limited space). This is because these problems are too easy to require any additional special consideration. However we emphasis that even with all these additional local search operations there is no degradation in performance even on these easy problems. The vertical lines indicate the number of solution evaluations at which each algorithm reach stagnation.

Fig. 3.
figure 3

Variation of GD and IGD with solution evaluations for OSY.

Fig. 4.
figure 4

Variation of GD and IGD with solution evaluations for ZDT4.

Fig. 5.
figure 5

Variation of GD and IGD with solution evaluations for ZDT6.

It is clear from the figures that with more difficulties in problems the merit of our approach is more evident. Obviously, DC-NSGA-III outperforms NSGA-III using the same number of function evaluations in both convergence and diversity as shown in Figs. 34 and 5 for OSY, ZDT4 and ZDT6, respectively. ZDT4 represents a real challenge to the convergence ability of any algorithm due to the presence of multiple local Pareto-optimal fronts. ZDT6 and OSY test the ability of the algorithm to maintain a diverse set of solutions due to biased density or unfavorable feasibility in the search space. Obviously, DC-NSGA-III excels in overcoming these two types of challenges compared to NSGA-III with the limited number of total solution evaluations fixed in this study (mentioned in Table 1). The final front-to-front comparison of the three problems are shown in Figs. 6 and 7. These SEs are too few for NSGA-III (without any local search or other checks and balances) to bring its population close to the true Pareto-optimal front.

Fig. 6.
figure 6

Comparison of obtained fronts for ZDT4 and ZDT6 between DC-NSGA-III and NSGA-III median of 31 runs (final generation)

Fig. 7.
figure 7

Comparison of obtained fronts for OSY between DC-NSGA-III and NSGA-III for the median of 31 runs.

Fig. 8.
figure 8

Alternation among the three phases in a random OSY run

Figure 8 shows a typical alternation sequence recorded over a single optimization run of OSY. Although the algorithm keeps alternating between the three phases, it clearly indicates that phase-1 is dominant in the beginning of the run yet less frequent later. That is when all extreme points tend to stagnate, giving higher priority to convergence towards the end of the run.

Thus, the results on bi-objective problems reveal that our proposed DC-NSGA-III approach performs equally well to a vanilla NSGA-III approach on simpler problems, but exhibits a much better performance on more difficult ones.

Fig. 9.
figure 9

Variation of GD and IGD with solution evaluations for 5-objective DTLZ1.

Fig. 10.
figure 10

Variation of GD and IGD with solution evaluations for 5-objective DTLZ2.

4.2 Many Objectives

On the many objectives side, we consider DTLZ1 and DTLZ2 [18], both using five objectives. The results show a visible improvement on DTLZ2 (5 obj) achieved by DC-NSGA-III over NSGA-III in terms of both convergence and diversity measures. However, for five-objective DTLZ1, NSGA-III performance is slightly better. This is because the BWS-based local search optimizer is faced by a difficult problem having a large number of weakly dominated false extreme points along each axis. As the number of objectives grow, the effect of the \(\epsilon \)-objective gradually vanishes.

5 Conclusions

In this study we have proposed a novel integrated EMO approach combining NSGA-III with a recently proposed KKTPM measure. The proposed DC-NSGA-III has used two types of local search operators. Through an alternating three-phases approach, the proposed algorithm has shown to enhance both diversity and convergence on a set of bi-objective and many-objective optimization problems. This confirms the usefulness of integrating optimization algorithms and techniques to form more powerful optimization concepts, a research direction that has started to receive attention among EMO researchers. This work can be further extended through a parametric study of the \(\epsilon \) parameter used in the BWS local search operator. Also, more problems involving more objectives can be included. Another possible extension is to modify this algorithmic design to encompass single-objective optimization problems as well. Despite the need for pursuing these additional studies, these initial results have indicated a definite merit of our proposed hybrid EMO algorithm which has embraced point-based local search methods and theoretical results of optimization in a way to benefit the overall algorithm.