Keywords

1 Introduction

Hyper-Heuristics are high-level methods to construct heuristics. Whereas, heuristics are often used to find optimal solutions in a search space for a problem, a Hyper-Heuristic are designed to find an optimal heuristic to solve a specific problem [9]. In this paper, we intend to use Hyper-Heuristics as selectors of crossover operators in Multi-Objective Evolutionary Algorithms (MOEAs). For different problem properties, there are different optimal MOEAs [19]. To further generalize them, Hyper-Heuristics are designed and added into MOEAs to make them reliable on a wider variety of problems.

A well-known example of a problem that could benefit from the use of Hyper-Heuristic is job scheduling [19]. Although, it can be defined as one problem class because the solution spaces are alike, the properties and the complexity of these problems can differ and usually depend on the decision makers preferences. They can even be time-dependent, and their properties can change during the optimization process. In this case, it is nearly impossible to know which Meta-Heuristic would work best or even good enough. A Hyper-Heuristics could improve the situation by selecting the algorithm during the optimization process [10].

Hyper-heuristics have been applied to MOEAs in the past. In [24], Venske et al. propose a Hyper-Heuristic selection of crossover operators in MOEA/D for Many-Objective problems and could show that the proposed approach can improve the robustness and the quality of the algorithm. Similar to [15] by Pan et al. or [14] by Ono et al., they use a small pool of two to three crossover operators. Those are chosen in a way, that there is at least one good option for a specific problem class. Further approaches on applying Hyper-Heuristics are proposed by Pang et al. [16], who uses an offline learning to tune MOEA/D for specific problem properties, or Hong et al. [12], who uses a Hyper-Heuristic to generate polynomial mutation operators via training. An offline learner always needs training data sets, which should be at least similar to the actual problem. Both works showed, that otherwise the risk rises that the algorithm is outperformed by MOEAs that do not need training. Therefore, when using trained offline learners, more knowledge about the problem is again needed.

In this paper, we use an online learning Hyper-Heuristic to enhance the usability of the algorithm on unknown problems. We mainly focus on the selection mechanism and aim to expand the selection pool with more operators to cover several problem classes. We propose different variations of a learning mechanism applied to NSGA-II [5] to investigate their influence. We perform experiments on 20 test problems and compare our approach with NSGA-II using solely Simulated Binary Crossover (SBX) [6] and Uniform Crossover (UX) [21]. The results indicate that we could outperform the basic NSGA-II on most problems, although their properties are diverse.

This paper is organized as follows: first, we describe the general structure of our Hyper-Heuristics. In the second section, we propose our Hyper-Heuristics with the two basic selection methods. Afterwards, we analyse these algorithms in Sect. 4 and conclude that modification would be beneficial. Those modifications are presented in Sect. 5 and analysed in Sect. 6. We conclude our results and answer the questions about the generalization of MOEAs with Hyper-Heuristic in the last section.

2 Background

Hyper-Heuristics are generally defined as algorithms that optimize the selection of a heuristic or that constructs a new heuristic to solve an optimization problem. Therefore, they are part of a two-level framework. The high-level is the Hyper-Heuristic exploring the heuristic space H, and the low-level is a heuristic exploring the solution space S. The objective of a Hyper-Heuristic is to search in H for the optimal heuristic configuration \(h*\), which generates the optimal solution \(s*\) [17]. Considering, that in an optimization problem solutions are evaluated by function f, there has to be a mapping M so that \(M: f(s) \rightarrow F(h)\), where F is the objective function for the Hyper-Heuristic. This leads to the formal definition:

$$\begin{aligned} F(h*|h* \rightarrow s*, h* \in H) \leftarrow f(s*, s* \in S) = \min \{f(s), s\in S\} \end{aligned}$$
(1)

This general definition is applicable to multiple classes of Hyper-Heuristics. Burke et al. [2] classifies Hyper-Heuristics by dividing the feedback procession in online, offline and no learning and the heuristic search space into selection and generation. In this paper, we use exclusively online learning selection Hyper-Heuristics. These Hyper-Heuristic are applied to a MOEA to select the crossover operator. This use case is further investigated by Drake et al. [9]. They mention that in MOEAs, Hyper-Heuristics can select crossover operators so that they produce offspring optimized for the current problem. While this “Nature of how heuristics are grouped, chosen and applied” [9] differs for different Hyper-Heuristics, the basic structure remains the same. Figure 1, illustrates the main idea, which contains the application of this structure to a basic evolutionary algorithm (EA).

For each generation, the Hyper-Heuristic selects one or more crossover operators to produce the next generation. Thus, the heuristic space H is a set of crossover operators, the selection pool. The evaluation f of the solutions is done by the EA. To use this information for the learning process, we use a reward function that depends on the evaluation results. This function corresponds to the mapping M. The Hyper-Heuristic stores a cumulative score, which is updated by the reward function in every generation. This is utilized for the selection of a subset of operators from the selection pool for the current generation. This selection mechanism corresponds to the Hyper-Heuristics objective function F. Therefore, Hyper-Heuristics are mostly made up of those three exchangeable parts: selection pool, reward function and selection mechanism. In this paper, we focus on the selection mechanism and propose four different variants to evaluate its influence.

Fig. 1.
figure 1

Extended evolutionary algorithm using a Hyper-Heuristic instead of single crossover operation.

3 Selection Mechanisms: Single Selection and Distribution

In this section, we present two out of four developed Hyper-Heuristics. Both use the same reward function and the same selection pool. We use two different selection mechanisms: distribution and single selection. First, we introduce the reward function and the present the crossover operators in the selection pool, and afterwards the selection mechanisms are described.

For simplification, we use the expression products of operator to sum up the subset of offspring that were produced by a specific crossover operator. Furthermore, the set \(\mathcal {X}\) describes the current generation and the set \(\mathcal {Y}\) describes the offspring. \(\mathcal {E}\) is the set of crossover operators, and \(e \in \mathcal {E}\) is a specific operator. Products of e are therefore \(\mathcal {Y}_e\) and becomes \(\mathcal {X}_e\) after the environmental selection.

3.1 Reward Function

The reward function uses the survival rate of the offspring during the evolutionary cycle t. We consider the ratios between the latest offspring before \(\mathcal {Y}_{e, t-1}\) and after the environmental selection \(\mathcal {X}_{e, t}\), and the portion of products per operator in the current generation \(\mathcal {X}_t\). The calculation is given in Eq. 2.

$$\begin{aligned} r_e = \frac{| \mathcal {X}_{e, t} |}{| \mathcal {Y}_{e, t-1} |} + \frac{| \mathcal {X}_{e,t} |}{| \mathcal {X}_t |} \end{aligned}$$
(2)

Therefore, we use the survival of the fittest given by NSGA-II and no further evaluation is required for the learning process.

3.2 Selection Pool

The second component of the Hyper-Heuristic is the selection pool \(\mathcal {E}\) populated with seven crossover operators. Simulated Binary Crossover (SBX) introduced by Deb and Agrawal in [6] and Uniform Crossover (UX) introduced by Syswerda and Gilbertin [21] are widely used and commonly known. We added Rotation-Based Simulated Binary Crossover (RSBX), which is derived from SBX with a rotational invariant property by Pan et al. [15]. We adapt two other known evolutionary operators, the Differential Evolution [20] and Covariance Matrix Adaption Evolutionary Strategy [11], to implement a Differential Evolution Crossover (DEX) and a Covariance Matrix Adaption Crossover (CMAX). Additionally, we derive from Simplex Crossover (SPX) introduced by Tsutsui et al. in [23] a simplified form using linear combinations of three parents, which we call LCX3. Lastly, we modified the Laplace Crossover (LX) presented by Deep et al. in [8] to get a new variation of a distribution-based crossover. With those operators, we cover a variety of self-adaptive behaviour as described by Beyer in [1], which is often added by a distribution-based crossover (SBX, RSBX, LX, LCX3), and different forms of centric production patterns, parent centric (SBX, RSBX, LX, UX, DEX) and mean centric (CMAX, LCX3), described by Deb et al. in [4].

3.3 Selection Mechanism

The last component is the selection mechanism. We choose two different mechanisms to compare them and examine the impact of this part of the algorithm. Both variants update the score of each operator in the first step. The Scoring-Function described in Algorithm 1 utilizes the reward calculation given in Eq. 2 to measure the quality of the latest products of each crossover operator. Afterwards, the operators are ranked depending on the reward so that the score can be calculated by using a cubic function on the rank. The score is cumulative over the generations. We use a cubic function to ensure that the best performing operators receives a high score, and the bad performing operators get a decreased score. Therefore, the score enhances the online learning process.

figure a

After the update of the scores, the actual mechanism starts. We decided to implement a selection of a single operator per generation and a distribution of the whole generation to all operators. The selection is described in Algorithm 2 and is named HHX-S. It uses the well-known Roulette Wheel algorithm, to select the crossover depending on the current score.

figure b

The distribution is described in Algorithm 3 and is named HHX-D. In this case, the score is used to calculate the portions of the generation each operator receives. The decision, which individual is used with which operator, is random. The resulting child generation produced by each operator shall be the same size as the part of the parent generation they receive.

figure c

We compare both variants by implementing them in NSGA-II and solving 20 benchmark problems in the next section.

4 Evaluation and Experiments

In the experiments, we use 20 different benchmark problems. We compare both Hyper-Heuristics with NSGA-II [5] and different single Crossover operators. In prior experiments, we found out that the UX works best on most problems. Therefore, we use this and the classic version with SBX [6] for the analysis. We record their quality regarding the IGD [3] metric and additionally the selecting behaviour of the Hyper-Heuristics to evaluate their learning behaviour.

We selected DTLZ1-7 [7], RM1-4 [18], which are derived from ZDT1, ZDT2, ZDT6 [25] and DTLZ2, and WFG1-9 [13]. With these, we cover multiple variants of non-separability, modality and rotation on two and three objectives. We multiplied the number of decision variables by four to increase the complexity and emphasize the performance differences. We use the PlatEMO Framework [22]. It contains all the basic and many additional algorithms, benchmark problems and quality metrics.

During the evaluation, we aim to investigate the influence of the selection mechanism. We start with a comparison of the resulting IGD values, regard the over time development on example problems, and compare those afterwards to the selecting behaviour of both Hyper-Heuristics. We can figure out the advantages and disadvantages of each selection mechanism by analysing this behaviour.

4.1 IGD Results of NSGA-II Using HHX-D, HHX-S, UX and SBX

In Table 1 our Hyper-Heuristics, HHX-S and HHX-D, implemented in NSGA-II are compared to NSGA-II with SBX and UX. We decided to use the rank sum test and highlight all algorithms that are considered as the best performing algorithms per benchmark. The data depicted in the cells are the median results of the IGD measurement over 31 runs. Furthermore, the rank sum test is used in a one-to-all comparison, which in our case compares the original NSGA-II to the other algorithms. The results are symbolized with markers to show, whether the algorithm performed significantly better than the original NSGA-II (\(+\)), significantly worse (−) or approximately equal (\(\approx \)).

Table 1. Inverted Generational Distance (IGD) of NSGA-II with HHX-D, HHX-S, UX and SBX as crossover operators on DTLZ, RM and WFG with increased number of dimensions.

The data in Table 1 shows that the original NSGA-II is always dominated by at least one other algorithm. Prior experiments already showed that the NSGA-II with UX outperforms most other variations on most problems. A problem feature that is hard to handle for the UX is the rotation, thus, problems RM1-4. As RM3 is also multi-modal, most rotation invariant crossover operators still have difficulties on this problem, and UX performs similarly well due to its advantages on problems of this kind. Both variations with Hyper-Heuristics can compete with UX, but often lose in a direct comparison. Nevertheless, they outperform the original NSGA-II on most problems. On rotated problems, they also outperform the UX which is a hint, that they picked rotational invariant operators in those cases.

Assuming that the UX operator is the best performing operator in the pool, online learners would need to test all operators to learn this. Thus, extra effort is necessary, which we call Learning Offset. Due to that offset, it is very difficult to outperform the best operator in pool with our Hyper-Heuristics. In a setting, where the problem and its features are unknown, the user would also not know which crossover operator works best. Therefore, they would benefit from using the Hyper-Heuristic, that performs above the average. In Fig. 2 the IGD measurements of different variations of NSGA-II on two problems with different properties are visualized. On RM2 it is clear, that the UX and the SBX cannot achieve good results, whereas the rotational invariant operators perform similarly well. In this case, both Hyper-Heuristic had many suitable options to pick in the pool and thus the learning offset is minimal, and both algorithm outperform the other variations. On WFG5, most operators had difficulties producing sufficient offspring. In this case, it is not only hard to select a good operator, but also to learn this fast. The learning offset is therefore a lot bigger, especially for HHX-S as the learning is intrinsically slower than the learning of the HHX-D. This is also clear in the graph, as the HHX-S cannot achieve a sufficient result quality. HHX-D, on the other hand, outperformed the best operators on this problem.

Fig. 2.
figure 2

IGD trends of the median runs of NSGA-II with single crossover operators and HHX-S and HHX-D on RM1 on the left and WFG8 on the right.

Fig. 3.
figure 3

Cumulative number of products of Crossover Operators selected by HHX-D on the left and HHX-S on the right on WFG5

4.2 Selection Behaviour of HHX-D and HHX-S

The question arises, how a selection of different operators can outperform the best operator within the selection pool. To answer this, we evaluate the behaviour of the Hyper-Heuristics. In Fig. 3 this behaviour is visualized by giving the cumulative number of products of the different crossover operators on WFG5. A difference in the selection is visible from the beginning on. Regarding the qualities of the single usages of the crossover operators in Fig. 2, it would be assumed that both Hyper-Heuristics would select primarily the LX and later DEX or UX. The HHX-D firstly prioritize the DEX, which changes to the LX after about 60 generations. Additionally, UX and SBX receive a big portion of the population in each generation. HHX-S on the other hand selects mostly the CMAX and changes to LX after about 50 generations. From those impressions, we assume that there is not the one best operator for a problem, but for the current state of the population. For example, if it is far or close to the Pareto Front. Thus, a combination of different operators could lead to a better performance than using a single one. This answers the question, how Hyper-Heuristics could outperform the best operator in their selection pool. Another problem with the current Hyper-Heuristic arises, on the other hand, which we call the Learning Bias. Especially the HHX-S could suffer from this, when it scores one operator too soon too high so that a correction or a change in preference would be too slow. HHX-D can adapt faster to a new situation, but the higher the score differences, the slower the learning process.

5 Advanced Selection Mechanisms: Evolving and Alternating

From the learnings of the first approach to Online Learning Hyper-Heuristics, new obstacles were defined: Leaning Offset and Learning Bias. We modify our Hyper-Heuristics and eliminate those obstacles, to improve the algorithms further. The learning offset is a big problem for the Hyper-Heuristic Selection, because of its bad explorative behaviour. Nevertheless, it performs very well on problems, where it can decide early on a good operator. Thus, the goal for a new Hyper-Heuristic would be, to have an improved exploration phase while keeping the exploiting of HHX-S. The first new Hyper-Heuristic uses an evolving approach and is therefore called HHX-E. It starts using the HHX-D to use its good exploration, then evolves to HHX-S to use a better exploitation and finally fully exploits the current best crossover operator. Assuming we know the number of the maximal function evaluations due to limited resources of the users, we can set the intervals in relation to this. With the results from previous tests, we decide to set the distribution portion to the first half and only use the last tenth for the exploitation of the current best crossover operator.

This approach might improve the Learning Offset, assuming the latest learning, that there is no best operator for a whole problem, but the best combination, the Learning Bias would not be defeated with this procedure. To fight this, a fourth approach on Hyper-Heuristic Online Learner is introduced. It uses HHX-D and HHX-S alternately and resets the score after each iteration. Again, we use the maximum number of function evaluations, to set the durations of each iteration. With the results from previous tests, we decide to use 10 iterations, with 70% using distribution and 30% using selection.

6 Comparison of All Presented Algorithms

To examine the improvements of the named obstacles, we again use pairwise comparisons of IGD results on different benchmark problems. This time, we compare HHX-A, HHX-E, HHX-D and UX. We exclude HHX-S as it is mostly outperformed by HHX-D, and we still have the comparison to UX, to determine whether the new Hyper-Heuristics can compete with it on its best problems.

Table 2. Inverted Generational Distance (IGD) of NSGA-II with HHX-D, HHX-S, UX and SBX as crossover operators on DTLZ, RM and WFG with increased number of dimensions.

The results are shown in Table 2. HHX-A-NSGA-II performed better on nine different problems and worse on nine different problems than UX-NSGA-II. In comparison with HHX-D-NSGA-II, an improvement is noticeable on DTLZ3 and DTLZ4 and also on RM1 and RM2, where HHX-A-NSGA-II was the best performing algorithm between those four. HHX-E-NSGA-II, on the other hand, only differs a little from HHX-D-NSGA-II so that we can say it performs nearly equally well. To sum this up, HHX-A-NSGA-II is the best performing variant of all four Hyper-Heuristics. According to the IGD results, HHX-A performs nearly equally good to the UX operator. UX also has weaknesses on different problems, on which HHX-A finds the better option. To give an outlook and the possibilities of Hyper-Heuristics, we also examine the learning behaviour of all four Hyper-Heuristics. In Fig. 4 an exemplary development of the distribution of the scores on WFG5 are visualized.

Fig. 4.
figure 4

Distribution of the Scores for each Crossover Operator during solving of WFG5 with NSGA-II using Hyper-Heuristics.

According to the graphs given in Fig. 2 the use of DEX, LX, and UX led to the best results. It is remarkable, that HHX-D reaches a fixed distribution within the first third. HHX-S, on the other hand, changes its score distribution dramatically after about 40 generations. It weighted the LCX3 and the RSBX more than the UX operator on that point, although both did not perform well in the single usages. The development of the scores of HHX-E is similar to HHX-D. The most considerable changes happen in the first third. After the 50th generation, the selection mechanism changes to selection and the score behaves similar to HHX-S. Since the distribution mechanism weighted the SBX operator highly and the selection mechanism is very sensitive to biases, the SBX operator is still the most rewarded operator in the 90th generation and is solely selected for the last evolution of HHX-E. The HHX-A resets the scores in every 10th generation. Different phases of the problem are here more visible than in the other graphs. In the first iteration, DEX, LX and SBX achieve the highest rewards. In the second iteration the focus lies more on LX, UX and CMAX. In the following iterations, the tendency in choosing UX with some spikes on SBX and LXC3 is recognizable. This matches the trends illustrated in Fig. 2.

Considering the IGD measurements, HHX-A, HHX-D and HHX-E all performed similar well on WFG5. All of them have a distribution trend that matches the IGD trend in Fig. 2. Considering their functionality, HHX-E embraces the learning bias, which is an advantage on this problem because most operators perform well on it. It also resolves the problem with the learning offset, as it finds a good distribution as soon as HHX-D. HHX-A fulfils the expectations, that there is no bias, as it chooses in each phase the best option. The learning offset is also minimized, as the HHX-D gathers the necessary information very fast. Regarding, that there are still problems, where HHX-A does not perform as well as expected, the iterations might be too small. On some problems, the distribution might need more time gathering the information, so a better balancing of iterations and their lengths in this approach could still improve the performance. Nevertheless, HHX-A-NSGA-II offers a very well performing algorithm, that could be used on a variety of problems without knowing their properties.

7 Conclusion

In this paper, we proposed four different forms of Hyper-Heuristics as selectors of crossover operators in NSGA-II. We use two different selection mechanisms: the selection of one operator per generation and the distribution of the generation to all operators. With the first, we build the HHX-S algorithm and with the latter, we build the HHX-D algorithm. In a first experimental evaluation, we concluded, that the distribution has a well explorative behaviour and the selection has a well exploiting behaviour. We named two different problems: The Learning Bias and the Learning Offset, which could both be minimized by using a combination of both algorithm. Therefore, we used an evolving approach with three stages (distribution, selection, exploitation) named HHX-E and an alternating approach with resets of the score in every 10th generation, named HHX-A. In a second experimental evaluation, we concluded that HHX-A is the most successful algorithm out of the presented four. It has a fast learning due to the distribution part, but it is not biased in different phases, so that new obstacles are faced without information that are not applicable any more.

We consider HHX-A as a successful Hyper-Heuristic, as it selects well performing crossover operators in every situation. As the learning offset is not resolved, it is still a difficulty to outperform the best crossover operator in the pool. Nevertheless, this method prevents the user from deciding about the operator by themselves and makes it easier to work with problems without any known properties.

Our future work is to look more into the other parts of the Hyper-Heuristic. The crossover operators in the pool could be further examined to select a better variety with fewer operators. The smaller the number of operators, the faster the Hyper-Heuristic can learn. Thus, it could be another method to minimize the learning offset.