1 Introduction

New problem domains or variants of existing problems have progressively entered the literature. They draw the attention of researchers to develop effective solution strategies. The performance of the developed algorithms tends to vary over different problems or even over problem instances belonging to a particular problem. In order to alleviate this situation, algorithm selection strategies have been studied. The primary goal of these strategies is to determine the best algorithm for the target problem instances. Even if this is a useful approach, it misses the possible improvement opportunities due to varying algorithm performance in the course of the solution process. Selection hyper-heuristics follow a deeper selection approach by managing a number of given low-level search strategies during the search thereby relying on their strengths and weaknesses (Burke et al. 2003a).

In the literature, the number of studies concerning hyper-heuristics is exponentially growing. Burke et al. (2010a) classified these hyper-heuristics based on the type of the provided feedback mechanisms and the nature of the heuristic search space. Three categories of feedback mechanisms are considered: hyper-heuristics with online learning, offline learning, and no learning. In online learning, the learning process occurs during the search process. Choice function (Cowling et al. 2001), reinforcement learning (Nareyek 2003; Özcan et al. 2010), learning automata (Mısır et al. 2009) are some examples of online learning. In contrast, offline learning refers to learning before starting the search. In particular, case-based reasoning (Burke et al. 2006b) and learning classifier systems (Marín-Blázquez and Schulenburg 2007; Ross et al. 2002) work in an offline manner. Some hyper-heuristic components without any learning device are also available. An example is the simple random heuristic selection mechanism (Cowling et al. 2001). Hyper-heuristics are categorised with respect to the nature of the given heuristics as selection hyper-heuristics and generation hyper-heuristics. The aforementioned studies interested in the selection hyper-heuristics executing over constructive or perturbative heuristics. This type of hyper-heuristics also contain various meta-heuristic components such as tabu search (Burke et al. 2003b), genetic algorithms (Han and Kendall 2003), simulated annealing (Dowsland et al. 2007; Burke et al. 2012), and ant colony optimisation (Burke et al. 2005). On the other hand, there exist some hyper-heuristics that aim to generate the low-level heuristics. Burke et al. (2006a, 2007), Fukunaga (2008), Bader-El-Den et al. (2009), and Burke et al. (2010b) utilised genetic programming to generate low-level heuristics designed for the problem instances.

According to the initial definition, hyper-heuristics have been designed to raise the level of generality (Burke et al. 2003a). Placing a domain barrier preventing any problem-dependent data transition from or to hyper-heuristics is the foremost rule for reaching generality. This basic principle leads hyper-heuristics to focus on managing low-level search strategies instead of directly solving a problem instance. Therefore, it is required to design a hyper-heuristic that has the ability to govern different heuristic sets while profiting maximally from their capabilities. HyFlex is a software framework that empowers hyper-heuristic developers to test their approaches across a range of problems proposed by Ochoa et al. (2012). In its current version, six problem domains are available, i.e., max SAT, bin packing, permutation flowshop scheduling, personnel scheduling, travelling salesman, and vehicle routing. Related to each problem domain, a number of perturbative heuristics have been implemented from four main heuristic types, namely mutational heuristics, crossover operators, ruin-recreate heuristics, and hill climbers. The detailed description of these problems, the characteristics and origins of instances as well as the definition of the heuristics were given in Curtois et al. (2010), Hyde et al. (2010b, a), and Vázquez-Rodríguez et al. (2010). In addition to these features, HyFlex provides an opportunity to change the effect of mutational heuristics and hill climbers. Moreover, it is possible to keep some of the solutions in memory for further use.

In this study, a selection hyper-heuristic is implemented on HyFlex and an experimental performance analysis on the available problems is conducted. The proposed hyper-heuristic has been developed as a general adaptive strategy for the six problem domains and the given heuristic sets. It consists of a dynamic heuristic selection mechanism and a move acceptance strategy evolving with the changing characteristics of the search environment. Experimental results confirm that the developed hyper-heuristic can provide significant performance improvement compared to other hyper-heuristics tested on the problem instances. It should be mentioned here that the proposed hyper-heuristic is the forerunner of the hyper-heuristic that won the first international cross-domain heuristic search challenge (CHeSC 2011).Footnote 1

In the remainder of this paper, the generality requirements for hyper-heuristics are succinctly discussed. The underlying components of the proposed hyper-heuristic are presented in Sect. 3. Next, the experimental results are discussed in Sect. 4. In the last section, the paper is summarised and concluded with a discussion on future research.

2 Generality requirements

A generic selection hyper-heuristic should be capable of managing a diverse range of heuristic sets utilised for solving distinct problems. Although the capability of solving as many problems as possible is the primarily mentioned focus, the main concern should be the management of different heuristic sets. This objective implicitly embraces the aim of solving various problems anyhow. The characteristics of the existing low-level heuristics may require distinct management strategies because each heuristic may have various advantages and disadvantages. These features should be interpreted relying on the dynamic performance of the heuristics and experimental limits such as the given execution time. And the heuristic set as a whole should be used in synergy. For this purpose, a heuristic selection mechanism should consist of particular analysis components to facilitate the adaptivity of the selection process, Table 1.

Table 1 The distribution of heuristic types for the given problem domains (heuristic indexes are shown in parentheses)

2.1 Heuristic set features

An analysis tool or a learning component should be designed based on a set of characteristics that determine the behavior of the heuristics. The first characteristic is the heuristic set size. A heuristic set with many heuristics has a higher probability of finding satisfactory solutions regarding a problem instance. Conversely, such a set can be hard to manage due to the availability of many options to select. The quality of heuristics depends critically on the set size. If the heuristics have similar performance, then the set size can make no difference. Since this is uncommon, it is required to employ effective learning strategies. The second feature is the speed of the heuristics. This characteristic may affect the number of decision steps for the selection process. Hence, it is required to interpret this element in combination with the improvement capabilities of the heuristics. The heuristic specialisation is another factor. A heuristic dedicated to solving a constraint or improving an objective can be considered in this category. In addition, heuristics generating only improving or equal quality solutions such as hill climbers or any other strict heuristic behavior fall in this category. This feature list can obviously be extended. The main purpose should be the usage of the most relevant and effective features (Alpaydin 2010, ch.6) in a collaborative way to predict the future performance of the heuristics.

2.2 Parameter and rule settings

Parameter-free strategies are interesting from a generality perspective. Even though such methodologies are called parameter-free, their behavior depends on some predetermined values or rules. For instance, simple random is a parameter-free heuristic selection mechanism that gives an equal selection chance to each heuristic. The simple random is parameter-free since its parameters are set from the beginning. The improving or equal move acceptance mechanism is also considered a parameter-free approach.

However, this method is based on a predetermined rule in which only better or equal quality solutions are accepted. This means that, the move acceptance method is parameter free, yet not rule-free. All similar methods show that providing a totally independent mechanism seems impossible. Instead, the proposed algorithms should be able to control their parameters or rules according to the search space and the environmental settings with the aim of decreasing the user effect on the algorithm’s performance.

3 Hyper-heuristic

A traditional selection hyper-heuristic requires a selection mechanism to determine the best heuristic to apply at each decision step. In addition, it needs a move acceptance strategy to check whether the constructed/visited solution is accepted with regard to its quality and the current state of the search. These consecutive operations are performed until the stopping condition is met. In this study, a new heuristic selection mechanism and a move acceptance strategy including additional components are proposed (Fig. 1). In the following paragraphs, these sub-mechanisms are explained in detail.

Fig. 1
figure 1

A selection hyper-heuristic framework

3.1 Heuristic selection

3.1.1 Adaptive dynamic heuristic set

Mısır et al. (2010) and Bilgin et al. (2012) studied a dynamic heuristic set strategy for the heuristic selection process. The motivation behind this approach is determining elite heuristic subsets during specific iteration intervals. Similar approaches aiming to eliminate heuristics were studied in Burke et al. (2003b), Chakhlevitch and Cowling (2005), Kendall and Hussin (2005a), and Kendall and Hussin (2005b). The dynamic heuristic set strategy was carried out by eliminating heuristics that are expected to perform worse compared to the rest and keep the best ones with respect to some performance criteria.

$$\begin{aligned} p_i&= w_1 \Big [ \Big (C_{p,{\text{ best}}}(i)+1 \Big )^2 \Big (t_{{\text{ remain}}}/t_{p,{\text{ spent}}}(i) \Big ) \Big ] \nonumber \\&\times b + w_2 \Big (f_{p,{\text{ imp}}}(i)/t_{p,{\text{ spent}}}(i) \Big ) \nonumber \\&-w_3 \Big (f_{p,wrs}(i)/t_{p,{\text{ spent}}}(i) \Big ) + w_4 \Big (f_{{\text{ imp}}}(i)/t_{{\text{ spent}}}(i) \Big ) \nonumber \\&-w_5 \Big (f_{wrs}(i)/t_{{\text{ spent}}}(i) \Big ) \\ b&= {\left\{ \begin{array}{ll} 1,&\sum _{i=0}^n C_{p,{\text{ best}}}(i) > 0 \\ 0,&otw.\\ \end{array}\right.}\nonumber \end{aligned}$$
(1)

These criteria reflect the performance changes of the heuristics during the search. Performance changes are determined based on the number of new best solutions found, the total fitness improvement and disimprovement per unit execution time. These elements are used according to their importance for the search process. For instance, finding a new best solution is more crucial than improvement without a new best solution. That is, a heuristic that reaches a new best solution is considered a higher quality heuristic. This information was gathered during a phase composed of a predetermined number of iterations. In the new selection strategy, i.e., adaptive dynamic heuristic set (ADHS), an updated version of this performance metric is proposed.

A weighted sum of different performance elements was used to determine the quality of different search strategies. Equation (1) presents the details of the performance measurement for each heuristic. The definition of the variables is given in Table 2. The weights are set as {\(w_1 >> w_2 >> w_3 >> w_4 >> w_5\)} to provide a strict priority between the given performance elements. Thus, for instance, if the first performance element of a heuristic has the highest value, then the rest of the performance elements have no effect on the overall \(p_i\) of the corresponding heuristic.

Table 2 The performance metric variables for the adaptive dynamic heuristic set strategy

In the performance criterion, the first performance element is related to the heuristic’s capability of finding a new best solution. In the aforementioned studies, this element was just a counter of new best solutions found by a heuristic. However, it may cause particular difficulties especially if certain heuristics can find new best solutions only after a long time. For that reason, it can be useful to apply a heuristic that may find more new best solutions during the given execution time. Otherwise, a relatively slow heuristic can stay in the heuristic set regardless of its speed because of finding new best solutions. In addition, it is important to use \(C_{p,{\text{ best}}}(i)+1\) rather than \(C_{p,{\text{ best}}}(i)\) for preventing the same type of slow heuristics. The second element is used to select improving heuristics and the third element is employed to choose heuristics that deteriorate solutions less. The last two elements have a similar aim, but their values are independent from phases. They are calculated using the values collected until that moment.

The number of phases where a heuristic stays out of an elite heuristic set is denoted by the tabu duration. For decreasing user dependency, the tabu duration and the number of iterations for one phase are calculated based on the number of heuristics available in the elite heuristic subset. The tabu duration is set to \(d = \sqrt{2n}\) where \(n\) shows the number of heuristics and the phase length (\(pl\)) is defined as the product of the tabu duration and a constant value (\(ph_{\text{ factor}}=500\)). These values are recalculated at the end of each phase. Since the calculated \(p_i\) values are a combination of different quantities and noisy, the performance values are converted into a quality index (\(QI \in [1,n]\)) value. A heuristic with the lowest \(p_i\) gets 1, the others get one unit more based on their order. The average (avg) of these \(QI\) values is calculated to determine the heuristics that will be excluded. Tabu heuristics also attend this calculation with \(QI=1\).

$$\begin{aligned} \mathrm{avg} = \bigg \lfloor \Big (\sum _i^n{QI_i}\Big ) \bigg / \bigg .n \bigg \rfloor \end{aligned}$$
(2)

Tabu duration adaptation The tabu duration of a specific heuristic is increased if it is successively excluded. In other words, the tabu duration is specifically determined for each heuristic using \(d\) as the base value. This is required since the proposed exclusion procedure gets a tabu heuristic back whenever its tabu status expires.

Phase length adaptation The number of iterations per phase is determined by \(pl = d*ph_{\text{ factor}}\). However, this value is updated based on the speed of the non-tabu heuristics for fairness purposes. For that purpose, at the end of each phase, the time spent per move concerning each non-tabu heuristic is calculated and the total of all (\(t_{{\text{ subset}}}\)) is used to determine the next phase length. In the following equations, the simple formula for calculating the new phase length is given. A predefined constant value is assigned to \(ph_{{\text{ requested}}}\) that denotes the number of phases requested during the whole search process. For this study, \(ph_{{\text{ requested}}} = 100\). The duration of a phase (\(ph_{{\text{ duration}}}\)) is calculated by dividing the total execution time by \(ph_{{\text{ requested}}}\). The resulting value is divided by \(t_{{\text{ subset}}}\) to reach the new \(pl^{\prime }\). If the calculated value is smaller than \(pl\), then it is utilised as the new phase length. This process is repeated at the end of each phase.

$$\begin{aligned} \begin{array}{rll} ph_{{\text{ duration}}}&= t_{{\text{ total}}}/ph_{{\text{ requested}}} \\ pl^{\prime }&= ph_{{\text{ duration}}}/t_{{\text{ subset}}} \end{array} \end{aligned}$$
(3)

The utilised constant values as well as the provided rules for adaptation purposes regarding the ADHS are set based on the idea of giving chance to all the existing heuristics in the heuristic set to show their performance.

3.1.2 Learning automata

A learning automaton is a finite-state machine that aims at learning the optimal action out of a set of actions (\(A=\{a_{1},\ldots ,a_{n}\}\)) through interaction with an environment in a trial and error manner (Thathachar and Sastry 2004). During a learning process, the environmental response (\(\beta (t) \in [0,1]\)) to the selected action is used to update a probability vector \(p\) consisting of the selection probabilities of the actions. The update operation is carried out using an update scheme (\(U\)). The update scheme of the learning automata is based on Eqs. (4) and (5). Equation (4) refers to the update operation for the applied action and Eq. (5) is used to update the probabilities of the remaining actions.

$$\begin{aligned} p_i(t+1)&= p_i(t) + \lambda _{1}\; \beta (t)(1 - p_i(t)) \nonumber \\&- \lambda _{2} (1 - \beta (t)) p_i(t) \nonumber \\&\text{ if} \; a_i \; \text{ is} \text{ the} \text{ action} \text{ taken} \text{ at} \text{ time} \text{ step}\; t \quad \end{aligned}$$
(4)
$$\begin{aligned} p_j(t+1)&= p_j(t) - \lambda _{1}\; \beta (t)p_j(t) \nonumber \\&+\; \lambda _{2} (1 - \beta (t)) [(r - 1)^{-1} - p_j(t)] \nonumber \\&\text{ if} \; a_j \ne a_i \end{aligned}$$
(5)

A learning automaton with a linear update scheme, linear reward-inaction, was employed as a heuristic selection mechanism in Mısır et al. (2009). During the learning process, actions were considered low-level heuristics and heuristics that found new best solutions were rewarded. The changing heuristic probabilities were used like roulette wheel selection. In this study, a linear reward-punishment scheme is used to update heuristic probabilities with respect to finding a new best solution, improving the current solution, worsening the current solution and finding a solution with equal quality as the current solution. Related learning rates were set in a decreasing manner in the given order as \(\{\lambda _{1} = 0.1, \lambda _{1} = 0.01, \lambda _{2} = 0.002, \lambda _{2} = 0.001\}\) without extensive tuning. The first two learning rates are for rewarding due to finding new best solutions and improving solutions. The last two learning rates are used to punish heuristics due to finding worse solutions and solutions with the same quality. Differently from the above mentioned application of the learning automaton, it is just used to keep track of performance changes during different phases of the dynamic heuristic set process. In the beginning of each phase, the learning probabilities are reset. In addition, different learning rates are used for different heuristics. The underlying idea behind this approach is to project the speed of heuristics to the probability updates because it would be unfair to use the same reward/punishment for a very slow and a fast heuristic (Bowling and Veloso 2001). For instance, HyFlex contains local search heuristics that only return improving or equal solutions. These are generally expected to be more time consuming compared to other heuristic types. For that reason, the performance differences among heuristics should be related to their speed. The learning rate of the heuristics are determined using Eq. (6). \(C_{{\text{ max}}}\) refers to the number of moves spent per unit time by the fastest heuristic. \(C_{i,{\text{ move}}}\) denotes the same value for heuristic \(i\). The resulting value (\(\lambda _{{\text{ mult}}}\)) is used as a multiplier for the initial learning rate. If this multiplier makes the related learning rate increase more than its predetermined limit, then the learning rate is set to a predetermined value.

$$\begin{aligned} \lambda _{{\text{ mult}}} = C_{{\text{ max}}}/C_{i,{\text{ move}}} \end{aligned}$$
(6)

3.1.3 Relay hybridisation

The heuristics are divided into two types: mutational heuristics and hill climbers in Özcan et al. (2008). They were used within four different frameworks. One of the frameworks, \(F_C\), offers selecting a mutational heuristic first and applying a pre-determined hill climber to the visited solution by the selected mutational heuristic. The experimental results showed that \(F_C\) is a very effective strategy to use. In this study, we proposed a relay hybridisation approach to determine a pair of heuristics that can be used consecutively, like \(F_C\), to find superior results. For that purpose, a list of the best heuristics, when applied just after a specific heuristic, is determined. This list is updated in a First-In-Last-Out manner, by adding a new heuristic to the end of the list and removing the first one to keep the size of the heuristic set fixed if the list is full. After applying a relatively worse performing heuristic based on the provided performance metric, a heuristic is randomly chosen from its list and applied to the solution generated by this heuristic. Since the heuristic list can include different heuristics more than once, a random selection strategy chooses a heuristic among weighted heuristics. For instance, in a list size of 10, a specific heuristic may be present five times while the rest of the list may be occupied by the other heuristics. In such a list, the probability of selecting the five-times occurring heuristic is \(50~\%\). That is, even if the selection mechanism is random, the end product is a weighted selection mechanism like roulette wheel.

figure a1

This strategy is applied based on \(C_{{\text{ phase}}}/pl\). \(C_{{\text{ phase}}}\) denotes the number of iterations passed within a phase. A random probabilistic value \(p \in [0,1]\) is generated and the relay hybridisation is applied if \(p \le (C_{{\text{ phase}}}/pl)\). The pseudocode of the method is depicted in Algorithm 1.

Figure 2 shows an example of the relay hybridisation. In this example, applying heuristic pairs involving \(LLH_2\) \(\rightsquigarrow \) \(LLH_5\) and \(LLH_2 \rightsquigarrow LLH_3\) generated new best solutions before. Whenever \(LLH_2\) is called and if the relay hybridisation is decided to be used, then it is determined whether a consecutive heuristic is randomly chosen from the heuristics that are currently available in the heuristic subset or from its pair list. The selected heuristic is consecutively applied to the solution at hand. If the resulting solution is a new best solution, then the selected heuristic is added to the end of the list.

Fig. 2
figure 2

Relay hybridisation

For some problem domains with some heuristic sets, it may not be possible to find effective heuristic pairs. Therefore, a similar tabu strategy used for the heuristic exclusion process is employed. If no new best solution is found by the relay hybridisation after a phase, then this feature is disabled for one phase. Furthermore, if this consecutively happens, then the tabu duration is increased by 1. This value stays the same whenever the tabu duration reaches its upper bound. If it can find a new best solution again, then the tabu duration is set to its initial value, 1.

Heuristic adaptation HyFlex provides an opportunity to modify some of the heuristics in an informed manner. It is possible to increase or decrease the perturbation effect of a mutational heuristic. In addition to that, it is also allowed to change the search depth of local search heuristics. In the proposed approach, the intensity of the mutational heuristics and the search depth of the hill climbers was updated only for the relay hybridisation. If a hill climber can find a new best solution more than fives times, then its search depth is set to 1.0. Otherwise, its value is set to \((0.1+(ph_{{\text{ passed}}}/\!pl)\times 0.4)\). \(ph_{{\text{ passed}}}\) is the number of iterations passed during the current phase. The second part is used also for the mutation operators.

3.2 Move acceptance

Move acceptance mechanisms are very effective on the performance of hyper-heuristics (Özcan et al. 2008). They determine the way to traverse the search space. One of the main concerns of the move acceptance mechanisms is the acceptability of worsening solutions. Accepting a worsening solution is a widely accepted strategy to prevent from getting stuck around a solution. Mısır et al. (2009) proposed a move acceptance strategy, i.e., iteration limited threshold accepting (ILTA), which accepts worsening solutions in a controlled manner. ILTA immediately accepts improving or equal solutions. If the hyper-heuristic cannot find new best solutions during a pre-determined number of iterations, then a worsening solution is accepted. This operation is decided based on the value of the best solution found and a constant value determining a range. An adaptive version of ILTA (AILTA) was proposed by Mısır et al. (2010). In this move acceptance, a second and higher range value is determined. If the hyper-heuristic cannot find a new best solution using a given range value, then this range value is increased to enable accepting much worse solutions. This is required to get rid of local optima.

 

figure a2

In the proposed hyper-heuristic, the underlying idea behind list-based threshold accepting (Lee et al. 2002) and late acceptance (Burke and Bykov 2008) was employed to decrease the parameter dependency of AILTA. The new move acceptance strategy, i.e., adaptive iteration limited list-based threshold accepting (AILLA), accepts worsening solutions using the fitness values of the previously visited best solutions. The best fitness encountered previously is used as the first threshold value. If it is not good enough to find a new best solution, then a higher fitness from the list (\({\text{ best}}_{\text{ list}}\) with \(l\) size) is used to decide about the acceptability of worsening solutions. The pseudocode of AILLA is presented in Algorithm 2. Other versions of AILLA were studied in Mısır et al. (2011b, a). Besides, the iteration limit (\(k\)) is updated whenever a new best solution is found. Equation (7) shows the details of this update process. \({\text{ iter}}_{{\text{ elapsed}}}\) is the number of iterations elapsed and \(t_{{\text{ exec}}}\) refers to the total execution time.

$$\begin{aligned} k&= \left\{ \begin{array}{l@{\quad }l} ((l-1) \times k + {\text{ iter}}_{{\text{ elapsed}}})/l ,&cw = 0 \nonumber \\ ((l-1) \times k + \sum _{i=0}^{cw} k \times 0.5^i \times tf) / l ,&\text{ otw.} \nonumber \\ \end{array}\right.\nonumber \\ tf&= (t_{{\text{ exec}}}-t_{{\text{ elapsed}}})/t_{{\text{ exec}}} \\ cw&= \left\lfloor {\text{ iter}}_{{\text{ elapsed}}}/k \right\rfloor \nonumber \end{aligned}$$
(7)

 

4 Experiments

Experiments were carried out on the six problems within HyFlex, namely max SAT (Hyde et al. 2010a), bin packing (Hyde et al. 2010b), permutation flow shop (Vázquez-Rodríguez et al. 2010), personnel scheduling (Curtois et al. 2010), travelling salesman, and vehicle routing problems (Walker et al. 2012). HyFlex provides 12 instances for the first four problems and ten instances for the last two Hyde and Ochoa (2011). Each instance was tested for ten runs. Each run is executed 10 min using Pentium Core 2 Duo 3 GHz PCs with 3.23 GB memory. In addition to the developed hyper-heuristic, nine additional hyper-heuristics were tested. They are combinations of simple random (SR) selection with AILLA, simulated annealing (SA) (Bilgin et al. 2010), late acceptance (LATE) (Demeester et al. 2010), great deluge (GD) (Kendall and Mohamad 2004), and improving or equal (IE) move acceptance mechanisms. Moreover, these acceptance mechanisms together with the adaptive dynamic heuristic set (ADHS) strategy are used as a part of the hyper-heuristic pool.

 

4.1 Computational results

Table 3 presents the score of the tested hyper-heuristics across six problem domains based on the scoring system used for CHeSC’2011. The detailed experimental results for each problem domain are presented in Tables 4,  5,  6,  7,  8, and 9. Among the tested hyper-heuristics, ADHS–AILLA gets the highest score, 508 (out of 680). ADHS–SA and ADHS–LATE follow it with the scores of 451 and 458, respectively. ADHS–AILLA performs the best for the bin packing, permutation flowshop scheduling, travelling salesman, and vehicle routing problems. It comes second after ADHS–SA for the personnel scheduling domain. Its worst performance comes for the max SAT problem as the fifth hyper-heuristic.

Table 3 Comparison of all the tested hyper-heuristics based on CHeSC’2011 scoring system (higher values are better)
Table 4 The results of the hyper-heuristics on the bin packing problem
Table 5 The results of the hyper-heuristics on the max SAT problem
Table 6 The results of the hyper-heuristics on the permutation flowshop scheduling problem
Table 7 The results of the hyper-heuristics on the personnel scheduling problem
Table 8 The results of the hyper-heuristics on the traveling salesman problem
Table 9 The results of the hyper-heuristics on the vehicle routing problem

For the hyper-heuristics with SR, AILLA provides the best results based on their overall performance. This indicates that the proposed move acceptance strategy is effective while dealing with different heuristic sets for different problem domains. Furthermore, its adaptive nature increases its robustness.

The performance difference between the hyper-heuristics with ADHS and SR demonstrate the effectiveness and contribution of the selection mechanism. The reason behind this result is the availability of different heuristics with respect to their speed and improvement capabilities. ADHS intelligently determines elite heuristic subsets during the run and explores effective heuristic pairs. The following subsections discuss the hyper-heuristics’ performance for each problem domain separately.

4.1.1 Bin packing

For the bin packing problem, experimens showed that ADHS–AILLA performs significantly better than the other tested hyper-heuristics based on the Wilcoxon test with a 95 % confidence interval. Its CHeSC’2011 score is 108 which is the only score exceeding 100 throughout all the test problem domains among the tested approaches. There is clear performance difference between selection mechanisms, ADHS and SR. For all tested move acceptance mechanisms, ADHS outperforms SR. For the hyper-heuristics with SR, AILLA also outperforms the rest.

4.1.2 Max SAT

SR–SA is the best performing approach for the max SAT problem. Despite this performance difference, ADHS–SA and ADHS–LATE perform significantly similar. ADHS performs better than SR with all the acceptance mechanisms except SA. Performance decrease caused by ADHS against SR with SA occurs only for this problem domain.

4.1.3 Permutation flowshop scheduling

The experimental results on the permutation flowshop scheduling instances indicate that ADHS–AILLA achieves better quality results in comparison with the other competing algorithms. However, there is no significant performance difference between ADHS–AILLA and ADHS–LATE, ADHS–IE, ADHS–SA. For the SR-version of these hyper-heuristics, SR–AILLA is significantly better than SR–IE and SR–SA despite their similar performance with ADHS. Therefore, it can be concluded that the effect of the selection mechanism on the performance is relatively high for the corresponding problem instances.

4.1.4 Personnel scheduling

ADHS–SA is the winning algorithm for the personnel scheduling domain. Its performance is significantly similar to ADHS–AILLA. The performance difference between ADHS and SR is again extremely high. In particular, SR–IE gets a score of 8 but the score of ADHS–IE is 80.5 which is the improvement provided just by changing the selection mechanism.

4.1.5 Travelling salesman

The travelling salesman problem is another domain where ADHS–AILLA performs best. However, no significant performance difference can be noticed compared with ADHS–SA, ADHS–IE and ADHS–LATE. As mentioned for the personnel scheduling problem, there are large margins between the scores of the hyper-heuristics with ADHS and SR for this problem domain too. For instance, the score of SA increases from \(0\) to \(73\) and the score of IE changes from \(2\) to \(69\) using ADHS instead of SR. In addition, SA and IE move acceptance mechanisms perform better than LATE and GD with ADHS while they perform significantly worse than LATE and GD with SR.

4.1.6 Vehicle routing

For the vehicle routing problem, ADHS–AILLA comes first with the score of 82. ADHS–SA and ADHS–LATE follow it with no significant performance difference. With ADHS, the scores of tested hyper-heuristics are two or three times higher than their scores compared to theirs scores obtained utilising SR.

4.2 Heuristic selection

Figure 3 demonstrates the effect of the proposed selection mechanism, ADHS, on the tested heuristic sets. For the bin packing problem, there is a large gap between ADHS and SR based on the number of heuristic calls during the run, ADHS chooses effective and fast heuristics more frequently. However, in some part of the search space, it prefers slower heuristics due to the changing search requirements. The selection process is still faster for max SAT but the difference is relatively smaller. For the flowshop scheduling problem, ADHS starts with fast heuristics, but then prefers slower options for performance improvement. The number of heuristic calls by ADHS compared to SR are also smaller for the personnel scheduling and travelling salesman problems The reason behind this slowing down on the number of heuristic calls partially caused by the relay hybridisation. For the vehicle routing problem, the number of heuristic calls by ADHS is again larger.

Fig. 3
figure 3

Comparison of the performance and speed between the tested selection mechanisms (the chart pairs belong to the results obtained from a randomly selected run on one instance of the bin packing, max SAT, flowshop scheduling, and personnel scheduling problems from top to bottom). The data belongs to a run on an instance for each problem domain

Figure 4 illustrates the number of calls for each heuristic regarding each problem domain. It can be concluded that at certain time points, the behaviour of the heuristic selection changes. The heuristics available for solving the bin packing problem instances show similar performance or behaviour through the whole search. This can be deduced from the constantly increasing number of heuristic calls. Only one heuristic started to be called more frequently during the second half of the search time. A similar pattern of behaviour can be seen on the max SAT, travelling salesman and vehicle routing problems’ heuristics. For the permutation flowshop scheduling problem, the number of calls for a group of heuristics follows different trends during different time intervals. For the personnel scheduling problem, some immediate changes on the number of calls for each heuristic during the whole running period can be seen. This is mainly related to the speed of the heuristics that allows only a limited number of iterations for the given execution time. In addition, it should be noted that the variations of the heuristic selection and exclusion processes depend on the environmental responses from the corresponding search spaces during learning. For instance, the graph for max SAT shows that there is no strict changes of the heuristic preferences in general. The reason behind this behavior is related to the characteristic of the search space providing easy solution improvement opportunities. This means that it is easy to immediately find good quality solutions due to the characteristics of their search spaces shaped by their solution spaces and their corresponding low-level heuristics. After this quick search process, it gets harder to find improving solutions as the learning process regarding the heuristic selection operations slows down.

Fig. 4
figure 4

The number of calls for each heuristic over the given problem domains by ADHS–AILLA (The data belongs to a run on an instance for each problem domain)

 

4.3 Relay hybridisation

Figure 5 demonstrates the effective heuristic pairs yielding new best solutions during the run on the tested problem domains. It shows the first heuristics as squares and the second heuristics as circles. Applying these two types of heuristics consecutively at an iteration is the hybridisation considered. The number of new best solutions found changes for different problem domains as well as different time intervals during a run. For the bin packing problem relay hybridisation is very effective for finding intensifying heuristic pairs for most of the execution time. The only hill climbers (\(LLH_4, LLH_6\)) perform as the most effective second heuristics. Two mutational heuristics (\(LLH_0, LLH_3\)), one ruin-recreate heuristic (\(LLH_2\)) and the crossover operator (\(LLH_7\)) help to diversify the solution for (\(LLH_6\)) as the first heuristics.

Fig. 5
figure 5

Heuristic combinations resulting from relay hybridisation on different problem domains (squares represent the first heuristics and circles refer to the second heuristics in which the heuristics were consecutively applied). Each graph was drawn based on data belonging to a run on an instance for each problem domain

For the max SAT problem, the effect of hybridisation is very limited. The hill climbers (\(LLH_7, LLH_8\)) are used as the second heuristics and all the mutational heuristics (\(LLH_0, LLH_1, LLH_2, LLH_3, LLH_4, LLH_5\)) available in the corresponding heuristic set were utilised as the first heuristics.

The hill climbers (\(LLH_7, LLH_8, LLH_9, LLH_{10}\)) for the permutation flowshop scheduling problems were effectively used as second heuristics. Among them (\(LLH_7\)) and (\(LLH_8\)) were the most preferred options. The ruin-recreate heuristics (\(LLH_5, LLH_6\)) are used as the first heuristics generally together with two mutational heuristics (\(LLH_0, LLH_1\)).

For the personnel scheduling, due to the running time limit, the number of iterations were not as high as the other problem domains. Anyway, the relay hybridisation delivered some effective heuristic pairs composed of one mutational heuristic (\(LLH_{11}\)) two ruin-recreate heuristics (\(LLH_6, LLH_7\)) as the first heuristics and two hill climbers (\(LLH_3, LLH_4\)) as the second heuristics.

For the travelling salesman problem the effect of hybridisation is valid within certain time intervals. Two hill climbers (\(LLH_7, LLH_8\)) are the most frequently used second heuristics. Another hill climber (\(LLH_6\)) and two crossover operators (\(LLH_9, LLH_{12}\)) rarely yielded new best solutions when applied as second heuristics. The mutational heuristics (\(LLH_0, LLH_1, LLH_3, LLH_4\)) and the only ruin-recreate heuristic (\(LLH_5\)) effectively performed as first heuristics.

The last problem domain, vehicle routing, took advantage of the hybridisation mostly during the first quarter of the search. Two hill climbers (\(LLH_4, LLH_8\)) were effective second heuristics. It is also possible to conclude that the other hill climber (\(LLH_9\)) and a mutation operator (\(LLH_1\)) are seldomly used as second heuristics. Different heuristics performed as first heuristics, two mutational heuristics (\(LLH_0, LLH_1\)), the ruin-recreate heuristics (\(LLH_2, LLH_3\)), one crossover operator (\(LLH_5\)), and one hill climber (\(LLH_9\)).

The resulting heuristic pairs exhibit a behavior somewhat similar to memetic algorithms and iterated local search. However, for some problem domains, the effect of these pairs for finding new best solutions is limited. Employing a dynamic decision mechanism in order to determine whether applying single heuristics or heuristics in pairs may be effective to fasten the search process and provide further improvements.

4.4 Varying performance of the heuristics

Figure 6 illustrates the varying \(QI\) values of each heuristic during the search process on a bin packing instance. The fluctuating \(QI\) suggests that the performance of each heuristic is different for different regions of the search space regarding the utilised performance metric. Among these heuristics a ruin-recreate heuristic (\(LLH_2\)) with a mutational heuristic (\(LLH_3\)) are mostly available in the heuristic set. On the other hand, two mutational heuristics (\(LLH_0, LLH_5\)) are mostly excluded despite their very effective performance during certain phases. During the last quarter of the execution time, these heuristics are always excluded. One hill climber (\(LLH_4\)) is excluded from the heuristic set from time to time but during the last quarter of the time it was not excluded anymore. A ruin-recreate heuristic (\(LLH_1\)) is frequently excluded, but still provided better performance for some regions of the search space. The last two heuristics, one is hill climber (\(LLH_6\)) and one is a crossover (\(LLH_7\)), are mostly a part of the heuristic set. However, the \(QI\) fluctuations of \(LLH_6\) are smaller compared to the \(QI\)s of the other heuristics except \(QI_1\). These results show that the performance variation of different heuristics should be regularly monitored and analysed.

Fig. 6
figure 6

\(QI\)-avg changes of the bin packing heuristics during a run for solving a bin packing problem instance (Quality index (\(QI\)) values showing the performance of the heuristics are depicted as solid lines. Average (avg) values of all the \(QI\) values through the search are indicated as dotted lines. The heuristics with a lower \(QI\) than “avg” are excluded)

4.5 Set size

As mentioned in the computational results, ADHS elevates the performance of the tested move acceptance strategies significantly. One of the major reasons behind this effective performance improvement is the method determining elite heuristic subsets for different search regions. For the bin packing problem, the number of heuristics used is generally four or five out of eight heuristics. For the max SAT problem, five or six heuristics stay out of the heuristics sets with size 11. The number of heuristics used for solving the permutation flowshop scheduling instances is mostly eight out of 15 heuristics. The heuristic set for the personnel scheduling problem domain could not be processed for elimination due to the low speed of the heuristics and the maximum execution time. For the travelling salesman problem, ADHS keeps the heuristic set as involving mostly seven heuristics out of 13. This value is six for the vehicle routing problem with ten heuristics.

5 Conclusion

In this study, the design and implementation of a new selection hyper-heuristic on a hyper-heuristic software framework, i.e., HyFlex, was discussed. The developed approach consists of an adaptive dynamic heuristic set (ADHS) strategy determining the best heuristic subsets along a number of iterations. For enhancing the effectiveness of ADHS, a learning automaton-based heuristic selection strategy and a pairwise heuristic hybridisation mechanism were employed. In addition, a new threshold-based move acceptance strategy, adaptive iteration limited list-based threshold accepting (AILLA), was accommodated. A set of experiments was carried out with nine additional hyper-heuristics over six problem domains with their specific heuristic sets. The experimental results indicated that the designed hyper-heuristic is an effective strategy for solving the given problem instances using their corresponding heuristic sets. Although the hyper-heuristic performs well on average, there is still room form improvement. For the max SAT problem, SR–SA generates superior results compared to the proposed hyper-heuristic. Also, ADHS–SA performs slightly better than ADHS–AILLA for the personnel scheduling problem. Furthermore, the successor of the proposed hyper-heuristic won the first international cross-domain heuristic search challenge (CHeSC 2011).

In future research, the proposed hyper-heuristic will be improved to cope with its possible adaptation issues for different heuristic sets. For this purpose, a feedback mechanism will be setup between the selection process and the move acceptance part. In addition, a two-phase heuristic selection mechanism consisting of selecting a heuristic subset and choosing a heuristic from the selected heuristic set in connection with a feedback mechanism will be utilised. Furthermore, the adaptive characteristic of the move acceptance strategy will be boosted. Finally, a number of decision mechanisms to manage these sub-mechanisms in collaboration will be developed.