Keywords

1 Introduction

Multiple optima are still a challenge in many optimization problems. With standard approaches, one always balances exploitation of locality against exploration of the search space. In genetic algorithms [3] this is done by choosing the right evolutionary parameters and genetic operators. However, these values are fixed over the course of the optimization and do not take into account the current state of the population or where it is located in the search space. Genetic algorithms (GAs) base the guidance of their search on a single measure; the fitness. This single-value metric defines which individuals reproduce, and, over several generations, in which areas of the search space the population concentrates. In this study we want to open the discussion on using the genetic distribution of the population to guide the search process. To do this, we propose the use of genetic distances between individuals to modify the tournament composition in the selection stage of genetic algorithms. By measuring the correlation of the distances between individuals and their fitness, we believe that we can introduce mechanics which will enhance the algorithms performance.

This approach is related to niching methods [7, 8] but does not rely on fixed distances and is adaptive to the state of the convergence process. It is an example of how to use additional information to guide the search process. In our method, depending on an individual’s place in the population, only the chance of competing in a given tournament is modified. This results in a softer modification of the standard approach.

The rest of this article is structured as follows. In Sect. 2 the proposed method is presented. Section 3 describes the test cases and the evolutionary settings employed in this study. The results of the experiments are presented in Sects. 4 and 5 concludes this study and introduces further ideas and future areas of research.

2 Methodology

There are different ways to influence a GA’s convergence behavior through additional information. In this study, we influence the convergence of the algorithm by modifying which opponents compete in a tournament. Note that this is an alternative approach to fitness sharing [12], fitness penalties [5] or other measures which directly change the selection probability of an individual. Instead, depending on the state of the genetic algorithm, we alter a tournament’s opponents which can be genetically more or less diverse. In case of a very homogeneous tournament, meaning the individuals are similar, there is a high chance that the best individual of the local cluster will be selected. In more heterogeneous tournaments, as the individuals will not always have to compete against the best individual in the cluster, individuals in more distant regions of the search space have a higher chance of selection. Instead of modifying directly which individuals are selected, we basically give certain individuals a higher chance of participating in a tournament.

Fig. 1.
figure 1

The correlation of average distances to the other individuals and the individual’s fitness. The dark blue spot around (20, 20) is the optimum. When the individuals gather in one optimum, the negative correlation of fitness to the average distance to other individuals tends to be stronger than in the case where some individuals are located in a local optima. (Color figure online)

Figure 1 shows an example for the two different cases of a negative correlation between the individual’s average distance to each other and their fitness. In the left panel, the individuals gather in one optimum and the most central individualFootnote 1 of the population exhibits the highest fitness. On the right hand side, the correlation is weaker, as the best individual is no longer in the spatial center of the population. The optimal tournament will look different in both cases.

Fig. 2.
figure 2

Two probability transformations based on the rank distance to the initial tournament candidate. The x-axis shows the rank distance, the y-axis shows the probability of this rank’s selection into the tournament.

In each tournament, the initial candidate is chosen at random. This is done uniformly and does not differ from standard tournament selection. However, once the initial candidate is chosen, the other constituents of the tournament are chosen based on their rank distance. To build the rank-distances between the individuals, we initially measure the Canberra-Distance [6] of their genomes. The Canberra-Distance is used such that the different dimensions have similar importance. It is defined by:

$$\begin{aligned} d(u,v) = \sum _{i=1}^n \frac{|u_i-v_i|}{|u_i|+|v_i|}, \end{aligned}$$
(1)

where u and v are the vectors to compare, in this case the genomes. These distances are then turned into ranks r from 1 to N where N is the population size. Depending of the state of the algorithm a probability transformation is chosen to either favor individuals which are close to the initial candidate or solutions that are genetically further apart. This transformation is then applied to the rank distances to the other individuals and for each individual of the population a probability is assigned with which that individual is chosen for the current spot in the tournament. The probability of individual j to end up in a particular tournament \(p_j\) is then:

$$\begin{aligned} p_j = {\left\{ \begin{array}{ll} g_1(r_j) \quad \text {if}\quad \rho \ge \alpha \\ g_2(r_j) \quad \text {if}\quad \rho < \alpha \end{array}\right. } \end{aligned}$$
(2)

where \(\rho \) is this generation’s correlation between the average distance between individuals to their fitness score, and alpha is a threshold. The selection into the tournament is then done akin to a roulette wheel selection. This process is repeated for all opponents in the tournament. Once all opponents have been found, the tournament is performed and the tournament winner is determined and marked for reproduction and survival.

Figure 2 shows two probability transformations to select further individuals into the tournament. The x-axis shows the rank distance to the initial tournament candidate, and the y-axis the probability of the corresponding individual to be selected into the tournament. The transformations have been determined by Monte Carlo search over the Schwefel 1.2 problem (\(f_2(x)\)), shown in Sect. 3. This problem has been chosen as over the course of the runs, it has a wide range of values of \(\rho \).

3 Experiments

To test the new concept, we apply it to most of the benchmark problems proposed in [14]. The functions are listed in Table 1. Functions \(f_1\) to \(f_7\) are unimodal, and the functions \(f_8\) to \(f_{14}\) multimodal. The parameter n defines the dimensionality of the problem and S the range. \(f_\text {min}\) shows the minimum value of the optimization problem.

The functions are well-known benchmarks from the literature. \(f_1(x)\) is a simple sphere model, \(f_2(x)\), \(f_3(x)\), \(f_4(x)\) are Schwefel’s problems 2.22, 1.2, and 2.21 [13]. \(f_5(x)\) is the generalized Rosenbrock’s function [11]. \(f_6(x)\) is a step function and \(f_7(x)\) is a quartic function with noise. A generalized version of Schwefel’s problem 2.26 is given in \(f_8(x)\). \(f_9(x)\) is the generalized Rastrigin’s function [10], and \(f_{10}(x)\) Ackley’s function [1]. The benchmark function \(f_{11}(x)\) is the generalized Griewank function [4], and \(f_{12}(x)\) Shekel’s Foxholes function [9]. \(f_{13}(x)\), \(f_{14}\), and \(f_{15}(x)\) are Kowalik’s [14], Six-Hump Camel-Back [9], and Branin [2] functions.

All experiments are conducted with the same evolutionary parameters listed in Table 2. 300 runs are performed for both the standard tournament selection as well as for the novel distance-based tournament. The fitness is calculated by the mean-squared error (MSE). As noted in the previous section, problem \(f_2(x)\) has been used to find a piecewise-linear probability transformation by Monte Carlo search. The parameters of the piecewise function are listed in Table 3. We apply two different probability transformations \(g_1(r)\) and \(g_2(r)\) depending on \(\rho \). The ranks are split into three different segments, and the probabilities of selection into the tournament are interpolated based on the rank of the individuals.

Table 1. Benchmark functions
Table 2. Settings
Table 3. Parameters of the probability transformation
Table 4. Percentage of baseline MSE

4 Results

Table 4 shows the percentage of the mean-squared error in the distance-based tournament compared to the baseline runs for different quantiles and the mean. For example for problem \(f_{1}(x)\) the minimum MSE at the end of the run was 64.4% of the baseline result. It is clear to see that our method dominates the standard implementation in most of the problems. Except for \(f_{14}(x)\) and \(f_{15}(x)\) the median of the MSE is smaller in the distance-based approach. Especially in the unimodal case, the new structure of the tournament performs well. Remember that in these cases, the correlation between the average distance to the other individuals and the fitness is most likely high. This means, the probability transformation \(p_1(r)\) is applied predominantly. The method also outperforms the standard approach in the multimodal functions. However, the difference is not as large. Further performance improvements might be achieved by performing a more thorough search for the parameters of \(g_1\) and \(g_2\). The results are further presented in Fig. 3. The figure shows box plots of the results of all the benchmark problems. The reason for why the method does not work as well for functions \(f_{13}(x)\) to \(f_{15}(x)\) is not entirely clear. However, they all have a lower dimensionality in common. The loss of information from mapping low dimension genomes to the fitness might not be as large as in problems with higher dimensionality, dismantling the advantage of the distance-based method.

Fig. 3.
figure 3

The boxplots comparing distance-based tournament to the standard tournament selection.

5 Conclusion

We have presented a new distance-based tournament for genetic algorithms. By influencing the chance of selection into the tournament, chances of being selected for survival and reproduction are influenced. While this short article is only a proof-of-concept, we think that it shows the potential of influencing the tournament composition through additional information gained from analyzing the population’s distribution. The method does improve the performance of the GAs in most cases of a standard benchmark test set. Both the mean and the median of the MSE can be improved by the new method which incorporates the Canberra distances between genomes to construct tournaments in the selection mechanism. In future work we will try to further exploit the information that is in the distribution of the population to improve the search performance of GAs. Further, the technique can also be applied to genetic programing, when the genetic distances are replaced with distances of output vectors. Preliminary experiments show that the technique might be even more promising in this area.