Keywords

1 Introduction

Evolutionary computing is a research area inspired by natural evolution [3]. The main feature of natural evolution is the survival of the fittest. In evolutionary algorithms, the initial population is generated randomly and the fitness of every individual is calculated. The best ones survive and reproduce, and so evolution progresses [3]. Because of simplicity, in evolutionary algorithms, the population size \( NP \) is constant, but we are aware that this is not the case in nature. Instead, the number of individuals in a population varies in different generations. Adaptive population size is still a challenging task, and, for now, we wanted to see how the population size affects the algorithms.

We briefly discus the related work in Sect. 2. The original differential evolution algorithm and its improved self-adaptive version are described in Sect. 3. A new algorithm is proposed in Sect. 4. Section 5 presents our experiments and results. In Sect. 6 we conclude our work briefly.

2 Related Work

In the paper [6], the authors investigated how different population sizes affect the differential evolution algorithm. The paper considered the effect of the population sizes 2D, 4D, 6D, 8D and 10D (D - dimension of the problem) with two different mutation strategies on problems chosen from the CEC 2005 Special Session on Real-Parameter Optimization. They found that a smaller population size with a greedy strategy converges fast but premature convergence and stagnation are more pronounced. A large population, with a strategy having good exploration capacity, does not prematurely converge or stagnate but it can converge very slow.

The same authors in another paper [5] have proposed a differential evolution algorithm with an ensemble of parallel populations having different population sizes, in which a more suitable population size takes most of the function evaluations adaptively. Although this paper uses multiple populations, it is related to our work in the manner that it explores how the population size affects the convergence. They found that the multi-population differential evolution algorithm was more efficient in obtaining better quality solutions than the conventional differential evolution algorithm.

A review article on the study of how the population size affects differential evolution [9] emphasizes that the inappropriate choice of the population size may seriously impede the performance of each differential evolution algorithm.

All those papers are considering the adaptation of population size in the conventional differential evolution algorithm, with other parameters (crossover and mutation rates) kept constant. Besides that, we are investigating the effect of changing population size along with adapting other parameters.

3 Background

3.1 Differential Evolution (DE)

In the DE algorithm [2, 4, 7, 8, 11, 12], population or candidate solutions are represented by real-valued vectors with \( D \) components (genes) [3]

$$\begin{aligned} \varvec{x}_i^{(G)} = {x_{i,1}^{(G)}, x_{i,2}^{(G)}, ..., x_{i,D}^{(G)}}, \end{aligned}$$
(1)

where \(i = 1,..., NP \) and \( NP \) is the population size. \( G \) represents a generation. The offspring are created through the mutation and crossover. The initial population is generated randomly between lower and upper bounds, which are defined by the problem. An evolutionary cycle starts with random selection of 3 vectors \(\varvec{x}_{r1}, \varvec{x}_{r2}, \varvec{x}_{r3}\) from the initial population. A mutation vector is then obtained by adding a perturbation vector to the first of those random vectors

$$\begin{aligned} \varvec{v}_i^{(G+1)} = \varvec{x}_{r1}^{(G)} + \varvec{p}^{(G)}. \end{aligned}$$
(2)

The perturbation vector \(\varvec{p}\) is the difference between two other randomly selected vectors multiplied by the scaling factor \( F \)

$$\begin{aligned} \varvec{p}^{(G)} = F(\varvec{x}_{r2}^{(G)} - \varvec{x}_{r3}^{(G)}). \end{aligned}$$
(3)

The scaling factor is a positive number, and has values \( F \in \left[ 0, \infty \right] \). In most cases, the scaling factor occupies values between \( F \in \left[ 0, 2\right] \). The second step in the reproduction is usually a binomial crossover, which has one parameter, the crossover probability \( CR \in [0, 1]\). It creates a trial vector, combining elements of the mutation vector and the corresponding parent vector as

(4)

\( CR \) determines the probability that the trial vector takes a component from the mutation vector, and \(j_{rand}\) is a randomly chosen integer in the range 1, ..., D which provides that at least one component of the trial vector is changed in regard to the previous generation. A selection of the offspring that will proceed to reproduction in next generation comes at the end of the evolutionary cycle. The fitness value of the trial vector is compared to the fitness value of the previous population member, the parent vector. The fittest member is allowed to be reproduced further

(5)

3.2 Self-adaptive Differential Evolution (jDE)

DE has three parameters, namely \( F \), \( CR \), and \( NP \), and their tuning can improve the performance of the algorithm greatly. In the original DE  these parameters are specified before the evolutionary cycle, and remain fixed during each generation of the algorithm. That is in contrast with the dynamic nature of evolutionary computing itself. Furthermore, different values of parameters can be optimal at different stages of the evolutionary process. A better approach is to use self-adapting parameters. In an improved algorithm, that the authors named jDE  [1], all population members are extended by the control parameters \( F \) and \( CR \). Adaptive changes of the control parameters should give better individuals, in the sense that they will have better fitness values. In jDE  new control parameters are calculated as

(6)

and

(7)

Here, \(\tau _1\) and \(\tau _2\) represent small probabilities when control parameters should be changed, and \(rand_j\), \(j = (1,2,3,4)\) are uniform random numbers in the range [0, 1]. \( F _{i,G+1}\) and \( CR _{i,G+1}\) are computed before the mutation, so they have an impact on mutation, crossover and selection operations when making offspring.

4 Self-adaptive Differential Evolution with Restarts (rjDE)

figure a

Our main work is based on the newly created algorithm, called rjDE, that is presented in Algorithm 1. The algorithm is designed for tackling CEC 2019 problems proposed in the technical report [10] for the 100-Digit challenge. The proposed rjDE algorithm is derived from the jDE algorithm, so it uses the same technique for control parameters adaptation over each evolutionary step.

The jDE algorithm can have the same problem as the DE algorithm. Both algorithms have fast convergence, and can be trapped into local optima. When the basic DE converges to some local optima, its population diversity is decreased. In order to avoid that, we added Line 15 to Algorithm 1 that checks if the i-th individual fitness value is close to that of the best individual. If this condition is true, we increase the counter. When some individuals obtained similar fitness values as the best one, we can assume that population diversity is also decreased. Therefore, a reinitialization of the population takes place. A restart in rjDE  i.e. population reinitialization, occurs when the fitness values of \(\upalpha \cdot NP \) individuals differ from the best fitness value by less than a very small value \(\varepsilon \).

Table 1. Fifty runs of DE for each function sorted by the number of correct digits.

5 Experiments and Results

We experimented on CEC benchmark functions [10] and followed their rules for computing scores. In [10] there is no limit on the maximum number of function evaluations (\( MaxFEs \)), but in this work we set \( MaxFEs = 10^7\).

We analyzed three different algorithms: DE, jDE, and rjDE on population sizes \( NP = 50, 100, 200, 400, 800, 1600\).

We show results for population size 100 first, and, later, compare results for all population sizes, on all 10 functions, including all 3 algorithms.

5.1 Results for \( NP = 100\)

Table 1 shows the results for all benchmark functions using the original DE algorithm. Other parameters that we used in this algorithm are \(F = 0.5\) and \(CR = 0.9\). It can be seen that DE obtained 10 correct digits for functions F1 and F2, while for function F10 it obtained no correct digits. Functions F4, F7 and F8 also have scores almost equal to zero, from which we can see that the original DE algorithm is not suitable for solving those functions when using a population with size 100.

The jDE algorithm solved functions F1, F2 and F4 to 10 correct digits. For all other functions, scores were bigger than one correct digit. Those results are shown in Table 2. Parameters used in this algorithm are \(F_l = 0.1\), \(F_u = 0.9\), \(\tau _1 = \tau _2 = 0.1\). Initial control parameters were \(F = 0.5\) and \(CR = 0.9\). It is obvious that jDE is better for solving single objective optimization problems than the simple DE.

Table 2. Fifty runs of jDE for each function sorted by the number of correct digits.
Table 3. Fifty runs of rjDE for each function sorted by the number of correct digits.

Table 3 shows the score for each function for the rjDE algorithm. The highest score 10 was obtained for 7 out of 10 functions. Parameters used here were the same as for the jDE algorithm, along with two additional parameters, \(\varepsilon = 10^{-16}\) and \(\upalpha = 0.5\). Functions F3, F8, and F9 seem to be the difficult ones for rjDE.

Total scores for DE, jDE and rjDE are 35.56, 61.68 and 78.12, respectively.

5.2 Results for Different Population Sizes

In Table 4 we present the performance of the DE algorithm on all 10 benchmark functions and all 6 population sizes. It is obvious that performance of DE is not increasing nor decreasing continuously when the population size increases, but we can observe that it has by far the best score for \( NP = 400\). Observing all particular functions, it can be seen that the best performance was for the function F1, namely 10 correct digits were obtained for every population size. For the function F2, DE obtained 10 correct digits for population sizes \( NP = 100, 200, 400, 800\). Too small and too big population sizes obviously have a bad impact on this function, but too small an \( NP \) is still worse than too big. For function F6, the DE algorithm reached 10 correct digits for \( NP = 800, 1600\), and all other scores were bigger than 5. Functions F3, F5 and F9 all have scores equal to or greater than 1, while for the remaining functions, zero correct digits were obtained 2 or 3 times. The best total score, equal to 42.68, was obtained for \( NP = 400\).

Table 4. Scores for each function and different population sizes using DE.

Performance of the jDE algorithm is shown in Table 5. It is obvious that this algorithm has better performance than the previous one. The best score (10 correct digits) was obtained 17 times out of 60 possibilities. For function F1 jDE obtained 10 correct digits for every population size, for F2 and F5 4 times and for F4 3 times. Zero correct digits were obtained only 4 times. The algorithm performed worst on function F7. The best total score, 66.48, was obtained again for population size 400.

Table 5. Scores for each function and different population sizes using jDE.

Table 6 presents results for the rjDE algorithm. This algorithm reached 10 correct digits 37 times out of 70, which is 52.85% of overall performance. The worst result, zero correct digits, was reached only once, for function F10 and population size 1600. The best total score, 78.6, was obtained for population size of 50.

Table 6. Scores for each function and different population sizes using rjDE.

It is obvious that the population size affects the performance of all differential evolution algorithms. For the jDE and rjDE  it seems that the total score increases when we increase the population size until it reaches the maximum for some \( NP \), and then decreases for bigger population sizes. For the DE algorithm, we can notice a small deviation from that observation. The maximum for different algorithms is obtained for different population sizes: For DE  the best population size is 400, followed by 800, for jDE the best population sizes are 400, 200, and for rjDE  50 and 100. Obviously, some algorithms perform better on bigger populations, while the others give better results for smaller population sizes. Mean numbers of restarts for all runs, for each benchmark function and each population size are shown in Table 7. The restarts were most frequent for small population sizes \( NP = 25, 50\) while for the population sizes \( NP = 400, 800,1600\) there were no restarts at all.

Table 7. Mean number of restarts for 50 runs and different population sizes.

We followed the rules that were suggested for the CEC 2019 competition.

6 Conclusion

We analyzed three algorithms: DE, jDE and rjDE  on the CEC 2019 benchmark functions with different population sizes, in order to see how the population size affects their performance. We followed the rules of the CEC 2019 competition. Our analysis shows that the population size affects the performance of those algorithms in the manner that it increases the total score until it reaches maximum. Further increment of the population size decreases the total score. The self-adaptive differential evolution with reinitialization has proven to have the best results when performing on selected benchmark functions. For the future work, we plan to run the algorithms with a greater maximum number of function evaluations. We also plan to investigate linear population reduction methods such as L-SHADE [13].