Keywords

1 Differential Evolution

Differential evolution (DE) introduced by Storn and Price in [8] is a population-based evolutionary algorithm for problems with a real-valued cost function. DE is one of the most efficient optimization technique for solving a global optimization problem. A global optimization problem is defined in the search space \(\varOmega \) which is limited by its boundary constrains, \(\varOmega =\prod _{j=1}^{D}[a_j,b_j]\), \(a_j < b_j\). The objective function f is defined in all \(\varvec{x} \in \varOmega \) and the point \(\varvec{x}^*\) for \(f(\varvec{x}^*) \le f(\varvec{x}), \forall \varvec{x} \in \varOmega \) is the solution of the global optimization problem.

The main problem of DE consists in the control of the convergence. The convergence of DE becomes too fast and then only a local solution is found. When the convergence is too slow, fixed function evaluations are reached before the global solution is found. The speed of the convergence is dependent on the control parameters of DE - mutation scheme, crossover type and the values of F and CR. A comprehensive summary of advanced results in DE research is available in [2, 6], where several kinds of mutation and crossover were listed and some adaptive or self-adaptive DE variants are described.

Performance of the several DE strategies was compared in [3]. The performance depends on the speed of convergence (higher speed causes getting stuck of P in a local minimum area) and quality of found solution (higher quality means less function-error values of the solution found by the search process). The mutation variants with best performance are rand/1 (1), best/2 (2), and rand-to-best/1 (3)

$$\begin{aligned} \varvec{u} = \varvec{x}_{r_{1}} + F \cdot (\varvec{x}_{r_{2}} - \varvec{x}_{r_{3}}) \end{aligned}$$
(1)
$$\begin{aligned} \varvec{u} = \varvec{x}_{best} + F \cdot (\varvec{x}_{r_{1}} - \varvec{x}_{r_{2}}) + F \cdot (\varvec{x}_{r_{3}} - \varvec{x}_{r_{4}}) \end{aligned}$$
(2)
$$\begin{aligned} \varvec{u} = \varvec{x}_{r_{1}} + F \cdot (\varvec{x}_{best} - \varvec{x}_{r_{1}}) + F \cdot (\varvec{x}_{r_{2}} - \varvec{x}_{r_{3}}) \end{aligned}$$
(3)

where \(\varvec{x}_{r_{1}}, \varvec{x}_{r_{2}}, \varvec{x}_{r_{3}}, \varvec{x}_{r_{4}}\) are mutually different points \(r_1 \ne r_2 \ne r_3 \ne r_4 \ne i\) and \(\varvec{x}_\text {best}\) is best point of P.

Another study [14] is focused on the diversity of the population and avoiding a premature convergence of the algorithm. The recommended mutation strategies for increasing the population diversity are DE/rand/1 (1), DE/current-to-rand/1 (4), and DE/either-or (5). DE/best/ strategies are not recommended because they decrease the population diversity. The K parameter of the mutations current-to rand/1 and either-or is set up in two ways. An experimental study of various DE strategies [11] shows that a higher reliability is achieved by DE/current-to-rand/1 (4) if \(K = rand(0,1)\) and good results also occur for DE/either-or (5) when \(K = 0.5 \cdot (F + 1)\).

$$\begin{aligned} \varvec{u} = \varvec{x}_i + K \cdot (\varvec{x}_{r_{1}} - \varvec{x}_i) + F \cdot (\varvec{x}_{r_{2}} - \varvec{x}_{r_{3}}) \end{aligned}$$
(4)
$$\begin{aligned} \varvec{u} = \left\{ \begin{array}{ll} \varvec{x}_{r_{1}} + F \cdot (\varvec{x}_{r_{2}} - \varvec{x}_{r_{3}}) &{}\text {if} \ F > rand(0, 1)\\ \varvec{x}_{r_{1}} + K \cdot (\varvec{x}_{r_{2}} + \varvec{x}_{r_{3}} - 2 \cdot \varvec{x}_{r_{1}}) &{}\text {otherwise}\\ \end{array} \right. \end{aligned}$$
(5)

It was also shown that no DE strategy in the experimental comparison is able to outperform all the others in each optimization problem, which corresponds with the result of No-free-lunch theorem [13].

2 Proposed DE Variant with Selection of Mutant Vectors

The main motivation of this paper is to control the speed of the convergence in DE algorithm, namely to prefer exploration at the first stage of the search process and exploitation in the latter stage. We suppose that such a strategy prevents from getting stuck in a local minimum and supports appropriate convergence speed of the algorithm. We generate several (\(k_{max}\)) mutation vectors and select only one of them for the crossover and trial point generation. The selection is based on Euclidean distance from a chosen point in the population. The choice of the point and strategy of selection depends on the stage of the search process. A pseudo-code of newly proposed DEMD algorithm is shown in Algorithm 1. We are not able to distinguish the stages strictly a priori, which is valid generally for any optimization problem, so we propose a stochastic approach based on the ratio of (FES/maxFES) described in (6).

$$\begin{aligned} \textit{dist}= \left\{ \begin{array}{ll} \sqrt{\mathop {\sum }\nolimits _{j=1}^D{(u_{k,j}-x_{i,j})^2}} \quad k=1,2,3, \quad \text {if} \ \textit{FES}/\textit{maxFES} < rand(0,1)\\ \sqrt{\sum \nolimits _{j=1}^D{(u_{k,j}-x_\text {best,j})^2}} \quad k=1,2,3, \quad \text {otherwise}, \end{array} \right. \end{aligned}$$
(6)

where FES is the current count of function evaluations and maxFES maximum FES allowed for the run. The binomial crossover (7) is applied to the selected mutant point

$$\begin{aligned} y_{i,j} = \left\{ \begin{array}{ll} u_{i,j},\ \ &{} \text {if} \ rand_j(0, 1) \le \textit{CR} \ or \ j = rand_j(1, D)\\ x_{i,j},\ \ &{} \text {otherwise}.\\ \end{array} \right. \end{aligned}$$
(7)
figure a

The selection of an appropriate mutant point in statement at line 9 is specified vague. The point with minimal distance from the best point is selected in the second (exploitation) stage. In the first stage, the selection of the mutant point with the maximal distance from the current point \(\varvec{x}_i\) prefers exploration but may deteriorate convergence. That is why the strategy selecting the mutant point with the minimal distance from the current point \(\varvec{x}_i\) was also considered.

The selection of the mutation strategies for DEMD is based on the results in [3] and in [14]. Two different triplets of DE mutation strategies were chosen for newly proposed DEMD algorithm, hereafter labelled DEMD1 and DEMD2. DEMD1 uses three mutations defined by (1), (2), and (3), while DEMD2 the triplet of (1), (4), and (5) mutations. Moreover, each DEMD version can use a different strategy of the mutant vector selection in the explorative stage. When the mutant point with a minimal distance from the current point in the first stage is used, the label of the version has the “min” suffix, if the mutant point with maximal distance from the current point is selected, the suffix is “max”.

Compared to CoDE [12], where three trial points are also created and one is selected according the least value of the cost function, the newly proposed selection is not based on function values but on the distance from points of the population. The computational demands of this approach are smaller, especially in the case when the evaluation of function value is computationally expensive.

Table 1. Medians of function-error values for DEMD variants, \(D=10,\,30\).

The control parameters of F and CR in DEMD are adapted during the search process for each point \(\varvec{x}_i\). The parameter of CR \(_i\) is initialized randomly from a uniform interval (0, 1) and mutated during the search with a value from (0, 1) with the probability of 0.1, similarly to [1]. Parameter F is adapted as follows. In the first stage, N equidistant values \(F_i = i/N, \ i=1,2, \ldots , N\) are assigned to the points of population in random rank and modified using \(F_i \leftarrow F_i + 0.1\,*\,rand(0,1)\) in each generation. A similar method is applied in [10]. In the exploitation phase, the value of \(F_i\) for each current point is computed as random number from the uniform interval (0, 1) in each generation.

Table 2. Medians of function-error values for DEMD variants, \(D=50,\,100\).

The mutation can cause that a mutant point \(\varvec{u}\) moves out of the domain \(\varOmega \). In such a case, the values of \(u_j \not \in [a_j, \, b_j]\) are turned over into \(\varOmega \) by using transformation \(u_j \leftarrow 2 \times a_j - u_j\) or \(u_j \leftarrow 2 \times b_j - u_j\) for the violated component.

3 Experiments

Adaptive DE variants jDE [1], EPSDE [5], SaDE [7], SHADE [9], CoDE [12], and JADE [15] are used in comparison with the newly proposed DEMD variants. These adaptive DE algorithms are considered, according to [2], the state-of-the-art DE variants. Another adaptive algorithm in this comparison is the recently published IDE [10], which has proven very good performance. Three standard DE using DE/rand/1/bin, DE/best/2/bin, and DE/rand-to-best/1/bin strategies are also included into experimental comparison and their labels in results are derived from the strategies. These algorithms are compared with three newly proposed DEMD variants: DEMD1min, DEMD1max and DEMD2min. DEMD2max variant is not included into comparison due to its bad performance in preliminary experiments.

Table 3. Mean ranks from Friedman-rank test results for all the algorithms in comparison.
Table 4. Medians of function-error values and results of Kruskal-Wallis tests, \(D=10, \ 30\).

The test suite of 15 problems was proposed for a special session on Real-Parameter Numerical Optimization, a part of Congress on Evolutionary Computation 2015 [4]. This session was intended as a competition of optimization algorithms where new variants of algorithms are introduced. All algorithms are implemented and executed in Matlab 2010a on a standard PC. An experimental setting follows the requirements given in the report [4], where 15 minimization problems are also defined.

Our tests were carried out at four levels of dimension, \(D = 10,\,30,\,50,\,100\), with 51 independent runs per each test function. The function-error value is computed as the difference between the function value of the current point and the known function value in the global minimum point. The run of the algorithm stops if the prescribed amount of function evaluation \(MaxFES = D \times 10^4\) is reached or if the minimum function-error in the population is less than \(1 \times 10^{-8}\). Such an error value is considered sufficiently small for an acceptable approximation of the correct solution. The values of the function-error less than \(1 \times 10^{-8}\) are treated as zero in further processing. The population size of the state-of-the-art DE variants is set \(N=100\), for IDE \(N=50, 100, 200, 200\), and for remaining algorithms \(N=30\). The smaller size of P was determined based on previous experiments of standard DE. The control parameters for the standard DE are set \(F=0.8\) and CR \(=0.8\). The control parameter of DEMD is set \(k_{max}=3\). The remaining control parameters of the algorithms were set up to the values recommended by the authors in original papers.

4 Results

The medians of the function-error values for newly proposed DEMD variants are presented in Tables 1 and 2. In the last row, the count of the best median values per algorithm is given. High efficiency of DEMD1min variant is obvious (wins in \(\approx \) 53%), its efficiency increases with increasing dimension D.

Table 5. Medians of function-error values and results of Kruskal-Wallis tests, \(D=50, \ 100\).

The overall performance of all 13 algorithms was compared using Friedman test for medians of function-error values. The null hypothesis on the equal performance of the algorithms was rejected, achieved p value for rejection was \(p < 5 \times 10^{-7}\). Mean ranks of the algorithms are presented in Table 3. Note that the algorithm winning uniquely in all the problems has the mean rank 1 and another algorithm being unique loser in all the problems has the mean rank 13. In the last column of Table 3, the average mean rank is computed for all dimensions. It is obvious that the newly proposed DEMD1min has the least average mean rank.

Based on this comparison, three algorithms with the least mean rank in each dimension are selected and compared in more detail for each problem by Kruskal-Wallis test (Tables 4 and 5). In these Tables, medians of function-error values and p values of Kruskal-Wallis tests are presented. When p value is less than the significance level 0.05, the null hypothesis on the equal performance of the three algorithms is rejected. Counts of wins included shared, unique second positions and unique last positions are summarized at the bottom of the tables. The proposed DEMD1min variant wins in 35%, reaches the second position in 17% of the test problems while it appears on the third position only in 23% of the test problems.

5 Conclusion

Three new variants of DE algorithm with mutation selection based on mutant vector distance (DEMD) are proposed and applied to CEC 2015 test problems. The performance of DEMD variants is compared with the state-of-the-art adaptive DE variants. The results of this comparison show that the best performance is achieved by DEMD1min variant. This DEMD1min variant outperforms significantly the majority of the state-of-the-art adaptive DE variants and therefore could be mentioned at least as comparable to state-of-the-art. This simple idea of mutation selection will be studied in more detail in further research.