1 Introduction

Differential evolution is a population-based algorithm which is firstly proposed by Storn and Price [1]. Three main operations incorporated in DE that simulate the natural evolution process are: mutation, crossover and selection. Although DE has proved its competitive performance compared to other evolutionary algorithms for solving different types of optimization problems [2,3,4,5,6], many theoretical analyses and experimental studies [7, 8] have shown that it is still quite dependent on the setting of control parameters such as scaling factor (F), crossover rate (\(\textit{CR}\)), population size (\(\textit{NP}\)), and its key operations of mutation/crossover strategies.

As a known fact that if the parameters or evolutionary strategies are well designed, the performance of DE algorithm can be improved. Therefore, many researchers introduced some new mutation/crossover strategies or combine multiple different mutation/crossover strategies [9,10,11,12,13,14] to improve the performance of DE, and lots of other researchers proposed different schemes to control the parameter settings [15,16,17,18,19]. It is interesting that in the early studies, the researchers mainly focus on the scaling factor and crossover rate, but the population size parameter was not widely studied in the scope of DE. In fact, the population size also has important impact on the exploration ability and exploitation ability of DE, and then influence on its performance [20]. Generally speaking, a large population size can extend the exploring space and increase the chance of discovering potentially better solutions, but ineluctably suffer from an undesirable and high computational cost. On the contrary, a deficient population size may cause the problem of premature convergence or stagnation.

In this paper, we only present a brief review of some recent improvement strategies about population size and some improved mechanisms in DE. In some early studies, the researchers raised a point that relating the value of \(\textit{NP}\) to the problem dimensionality (D), and suggested that \(\textit{NP}\) should be set between 5D and 10D. Based on this hint, \(\textit{NP}=5D\) or \(\textit{NP}=10D\) has been widely applied in many DE algorithms [21,22,23,24]. In recent years, a few DE algorithms with variable population size have already been proposed. Some researchers followed the opinion that DE should rather explore the search space during initial stage, but when time proceeds more attention should be put on exploitation, then introduced a few different reduction strategies to adjust the value of population size. For example, Brest and Maucec [25] proposed an improved DE algorithm which is initialized with large population size (10D) and after each \(25\%\) of the allowed function calls, \(\textit{NP}\) is reduced by half, to 1.25D in the last quarter of the run. Tanabe and Fukunaga [26] proposed a similar strategy that the population is reduced after every generation linearly. Ali et al. [27] proposed a new variant of DE, and the population size is dynamically reduced using a linear reduction method. Awadet et al. [28] proposed a DE algorithm which uses a niching-based reduction scheme to adapt the population size. The reduction strategies of \(\textit{NP}\) in DE are in-depth discussed in a recent paper [29]. Different from the above reduction strategies, Tirronen and Neri [30] introduced a population size adaptation mechanism based on the fitness diversity. Penunuri et al. [31] suggested that the population size should depend on the complexity of the objective function. In few algorithms introduced in recent years, such as [32, 33], the researchers suggested that the population size is increased or decreased, depending on whether the algorithm improves the best solution within a few last generations. Piotrowski [20] reviewed the opinions regarding DE population size setting and verified the impact of the population size on the performance of DE algorithms. Actually, there are some researchers had proposed a number of improved mechanisms to enhance the performance of DE algorithms, such as Guo et al. [34] introduced an efficient successful-parent-selecting framework, which essentially provide an alternative for the selection of parents during mutation and crossover, to improve the performance of DE. Fan and Yan [35] proposed a self-adaptive mechanism to enhance DE, and the mechanism includes that mutation strategies automatically adjusted with population evolution and zoning evolution of control parameters. Qiu et al. [36] integrated a bottom-boosting scheme and a partial-regeneration strategy to help DE handling various types of problems.

Although a number of works about adjusting the population size parameter have been proposed, there is still much room for improving the performance of DE. Motivated by the obtained results above, we propose an improved strategy that the value of population size can be fluctuant adjustment according to a pregiven rule during the optimization routine. To show the effectiveness of the proposed fluctuant population adjustment strategy, we apply six different DE algorithms with flexible \(\textit{NP}\) to compare with their older counterparts in the source papers.

The remainder of this paper is organized as follows. Section 2 briefly introduces the basic operators of the original DE algorithm. Section 3 provides a detailed description of the proposed population adjustment strategy and its overall procedure. Section 4 presents the comparison results between the DE algorithm with flexible \(\textit{NP}\) and their older counterparts in the source papers. Section 5 draws the conclusions.

2 Classical DE algorithm

The classical DE algorithms are briefly presented in this section. DE is a population-based algorithm, it consists of \(\textit{NP}\) individuals \(\varvec{x}_{i}\). Each individual \(\varvec{x}_{i}\) consists of D variables \(x_{i,j}\) restricted within the search interval of \([L_{j},U_{j}], j=1,2,\ldots ,D\). Usually, the initial individuals are randomly generated within the search interval. During one generation for each individual, DE employs mutation, crossover, and selection operations to search the individual with the best fitness value.

2.1 Mutation operation

DE employs a mutation operation to produce a mutant vector \(\varvec{v}_{i}=\big [v_{i,1},v_{i,2},\ldots ,v_{i,D}\big ]\) for each target vector \(\varvec{x}_{i}\). The followings are five most frequently used mutation operators implemented in various DE algorithms.

  1. (1)

    DE/rand/1

    $$\begin{aligned} \varvec{v}_{i}=\varvec{x}_{r_{1}}+F\cdot (\varvec{x}_{r_{2}}-\varvec{x}_{r_{3}}). \end{aligned}$$
    (1)
  2. (2)

    DE/best/1

    $$\begin{aligned} \varvec{v}_{i}=\varvec{x}_{best}+F\cdot (\varvec{x}_{r_{1}}-\varvec{x}_{r_{2}}). \end{aligned}$$
    (2)
  3. (3)

    DE/current-to-best/1

    $$\begin{aligned} \varvec{v}_{i}=\varvec{x}_{i}+F\cdot (\varvec{x}_{best}-\varvec{x}_{r_{1}})+F\cdot (\varvec{x}_{r_{2}}-\varvec{x}_{r_{3}}). \end{aligned}$$
    (3)
  4. (4)

    DE/best/2

    $$\begin{aligned} \varvec{v}_{i}=\varvec{x}_{best}+F\cdot (\varvec{x}_{r_{1}}-\varvec{x}_{r_{2}})+F\cdot (\varvec{x}_{r_{3}}-\varvec{x}_{r_{4}}). \end{aligned}$$
    (4)
  5. (5)

    DE/rand/2

    $$\begin{aligned} \varvec{v}_{i}=\varvec{x}_{r_{1}}+F\cdot (\varvec{x}_{r_{2}}-\varvec{x}_{r_{3}})+F\cdot (\varvec{x}_{r_{4}}-\varvec{x}_{r_{5}}). \end{aligned}$$
    (5)

In above equations, F is the control parameter called scaling factor, \(r_{1}, r_{2}, r_{3}, r_{4}\) and \(r_{5}\) are randomly selected integers from the set \(\{1,2,\ldots ,\textit{NP}\}\), such that \(r_{1}\ne r_{2}\ne r_{3}\ne r_{4}\ne r_{5}\ne i\). The vector \(\varvec{x}_{best}=(x_{best,1},x_{best,2},\ldots ,x_{best,D})\) is the best individual in the current population.

2.2 Crossover operation

Crossover operation generate trial vector \(\varvec{u}_{i}\) by picking the components from the target vector \(\varvec{x}_{i}\) or its offspring mutant vector \(\varvec{v}_{i}\) with probability. The widely used binomial crossover can be described as

$$\begin{aligned} u_{i,j} = \left\{ \begin{array}{ll} v_{i,j}, & {\text {if}}~\, \big ({\text {rand}}[0,1]\le \textit{CR}~\,{\text {or}}\,~j=j_{rand}\big )\\ x_{i,j}, & {\text {otherwise}}. \end{array} \right. \end{aligned}$$
(6)

The value of crossover rate \(\textit{CR}\) should be set within the interval (0, 1); \(\text {rand}[0,1]\) represents a uniformly distributed random number within the interval [0, 1], and \(j_{rand}\) is an integer randomly chosen from the set \(\{1, 2, \ldots , D\}\), which is used to ensure that the trial vector has at least one component different from the target vector.

2.3 Selection operation

The selection operation determines whether the trail vector or the target vector can be saved into the next iteration by their fitness values \(f(\cdot )\), and the better one will survive to the next generation. When deal with the minimization problems, the selection operator can be expressed as follows:

$$\begin{aligned} \varvec{x}_{i} = \left\{ \begin{array}{ll} \varvec{u}_{i}, & {\text {if}}\,~ f(\varvec{u}_{i})\le f(\varvec{x}_{i})\\ \varvec{x}_{i}, & {\text {otherwise}}. \end{array} \right. \end{aligned}$$
(7)

DE continues the search until the stopping criteria are met (that are frequently defined by setting the maximum number of function evaluations).

3 Description of fluctuant population strategy

As mentioned earlier, the selection of parameter settings in DE has an important role in the effectiveness of the algorithm. However, most of the existing DE variants basically adopt a fixed population size and remain unchanged during the whole optimization process. To raise a high-efficiency strategy for adjusting the population size of DE, it is necessary to understand how population size influence on the performance of DE. According to the computation rule of mutation/crossover strategies in DE, it is easy to know that larger population size usually means higher population diversity. In other words, the DE with large population size has more powerful exploration ability, but the DE diversity should decrease during the optimization process to allow ultimate convergence to some local optima. On the other side, the DE with deficient population size is easy to fall into the situation of premature convergence or stagnation.

Based on the conclusion above, \(\textit{NP}\) should be set a large number in the early stage, and gradually decrease during the run, meanwhile \(\textit{NP}\) also should have a degree of rebound to reduce the probability of falling into premature convergence or stagnation. Therefore, a periodic fluctuant control strategy for parameter \(\textit{NP}\) could be a good choice. As a result, we propose a new kind of fluctuant population (abbreviated as FP in the following) strategy. First, one parameter in FP strategy can be calculated as follows:

$$\begin{aligned} \textit{MP}_{t}=\frac{(\textit{MP}-1)\times (T-t)}{T}+1, \end{aligned}$$
(8)

where parameter \(\textit{MP}\) is the multiple that the initial population size (denoted by \(\overline{\textit{NP}}\)) at the beginning of optimization process compared to the problem dimensionality D, it means that \(\textit{MP}=\overline{\textit{NP}}/D\). Parameter T represents the allowed maximum function evaluations, and t is the index of the current function evaluation. \(\textit{MP}_{t}\) denotes the current value of \(\textit{MP}\) in the t-th function evaluation. It is easy to see that \(\textit{MP}_{t}\) linearly decreases from \(\textit{MP}\) to 1 during the running process. The specific value of \(\textit{NP}\) in the t-th function evaluation (denoted by \(\textit{NP}_{t}\)) can be calculated by the following formula,

$$\begin{aligned} \textit{NP}_{t}&=\text {round}\left( \left( \textit{MP}_{t}\times D-\underline{\textit{NP}}\right) \times 0.5\right.\\&\quad \left.\times \left( \cos \left( \frac{t\times \pi }{\textit{MP}_{t}\times D^{2}}\right) +1\right) \right) +\underline{\textit{NP}}, \end{aligned}$$
(9)

where “round” means that the corresponding fractions is rounded down, the parameter \(\underline{\textit{NP}}\) represents the minimum value of \(\textit{NP}\), and it must be set larger than the number of different randomly selected integers in the chosen mutation operation, otherwise difference vectors could not be constructed. Actually, formula (9) is the proposed FP strategy. In order to show the changing value of \(\textit{NP}\) more intuitively during the whole running process, we calculate every specific value \(\textit{NP}_{t}\) in each t-th function evaluation when \(\textit{MP}=5\), \(D=30\), \(\underline{\textit{NP}}=15\) and \(T=300{,}000\), and plot all the values in Fig. 1.

Fig. 1
figure 1

The changing value of \(\textit{NP}\) during the optimization process

From the formulas (8)–(9) and Fig. 1, we can see that the value of \(\textit{NP}\) is related to the problem dimensionality (D), and it is gradually decreasing at the macro level, owing to the computation rule of \(\textit{MP}_{t}\) in formula (8) and the part of \(\left( \textit{MP}_{t}\times D-\underline{\textit{NP}}\right) \) in formula (9). Furthermore, the specific value of \(\textit{NP}_{t}\) fluctuates periodically at micro-level, on account of the periodicity of the cosine function in the formula (9). Since the decreasing and increasing status are appear alternately, one thing needs to be solved is that which individuals are needed to delete, and how to add new individuals into the population. To handle this problem, we rearrange the whole population according to the fitness value of every individual, and save their location and fitness value to appointed corresponding array. In fact, during the optimization process, the individuals can be divided into two groups automatically: the current population (Group A) and the spare group (Group B). Compared to the value of \(\textit{NP}_{t-1}\) in the last generation, if the new value \(\textit{NP}_{t}<\textit{NP}_{t-1}\), then remove the worst \(\textit{NP}_{t-1}-\textit{NP}_{t}\) individuals from Group A and add them into Group B; if the new value \(\textit{NP}_{t}>\textit{NP}_{t-1}\), then add the best \(\textit{NP}_{t}-\textit{NP}_{t-1}\) individuals selected from Group B into Group A.

After the descriptions of the proposed fluctuant population strategy, we incorporate it into the DE algorithm and summarize the pseudocode of DE/best/1 algorithm with FP strategy into Algorithm 1.

figure a

From the Algorithm 1, compared to the original DE/best/1 algorithm, it is easy to see that the one with FP strategy only adds one parameter \(\underline{\textit{NP}}\), one kinds of calculation \(\textit{NP}_{t}\) via formulas (8)–(9), and one operation of rearranging the current population. On the whole, the adopted FP strategy is easy to implement and it also does not add much computation.

4 Comparison and result analysis

In this section, extensive experiments are carried out to evaluate the performance of FP strategy. Section 4.1 provides the experimental platform and the DE variants used for comparison. In Sect. 4.2, six DE variants with and without FP strategy are compared, and the DE variants with FP strategy are labeled to FP-DE for short.

4.1 Experimental platform and compared algorithms

Numerical simulations are executed on 30 artificial test functions in CEC 2014 [37] benchmark set (Table 1). Based on the characteristics of the 30 artificial test functions, they can be divided into the four classes: unimodal functions \((f_{1}-f_{3})\), multimodal functions \((f_{4}-f_{16})\), hybrid functions \((f_{17}-f_{22})\) and composition functions \((f_{23}-f_{30})\).

Table 1 Summary of the IEEE CEC 2014 benchmark functions

To verify the effectiveness of FP strategy, six different DE algorithms are selected, which include DE/best/1 with \(F=0.7, \textit{CR}=0.5\), SADE [38], JADE [39], MGBDE [40], SHADE [41], SinDE [42]. The selected six DE algorithms are used to verify that whether FP strategy can improve the performance of DE algorithms via comparing those selected DE algorithms with and without FP strategy.

4.2 Evaluation on FP strategy in DE algorithms

For all the aforementioned compared algorithms, except the population sizes \(\textit{NP}\), which respectively equal to D, the other involved control parameters keep the same with their corresponding literature. In the FP strategy, the parameters \(\textit{MP}\) and \(\underline{\textit{NP}}\) are respectively set to 5 and 15. We adopt fitness error value \(\big (f\big (\varvec{x}_{best}\big )-f\big (\varvec{x}^{*}\big )\big )\) to evaluate the performances of all the incorporated DE algorithms, where \(\varvec{x}_{best}\) is the best solution obtained and \(\varvec{x}^{*}\) is the known optimum solution. In addition, all the compared algorithms are tested 50 independent runs for every function and the mean results are used in the comparison, and the maximum allowable function evaluation are set to \(10{,}000\times D\) for all the test functions. The mean error (denoted by Mean) of the 50 runs and the corresponding standard deviation (denoted by SD) are calculated and presented in the numerical results tables.

To get a statistical conclusion of the experimental results, the paired Wilcoxon rank sum test is conducted at 0.05 significance level to assess the significance of the performance difference between FP-DE and each competitor. We mark the three kinds of statistical significance cases with “+”, “=” and “−” to indicate that FP-DE is significantly better than, similar to, or worse than each competitor, and the numbers of the three cases \((+/=/-)\) obtained by the compared results for all functions are listed at the bottom of the numerical results tables.

Table 2 Comparison results between DE/best/1, SADE, JADE, MGBDE and their respective variant when \(D=30\)
Table 3 Comparison results between SHADE, SinDE and their respective variant when \(D=30\) and \(D=50\)
Table 4 Comparison results between DE/best/1, SADE, JADE, MGBDE and their respective variant when \(D=50\)
Table 5 Comparison results between DE/best/1, SADE, JADE, MGBDE and their respective variant when \(D=100\)
Table 6 Comparison results between SHADE, SinDE and their respective variant when \(D=100\)
Table 7 Summary of the comparison results between DE algorithm with or without FP strategy

The detailed numerical results for the comparisons of these DE algorithms are listed in Tables 2, 3, 4, 5, and 6 and the statistical result based on Wilcoxons rank sum test is summarized in Table 7. According to the results reported in Table 7, we can draw three conclusions. First, the overall performance (297 “+”, 197 “=”, and only 46 “−”) has showed that the FP strategy can indeed improve the performance of original DE variants. Second, while dealing with the problems with different dimensions, for all the functions solved by six different DE variants when \(D=30\), FP-DEs perform obvious better than or similar to the original DEs in 109 and 62, respectively; only 9 functions become worse, and FP-DEs always keeps the similarly obvious advantage in the 50-D and 100-D problems, the exact score are 92 “+”, 71 “=”, 17 “−” when \(D=50\), and 96 “+”, 64 “=”, 20 “−” when \(D=100\). The obtained results show that the FP strategy is dimension-independent. Third, for any kinds of problems (i.e. unimodal, multimodal, hybrid and composition), FP strategy has enhanced all the performances of DE variants adopted in the comparative experiment except for MGBDE on unimodal functions, what implies that the FP strategy is problem-independent.

To depict the effect of FP strategy on the DE variants’ convergence characteristics, the convergence curves in terms of the mean error values of some selected algorithms and benchmark functions are expressed via Figs. 2 and 3, which show that FP-DEs have better or similar convergence performance. Moreover, according to the convergence curves, we can observe that most of the FP-DEs can obtain better solutions than their older counterparts, which reveal that FP-DEs can obtain the higher precision than DE algorithms with fix population size.

Fig. 2
figure 2

Convergence graphs (mean curves) for DE algorithms with or without FP strategy on functions \(f_{1}, f_{4}, f_{7}, f_{9}, f_{13}\) and \(f_{15}\) with \(D=50\) over 50 independent runs

Fig. 3
figure 3

Convergence graphs (mean curves) for DE algorithms with or without FP strategy on functions \(f_{17}, f_{18}, f_{20}, f_{21}, f_{27}\) and \(f_{30}\) with \(D=50\) over 50 independent runs

To test the robustness of the two key control parameters (\(\textit{MP}\) and \(\underline{\textit{NP}}\)) involved in FP strategy, we actually obtain the results of all the test functions with different parameter setting (what is listed in Table 8) on different dimensional problems, but for saving space and providing visible results, we only apply FP-JADE with different values of \(\textit{MP}\) and \(\underline{\textit{NP}}\) on 12 benchmarks functions when \(D=50\) to depict the influence of that two parameters, and the convergence graphs are plotted in Figs. 4 and 5.

Table 8 Parameter setting of different FP-DEs
Fig. 4
figure 4

Convergence graphs (mean curves) for the FP-JADE with different value of parameter combination on functions \(f_{1}, f_{4}, f_{7}, f_{9}, f_{13}\) and \(f_{15}\) with \(D=50\) over 50 independent runs

Fig. 5
figure 5

Convergence graphs (mean curves) for the FP-JADE with different value of parameter combination on functions \(f_{17}, f_{18}, f_{20}, f_{21}, f_{27}\) and \(f_{30}\) with \(D=50\) over 50 independent runs

Based on the obtained results and the convergence graphs plotted in Figs. 4 and 5, we can summarize three conclusions about the effect of parameter combination. First, parameters \(\textit{MP}\) and \(\underline{\textit{NP}}\) indeed influence the performance of the FP strategy, but the corresponding effects are relatively small. Second, DE with FP strategy based on different parameter combination are basically better than that without FP strategy, which strongly suggests that the FP strategy does have the effect of improving the effectiveness of the DE algorithm. Last, parameter \(\underline{\textit{NP}}\) seems to have a greater impact than parameter \(\textit{MP}\), maybe it’s because parameter \(\underline{\textit{NP}}\) has a more direct impact on the accuracy of solution than parameter \(\textit{MP}\).

5 Conclusions

In this paper, a fluctuant population (FP) strategy for DE was presented. The proposed FP strategy incorporated one monotone decreasing function which is used to coordinate the exploration ability and exploitation ability of DE, a cosine function which is applied to control the population diversity, a rearranging and auto-grouping technique which is adopted to remove or add individual. We compared six selected DE variants with or without the FP strategy to demonstrate its effectiveness.

The experimental studies were carried out on 30 global numerical optimization problems used in the CEC2014 special session on real-parameter optimization. A comparative study between DE variants with fixed population size and their modification with the proposed fluctuant population was presented, the result clearly indicated that FP-DEs improve the performance of their older counterparts significantly. An overall performance comparison was also carried out. The experimental results illustrated that FP-DEs achieve the better performance in all the 30, 50, 100 dimensional problems.