1 Introduction

Evolutionary algorithms (EAs) are inspired by biological evolution. Proposed by Storn and Price (1997) and originally used to solve the Chebyshev polynomial problem, differential evolution (DE) algorithms are among the popular metaheuristic EAs. A population-based stochastic search technique, a DE algorithm is simpler to execute, requires fewer parameters and retains the more of the diversity of a population than other metaheuristic EAs. Furthermore, compared with other metaheuristic EAs, DE is more reliable and converges more quickly for optimization problems (Hu et al. 2013; Salman et al. 2007). DE has been applied in various areas (Storn and Price 1997), such as energy planning (Rajesh et al. 2016), scheduling (Tang et al. 2014), and supportive decision-making (Xue et al. 2009). In addition, it has been used to solve the traveling salesman problem (TSP) (Mi et al. 2010; Sauer and Coelho 2008) and other optimization problems (Iacca et al. 2012).

In classic DE, each new candidate solution is a combination of the parent individual and several other individuals in the population. A candidate that has better fitness than its parent replaces the parent. In addition, the search mechanism of a DE algorithm includes the mutation, crossover, and selection operations present in evolution. As with other EAs, a DE algorithm has only three crucial parameters: the scaling factor (F) of the difference vector, crossover rate (CR), and population size (NP) (Jia et al. 2011; Lee and Chiang 2011). DE does have some weaknesses, major among which are unstable convergence, slow convergence in the late stages of evolution, and a tendency to become stuck at a local optimal solution. Nonetheless, the use of suitable parameter values obviates these weaknesses such that a DE algorithm converges faster and yields superior solutions than when the parameters are not well selected. In addition, the properties of the parameters are highly correlated with the problems—that is, the objective functions—to be solved.

In the past few decades, researchers have improved many DE algorithms in three crucial ways: first, by modifying the procedure of implementing DE; next, by applying classic DE in combination with other search strategies such as the vector to be perturbed selection strategy, number of difference vectors considered for perturbation, and type of crossover being used (exponential or binomial) (Storn and Price 1997; Price et al. 2005), even with strategy adaptation (Qin et al. 2009); and finally, by proposing new parameters for the tuning mechanism (Das et al. 2005; Das and Suganthan 2011; Lee and Chiang 2011; Trivedi et al. 2015). First, with regard to the modified procedure for DE, Chuan-Kang and Chih-Hui (2009) proposed Poisson differential evolution using more than one difference vector based on a Poisson distribution, which significantly improved on DE in terms of solution quality and convergence speed. Second, a few researchers published studies focusing on DE in combination with other search strategies. Zhang and Sanderson (2007) implemented a strategy called “DE/current-to-p-best” that both improved the rate and reliability of convergence performance and employed scaling factors that had a distribution that was two-thirds the Gaussian distribution and one-third a uniform distribution. Wang et al. (2016) proposed a self-adaptive parameter dynamics DE algorithm and modified the algorithm’s framework by implementing a new mutation strategy: DE/Current-to lbest/1. This modified framework algorithm maintains a balance between exploitation and exploration capabilities during DE. Zhang and Sanderson (2007) showed that the CR in DE obeys a normal distribution. Lee and Chiang (2011) proposed the use of a perturbation operator in classic DE to improve the DE algorithm’s exploration and exploitation abilities, enabling individuals to more easily escape the local optimum. Lastly, numerous studies have verified that the values of both the mutation and crossover factors critically influence the performance of a DE algorithm (Chen and Chiang 2015). Mezura-Montes et al. (2006) empirically identified the DE algorithms that had highest performance for various global optimization problems. However, the researchers noted that parameter tuning is time-consuming. Brest et al. (2006) presented an improved DE algorithm with dynamic parameter values, and this algorithm randomly obtained values of the scaling factor and CR within the range of each parameter. Likewise, Dexuan and Liqun (2012) concluded that establishing suitable parameter values is crucial for efficiently solving complex problems. Therefore, they proposed a method that modifies the scaling factor, modifies the CR, and obeys a uniform distribution according to a linearly increasing strategy. Moreover, Li and Yin (2016) introduced self-adaptive parameter setting by using uniform random numbers that follow the Gaussian distribution; this enhanced the diversity based on the two new parameters proposed in a previous period. Chen and Chiang (2015) created a series of DE algorithms with adaptive parameters and summarized the six characteristics of a favorable adaptive scheme for controlling the parameters of DE algorithms: (1) adaptable to different problems, stages, and individuals; (2) revisable parameter values based on iterations or other factors; (3) considers population diversity to prevent premature convergence; (4) considers reusing successful parameter values; (5) applies a small perturbation to individuals with favorable fit; and (6) relies on a random distribution, in which case, both the mean/location and deviation/range are important.

As noted, adjusting or controlling parameters manually is time-consuming. Therefore, the purpose of the present paper is to propose an enhanced DE algorithm with self-adaptive adjustable parameters based on individual fitness performance (SADE-FP). The adjustment mechanism of SADE-FP relies on determining the mutation factor based on the fitness of individuals and generating crossover factors from the Gaussian distribution, as proposed by Lee and Chiang (2011). In addition, we compare the performance of SADE-FP with PsDE (Lee et al. 2011) and classic DE (Storn and Price 1997) for the solution of seven benchmark problems; the results of this comparison verify that the proposed SADE-FP performs well and is efficient. The rest of this paper is organized as follows. Section 2 reviews classic DE and recent adaptive DE variants. Section 3 describes the proposed SADE-FP algorithm. Section 4 presents the experimental investigation and a discussion of the results and their implications. Finally, our conclusions and suggestions for future research are presented in Sect. 5.

2 Related work

The classic DE algorithm is a population-based metaheuristic algorithm proposed by Storn and Price (1997). In this algorithm, the dimension vectors of an individual in a population represent a combination of decision variables as a reasonable solution of the objective function. An individual in the population of each generation is represented by a D-dimensional vector, expressed as \(X_{i,G} = (X_{1,G} ,X_{2,G} ,\ldots ,X_{i-1,G} ,X_{i,G} )\). A population consists of NP individuals: \(i = 1, 2\), ..., NP. In classic DE, NP does not change during the evolution. G denotes one generation, and the maximum number of generations (\(G_{{\max }})\) is the stopping criterion for the algorithm’s search procedure.

The classic DE algorithm was proposed in 1996, and since then, many researchers have attempted to increase its effectiveness; specifically, to address its weaknesses, such as unstable convergence, slow convergence in late evolution stages, and tendency to become stuck in the local optimal solution (Jia et al. 2011; Lee and Chiang 2011). DE algorithms have improved in three aspects (Lee and Chiang 2011): by applying classic DE in concert with other strategies, by improving the control parameters (Brest et al. 2006; Liu and Lampinen 2005), and by forming a hybrid of classic DE with other algorithms (Yildiz 2013; Ponsich and Coello 2013).

With regard to improving the control parameters, there are two ways in which parameters can be set: parameter tuning and parameter control (Eiben et al. 1999). In parameter tuning, a common approach, favorable parameter values are identified before running the algorithm, and the algorithm is then implemented using these values, which are not changed during the implementation. Parameter control, also commonly employed, is similar to parameter tuning but allows the parameter values to change during the implementation (Brest et al. 2007). Given that the parameters must be adjusted manually before it can be used, the classic DE algorithm employs parameter tuning as its strategy for parameter identification. Storn and Price (1997) considered the scaling factor to be most effective in the range \(F = [0.5, 1.0]\); they noted that if parameter F is smaller than 0.4 or larger than 1, the algorithm performs more poorly. The crossover rate should be set within the range \(\textit{CR}= [0.9, 1.0]\) to ensure high convergence speed. To increase divergence and prevent arrival at the optimal solution prematurely, Zaharie (2007) found that when a binomial crossover strategy is used, F should be greater than 0.18 (under the conditions \(\textit{NP} = 50\) and \(D = 30\)). However, in recent research (De Falco et al. 2014; Jiang et al. 2013), the use of parameter control has been shown to improve the performance of DE. Parameter control can be divided into adaptive and self-adaptive parameter control. In self-adaptive parameter control, the processes of searching for optimal parameter values and optimal solutions are combined. By contrast, adaptive parameter control separates the optimal solution search process from the ideal parameter values search process. Unlike in self-adaptive parameter control, the mechanism for updating the ideal parameter values in adaptive parameter control is not part of the optimization cycle (Aleti and Moser 2016). Brest et al. (2006) proposed a self-adaptive parameter mechanism that generates F and CR randomly. When F is relatively small, each individual searches its neighboring area, and thus convergence occurs quickly. Omran et al. (2005) made F self-adaptive by randomly selecting the scaling factor from the current individuals and then multiplying the selected scaling factor by a number generated from the Gaussian distribution. This self-adaptive method, which is similar to the method whereby DE involves mutation, generates a nonfixed scaling factor and improves the quality of the solution compared with classic DE. Furthermore, Omran et al. (2005) commented that each individual can select from the target vector (\(X_{i,G} )\) and mutation vector (\(V_{i,G} )\) equally when the CR follows the normal distribution N(0.5, 0.15). According to all these studies (Brest et al. 2006; Omran et al. 2005), each individual can have its own scaling factor and CR. Nonfixed and self-adaptive parameter setting avoids the time-consuming process of manual parameter tuning and allows for a high-quality solution to be obtained easily and with rapid convergence.

In the present study, we propose a method that relies on self-adaptive adjustment of parameters. We combine self-adaptive adjustable parameters with other strategies to avoid the time-consuming process of parameter setting, thus increasing the speed at which a stable solution is found. Specifically, our approach adjusts the scaling factor according to the procedure described by Brest et al. (2006) so that each individual has its own parameter set. Individuals in the search space can become more diverse and extend their search areas when F is large. When F is small, individuals focus on searching their neighboring areas and prioritize convergence. From this setting perspective, we propose the ideal scaling factor value, which should be small to obtain favorable solutions. Individuals with poor fitness should explore more, whereas individuals with favorable fitness should exploit more. Adjusting the concept of the crossover rate—referring to Omran et al. (2005) and Lee and Chiang (2011)—we employ a normal distribution. Each individual selects from the target vector (\(X_{\hbox {i},G}\)) and mutation vector (\(V_{i,G}\)) equally when we use a normal distribution with an average equal to 0.5. We also use the perturbation strategy (Lee and Chiang 2011) before mutation to ensure that the algorithm does not become stuck at a local optimum. Owing to use of the self-adaptive adjustable parameters and a perturbation strategy, the proposed algorithm is efficient and effective at solving problems.

3 Proposed SADE-FP algorithm

We propose the self-adaptive adjustable parameters in DE based on fitness performance and the perturbation strategy (SADE-FP) algorithm by referring to the DE/rand/1/binomial crossover variant designed by Chiang et al. (2013). Figure 1 shows the example of perturbation strategy. In addition to proposing the self-adaptive mechanism of adjusting the scaling factor F, we propose a new model of the ideal scaling factor value, as shown in Fig. 2. According to individual fitness performance, we force the scaling factor to obey a cosine distribution. Furthermore, when the current fitness of an individual is poor, the value of their scaling factor is large in the next iteration. By contrast, when the current fitness of an individual is good, the value of their scaling factor is small in the next iteration. The special notation used to describe SADE-FP is as follows:

d::

number of dimensions of an individual \((d = 1, 2, 3, {\ldots }, D)\)

\(\left| {F_i \left( G \right) } \right| \)::

ideal scaling factor (F) of the \(i{\mathrm{th}}\) individual in the \(G{\mathrm{th}}\) generation

\(F_s\)::

ideal value of the scaling factor

\(F_b\)::

lower bound of the ideal value of the scaling factor function

\(A_i \left( G \right) \)::

current fitness of the \(i{\mathrm{th}}\) individual in the \(G{\mathrm{th}}\) generation: \(A_i\,\,=\) [0, 1]

\(X_b\)::

individual in the population with the highest fitness value

\(X_\mathrm{w}\)::

individual in the population with the lowest fitness value

The SADE-FP algorithm applies the perturbation strategy (Lee and Chiang 2011) to the current best individual in the population (\(X_b )\) before performing the classic DE search procedure, in which two dimensions of \(X_b \) are selected randomly and exchanged to improve a partially evolved problem. For example, as shown in Fig. 1, \(X_b = (15.2, 33.5, {\ldots }, 8.4, 9.54)\) with D dimensions; the two dimensions randomly selected for perturbation are the \(2{\mathrm{nd}}\) and (\(D-1){\mathrm{th}}\) dimensions, the values of which are switched. The new current best individual in the population is then \(X_b\,= (15.2, 33.5, {\ldots }, 8.4, 9.54)\). The objective of the perturbation in SADE-FP is for the current best individual to improve the local solution performance.

Fig. 1
figure 1

Conceptual example of the perturbation strategy

Furthermore, in line with Chiang et al. (2013) and Lin and Cheng (2015), we determine that the ideal scaling factor value is smaller in favorable solutions and larger in poorer solutions. An individual with poor fitness should explore more; conversely, an individual with favorable fitness should exploit more. Therefore, \(F_i \left( G \right) \) is obtained based on the current fitness (\(A_i\)) of \(X_{i,G} \). Here, \(A_i \left( G \right) \) is the normalized value of the \(i{\mathrm{th}}\) individual’s fitness in the \(G{\mathrm{th}}\) generation, as given by Eq. (1). \(f\left( {X_{i,G} } \right) \) is the nonnormalized fitness of individual i in generation G; \(f\left( {X_\mathrm{w} } \right) \) is the fitness of the worst individual in the population, whereas \(f\left( {X_b } \right) \) is the fitness of the best individual. Subsequently, the ideal value of the scaling factor of the individual (\(F_i^{\mathrm{ideal}} )\) is obtained by substituting \(A_i \left( G \right) \) into Eq. (2):

$$\begin{aligned}&A_i(G) = \frac{f(X_\mathrm{w})-f(X_{i,G})}{f(X_\mathrm{w})-f(X_{b})} \end{aligned}$$
(1)
$$\begin{aligned}&F_i \left( G \right) = F_s \times \frac{1+\cos \left( {\pi *A_i \left( G \right) } \right) }{2}+F_{b} . \end{aligned}$$
(2)

The ideal scaling factor function is a nonlinear cosine-based function with the boundaries [\(F_b \), \(F_s \)] and is illustrated in Fig. 2. An individual with poor fitness has a larger scaling factor, whereas an individual with favorable fitness has a small scaling factor. Thus, the ideal scaling factor function is implemented according to the concept of a fitness performance-based search strategy. Given that the new individual scaling factor F is obtained before mutation is performed, it influences the mutation and selection operations of the new vector.

Fig. 2
figure 2

Distribution of scaling factor (F) values

Table 1 Benchmark test functions employed

The procedure of SADE-FP is as follows:

  • Step 1 Initialization: randomly set the parameter values within the restrictions and initialize the objective function (\(X_{i,G} )\).

  • Step 2 Redefine \(X_b \) and \(X_\mathrm{w} \).

  • Step 3 Perform perturbation (Fig. 1).

  • Step 4 Calculate each individual’s fitness and new scaling factor F [Eqs. (1) and (2)].

  • Step 5 Perform the mutation operation: randomly choose three individuals to generate the mutant vector (\(V_{i,G+1} )\):

    $$\begin{aligned}&V_{i,G+1} =X_{r1,G} \,+F\cdot \left( {X_{r2,G} -X_{r3,G} } \right) ,\,\,r1\ne r2\ne r3,\nonumber \\ \end{aligned}$$
    (3)

    where \(r1, r2,r3\in \left\{ {1,2,\ldots NP} \right\} \) are random integers, each of which differs in accordance with index i. The scaling factor F is a real and constant factor \(\in \left[ {0,2} \right] \) that controls the scale of difference between \(X_{r2,G} \hbox { and }X_{r3,G} \).

  • Step 6 Perform the crossover operation. Mix the target individual with the mutant vector to yield the trial solution (\(U_{i,G+1} )\). CR is set such that it obeys the Gaussian distribution (0.5, 0.1) [Eq. (4)] and is generated for every individual.

    $$\begin{aligned} U_{i,j,G+1} =\left\{ {{\begin{array}{ll} {V_{i,j,G+1}}, &{} \quad \mathrm{if}\,\mathrm{rand}_{j} \le { CR} \\ {\,\,X_{i,j,G}},&{} \quad \mathrm{otherwise} \\ \end{array} }} \right. , \end{aligned}$$
    (4)

    where j denotes the dimension of individual i in the \(G{\mathrm{th}}\) iteration. The larger the CR, the larger the movement of the individual in one generation, where the movement of the individual is \(\left( {U_{i,G+1} -X_{i,G} } \right) \).

  • Step 7 Compare the fitness of the target individual (\(X_{i,G} )\) and the trial solution (\(U_{i,G+1} )\) and retain the better of the two as the new individual in the next iteration (\(X_{i,G+1} )\):

    $$\begin{aligned} X_{i,G+1} =\left\{ {{\begin{array}{ll} {U_{i,G+1}} ,&{}{if}\,f\left( {U_{i,G+1} } \right) \,\le \,f\left( {X_{i,G} } \right) \\ X_{i,G} ,&{} {otherwise} \\ \end{array} }} \right. . \end{aligned}$$
    (5)
  • Step 8 When the iteration criterion (T) is met, stop; else, return to Step 2.

4 Experimental study and discussion

4.1 Experimental environment and benchmark test

To investigate the performance of various DE algorithms, 21 benchmark functions were used. The corresponding bounds of each dimension (searching space), optimal values of each dimension (\(x^*\)), and minimal fitness value of each function [\(f(x^{*})\)] are listed in Table 1 (Fan and Yan 2015; Lee et al. 2011). We verified the performance and accuracy of the proposed SADE-FP algorithm by using the six-hump camel-back function (\(f_{7}\)). The function operates in only two dimensions and has six local minima. A series of experiments was performed for the other six test functions, each of which had dimension sizes of 30 and 60 (Lee et al. 2011). All test functions were minimization problems, of which three were unimodal functions (\(f_{1}\)\(f_{3}\)) and four were multimodal functions (\(f_{4}\)\(f_{7}\)). The unimodal functions were used to verify the solution quality and convergence of the algorithm, whereas the multimodal functions, which usually have many local optima, were used to demonstrate that the algorithm does not become stuck at the local minima of the search space.

Two variants of the DE algorithm were used for comparison with the proposed SADE-FP algorithm: classic DE (Storn and Price 1997) and PsDE (Lee et al. 2011). The parameter values used for each of these algorithms are listed in Table 2.

To increase the efficiency of the perturbation strategy, we set the population size to 50. Each algorithm was executed over 30 independent runs and 1000 iterations. The desired accuracy was achieved by performing dimensions \(\times \) 1000 fitness evaluations (FEs), which indicate 30,000 function evaluations in the case of 30 dimensions and 60,000 function evaluations in the case of 60 dimensions. We performed the experiments on a 64-bit Windows 7 computer with an i5-2400 CPU and 8 GB of RAM.

Table 2 Parameter values used for the three DE algorithms
Table 3 Means and SDs of the best fitness values obtained using the three DE algorithms over 30 independent runs with 1000 iterations and 50 individuals in the population
Table 4 Success rates (SRs) of the three DE algorithms over 30 independent runs with 1000 iterations and 50 individuals in the population

4.2 Results and discussion

We obtained the fitness values of the approximate optima of the seven benchmark functions by using the three DE algorithms. Table 3 presents the averages and standard deviations (SDs) of the best fitness values of the three DE algorithms over 30 independent runs and 1000 iterations. In accordance with Suganthan et al. (2005), we designated the best solution obtained as the approximate optimum solution of each test function when the bias of the best solution and the optimum was smaller than the termination error of \(10^{-8}\). The best solution of each function solved by the three DE algorithms was considered the approximate optimal solution in accordance with the benchmark functions.

The results of the experiments are summarized in Table 3. The classic DE algorithm failed to solve five of the seven test functions, and PsDE failed to solve the Rastrigin function (\(f_{4}\)). However, the proposed SADE-FP algorithm solved all seven test functions. The Rastrigin function (\(f_{4}\)) is a nonlinear multimodal function and more complicated than the other functions; only SADE-FP solved this function by obtaining the best solution below the termination error (\(10^{-8})\).

We also determined the success rate (Chen and Chiang 2015) of the three DEs for each benchmark function (Table 4) as the ratio of the number of successful runs to the total number of runs. According to Suganthan et al. (2005), an algorithm can be considered successful at solving a function in a run if the deviation of the optimal fitness value is not greater than the termination error of \(10^{-8}\):

$$\begin{aligned} (\mathrm{number\,\,of}\,\,f\left( {X_{i,G_{\max } } } \right) <10^{-8})/{ NP},\qquad i=1,2,\ldots , \,{ NP} \end{aligned}$$
(6)
Fig. 3
figure 3

Performance of the three DE algorithms in solving \(f_{1}\)\(f_{6}\) with \(D = 30\): a\(f_{1}\): Sphere function, b\(f_{2}\): Schwefel’s problem 2.22, c\(f_{3}\): Step function, d\(f_{4}\): Rastrigin function, e\(f_{5}\): Ackley’s function, and f\(f_{6}\): Griewank function

Fig. 4
figure 4

Performance of the three ED algorithms in solving \(f_{1}\)f6 with \(D = 60\): a\(f_{1}\): Sphere function, b\(f_{2}\): Schwefel’s problem 2.22, c\(f_{3}\): Step function, d\(f_{4}\): Rastrigin function, e\(f_{5}\): Ackley’s function, and f\(f_{6}\): Griewank function

Fig. 5
figure 5

F–fitness and CR–fitness plots for the Sphere function (\(f_{1})\) with \(D = 60\) and using 50,000 iterations: aF–fitness plot of PsDE, bCR–fitness plot of PsDE, cF–fitness plot of SADE-FP, and dCR–fitness plot of SADE-FP

Fig. 6
figure 6

F–fitness and CR–fitness plots of Rastrigin function (f4) with 60D over 50,000 iterations: aF–fitness plot of PsDE, bCR–fitness plot of PsDE, cF–fitness plot of SADE-FP, and dCR–fitness plot of SADE-FP

The performance of the three DE algorithms was further compared by visualizing their evolutionary processes using convergence plots. Some of the DE algorithms did not converge even after the execution of more than 1000 iterations. Therefore, we individually set the number of iterations of test functions \(f_{1}\)\(f_{6}\) to clearly obtain a convergence. The numbers of generations were set as follows: (\(f_1 \), 20,000), (\(f_2 \), 6000), (\(f_3 \), 1000), (\(f_4 \), 6000), (\(f_5 \), 3000), (\(f_5 \), 1000), and (\(f_6 \), 1000).

We analyzed the convergence graphs of each function for two dimension sizes: 30 and 60. Figure 3 displays the performance trends of the six test functions for 30 dimensions over 30 independent runs in the evolutionary process. Figure 3a and b plots the performance trends of the Sphere function (\(f_{1})\) and Schwefel’s problem 2.22 (\(f_{2})\), that is, the number of convergence iterations (from few to many) when SADE-FP, PsDE, and classic DE were employed. The best fitness values obtained by all the algorithms decreased dramatically and reached the approximate optimal solution in fewer than 1000 iterations. In addition, the convergence speeds of all algorithms in case of the Step function (\(f_{3})\) were higher (approximately 400 iterations) compared with those in the cases of \(f_{1}\) and \(f_{2}\), as illustrated in Fig. 3c. In the experiments using the three unimodal functions (\(f_{1}\)\(f_{3})\), SADE-FP converged considerably faster than the other two algorithms and all algorithms obtained approximately optimal solutions. In the experiments using the multimodal functions, we made the following observations. As a classic complex multimodal function based on the cosine function, the Rastrigin function (\(f_{4}\)) generates numerous local optima and searches tend to become stuck at a local optimum. As shown in Fig. 3d, PsDE and SADE-FP evolved toward the optima and obtained an approximate solution. However, SADE-FP converged faster than PsDE. In addition, Fig. 3e and f reveals that both PsDE and SADE-FP efficiently obtained the approximate solution of Ackley’s function (\(f_{5}\)) and the Griewank function (\(f_{6}\)), although classic DE could not obtain the approximate solution of either. Therefore, SADE-FP converges faster and obtains a better solution than the other two DE algorithms.

Figure 4 displays the convergence plots for the six test functions when 60 dimensions and 30 independent runs were employed in the evolutionary process. The differences between the three DEs are amplified compared with Fig. 3. In addition, in solving \(f_{3}\)\(f_{6}\), classic DE was more likely to become stuck at a local optimum than the other DE algorithms. SADE-FP also became stuck, failing to find the global optimum in its solution of \(f_{1}\), \(f_{2}\), \(f_{4}\), and \(f_{5}\). PsDE failed to solve only \(f_{6}\). SADE-FP obtained favorable solutions and converged fast for most test functions—even the high-dimensional problems.

Our overall purpose was to learn more about parameter values; therefore, we examined the relationship of F and CR values with fitness in the evolution process, as illustrated in Fig. 5. Brest et al. (2006) defined a parameter to be successful if using it in the mutation or crossover operator generates an offspring that is superior to its parent; then, they used F–fitness and CR–fitness plots to determine the extent to which fitness can be improved and the extent to which this improvement benefits the dynamic parameter values in the evolution process. In the present study, we took \(f_{1}\) and \(f_{4}\) with a high number of dimensions (\(D = 60\)), solved them using the DE algorithms with dynamic parameter values, and then drew F–fitness and CR–fitness plots, as presented in Figs. 5 and 6, respectively. From the two figures, we obtained the F and CR values of successful individuals.

First, we noticed that for all algorithms, the ranges of F and CR values of successful individuals in \(f_{1}\) were wider than that in \(f_{4}\). For example, the scope of successful F values determined using SADE-FP was (0.2, 0.8) in the case of \(f_{1}\) and (0.1, 0.8) in the case of \(f_{4}\) (Figs. 5c, 6c). Therefore, the scope of F was larger in solving multimodal functions in successful individuals with lower fitness values. Secondly, we discovered that a small F helped the DE algorithm avoid becoming stuck in a local optimum during the solution of \(f_{4}\), as demonstrated by the plots for PsDE and SADE-FP. Next, we found that the CR values of successful individuals were lower when the fitness values were smaller, which may be why PsDE requires more iterations than SADE-FP to find the approximately optimal solution. Finally, the F plot for SADE-FP corresponds to setting the F value based on the cosine function. As Figs. 5c and 6c illustrate, the plot of the F value is similar in appearance to the sine function, which is the opposite of the shape of the cosine function, because a lower fitness value represents higher algorithm performance.

In this study, we compared the proposed algorithm with other algorithms by using the maximum FE stopping criterion. Fourteen functions and feasible ranges were referenced from CEC2005 (Suganthan et al. 2005) and are listed in Table 5. The minimum solution \(f(x^{*})\) of each function was referenced from Fan and Yan (2015). In this experiment, 14 test functions (\(f_8 \)\(f_{21} )\) with 30 dimensions were employed for comparing SADE-FP with four related algorithms—namely, DE/best/1, DE/rand/1, MDE_pBX (Islam et al. 2012), and DE-EPA (Hsieh et al. 2013)—as summarized in Table 6. The means and SDs obtained using DE/best/1, DE/rand/1, MDE_pBX, and DE-EPA in Table 6 were retrieved from Hsieh et al. (2013). The original parameter settings of these algorithms were used. The initial population size of SADE-FP was set to 100, and each algorithm was executed 25 times independently according to Hsieh et al. (2013). The maximum FE criterion was set to 300,000 (Suganthan et al. 2005).

The means and SDs of the errors of the five DE algorithms for solving the \(D = 30\) problems are listed in Table 6. The best results among the five approaches are shown in bold. The proposed algorithm obtained significantly better results in solving functions 10, 12, 13, and 16 than the other algorithms. The proposed method also yielded the same best result as MDE_pBX and DE-EPA for function 8.

Table 5 Shifting of benchmark test functions
Table 6 Experimental results of \(D\, = 30\) test functions with 300,000 maximum FEs

4.3 Comparison between proposed SADE-FP and classic DE

In this section, we compare classic DE and SADE-FP by performing statistical testing for advanced analysis. We employ the widely used nonparametric test method: the Wilcoxon signed-rank nonparametric test (Arasomwan and Adewumi 2014). To investigate additional details of the search processes of these two DE algorithms, we took the mean of the best solution at each cut point over replications as the sample data of \(f_{1}\) to \(f_{6}\) (Derrac et al. 2014). The experimental parameters were as follows: 30 dimensions, 40 individuals, 2000 iterations, and 10 replicate executions. The cut point was set to obtain the global best result per 100 iterations over 2000 iterations for statistical analysis of the convergence curve. We set 21 cut points per 100 iterations over 2000 iterations and conducted 10 replicate test runs to verify the reliability of the experimental data. The cut point data obtained using SADE-FP and classic DE are listed in Table 7.

Table 8 presents the signed rank score on the mean at each cut point’s fitness value obtained using the two DE algorithms. Three situations can exist regarding the performance of SADE-FP and DE: SADE-FP < DE, SADE-FP > DE, and SADE-FP = DE. As given for \(f_{1}\), \(f_{4}\), and \(f_{5}\) in Table 8, the number of instances when SADE-FP < DE was 21, indicating that the results obtained using SADE-FP were better than those obtained using classic DE over all 21 cut points. Moreover, the number of instances when SADE-FP < DE was considerably lower than that when SADE-FP = DE for \(f_{3}\), and the number of instances when SADE-FP < DE was equal to that when SADE-FP = DE for \(f_{6}\).

Statistical analysis was then performed by applying the Wilcoxon signed-rank nonparametric test, and the results are summarized in Table 9. The values in bold represent rejection of the null hypothesis (SADE-FP fitness > classic DE fitness). Because the P value (0.5) for \(f_{3}\) was larger than the significance level (0.05), the difference SADE-FP < DE was nonsignificant. The P values for \(f_{1}\), \(f_{2}\), \(f_{4}\), \(f_{5}\), and \(f_{6}\) were lower than the significance level (0.05), indicating rejection of the null hypothesis and implying that the fitness values obtained using SADE-FP were lower than those obtained using classic DE.

Overall, we discovered that the fitness obtained using SADE-FP was lower than that obtained using classic DE for five test functions and equal to that obtained using classic DE for \(f_{3}\) based on the number of instances of SADE-FP = DE in Table 8.

Table 7 Twenty-one cut points obtained using SADE-FP and classic DE on six test functions
Table 8 Signed rank score on mean of fitness values of 100 iterations (until 2000 iterations) obtained using the proposed SADE-FP and classic DE for six test problems

4.4 Example: production scheduling problem

In this study, we consider production scheduling as an industrial application example. In a typical production scheduling problem, most studies discuss factors such as production sequence, production routing, machine availability, and machine downtime. The common objective of production scheduling is to minimize tardiness in production. In our example, there are four orders (Table 10) and five machines (Table 11). Each order has a start time, due date, and some required machines. If the order is completed after the due date, a penalty may be incurred and customer satisfaction is reduced. Start time indicates the earliest available time to start the production job. Each machine has its own distinct maintenance due date and required maintenance period. Machines need to be maintained, otherwise they cannot be used after the due date. The scheduling problem to minimize tardiness can be formulated as follows:

$$\begin{aligned}&Z=\min \sum \limits _{i=1}^n T_i \end{aligned}$$
(7)
$$\begin{aligned}&\hbox {s.t.} \qquad \qquad T_i =C_i -d_i ,\,i=1,2,\ldots ,n, \end{aligned}$$
(8)
$$\begin{aligned}&T_i \ge 0,\,i=1,2,\ldots ,n, \end{aligned}$$
(9)
$$\begin{aligned}&\hbox {the capacity and availability constraints of the machine,}\nonumber \\ \end{aligned}$$
(10)

where constraints (8) and (9) reflect the definitions of job tardiness.

In this experiment, which was run 30 times, the maximum FE criterion was set to 300,000 by referring to Suganthan et al. (2005). For this problem, the optimal solution was known; that is, the shortest delay time is 2. We compared the delayed times of this problem obtained using SADE-FP, classic DE/rand/1, DE/rand/2, DE/best/1, and DE/current-to-best/1. The average solution obtained using classical DE/rand/1 was 6.63, DE/rand/2 was 2.8, and SADE-FP was 2.5. The averages and SDs obtained using the five DE algorithms are listed in Table 12. Classic DE found the optimal solution in 11 of 30 runs, whereas SADE-FP obtained the optimal solution in 15 of 30 runs. Furthermore, SADE-FP converged faster than the other DE algorithms, as illustrated in Fig. 7. Based on the experimental results, SADE-FP performed better than the other DE algorithms in the industrial problem.

Table 9 Wilcoxon signed-rank test of mean fitness values obtained using SADE-FP and classic DE (significance level = 0.05)
Table 10 Order information
Table 11 Machine information
Table 12 Average and standard deviation of delay times obtained using five DE algorithms in the production scheduling problem
Fig. 7
figure 7

Convergence rate for the production scheduling problem

5 Conclusion

The disadvantages of classic DE include unstable convergence, slow convergence in the late stages of evolution, and a tendency to become stuck in a local optimum (Jia et al. 2011; Lee and Chiang 2011). However, the traditional method of manually selecting appropriate values of the parameters is time-consuming. Therefore, numerous studies have focused on self-adaptive mechanisms to obtain favorable solutions and rapid convergence. We propose self-adaptive controlling parameters of DE based on fitness performance with a perturbation strategy (SADE-FP). Our results demonstrate that in solving unimodal and multimodal functions, compared with four other DE algorithms, SADE-FP obtains a better solution and converges faster, without becoming stuck at a local optimum.

Furthermore, the experimental results obtained reveal that SADE-FP satisfies five of the six characteristics of a favorable scheme for adaptive control of DE parameters, which were summarized by Chen and Chiang (2015); the only exception is the mechanism of a random distribution. Therefore, applying a random distribution in the parameter adaptation mechanism would be a worthwhile direction in future research.