1 Introduction

Among contemporary heuristic techniques, evolutionary algorithms (EAs), inspired by biological evolution, have been increasingly important in recent decades. They have been the kernel of the field known as evolutionary computation (EC) in artificial intelligence and succeeded in solving various problems. Nowadays, using EAs to solve optimization problems has become a common practice due to their efficient and effective performance in solving complex problems that cannot be easily solved by conventional techniques.

The differential evolution (DE) algorithm, proposed by Storn and Price (1995), is a popular evolutionary approach and a promising global optimizer in the continuous search domain. It is a population-based stochastic search technique employing mutation, crossover, and selection operations to move a population towards promising solutions. DE is a simple yet powerful search technique revealing remarkable performance in a wide variety of problems. It has been successfully utilized in diverse fields such as scheduling and planning problems (Cheng et al. 2014; Damak et al. 2009; Wang and Yeh 2014), data clustering (Dong et al. 2014; Xiang et al. 2015), pattern recognition (Das and Konar 2009; Du et al. 2007; Maulik and Saha 2009), and power dispatch (Chiou 2009; Varadarajan and Swarup 2008).

The performance of DE highly depends on two parts. One is the trial vector generation strategy, i.e., the mutation and crossover operations; the other is the control parameter setting, i.e., the scaling factor (F), the crossover rate (CR), and the population size (NP). In DE research, many encouraging trial vector generation strategies have been reported, different strategies can be suitable for solving different kinds of problems (Qin et al. 2009). However, the performance of a given trial vector generation strategy is still significantly influenced by its control parameter setting. Although there are many important guidelines for the choice of an appropriate control parameter setting (Gämperle et al. 2002; Mezura-Montes et al. 2006; Price et al. 2005), the interaction between control parameter settings and DE performance is still complicated and not completely understood.

Therefore, in order to successfully solve a practical problem by DE, as the characteristics of the given problem are usually unknown, it is commonly required to perform a trial-and-error procedure to choose an appropriate strategy and fine-tune its corresponding parameter setting. However, such trial-and-error search process is really a time-consuming and error-prone task requiring high computational costs. Besides, the population of target vectors may move through different kinds of regions during evolution. Using diverse trial vector generation strategies coupled with various control parameter settings at different stages of evolution can be more effective than just single one (Mallipeddi et al. 2011; Qin et al. 2009; Yong et al. 2011). Therefore, for DE it is preferable to adaptively determine an appropriate strategy and its parameter setting during evolution. There also has been an increasing interest in designing new DE variants with adaptive ability in the recent decade.

In this paper, we propose a novel adaptive DE variant called the time-frame adaptive differential evolution (TFADE) algorithm to avoid the time-consuming trial-and-error search process for seeking a suitable strategy and its parameter values. Both of which can be adaptively determined to generate promising solutions and dynamically adjusted to deal with the difficulty while premature convergence or stagnation occurs during evolution, according to successful experience over a period of preceding generations called the time frame. The rest of this paper is organized as follows. Section 2 reviews related work on conventional DE and state-of-the-art adaptive DE variants. Section 3 introduces the details of the proposed time-frame adaptive differential evolution algorithm. Some numerical experiments, results and discussions are reported in Sect. 4. Section 5 concludes this paper.

2 Related work

2.1 The differential evolution (DE) algorithm

An initialized population of NP D-dimensional parameter vectors, x \(_{i,G}\) = (\(x_{1i,\mathrm{G}}\), \(x_{2i,\mathrm{G}}\), ..., \(x_{Di,\mathrm{G}})\) where \(i = 1, 2, {\ldots }\), NP and G be a generation, called target vectors encodes some candidate solutions towards the global optimum. The target vectors in the population are randomly generated from a uniform distribution with minimum bounds of \({{\mathbf x}}_\mathrm{min} =\{x_{1 \mathrm{min}}, x_{2\mathrm{min}}, \ldots , x_{D \mathrm{min}} \}\) and maximum bounds of \({{\mathbf x}}_\mathrm{max} =\{x_{1 \mathrm{max}}, x_{2\mathrm{max}},\ldots , x_{D \mathrm{max}} \}\) within a search space. After initialization the population performs a series of evolutionary operations, which are iterations of mutation, crossover, and selection. The mutation and crossover operations are utilized to generate new vectors called trial vectors. Both operations are so-called the trial vector generation strategy. After that, the selection operation then determines whether each of the new trial vectors will survive into next generation. The common evolutionary process of DE is described as follows.

1. Mutation operation

For each target vector x \(_{i,G}\), DE employs the following mutation operation to produce its corresponding mutant vector v \(_{i,G}\) (for the case of DE/rand/1):

$$\begin{aligned} {{\mathbf v}}_{i,G} ={{\mathbf x}}_{r1,G} +F({{\mathbf x}}_{r2,G} -{{\mathbf x}}_{r3,G} ), \end{aligned}$$
(1)

where the indices r1, r2 and r3 are mutually exclusive integers randomly selected from the set of \(\{1,2,\ldots , NP\}\backslash \{i\}\). F is a mutation factor called the scaling factor for scaling the difference vector (x \(_{r2,G} - \mathbf{x}_{r3,G})\). It is a positive real number where \(F\in (0,1+)\) that controls the rate at which the population evolves. It is generally the mutation operation that separates one DE variant from another. The mutation operation given in (1), which uses a randomly selected vector \({{\mathbf x}}_{r1,G} \) and one weighted difference vector \(F({{\mathbf x}}_{r2,G} -{{\mathbf x}}_{r3,G} )\), is denoted as DE/rand/1. This is the idea how the different DE variants are named. The other commonly used conventional DE variants (Das et al. 2009; Price et al. 2005; Price 1999) are listed as follows:

$$\begin{aligned} \text{ DE }/\text{ rand }/\text{2 }:{{\mathbf v}}_{i,G}= & {} {{\mathbf x}}_{r\text{1 },G} +F( {{{\mathbf x}}_{r\text{2 },G} -{{\mathbf x}}_{r\text{3 },G} })\nonumber \\&+F( {{{\mathbf x}}_{r\text{4 },G} -{{\mathbf x}}_{r\text{5 },G} }) \end{aligned}$$
(2)
$$\begin{aligned} \text{ DE }/{\text{ best }}/\text{1 }:{{\mathbf v}}_{i,G} ={{\mathbf x}}_{\mathrm{best},G} +F( {{{\mathbf x}}_{r1,G} -{{\mathbf x}}_{r2,G} }) \end{aligned}$$
(3)
$$\begin{aligned} \text{ DE }/\text{ best }/\text{2 }:{{\mathbf v}}_{i,G}= & {} {{\mathbf x}}_{\mathrm{best},G} +F( {{{\mathbf x}}_{r1,G} -{{\mathbf x}}_{r2,G} })\nonumber \\&+F( {{{\mathbf x}}_{r3,G} -{{\mathbf x}}_{r4,G} }) \end{aligned}$$
(4)
$$\begin{aligned}&\text{ DE }/\text{ rand }-\text{ to }-\text{ best }/\text{1 }:{{\mathbf v}}_{i,G} ={{\mathbf x}}_{i,G} +F( {{{\mathbf x}}_{\mathrm{best},G} -{{\mathbf x}}_{i,G} })\nonumber \\&\quad +F( {{{\mathbf x}}_{r\text{1 },G} -{{\mathbf x}}_{r\text{2 },G} }) \end{aligned}$$
(5)
$$\begin{aligned}&\text{ DE }/\text{ rand }-\text{ to }-\text{ best }/\text{2 }:{{\mathbf v}}_{i,G} ={{\mathbf x}}_{i,G} +F( {{{\mathbf x}}_{\mathrm{best},G} -{{\mathbf x}}_{i,G} })\nonumber \\&+F( {{{\mathbf x}}_{r1,G}} -{{\mathbf x}}_{r2,G})+F( {{{\mathbf x}}_{r3,G} -{{\mathbf x}}_{r4,G} }) \end{aligned}$$
(6)
$$\begin{aligned}&\text{ DE }/\text{ current }-\text{ to }-\text{ rand }/\text{1 }:{{\mathbf v}}_{i,G} ={{\mathbf x}}_{i,G} \nonumber \\&\quad +K( {{{\mathbf x}}_{r1,G} -{{\mathbf x}}_{i,G} })+F( {{{\mathbf x}}_{r2,G} -{{\mathbf x}}_{r3,G} }) \end{aligned}$$
(7)

2. Crossover operation

Once the mutation operation is finished, the crossover operation is then employed to each pair of the target vector x \(_{i,G}\) with its corresponding mutant vector v \(_{i,G}\), to generate a new vector u \(_{i,G}\) = (\(u_{i1,\mathrm{G}}\), \(u_{i2,\mathrm{G}}\), ..., \(u_{iD,G})\) called the trial vector. The binomial crossover operation is the basic version defined as follows:

$$\begin{aligned} {{\mathbf u}}_{i,G} =u_{ij,G} =\left\{ {{\begin{array}{l@{\quad }l} {v_{ij,G} ,} &{} {\text{ if } \text{(rand }_j \le C\text{ R }) \text{ or } \text{( }j=j_\mathrm{rand} )}\\ {x_{ij,G} ,} &{} {\text{ otherwise }} \\ \end{array} }} \right. \end{aligned}$$
(8)

where \(j = 1, 2, {\ldots }, D,\) rand\(_{j}\) is a uniform random number generated within the range of [0, 1]. \(CR\in [0,1]\) is the crossover probability called the crossover rate, which is a user-defined value controls the fraction of parameter values that are copied from mutant vectors. \(j_\mathrm{rand} \quad \in \{1,2,\ldots , D\}\) is a randomly chosen index ensures that the trial vector u \(_{i,G}\) does not duplicate the target vector x \(_{i,G}\).

3. Selection operation

After the mutation and crossover operations, the selection operation selects a better one between the target vector x \(_{i,G}\) and the trial vector u \(_{i,G}\) into next generation according to the following fitness measurement of a greedy selection scheme:

(9)

2.2 DE with adaptive scheme

The performance of DE is significantly influenced by its control parameter settings, but the interaction between both is still complicated and not completely understood. Many adaptive DE variants then have been proposed to dynamically adjust control parameter values without involving the knowledge of the relationship among strategies, control parameters, and the characteristics of given problems.

Abbass (2002) adapted the crossover rate CR by encoding it into each individual to simultaneously evolve with other parameter for multi-objective optimization problems. The scaling factor F is generated for each variable from a Gaussian distribution N(0, 1). Teo (2006) then proposed a DE variant with a dynamic population sizing strategy called DESAP based on Abbass’ self-adaptive Pareto DE. He implemented two versions of DESAP using absolute and relative encoding methodologies for dynamically self-adapting the population size parameter. Das et al. (2005) linearly reduced the scaling factor F with increasing generation count from a maximum to a minimum value. They also employed a uniform distribution between 0.5 and 1.5 with a mean value of 1 to obtain a new hybrid DE variant. Brest et al. (2006) encoded control parameters F and CR into the individuals and adjusted by introducing two new parameters \(\tau _{1} \) and \(\tau _{2} \). In their algorithm (called jDE), a set of F values are assigned to individuals in the population. Then, a random number rand is uniformly generated in the range of [0, 1]. If \(\hbox {rand} < \tau _{1}\), the F is re-initialized to a new random value in the range of [0, 1], otherwise it is kept unchanged. The CR is adapted in the same manner but with a different re-initialized range of [0, 1].

Later, adapting not only parameter values but also trial vector generation strategies may bring great performance for DE, some outstanding contemporary adaptive DE schemes focused on both adaptations have been proposed. Qin et al. (2009) proposed a self-adaptive DE algorithm named SaDE, in which both trial vector generation strategies and their associated control parameter values are gradually self-adapted by learning from their previous experience in generating promising solutions. As a result, more suitable strategies with its control parameter settings can be determined adaptively to match different phases of search process. The performance of SaDE is extensively evaluated by 26 bound-constrained numerical optimization problems and compared with several conventional DEs and state-of-the-art adaptive DE variants. Zhang and Sanderson (2009) proposed an adaptive differential evolution called JADE to improve the performance of DE by implementing an efficient mutation scheme of “DE/current-to-pbest” with the optional external archive. The “DE/current-to-pbest” is a generalization of the conventional “DE/current-to-best” and the optional archive operation saves historical information which has the potential to provide promising direction of search for evolution progress. The simulation results show that JADE can not only diversify the population, but also improve convergence performance. Liu and Lampinen (2005) introduced a fuzzy adaptive differential evolution (FADE) using fuzzy logic controllers into DE, the inputs of which incorporate relative function values and individuals of successive generations to adapt the parameters of mutation and crossover operations. The experimental results show that the FADE algorithm outperforms several conventional DEs on higher dimensional problems. Mallipeddi et al. (2011) proposed an ensemble of trial vector generation strategies and control parameters called EPSDE. It contains a pool of distinct trial vector generation strategies and a pool of control parameter values competing to produce offspring during evolution process. Yong et al. (2011) combined several effective trial vector generation strategies with some control parameter settings, proposed a method called composite DE (CoDE). It uses three trial vector generation strategies and three control parameter settings randomly combining them to generate promising solutions.

In recent years, an increasing number of researchers have contributed their effort to enhance the efficiency and effectiveness of DE. Zou et al. (2013) introduced a modified differential evolution algorithm (MDE), in which two distributions of Gauss and Uniform are employed to adjust the scaling factor F and crossover rate CR, and two mutation strategies are adopted to produce better solutions. To guarantee the quality of the population, MDE uses an external archive, and some solutions of high quality in the external archive can be selected for candidate solutions. Lu et al. (2014) proposed a framework called DE with surrogate-assisted self-adaptation (DESSA) to deal with computationally expensive problems. DESSA generates multiple trial vectors using different trial vector generation strategies and parameter settings, and then employs a surrogate model to identify the potentially best trial vector to undergo real fitness evolution. The surrogate model acts like a strategy and parameter setting selector which aims to identify the most suitable strategy and parameter setting for each target vector. Yeh et al. (2014) proposed a novel grey adaptive DE algorithm (GADE) by updating the control parameters in adaptive manner with the help of grey relational analysis. In the algorithm, each target vector using strategies has its own values of scaling factor F and crossover rate CR depending on corresponding grey relational grade. Since the relational grade of an individual is varied over the generations, those two control parameters can be adaptive time-varying. Based on the works coupled with grey evolutionary analysis (GEA), they also proposed a GEA-based DE algorithm (GEA-DE) which enables the control parameters F and CR to adapt in the evolution process by introducing a GEA-based parameter automation approach (Yeh et al. 2015). The simulation results show that GEA-DE with the best mutation strategy (GEA/best) outperforms DE/rand/1, DE/best/1, jDE, and SaDE on most of the test functions in terms of solution accuracy and convergence speed.

3 The time-frame adaptive differential evolution (TFADE) algorithm

To solve a given problem by DE, using diverse trial vector generation strategies coupled with various control parameter settings at different stages of evolution can be more effective than just single one (Mallipeddi et al. 2011; Qin et al. 2009; Yong et al. 2011). Moreover, the performance of DE highly depends on the selection of the trial vector generation strategy and its control parameter setting. To choose a suitable trial vector generation strategy coupled with an appropriate control parameter setting for a given problem depends on the characteristics of the problem. It is common to perform a trial-and-error search for choosing a trial vector generation strategy and fine-tune its control parameter values, which would be really a time-consuming and error-prone task. In the last decade, DE researchers have investigated many empirical guidelines in selecting suitable trial vector generation strategies and control parameter settings for various problems. However, when solving a practical problem, the characteristics of the problem are usually ambiguous, resulting in easily choosing the bad ones. An inappropriate choice of the strategy and control parameter setting may lead to premature convergence or stagnation during evolution, which has been extensively surveyed (Gämperle et al. 2002; Lampinen and Zelinka 2000; Price et al. 2005; Zaharie 2003). In order to reduce the risk of occurring premature convergence during evolution, a stagnation breaking scheme embedded in DE would be able to improve its performance.

Towards this direction, a DE variant with time-frame strategy adaptation called the time-frame adaptive differential evolution (TFADE) algorithm is proposed. In TFADE, several specific trial vector generation strategies and their control parameter values diverse and restricted are embedded. It can be gradually adapted to select suitable strategies to produce promising solutions according to successful experience over a period of preceding generations. Besides, a stagnation breaking scheme possessing extreme features of control parameter settings is also introduced to deal with the situation of premature convergence or stagnation during evolution. The idea of TFADE is described as follows.

3.1 The time-frame strategy candidate pool

A time-frame strategy candidate pool containing several diverse and effective strategies is maintained. Many different characteristics of trial vector generation strategies have been vastly investigated (Mallipeddi et al. 2011; Qin et al. 2009; Yong et al. 2011). Putting more strategies into the pool might cause unfavorable and ineffective influence, which can reduce the chance of generating promising offspring. A good candidate pool should be restricted, but exhibits effective and diverse characteristics. In TFADE, several trial vector generation strategies with effective and diverse characteristics are selected into the time-frame strategy candidate pool, as follows.

Table 1 Three trial vector generation strategies in the time-frame strategy candidate pool
Table 2 Guidelines for choosing control parameter values

“DE/rand/1/bin” is the most commonly used trial vector generation strategy in DE literature. It is robust and has no bias to any search direction that can demonstrate various searches in a random manner. Its extension, “DE/rand/2/bin”, can lead to a better perturbation improving diversity and generate a wide variety of trial vectors because two difference vectors are added to the base vector (Gämperle et al. 2002; Storn and Price 1997). They are robust, diverse and can provide broad exploration capability to any problem. Especially it is suitable for solving multimodal problems (Qin et al. 2009). For these reasons, both strategies are selected into the time-frame strategy candidate pool.

“DE/current-to-rand/1” is a rotation-invariant strategy, which is kind of special than others and more effective for rotated problems. It commonly uses the rotation-invariant arithmetic crossover rather than the binomial to generate trial vectors (Das et al. 2009; Price 1999). As its effectiveness and robustness, especially to the rotated problems, it is selected into the pool.

The strategies of “DE/best/1/bin”, “DE/best/2/bin”, “DE/rand-to-best/1/bin”, and “DE/rand-to-best/2/bin” employ the information of the best individual found so far. They are suitable for solving unimodal problems but less effective to highly multimodal problems, especially they are easier to get stuck at local optimum. For these reasons, we do not consider these strategies putting into the pool.

All strategies selected into the time-frame strategy candidate pool are shown in Table 1. Due to the popularity in DE research, the binomial-type crossover operation is employed to the strategies in the pool except the strategy of “DE/current-to-rand/1”.

Fig. 1
figure 1

The process of TFADE

Fig. 2
figure 2

A time-frame container of size TF

Table 3 Algorithmic description of TFADE
Table 4 Experimental results of TFADE with conventional DEs on D10 functions
Table 5 Experimental results of TFADE with conventional DEs on D30 functions

3.2 The parameter candidate pool

Two kinds of parameter candidate pools are maintained in TFADE. One is called the time-frame parameter candidate pool; the other the extreme parameter candidate pool. Some empirical guidelines in choosing suitable parameter values for various problems also have been investigated. A detailed examination of important guidelines will lead us to select a good combination of parameters into the pool.

Storn and Price (1997) suggested that a reasonable value for the population size NP should be between 5D and 10D. A good initial choice of F is 0.5 and the effective range of F values is suggested between 0.4 and 1. If the population converges prematurely, either F or NP can be increased. First reasonable attempt of choosing CR value can be 0.1. However, because the larger CR value can speed up the convergence, the value of 0.9 for CR may be a good initial choice if the problem is near unimodal or fast convergence is desired. Gämperle et al. (2002) examined different parameter settings for DE on Sphere, Rosenbrock, and Rastrigin functions. Their experimental results showed that the searching capability and convergence speed are very sensitive to the choice of NP, F and CR. They recommended that the population size NP should be between 3D and 8D, the scaling factor F equals to 0.6, and the crossover rate CR should be between 0.3 and 0.9. Ronkkonen et al. (2005) suggested that the population size NP should be between 2D and 4D. Using F values between 0.4 and 0.95, and \(F=0.9\) as an initial choice is a good way for DE. CR values should lie in [0, 0.2] for separable functions and in [0.9, 1] for multimodal and non-separable functions. Runarsson and Xin (2000) showed that although it has been found that \(F{\in }\) [0.3,1] can be effective (Mezura-Montes and Coello Coello 2003), the use of a large population in conjunction with large F will cause the algorithm to converge too slowly to produce a good result by the end of its run.

Fig. 3
figure 3

The median convergence characteristics of TFADE with conventional DEs on D10 test functions of a the unimodal function of \(f_{1}\) b the basic multimodal function of \(f_{6}\) c the expanded multimodal function of \(f_{13}\) d the hybrid composition function of \(f_{15}\)

Fig. 4
figure 4

The median convergence characteristics of TFADE with conventional DEs on D30 test functions of a the unimodal function of \(f_{1}\) b the basic multimodal function of \(f_{6}\) c the expanded multimodal function of \(f_{13}\) d the hybrid composition function of \(f_{15}\)

Montgomery and Chen (2010) pointed out an analysis of DE why low and high values of CR are effective while values around 0.5 are not. At its extremes, CR leads to vastly different search behaviors. Low values of CR result in a search that is not just aligned with a small number of search space axes, but is gradual, slow and robust. High values of CR result in a search where fewer solutions generated may be improving, but the improvements can be larger. Coello Coello (2002) showed that a key parameter that affects DE’s performance is the crossover rate CR. The crossover rate CR = 0.9 has been found to work well across a large range of problem domains. It is often used as a default value in many DE implementations. Yong et al. (2011) showed that three trial vector generation strategies, “rand/1/bin”, “rand/2/bin”, and “current-to-rand/1”, and three control parameter settings, [\(F =1.0\), CR = 0.1], [\(F =1.0\), CR = 0.9], and [\(F =0.8\), CR = 0.2] are frequently used in many DE variants, the properties of which have been well studied. Mallipeddi et al. (2011) suggested that separable unimodal problems require lower CR values while parameter-linked multimodal problems require higher CR values. These guidelines are summarized in Table 2.

Based on these investigated guidelines, as the parameter values should be restricted to enhance the speed and effectiveness, and with diverse characteristics for solving various problems; therefore, choosing m of F and CR values within the range of [0.3, 1] and [0.1, 0.9] in an equal step, respectively, nearly covered above-mentioned guidelines, is considered putting into the time-frame parameter candidate pool.

However, once premature convergence or stagnation occurred during evolution, some extraordinary parameter values need to be employed instead to ruffle the situation. From these guidelines, it is shown that F is closely related to convergence speed. The higher F value can avoid premature convergence or stagnation, while the lower F may increase convergence speed. CR is usually more sensitive to specific problems. Separable unimodal problems need lower CR value while multimodal ones require higher CR. Therefore, an extreme parameter candidate pool is developed that only extreme values of \(F=\left\{ \text{1 } \right\} \) and CR=\(\left\{ {\text{0.1 },\text{0.9 }} \right\} \) are selected to increase the chance for escaping the stagnation. Thus, in the stagnation breaking scheme of TFADE, once the premature convergence occurred during evolution, the extreme parameter candidate pool will be enabled instead of the time-frame parameter candidate pool to deal with the difficulty. As for the parameter NP, it does not need to be fine-tuned because just a few typical values can be tried according to the complexity of a given problem (Qin et al. 2009). Therefore, it is left as a user-defined parameter.

3.3 The TFADE algorithm

The process of TFADE is shown in Fig. 1. The first important scheme is called the time-frame container, which maintains crucial information to be monitored during evolution, such as strategy/parameter candidate pools, the selection probability of strategies, and the best fitness of individuals (discussed later), over a given period of preceding generations called the time frame. Concerning each target vector in the current population, one trial vector generation strategy is selected from the time-frame strategy candidate pool in the container, according to the selection probability of the strategy which is calculated by generating promising solutions within the time frame. The selection probability of a strategy is defined as a normalized success rate that a trial vector generated by the strategy which can successfully enter next generation within the time frame. During evolution, the time frame moves forward along time horizon of generations, the selection probability of each strategy and the best fitness of individuals can be gradually updated and stored in the container by experiencing different generations. A time-frame container of size TF from generation G evolving to G+t is illustrated in Fig. 2.

Table 6 Experimental results of TFADE with state-of-the-art DEs on D10 functions
Table 7 Experimental results of TFADE with state-of-the-art DEs on D30 functions
Fig. 5
figure 5

The median convergence characteristics of TFADE with state-of-the-art adaptive DEs on D10 test functions of a the unimodal function of \(f_{1}\) b the basic multimodal function of \(f_{6}\) c the expanded multimodal function of \(f_{13}\) d the hybrid composition function of \(f_{15}\)

Fig. 6
figure 6

The median convergence characteristics of TFADE with state-of-the-art adaptive DEs on D30 test functions of a the unimodal function of \(f_{1}\) b the basic multimodal function of \(f_{6}\) c the expanded multimodal function of \(f_{13}\) d the hybrid composition function of \(f_{15}\)

More specifically, to the selection probability of a strategy, assume that we discuss a time frame of size TF and S strategies maintained in the time-frame strategy candidate pool. The selection probability with respect to each strategy for a target vector x \(_{i}\), where \(i = 1, 2, {\ldots },\) NP, in the initialized population is 1/S., i.e., all strategies have the same probability to be selected at the beginning.

During evolution at current generation G after TF generations, the success rate of a strategy s that a trial vector u \(_{i}\) generated by the strategy which can enter next generation can be calculated by

$$\begin{aligned} \mathrm{SucRate}_{i,s,G} =\frac{\sum \nolimits _{g=G-\mathrm{TF}}^G {\mathrm{Success}_{i,s,g} } }{\sum \nolimits _{g=G-\mathrm{TF}}^G {\mathrm{Selection}_{i,s,g} } }+\varepsilon , \end{aligned}$$
(10)

where Selection\(_{i,s,g}=\left\{ {\text{0 },\text{1 }} \right\} \) means whether the strategy s is selected to generate u \(_{i}\) at the generation g, and Success\(_{i,s,g }=\left\{ {\text{0 },\text{1 }} \right\} \) stands for whether it is successful to enter next generation at generation g. \(g=(G-\mathrm{TF}), (G-\mathrm{TF}+1), {\ldots }, G\) be the generations within the time frame. Note that a small value \(\varepsilon \) is placed to ensure nonzero success rate and that all strategies in the time-frame strategy candidate pool always have a chance to be selected.

Therefore, the selection probability of the strategy s at generation G for a target vector x \(_{i}\) can be denoted as the normalized success rate of the strategy s as

$$\begin{aligned} \mathrm{SelProb}_{i,s,G} =\frac{\mathrm{SucRate}_{i,s,G} }{\sum \nolimits _{s=1}^S {\mathrm{SucRate}_{i,s,G} } }, \end{aligned}$$
(11)

where strategy \(s=1,2, {\ldots } S\). At this stage, the roulette wheel selection method is then employed to select a promising trial vector generation strategy for target vector x \(_{i}\).

After a trial vector generation strategy is selected, the values of its control parameters, F and CR, are then randomly chosen from the time-frame parameter candidate pool. Both the selected strategy and control parameter values are subsequently employed to corresponding target vector to generate a trial vector.

All these dynamic data are maintained in the time-frame container and gradually adapted during evolution. Once the records overflow over TF generations, the earliest record will be removed, keeping the most up-to-date records of TF generations in the container to be monitored.

The above procedure works as the typical evolution process of TFADE. However, during evolution once premature convergence or stagnation occurred throughout the time frame of size TF, it means that the typical evolution process in pursuit of promising solutions might be dysfunctional at this evolution stage. In order to escape this situation, the mechanism of the stagnation breaking scheme will immediately take control over the strategy/parameter selection. This mechanism destroys the current promising selection records in the container at once, stop monitoring and recording for TF generations, and randomly selecting strategies and parameters from the time-frame strategy candidate pool and the extreme parameter candidate pool, respectively for TF generations to ruffle the difficulty. After TF generations, the typical evolution process will then automatically switch on to adapt a better solution again in the current population. The algorithmic description of TFADE is illustrated in Table 3.

Table 8 Experimental results of TFADE with non-DE approaches on D10 functions

4 The experimental study

Three kinds of experiments were conducted in the experimental study. First, TFADE was compared with 4 commonly used conventional DEs coupled with widely suggested parameter settings. Next, with 3 outstanding state-of-the-art adaptive DEs proposed recently, i.e., SaDE (Qin et al. 2009), EPSDE (Mallipeddi et al. 2011), and CoDE (Yong et al. 2011). Last, 2 novel non-DE algorithms, i.e., CLPSO (Liang et al. 2006) and CMA-ES (Hansen and Ostermeier 2001), were also compared. To evaluate the performance of TFADE with these approaches, a test suite of benchmark functions (Suganthan et al. 2005) reported in 2005 IEEE Congress on Evolutionary Computation (CEC2005) was conducted. The test suite comprises 25 benchmark functions that can be divided into 4 groups:

  1. 1.

    5 unimodal functions: \(f_1 -f_5 \),

  2. 2.

    7 basic multimodal functions: \(f_{6} -f_{12} \),

  3. 3.

    2 expanded multimodal functions: \(f_{13} -f_{14} \),

  4. 4.

    11 hybrid composition functions: \(f_{15} -f_{2{5}} \).

Both 10-dimensional (D10) and 30-dimensional (D30) of the benchmark functions were considered. 30 independent runs were conducted for each algorithm to each function. The population size NP was set to 50. The maximum number of function evaluations was set to multiplying function dimensions by 10,000 as the termination criteria, which are 100,000 for D10 functions and 300,000 for D30, respectively. The mean and standard deviation of the function error \(f(x_\mathrm{best} )-f(x^*)\) were calculated for measuring the performance of each algorithm to a function f, where \(x_\mathrm{best}\) stands for the best solution of the function found by an algorithm in an independent run and \(x^{*}\) the global optimum of the function. The number of preceding generations for the time frame in TFADE was set to 10, and \(m=5\) was chosen for the number of F and CR putting into the time-frame parameter candidate pool.

Table 9 Experimental results of TFADE with non-DE approaches on D30 functions

To present statistically significant difference between two algorithms, the statistical t test with a significance level of \(\alpha =0.05\) was performed on each pair of experimental results between TFADE and a compared algorithm on each function. The indicators of “\(+\)”, “\(=\)”, and “\(-\)” define that the outcome of the corresponding algorithm is statistically better than, similar to, and worse than that of TFADE on a function, respectively. In order to show the overall excellence degree of TFADE against a competitor on the test suite of all 25 functions, we define a quality measurement called the advantage percentage of TFADE, denoted by %adv, as the equation (12).

$$\begin{aligned} \% \mathrm{adv}=\frac{\mathrm{num.\,of \,worse\,performance\,func.(''-'')-num.\,of\,better\,performance\,func.(''+'')}}{\mathrm{total\,func. (25)-num.\,of\,similar\,performance\,func.(''='')}} \end{aligned}$$
(12)

All these performance records were listed in the last 4 rows of experimental result tables for comparison.

Table 10 The advantage percentages of TFADE

4.1 Comparison with conventional DEs

Five test instances of four conventional DEs with fixed control parameter settings, “Rand/1/bin (\(F=0.9\), CR = 0.1)”, “Rand/1/bin (\(F=0.9\), CR = 0.9)”, “Rand-to-best/1/bin (\(F=0.9\), CR = 0.1)”, “Current-to-rand/1 (\(F=0.9\), CR = 0.1)” and “Rand/2/bin (\(F=0.9\), CR = 0.1)”, were compared with TFADE. The experimental results on D10 and D30 functions are illustrated in Tables 4 and 5, respectively. Figures 3 and 4 present the convergence characteristics of each test instance for 4 functions in terms of the median of the best fitness in 30 independent runs on D10 and D30 functions, respectively. The 4 functions, one is chosen from each group, are (a) the unimodal function of \(f_{1}\), (b) the basic multimodal function of \(f_{6}\), (c) the expanded multimodal function of \(f_{13}\), and (d) the hybrid composition function of \(f_{15}\).

In the experimental results, obviously, TFADE is outstanding among these test instances on most functions of D10 and D30, especially on the groups of unimodal functions (\(f_1 -f_5 )\) and basic multimodal functions (\(f_{6} -f_{12} )\), because no one can be significantly better than TFADE on test functions of D10 and D30 except “Current-to-rand/1 (\(F=0.9\), CR = 0.1)” on 2 functions of D10 in the two groups. The advantage percentages of TFADE on the test suite to each are 87, 83, 83, 77, and 83 % for D10 functions and 90, 95, 90, 100, and 90 % for D30, respectively. Overall, the performance of TFADE is significantly better than that of these competitors; especially it outstandingly outperforms the competitors on both groups of the unimodal functions and basic multimodal functions, and on the functions with high dimensions (30-dimensional functions).

4.2 Comparison with state-of-the-art DEs

TFADE was also compared with three state-of-the-art adaptive DE variants proposed recently, i.e., SaDE, EPSDE, and CoDE. In these approaches, the control parameter settings were employed as in their original papers. The experimental results on D10 and D30 functions are presented in Tables 6 and 7, respectively. Figures 5 and 6 also illustrate the convergence characteristics of each approach for the 4 functions aforementioned in terms of the median of the best fitness in 30 independent runs on D10 and D30 functions, respectively.

Clearly, in the experimental results, TFADE also outperforms the state-of-the-art adaptive DE variants on all functions of D10 and D30 in the groups of the unimodal functions (\(f_1 -f_5 )\) and basic multimodal functions (\(f_{6} -f_{12} )\). No one is significantly better than TFADE on the test functions of D10 and D30 except EPSDE on 2 functions of D30 and CoDE on 2 functions of D30. The advantage percentages of TFADE on the test suite to each approach are 78, 95, and 80 % for D10 functions and 79, 65, and 71 % for D30, respectively. In summary, the performance of TFADE is greater than that of SaDE, EPSDE, and CoDE, especially on the groups of the unimodal functions and basic multimodal functions.

4.3 Comparison with non-DE approaches

Two novel non-DE algorithms, CLPSO and CMA-ES, were compared with TFADE in this experiment. CLPSO proposed by Liang et al. (2006) is an outstanding PSO variant, and CMA-ES proposed by Hansen and Ostermeier (2001) is a very efficient and famous evolution strategy. Both algorithms with control parameter settings used in their original papers were conducted for comparison. The experimental results on D10 and D30 functions are shown in Tables 8 and 9, respectively.

In the experimental results, TFADE performs better than CLPSO and CMA-ES on 21 and 22 out of 25 functions of D10, and on 20 and 12 out of 25 functions of D30, respectively. The advantage percentages of TFADE to both are 84 and 88 % for D10 functions and 83 and 57 % for D30, respectively. In summary, TFADE has an absolute advantage than CLPSO and CMA-ES except that it has a slight advantage than CMA-ES on the functions with high dimensions (30-dimensional functions).

5 Conclusion

The performance of DE is extremely determined by the trial vector generation strategy and control parameter setting. The effectiveness of a strategy is also significantly influenced by its control parameter values. In recent years, DE researchers have reported many empirical guidelines in selecting suitable trial vector generation strategies with control parameter settings for solving various features of problems. The comprehensive exploration of the experience can be a good way to develop an advanced DE variant.

Towards this direction, in this paper, an improved DE variant with time-frame strategy adaptation called the time-frame adaptive differential evolution (TFADE) algorithm is proposed. It employs diverse trial vector generation strategies with various control parameter values that exhibit distinct characteristics from each other. Both strategies and control parameter values can be adaptively determined to generate promising solutions and dynamically adjusted to deal with the difficulty while premature convergence or stagnation occurs during evolution, according to successful experience over a period of preceding generations called the time frame.

Although TFADE is simple and easy to be implemented, its performance is significantly outstanding. In the experimental study, the performance of TFADE was compared with that of 5 test instances of 4 commonly used conventional DEs, i.e., “Rand/1/bin (\(F=0.9\), CR = 0.1)”, “Rand/1/bin (F=0.9, CR = 0.9)”, “Rand-to-best/1/bin (\(F=0.9\), CR = 0.1)”, “Current-to-rand/1 (\(F=0.9\), CR = 0.1)” and “Rand/2/bin (\(F=0.9\), CR = 0.1)”, 3 outstanding state-of-the-art adaptive DEs, i.e., SaDE, EPSDE, and CoDE, and 2 novel non-DE approaches, i.e., CLPSO and CMA-ES, evaluated by a test suite of 25 benchmark functions reported in CEC2005. The experimental results show that its comprehensive performance is significantly superior to these competitors. The overall excellence degree of TFADE against these competitors, i.e., the advantage percentage (% adv), also shows that TFADE has an excellent advantage than these competitors. The advantage percentages of TFADE on the experimental results are summarized in Table 10.

Related work is considered for further research. From the literature review, we conclude that putting more strategies and parameter values into the strategy/parameter candidate pool will lead to an unfavorable and ineffective influence, which could reduce the performance of DE. Therefore, the strategies and parameter settings involved in the time-frame container should be restricted and preserve diversity. It is desirable to study whether the performance of TFADE can be significantly improved if the number of strategies and parameter values putting into the container can be dynamically adapted during evolution. What and how many strategies and parameter values should be combined can be more effective? Moreover, the discussion on the practical application of TFADE is also an important issue. In recent years, many distinguishing topics have been reported, such as the industrial applications on wireless sensor networks and RFID (Zhang et al. 2014a, b, 2015b), the web-based mobile application (Zhang 2012; Zhang and Liang 2013; Zhang et al. 2015a; Zhang and Zhang 2011), the image de-noising (Zhang et al. 2012b), and Internet of Things (Zhang et al. 2012a). The practical application of TFADE on these subjects is really worth investigating in the future.