1 Introduction

The differential evolution (DE) (Storn and Price 1995, 1996) algorithm was proposed by Price and Storn in 1995 and has been a very competitive form of evolutionary computing afterward. As a simple powerful search technique, DE is always employed for solving complex continuous nonlinear functions. With a random population by initializing solutions, the DE algorithm employs mutation and crossover operators to generate new candidate solutions, and utilizes a simple selection operator to determine whether the offspring should replace their parents in the next generation. Compared with most other evolutionary algorithms (EAs), DE is simpler and much easier to be implemented. Moreover, the gross performance of DE in terms of accuracy, fast convergence speed and robustness makes it as an attractive algorithm to be applied on various real-world optimization problems. In addition, the number of control parameters in DE is very few (mutation control parameter, also called scaling factor , crossover control parameter, also called crossover rate and population size in classical DE). Therefore, the DE algorithm has gained much attention with successful applications in data mining (Zhu et al. 2012; Tvrdík and Křivý 2015), scheduling (Mokhtari and Salmasnia 2015), construction engineering (Ho-Huu et al. 2015), pattern recognition (Secmen and Tasgetiren 2013), signal processing (Sheniha 2013), chemical engineering (Sharma and Rangaiah 2013), power system (Zhang et al. 2015), image processing (Ali et al. 2014), and in other domains (Al-Dabbagh et al. 2014; Rakshit and Konar 2015; Das and Prasad 2015).

Nevertheless, the performance (Liu and Lampinen 2002) of the DE algorithm is sensitive to the mutation strategy, which exists many different DE trial vector generation strategies and respective control parameters such as the population size (NP), crossover rate (Cr) and the scale factor (F). In various search phases of the evolution process, these strategies often possess different searching capabilities. And, the best settings of the control parameters vary for different optimization problems, and for different requirements on consumption time and accuracy when the optimization problem is same. Therefore, it is necessary to find the most appropriate strategy and its corresponding parameters. To achieve this, however, a process of trial-and-error search is needed to be performed, which definitely will suffer a high computation cost. Computational costs may be induced by the fact that the population of DE is evolved through different regions in the search space, within which different strategies and respective different parameter settings (Qin et al. 2009). Recently, various offspring generation strategies and parameter adaptation mechanisms have been developed to enhance the reliability and robustness of DE. For example, Brest and Mernik (2008) presented the jDE algorithm, which owned self-adapting parameters and by encoding the parameters into each individual and adapting them by means of evolution, and obtained the mutation strategy as DE/rand/1. Qin and Suganthan (2005) presented a self-adaptive variant of DE (SaDE), where trial vector generation strategies were gradually self-adapted by learning from their prior experiences in generating promising solutions. Wang et al. (2011) proposed a systematic framework for combining different trial vector generation strategies, called composite DE (CoDE), in which three well-studied offspring generation strategies are coupled with three-parameter settings randomly to generate trial vectors in the following way. Mallipeddi et al. (2011) proposed an ensemble of mutation strategies and parameter values for DE (EPSDE). Mutation strategies from mutation strategies pool can involve corresponding parameters from control parameters’ values pool to produce their own offspring. Then, the optimized offspring can be obtained by further competing. Zhang and Sanderson (2009) presented JADE algorithm, which implemented a new mutation strategy DE/target-to-pbest/1 and updated control parameters in an adaptive manner. Based on JADE, Brown et al. (2015) proposed a new DE with small population, namely \({\upmu }\hbox {JADE}\). In \({\upmu }\hbox {JADE}\), a new mutation strategy DE/current-by-rand-to-pbest/1 is introduced. Kundu et al. (2014) presented a modified semi-adaptive DE, namely MSeDE. In MSeDE, a new mutation scheme DE/current-to-constr_best/1 and a new crossover scheme p-BCX are used. Yu et al. (2014) presents an adaptive DE algorithm, namely ADE. In ADE, a new mutation strategy DE/lbest/1 and a two-level adaptive parameter control scheme are used. The new strategy is a variant of DE/best/1 strategy, which multiple locally best individuals instead of one globally and the two-level adaptive parameter control scheme includes population-level and individual-level parameter control. Mohamed (2015) presented an improved DE algorithm, namely IDE. In IDE, a new triangular mutation strategy based on the convex combination vector of the triplet, which is defined by three randomly chosen vectors and the difference vector between the best and the worst individuals among the three randomly selected vectors is introduced. Liu et al. (2014) proposed a random-based differential evolution with neighborhood mutation, namely NRDE. In NRDE, two mutation schemes are used. The mutation schemes are random-based mutation scheme and neighborhood mutation scheme, the best vector of which will be replace the target vector. Han et al. (2013) presented a differential evolution with local information, namely DELI. In DELI, considering both global information and local information, a new mutation operation is applied to generate a mutated individual. In order to get an appropriate combination of strategies and control parameters for different problems, many other adaption techniques have been developed (Tang et al. 2015; Wang et al. 2014; Cai and Wang 2015; Mallipeddi and Lee 2015). Although different partial adaptation schemes have been proposed to overcome the trial-and-error procedure, many strategies are hard to maintain the balance between the global exploration and the local exploitation. Here, we demonstrate the superior performance of the proposed mutation strategy with various DE algorithms based on niche on maintaining he balance between the global exploration and the local exploitation.

Fig. 1
figure 1

Basic flowchart of DE

In the paper, the clearing niche method (Petrowski 1996) is integrated with mutation strategies to enhance population diversity, improve the search ability, and accelerate the convergence rate of DE algorithms. The niche techniques (Petrowski 1996; Mahfoud 1995; Yin and Germay 1993; Petrowski and Genet 1999; Sareni and Krahenbuhl 1998) are regarded as effective methods to maintain the balance between both population diversity and the search domain. They aim at gathering the individuals on several peaks of fitness function in the population according to genetic likeness, and then permit DE to investigate those peaks in parallel. The individual with a high fitness in the niche is keeping its fitness, while the others in the niche are changed to reduce their fitness values sharply. Hence, the individuals in the population may be dispersed into the whole search space. Thus, some diversity can be maintained effectively during the generations in the population. Several niche techniques have been proposed, such as crowding methods (Mahfoud 1995), clustering-based methods (Yin and Germay 1993), speciation tree methods (Petrowski and Genet 1999), fitness-sharing methods (Sareni and Krahenbuhl 1998), and clearing methods (Petrowski 1996). Also, some niche-based DE algorithms have been presented. Thomsen (2004) presented a DE algorithm based on crowding and fitness-sharing scheme to tackle multimodal optimization, namely CrowdingDE and SharingDE. Different from conventional DE, CrowdingDE modifies the selection operation, in which the offspring replaces the most similar individual among a (the crowding factor) subset of the population. In the selection operation of SharingDE, all parents and offsprings are added to the population enlarging the population, and all the individuals are rescaled using the sharing function, then the individuals are sorted with the new fitness and the worst half of the population will be removed. Li (2005) extended DE with the notion of speciation for solving multimodal optimization problems, namely SDE. SDE locates multiple global optima simultaneously through the adaptive formation of multiple species. Each species is evolved by its own DE process, which tries to successively improve itself. Epitropakis et al. (2012) proposed two new DE mutation strategies, namely DE/nrand/1 and DE/nrand/2. In DE/nrand/1 and DE/nrand/2, the local information from the current population is incorporated into the mutation schemes, when each individual is evolved by applying its nearest neighbor individual as a base vector. Based on DE/nrand/1, Epitropakis et al. (2013) proposed a new niching DE algorithm with dynamic archive to overcome the population size influence and produce good performance almost independently of its population size, namely dADE/nrand/1 algorithm, which involves in incorporation between a control parameter adaptation technique and an external dynamic archive along with a re-initialization mechanism. The control parameter adaptation technique, proposed in the context of JADE algorithm (Zhang and Sanderson 2009), is designed to efficiently adapt the control parameters of the algorithm. Meanwhile, the external dynamic archive along with a re-initialization mechanism aims to alleviate the problem, that is, have to tune the population size and allow the algorithm to maintain good performance regardless of the population size value. The control parameter adaptation technique incorporates a dynamic archive proposed in Zhai and Li (2011), aiming to record good solutions found along with a re-initialization procedure to continue searching for additional good solutions in unexplored regions of the search space. Biswas et al. (2014) presents a DE variant with parent centric mutation that makes use of normalized search neighborhood and integrates with the proximity-based crowding technique, namely PNPCDE. The PNPCDE does not make use of the problem-dependent niching parameters (like niche radius), which are hardly determined the values. The mutation operator helps to maintain the population diversity at an optimum level by using well-defined local neighborhoods. Zhang et al. (2015) presented a DE with dynamic niche radius strategy, namely DNRDE. In DNRDE, the niche radius is adjusted dynamically to make the algorithm navigate from global exploration to local exploitation by a new two-stage annealing schedule. At first stage, exploration dominates the search process, and the radius decrease dynamically. When the niche radius reaches to a cutoff value, it will be stable. At second stage, exploitation takes over to enhance the quality of the acquired optima.

In this paper, the clearing niche method is used in the mutation strategy. Compared with other niche techniques, such as speciation tree and sharing fitness methods, the clearing niche method may maintain the population diversity effectively with a lower population size, and is also simpler to implement. However, in contrast to the speciation tree method, the clearing niche method needs a problem-dependent parameter, namely, the niche radius. In such case, the DE algorithm will be sensitive to niche radius, since its performance will be altered by changing the setting of niche radius, this impact of radius changes on algorithms will be demonstrated by numerical experiment in this paper.

The paper is organized as follows. In Sect. 2, we introduce the basic differential evolution and several well performance DE variants. In Sect. 3, we describe the proposed mutation strategy in details, and we introduce the change mutation strategies in four state-of-the-art DE variants. In Sect. 4, the paper lists the functions, and their corresponding simulated diagrams. In Sect. 5, numerical experiments are presented. In Sect. 6, the concluding remarks are contained.

2 Differential evolution algorithms

In this section, we describe the basic differential evolution and several variants of DE.

2.1 Brief description of basic differential evolution

There are several strategies of DE that are proposed in the literature (Zaharie 2009), one of which DE/rand/1/bin is used widely. Accordingly, we choose DE/rand/1/bin as example to introduce DE. Just like other Evolution Algorithms (EAs), DE uses mutation and crossover to generate new individuals. DE has four basic operations involving in initialization, mutation, crossover, and selection. The whole flow chart of DE is shown in Fig. 1. In DE-literature, a parent vector from the current generation is called target vector, a mutant vector obtained through the differential mutation operation is known as donor vector, and finally an offspring formed by recombining the donor with the target vector is called trial vector. The details of DE operations are described as follows.

(1) Coding. DE is a global optimization algorithm, and individuals in population are encoded using real number.

(2) Individual. NP denotes size of the population in DE. The ith individual at Gth generation is denoted by \(X_{i,G} =[{x_{1,i,G} ,x_{2,i,G}, x_{3,i,G}, \ldots , x_{D,i,G}}]\), where D is dimension.

(3) Initializing population. The initial population (at \(G=0)\) should cover the entire search space as much as possible by uniformly randomizing individuals within the search space constrained by the prescribed minimum and maximum bounds: \(x_{j,\mathrm{min}} \in X_\mathrm{min} =\{{x_{1,\mathrm{min}} ,x_\mathrm{2,min} ,\ldots ,x_{D,\mathrm{min}}}\}\) and \(x_{j,\mathrm{max}}\in X_{\mathrm{max}} =\{{x_{1,\mathrm{max}}, x_{2,\mathrm{max}}, \ldots ,x_{D,\mathrm{max}}}\}\). Hence the jth component of the ith individual should be initialized as \(x_{j,i,0} =x_{j,\mathrm{min}} +{rand}_{i,j}(0,1)\times (x_{j,\mathrm{max}}-x_{j,\mathrm{min}})\).

(4) Mutation. Differential idea is embodied into mutation operation. According to the strategy DE/rand/1/bin, the general process of mutation is expressed by Eq. (1):

$$\begin{aligned} V_{i,G} =X_{r_1 ,G}+F\times \left( {X_{r_{2} ,G}-X_{r_{3} ,G}}\right) \end{aligned}$$
(1)

where i is the ith individual vector of current generation. \(X_{r_1 ,G}\), \(X_{r_{2}, G}\) and \(X_{r_{3}, G}\) are other three individuals, which are from current generation, where \(i\ne r_1 \ne r_{2}\ne r_{3}\). \(V_{k,G}\) is donor vector. \(F\in [0,1]\) is mutation control parameter.

There are usually following differential evolution strategies when crossover operation is bin:

  • DE/rand/1/bin: \(V_{i,G}=X_{r_1 ,G} +F\times (X_{r_{2} ,G}-X_{r_{3}, G})\)

  • DE/best/1/bin: \(V_{i,G} =X_{\mathrm{best},G} +F\times (X_{r_1 ,G}-X_{r_{2}, G})\)

  • DE/target-to-best/1/bin: \(V_{i,G} =X_{i,G} +F\times (X_{\mathrm{best},G} -X_{i,G})+F\times (X_{r_1 ,G} -X_{r_{2} ,G})\)

  • DE/best/2/bin: \(V_{i,G} =X_{\mathrm{best},G} +F\times (X_{r_1,G} -X_{r_{2}, G})+F\times (X_{r_{3} ,G} -X_{r_{4} ,G})\)

  • DE/rand/2/bin: \(V_{i,G} =X_{r_1 ,G} +F\times (X_{r_{2} ,G} -X_{r_{3},G})+F\times (X_{r_{4} ,G} -X_{r_{5} ,G})\)

where \(X_{\mathrm{best},G}\) means the best individual in the Gth generation. \(X_{i,G}\) means the ith individual in the Gth generation. \(X_{r_1, G}\), \(X_{r_{2}, G} \), \(X_{r_{3} ,G} \), \(X_{r_4 ,G} \), and \(X_{r_{5}, G}\) are other five individuals, which are from current generation, where \(i\ne r_1 \ne r_{2} \ne r_{3} \ne r_4 \ne r_{5} \).

(5) Crossover. Crossover probability is \(Cr\in [0,1]\). Crossover means to swap the dimensions between the donor vector and the target vector controlled by crossover parameter Cr. Binomial crossover and exponential crossover are two different crossover strategies. Mahfoud (1995) analyzed the influence of crossover on the behavior of DE. The numerical experiments illustrate that main difference between them is the different distributions of the number of mutated components. Theoretical analysis shows that the behavior of exponential crossover variants is more sensitive to the problem dimension than binomial crossover variants’. Usually the binomial crossover is accepted, which is described as Eq. (2):

$$\begin{aligned} u_{j,i,G} =\left\{ {{\begin{array}{ll} v_{j,i,G}, &{}\quad \hbox {if}\left( rand_{i,j}(0,1)\le Cr\, or\, j=j_{rand}\right) \\ x_{j,i,G}, &{}\quad \hbox {otherwise}\\ \end{array}}}\right. \end{aligned}$$
(2)

where j is dimension, i is individual, G is generation. \(v_{j,i,G} \) is coming from donor vector \(V_{k,G} \), \(x_{j,i,G}\) is coming from target vector \(X_{k,G}. j_{rand} \in [1,2,\ldots , D]\) is a randomly chosen index.

(6) Selection. The offspring or trial vector \(X_{i,G+1}\) can be obtained through comparing the fitness value of trial vector \(U_{i,G}\) and target vector \(X_{i,G}\) according to Eqs. (3) and (4).

$$\begin{aligned} X_{i,G+1}= & {} U_{i,G}\, \quad if\, f\left( {U_{i,G} } \right) \le f\left( {X_{i,G}}\right) \end{aligned}$$
(3)
$$\begin{aligned} X_{i,G+1}= & {} X_{i,G}\, \quad if\, f\left( {U_{i,G} } \right) >f\left( {X_{i,G} } \right) \end{aligned}$$
(4)

2.2 Some variants of DE

  1. (1)

    The jDE algorithm

Brest and Mernik (2008) proposed the jDE algorithm, in which the control parameters F and Cr are encoded into the individual \(X_{i,G} =\langle \vec {x}_{i,G}, F_{i,G}, Cr_{i,G}\rangle \) and adjusted by two new arguments \(\tau _{1}\) and \(\tau _{2}\). They are calculated independently, as shown in Eqs. (5), (6)

$$\begin{aligned}&F_{i,G+1}=\left\{ {{\begin{array}{ll} F_{l,G} +rand_{1}\times F_{u,G},&{}\quad \hbox {if } rand_{2}<\tau _{1}\\ F_{i,G}, &{} \quad \hbox {otherwise} \\ \end{array}}}\right. \end{aligned}$$
(5)
$$\begin{aligned}&Cr_{i,G+1}=\left\{ {{\begin{array}{ll} rand_{3}, &{}\quad \hbox {if } rand_{4}<\tau _{2}\\ Cr_{i,G}, &{}\quad \hbox {otherwise} \\ \end{array}}}\right. \end{aligned}$$
(6)

where \(rand_k\), \(k\in \{1,2,3,4\}\) are uniformly distributed random values belonging to the range [0, 1]; \(\tau _{1}\) and \(\tau _{2}\) are constant values that represent the probabilities of parameters being adjusted; \(F_{u,G}\) and \(F_{l,G}\) are also constant values, which denote the upper and lower bounds of the parameters, respectively. The newly generated \(F_{i,G+1}\) and \(Cr_{i,G+1}\) are procured before the mutation is implemented. Thus, the scheme influences the mutation, crossover and selection operations of the new vector.

  1. (2)

    The JADE algorithm

Zhang and Sanderson (2009) proposed the JADE algorithm, in which a novel mutation strategy and an optional external archive are utilized to provide information of progress direction. This DE/target-to-best strategy uses multiple best solutions to balance the greediness of the mutation and the diversity of the population, which is generated using Eq. (7):

(7)

where \(X_{best,G}^{p}\) is the individual that is randomly selected from the top 100p % of the current population, with \(p\in (0, 1]\). \(X_{i,G}\), \(X_{\mathrm{best},G}^{p}\) and \(X_{r1,G}\) are chosen from the current population P. \(\tilde{X}_{r2,G}\) is randomly selected from the union, \(P\cup A\), while A, an archive, is employed to store the recently explored inferior solutions. The archive operation is made very simplely to avoid significant computation overhead. The archive is initiated to be empty. Then, after each generation, the parent solutions that fail in the selection process are added to the archive. If the archive size exceeds a certain threshold, say NP, then some solutions are randomly removed from the archive to keep the archive size at NP. \(F_{i,G}\) denotes the scaling factor and \(Cr_{i,G}\) denotes the crossover rate associated with the ith individual and they are updated dynamically in each generation, as shown in Eqs. (8), (9).

$$\begin{aligned}&F_{i,G}=randc_{i}\left( u_F ,0.1\right) \end{aligned}$$
(8)
$$\begin{aligned}&Cr_{i,G}=randn_{i}\left( u_{Cr}, 0.1\right) \end{aligned}$$
(9)

\(F_{i,G} \) and \(Cr_{i,G}\) are generated according to a Normal distribution and a Cauchy distribution with associated mean values \(u_{F}\) and \(u_{Cr}\). The proposed two location parameters are initialized to be 0.5 and then updated at the end of each generation according to Eqs. (10), (11).

$$\begin{aligned}&u_F=\left( 1-c\right) \times u_F +c\times \mathrm{mean}_{L}\left( S_{F}\right) \end{aligned}$$
(10)
$$\begin{aligned}&u_{Cr}=\left( 1-c\right) \times u_{Cr}+c\times \mathrm{mean}_{A}\left( S_{Cr}\right) \end{aligned}$$
(11)

where c is a positive constant in the range(0,1); \(S_F\) and \(S_{Cr} \) denote the set of all successful mutation/crossover rates; \(\mathrm{mean}_{A}(\cdot )\) indicates the usual arithmetic mean and \(\mathrm{mean}_{L}(\cdot )\) returns the Lehmer mean shown as Eq. (12).

$$\begin{aligned} \mathrm{mean}_{L}\left( S_{F}\right) =\frac{\sum \nolimits _{i=1}^{\left| S_{F}\right| } F_{i}^{2}}{\sum \nolimits _{i=1}^{\left| S_{F}\right| }F_i} \end{aligned}$$
(12)

How to get archive A, \(S_{F}\) and \(S_{Cr}\).

In the selection process, A, \(S_F \) and \(S_{Cr}\) will be got.

$$\begin{aligned}&\hbox {If }\quad f(X_{i,G})\le f(U_{i,G})\quad X_{i,G+1}=X_{i,G}\\&\hbox {Else }\quad X_{i,G+1}=U_{i,G}; X_{i,G}\rightarrow A;Cr_{i}\rightarrow S_{Cr} ,F_i \rightarrow S_{F} \end{aligned}$$
  1. (3)

    The SaDE algorithm

Qin and Suganthan (2005) presented the SaDE, where one trial vector generation strategy was chosen from the candidate pool(“DE/rand/1”, “DE/rand/2”,“DE/target-to-best/2” and “DE/target-to-rand/1”), according to the probability learned from its success rate in generating promising solutions within a certain number of previous generations, called the learning period(LP). More specifically, these probabilities are initially equal and then gradually self-adapted upon Eq. (13)

$$\begin{aligned} P_{k,G}=\frac{S_{k,G}}{\sum \nolimits _{k=1}^K S_{k,G}} \end{aligned}$$
(13)

where \(P_{k,G}\), \(k=1,2,\ldots ,K\), denotes the probability of applying the kth strategy. Here K is the total number of strategies contained in the pool. \(S_{k,G}\) is the success rate of the trial vector, which is generated by the kth strategy and successfully enters the next generation according to Eq. (14):

$$\begin{aligned} S_{k,G} =\frac{\sum \nolimits _{g=G-LP}^{G-1} ns_{k,g}}{\sum \nolimits _{g=G-LP}^{G-1} ns_{k,g}+\sum \nolimits _{g=G-LP}^{G-1} nf_{k,g}}+\varepsilon \end{aligned}$$
(14)

where \(ns_{k,g}\) and \(nf_{k,g}\) record the number of trial vectors generated by the kth strategy that are retained or discarded in the selection operation in the last LP generations. The small constant value \(\varepsilon =0.01\) is used to avoid the possible null success rates. At each generation, for each solution in the current population, the parameters \(F_{i,k}\) and \(Cr_{i,k}\) are independently calculated upon Eqs. (15), (16).

$$\begin{aligned}&F_{i,k}=randn_i \left( {0.5,0.3}\right) \end{aligned}$$
(15)
$$\begin{aligned}&Cr_{i,k}=randn_i \left( {Crm_k ,0.1}\right) \end{aligned}$$
(16)

where coefficients are respectively generated for each individual by sampling their values from a normal distribution. Nevertheless, the mean value of Cr (\(Crm_{k}\)) is gradually self-adapted on the basis of a success rule.

  1. (4)

    The EPSDE algorithm

Mallipeddi et al. (2011) provided the EPSDE involving a pool of mutation strategies along with various combinations of parameters, which are employed for competing in order to produce successful offspring population. More concretely, the pool of strategies is formed with three schemes with diverse characteristics. The pool of Cr values is taken in the range [0.1, 0.9] in steps of 0.1, and the pool of F values is assigned in the range [0.4, 0.9] in steps of 0.1 as follows:

A pool of mutation strategies includes DE/best/2, DE/rand/1 and DE/target-to-rand/1;

  • A pool of F values includes \(F=[0.4,0.5,0.6,0.7,0.8,0.9]\)

  • A pool of Cr values includes \(Cr=[0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9].\)

In EPSDE, each individual of the initial population is randomly adopted with a mutation strategy and associated parameter values taken from the respective pools. When the trial vector performs better than the target vector, the combination of the mutation strategy and parameter values can be survived in the next generation. And the combination is also stored. Afterward, the target vector should be randomly reinitialized with a new mutation strategy from the respective pools or from the successful combinations stored with equal probability when the trial vector performs poorer.

3 Differential evolution algorithm based on niche

In this section, we propose some new mutation strategies which are based on clearing niche mechanism. It will be explained in the following sections.

3.1 The niche operation based on clearing mechanism

The main idea of the niche algorithm based on clearing mechanism is that the population is divided into a number of niches. And each subpopulation contains a dominant individual: the one that has the best fitness. If the dissimilarity between an individual and the dominant one in a given subpopulation is less than a threshold \({\updelta }\): the clearing radius, this individual, then, belongs to this subpopulation. The basic clearing algorithm preserves the fitness of the dominant individual, but resets the fitness of all the other individuals of the same subpopulation. Consequently, the clearing procedure makes the whole resource of a niche belong to a single individual: the winner. The winner takes all rather than sharing resources with the other individuals of the same niche, and other individuals as losers will take a punishment mechanism, this mechanism differs from the fitness-sharing methods. It is also possible to generalize the clearing algorithm by accepting several winners in each niche. The capacity of a niche is defined as the maximum number of winners that this niche can accept. We assume that the population size is NP, the individuals which perform better in each niche are winners, in contrast, perform worse are loser. Finally, process of flow is shown in Fig. 2.

3.2 Mutation strategies based on clearing niche

We all know that the mutation strategies play an important role in the search capability and convergence rate of DE. However, many mutation strategies are hard to maintain a balance among a good population diversity, a good global exploration ability, a good local exploitation ability, and a fast convergence rate. Clearly, from the equation of the basic mutation strategy DE/rand/1/bin, it can be seen that three vectors are randomly chosen, one of which is the base vector. As a result, DE/rand/1/bin is able to maintain the population diversity and the global search ability, but it cannot guarantee the local search ability and the convergence rate. From the equation of the basic mutation strategy DE/best/1/bin, it can be observed that the base vector is the globally best vector and the other two vectors are randomly chosen. In this way, all the vectors are guided by the best vector. Such greedy strategy is helpful for the local search ability and the convergence rate. However, the greedy strategy may lose its population diversity and global search ability. In order to maintain a balance among the population diversity, the global exploration ability, the local exploitation ability, and the convergence rate, we present some new mutation strategies, which combine with the clearing niche mechanism.

In the proposed strategies, the population is divided into some niches according to the sort of the fitness values. Each niche contains several winners whose fitness values are better than others. And all the winners will make up a new subpopulation. Note that the number of niches may be change and the number of winners in each niche will be kept unchanged during the evolution of algorithms. As aforementioned, the mutation strategy DE/best/1 obtains a best vector as the base vector. In the new mutation strategies, the base vector is selected from the group which the subpopulation instead of the entire population. Therefore, all the vectors are guided by the subpopulation which is made of several locally best vectors rather than the randomly vector or the single globally best vector. At the same time, the difference vectors involved in the mutation strategies are selected from the entire population. In such way can maintain a balance among the population diversity, the global exploration ability, the local exploitation ability, and the convergence rate.

Differing from the existing clearing niche mechanisms, we adopt a new idea of dealing with the distance. Here, it calculates the distances from the best individual and other individuals in current status respectively by using Euclidean distance. Moreover, a different normalization method is proposed to calculate the relative distance based on maximum and minimum distances for each niche. In addition, different niches vary maximum distance and minimum distance. Consequently, the clearing radius is the same whereas the ranges of niches are different. Hence, the number of each niche is not only related to individual density but also related to the maximum distance and minimum distance.

Fig. 2
figure 2

Flowchart of the clearing niche mechanism

Fig. 3
figure 3

The range of niches. Note: aj are the process which a population is divided into several niches based on the clearing niche mechanism. Blue “*” denotes the individuals that have not been divided into niches; Green “*” denotes the individuals that have been divided into niches; Red “+*” denotes the best individual in each niche; Red circle denotes the current niche; Blue circle denotes the existing niches (color figure online)

For example, if the population size is set as 50, the dimension is set as 2, the clearing radius is set as 0.3. And the fitness is the minimum value of sum of squares. We can get the two-dimensional drawing of population. Then, we can see the range of each niche in Fig. 3. The figure can be found that the range of each niche is different.

When we combine the clearing niche mechanism with several classical mutation strategies, we set a changed solution. In the changed solution, the target individual will have priority, and the best individual will be considered next. At last, if the mutation strategy do not contain the target or best individual, the rand individual will be changed which does not multiply the scaling factor F. After modifying, the usual differential evolution strategies DE/rand/1, DE/best/1, DE/target-to-best/1, DE/best/2, and DE/rand/2 can be seen in the following.

DE/rand/1/bin:

$$\begin{aligned} V_{i,G} =X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(17)

DE/best/1/bin:

$$\begin{aligned} V_{i,G} =X_{\mathrm{best},G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(18)

DE/rand/1 based on clearing niche mechanism, we name it DE/clear niche/1. And DE/best/1 based on clearing niche mechanism, we also name it DE/ clear niche /1. As illustrated in Fig. 4, in DE/clear niche/1, a mutation vector is generated in the following manner:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(19)

where \(X_{c,G}\) is randomly chosen individual from subpopulation which preserves the dominant individual by clearing niche mechanism. \(X_{best,G}\) is the best individual in the current generation. \(X_{r1,G} \quad X_{r2,G} \), and \(X_{r3,G}\) are randomly chosen three individuals from the current generation, where \(r1\ne r2\ne r3\ne i\). G is the current generation. \(V_{i,G} \) is the ith donor vector. F is the scaling factor.

DE/target-to-rand/1/bin:

$$\begin{aligned} V_{i,G} =X_{i,G} {+}F\times \left( {X_{r3,G} {-}X_{i,G} } \right) {+}F\times \left( {X_{r1,G} -X_{r2,G} } \right) \nonumber \\ \end{aligned}$$
(20)

DE/target-to-best/1 based on clearing niche mechanism, we name it DE/ clear niche -to-best/1. In DE/ clear niche-to-best/1, a mutation vector is generated in the following manner:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r3,G} -X_{c,G} +X_{r1,G} -X_{r2,G} } \right) \nonumber \\ \end{aligned}$$
(21)

where \(X_{c,G} \) is randomly chosen individual from subpopulation which preserves the dominant individual by clearing niche mechanism. \(X_{r1,G} \quad X_{r2,G} \), and \(X_{r3,G} \) are randomly chosen three individuals from the current generation, where \(r1\ne r2\ne r3\ne i\). G is the current generation. \(V_{i,G} \) is the ith donor vector. F is the scaling factor.

DE/best/2/bin:

$$\begin{aligned}&V_{i,G}\nonumber \\&=X_{best,G} {+}F\times \left( {X_{r1,G} -X_{r2,G}} \right) {+}F\times \left( X_{r3,G} {-}X_{r4,G}\right) \nonumber \\ \end{aligned}$$
(22)

DE/rand/2/bin:

$$\begin{aligned} V_{i,G} =X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) +F\times \left( {X_{r4,G} -X_{r5,G} } \right) \end{aligned}$$
(23)

DE/best/2 based on clearing niche mechanism, we name it DE/ clear niche /2. And DE/rand/2 based on clearing niche mechanism, we also name it DE/ clear niche /2. In DE/ clear niche /2, a mutation vector is generated in the following manner:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r1,G} -X_{r2,G} +X_{r3,G} -X_{r4,G} } \right) \nonumber \\ \end{aligned}$$
(24)

where \(X_{c,G} \) is randomly chosen individual from subpopulation which preserves the dominant individual by clearing niche mechanism. \(X_{best,G} \) is the best individual in the current generation. \(X_{r1,G} , X_{r2,G} , X_{r3,G} , X_{r4,G}\), and \(X_{r5,G}\) are randomly chosen five individuals from the current generation, where \(r1\ne r2\ne r3\ne r4\ne r5\ne i\). G is the current generation. \(V_{i,G} \) is the ith donor vector. F is the scaling factor.

Fig. 4
figure 4

Illustrations of the new and basic mutation strategies in two-dimensional parametric space. Green “*” denotes the individuals in the population; Red “*” denotes the best individual in each niche; Red\(\square \)” denotes the mutation vector, and they are based on different mutation strategies, such as DE/rand/1, DE/best/1, and DE/clear niche/1. For example, we know the generated manner of the mutation vector based on DE/clear niche/1, it likes that” \(\hbox {V}_{\mathrm{i,G}} =\hbox {X}_{\mathrm{c,G}} +\hbox {F}\times \left( \hbox {X}_{\mathrm{r2}},\hbox {G}-\hbox {X}_{\mathrm{r3}},\hbox {G}\right) \)”. The blue lines mean the vector \(\hbox {X}_{\mathrm{c,G}}\) and the vector \(\hbox {F}\times \left( \hbox {X}_{\mathrm{r2,G}}-\hbox {X}_{\mathrm{r3,G}} \right) \), adding two vectors, the red line is got which represents the mutation vector (color figure online)

We can see the flowchart of the new DE combined with clearing niche mechanism in Fig. 5 when the mutation strategy is DE/rand/1.

3.3 DE variants based on clearing niche

We can see the flowchart of DE with the new mutation strategy based on clearing niche mechanism in Fig. 5. And when the new mutation strategy based on clearing niche mechanism is applied to variants of DE, the corresponding mutation strategy of DE will change. It can be seen in the following section.

  1. (1)

    The jDE algorithm

In jDE algorithm, the mutation strategy is DE/rand/1. We can see change of the mutation strategy as following.

The initial mutation strategy:

$$\begin{aligned} V_{i,G} =X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(17)

The mutation strategy based on clearing niche mechanism:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r1,G} -X_{r2,G} } \right) \end{aligned}$$
(19)
  1. (2)

    The JADE algorithm

In JADE algorithm, the mutation strategy is DE/target-to-pbest/1. We can see change of the mutation strategy as following.

The initial mutation strategy:

$$\begin{aligned} V_{i,G} =X_{i,G} +F\times \left( {X_{\mathrm{best},G}^p -X_{i,G} +X_{r1,G} -\tilde{X} _{r2,G} } \right) \end{aligned}$$
(25)

The mutation strategy based on clearing niche mechanism:

$$\begin{aligned} V_{i,G} =X_{c,G}+F\times \left( {X_{\mathrm{best},G}^p -X_{c,G} +X_{r1,G} -\tilde{X} _{r2,G}}\right) \end{aligned}$$
(26)
  1. (3)

    The SaDE algorithm

In SaDE algorithm, the mutation strategies are DE/rand/1, DE/rand/2, DE/target-to-best/2, DE/target-to-rand/1. We can see change of the mutation strategy as following.

The initial mutation strategy:

$$\begin{aligned} V_{i,G}= & {} X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(17)
$$\begin{aligned} V_{i,G}= & {} X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} +X_{r4,G} +X_{r5,G} } \right) \nonumber \\ \end{aligned}$$
(23)
$$\begin{aligned} V_{i,G}= & {} X_{i,G} +F\times \left( X_{best,G} -X_{i,G}\right. \nonumber \\&\left. +X_{r1,G}-X_{r2,G} +X_{r3,G} -X_{r4,G}\right) \end{aligned}$$
(27)
$$\begin{aligned} V_{i,G}= & {} X_{i,G} +F\times \left( {X_{r3,G} -X_{i,G} +X_{r1,G} -X_{r2,G}}\right) \nonumber \\ \end{aligned}$$
(20)

The mutation strategy based on clearing niche mechanism:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r1,G} -X_{r2,G} } \right) \end{aligned}$$
(19)
$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r2,G} -X_{r3,G} +X_{r4,G} +X_{r5,G} } \right) \quad \end{aligned}$$
(24)
$$\begin{aligned} V_{i,G}= & {} X_{i,G} +F\times \left( X_{c,G} -X_{i,G} +X_{r1,G}\right. \nonumber \\&\left. -X_{r2,G} +X_{r3,G} -X_{r4,G}\right) \end{aligned}$$
(28)
$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r3,G} -X_{c,G} +X_{r1,G} -X_{r2,G} } \right) \quad \end{aligned}$$
(21)
  1. (4)

    The EPSDE algorithm

In EPSDE algorithm, the mutation strategies are DE/rand/1, DE/best/2, DE/target-to-rand/1. We can see changes of the mutation strategy as following.

The initial mutation strategy:

$$\begin{aligned} V_{i,G} =X_{r1,G} +F\times \left( {X_{r2,G} -X_{r3,G} } \right) \end{aligned}$$
(17)
$$\begin{aligned} V_{i,G} =X_{best,G} +F\times \left( {X_{r1,G} -X_{r2,G} +X_{r3,G} -X_{r4,G} } \right) \nonumber \\ \end{aligned}$$
(22)
$$\begin{aligned} V_{i,G} =X_{i,G} +F\times \left( {X_{r3,G} -X_{i,G} +X_{r1,G} -X_{r2,G}}\right) \nonumber \\ \end{aligned}$$
(27)

The mutation strategy based on clearing niche mechanism:

$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r1,G} -X_{r2,G}}\right) \end{aligned}$$
(19)
$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r1,G} -X_{r2,G} +X_{r3,G}-X_{r4,G}}\right) \nonumber \\ \end{aligned}$$
(24)
$$\begin{aligned} V_{i,G} =X_{c,G} +F\times \left( {X_{r3,G} -X_{c,G} +X_{r1,G} -X_{r2,G}}\right) \nonumber \\ \end{aligned}$$
(28)
Fig. 5
figure 5

Flowchart of DE combined with clearing niche mechanism

4 Test functions

4.1 Minimization optimization benchmark functions

This section lists the global minimization benchmark functions which are used to evaluate the performance of DE variants. In the section, 17 global minimization benchmark functions are chosen as test functions. The 17 test functions \((f_1 -f_{17})\) are dimension-wise scalable (Liang et al. 2005).

(1) Shifted Sphere function, defined as

$$\begin{aligned} f_{1}(x)=\sum \limits _{i=1}^D z_i^2 ,\quad z=x-o \end{aligned}$$
(29)

With \(o=(o_1 ,o_2 ,\ldots ,o_{D})\) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f(x^{*})=0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(2) Shifted Schwefel’s Problem 1.2, defined as

$$\begin{aligned} f_2 (x)=\sum \limits _{i=1}^{D}\left( \sum \limits _{j=1}^i z_{j}\right) ^{2},\quad z=x-o \end{aligned}$$
(30)

With \(o=\left( o_{1}, o_{2},\ldots ,o_{D}\right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}}\right) =0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(3) Shifted Rotated High Conditioned Elliptic function, defined as

$$\begin{aligned} f_3(x )=\sum \limits _{i=1}^D \left( {10^{6}} \right) ^{\frac{i-1}{D-1}}z_i^2, \quad z=\left( {x-o}\right) *M \end{aligned}$$
(31)

With \(o=\left( {o_1 ,o_2 ,\ldots , o_D}\right) \) is the shifted global optimum, M is a orthogonal rotation matrix, global optimum is \(x^{*}=o\) and \(f\left( x^{*}\right) =0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(4) Shifted Schwefel’s Problem 1.2 with Noise in Fitness, defined as

$$\begin{aligned} f_4 (x){=}\left( {\sum \limits _{i=1}^D\left( {\sum \limits _{j=1}^i z_j } \right) ^{2}}\right) *\left( {1+0.4*\left| {N\left( 0,1\right) }\right| } \right) , z=x-o \end{aligned}$$
(32)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D}\right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}} \right) =0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(5) Shifted Rosenbrock’s Function, defined as

$$\begin{aligned} f_5 (x)=\sum \limits _{i=1}^{D-1}\left( {100*\left( {z_i^2 -z_{i+1}}\right) ^{2}+\left( {z_i -1}\right) ^{2}}\right) , z=x-o \end{aligned}$$
(33)
Fig. 6
figure 6

Three-dimensional graph of 17 benchmark functions

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D}\right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}} \right) =0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(6) Shifted Rotated Griewank’s Function without Bounds, defined as

$$\begin{aligned} f_6 (x)=\sum \limits _{i=1}^D \frac{z_i^2 }{4000}-\prod \limits _{i=1}^D \cos \left( {\frac{z_i }{\sqrt{i}}} \right) +1, \quad z=\left( {x-o} \right) *M \end{aligned}$$
(34)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D } \right) \) is the shifted global optimum, M is a orthogonal rotation matrix, global optimum is \(x^{*}=o\) and \(f\left( x^{*}\right) =0\) for \(0\le x_{i} \le 600\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(7) Shifted Rotated Ackley’s Function with Global Optimum on Bounds, defined as

$$\begin{aligned} f_7 (x)= & {} -20*\hbox {exp}\left( {-0.2*\sqrt{\frac{1}{D}\sum \nolimits _{i=1}^D z_i^2 }} \right) \nonumber \\&-\hbox {exp}\left( {\frac{1}{D}\sum \limits _{i=1}^D \cos \left( {2*\pi *z_i}\right) } \right) +20+e, \nonumber \\ z= & {} \left( x-o\right) *M \end{aligned}$$
(35)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D}\right) \) is the shifted global optimum, M is a orthogonal rotation matrix, global optimum is \(x^{*}=o\) and \(f\left( x^{*}\right) =0\) for \(-32\le x_i\le 32\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(8) Shifted Rastrigin’s Function, defined as

$$\begin{aligned} f_8 (x)=\sum \limits _{i=1}^D \left( {z_i^2 -10*\cos \left( {2*\pi *z_i}\right) +10}\right) , \quad z=x-o \end{aligned}$$
(36)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D}\right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}} \right) =0\) for \(-5\le x_i \le 5\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(9) Schwefel’s Problem2.13, defined as

$$\begin{aligned} f_9= & {} \sum \limits _{i=1}^D \left( {A_i -B_i (x)} \right) ^{2}\nonumber \\ A_i= & {} \sum \limits _{j=1}^D \left( {a_{ij} \sin \alpha _j +b_{ij} \cos \alpha _j}\right) , \\ B_i (x)= & {} \sum \limits _{j=1}^D \left( {a_{ij} \sin x_j +b_{ij} \cos x_j } \right) \nonumber \end{aligned}$$
(37)

With \({\upalpha }=\left[ {\alpha _1 ,\alpha _2 ,\ldots ,\alpha _{D}}\right] \), \(\alpha _j \in \left[ {-\pi ,\pi } \right] \); With global optimum \(x^{*}=\alpha \) and \(f\left( {x^{*}} \right) =0\) for \(-\pi \le x_i \le \pi \). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(10) Shifted Expanded Griewank’s plus Rosenbrock’s Function, defined as

$$\begin{aligned} f_{10}(x)= & {} f_7 \left( {f_2\left( {z_1 ,z_2}\right) }\right) +f_7 \left( {f_2 \left( {z_2 ,z_3}\right) }\right) +\cdots \nonumber \\&+f_7\left( {f_2 \left( {z_{D-1} ,z_D}\right) }\right) +f_7 \left( {f_2\left( {z_D ,z_1}\right) }\right) \nonumber \\ z= & {} x-o+1 \end{aligned}$$
(38)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D } \right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}} \right) =0\) for \(-3\le x_i \le 1\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(11) Shifted Rotated Expanded Scaffer’s F6 Function, defined as

$$\begin{aligned}&F\left( {x,y}\right) =0.5+\frac{\left( {\sin ^{2}\left( {\sqrt{x^{2}+y^{2}}}\right) -0.5} \right) }{\left( {1+0.001\times \left( {x^{2}+y^{2}}\right) } \right) ^{2}} \end{aligned}$$
(39)
$$\begin{aligned}&f_{11}=F\left( {z_1 ,z_2 } \right) +F\left( {z_2 ,z_3 } \right) +\cdots +F\left( {z_{D-1} ,z_D } \right) \nonumber \\&~~~~~~~~~~+F\left( {z_D ,z_1 }\right) , \quad z=\left( {x-o} \right) *M \end{aligned}$$
(40)

With \(o=\left( {o_1 ,o_2 ,\ldots ,o_D } \right) \) is the shifted global optimum, global optimum is \(x^{*}=o\) and \(f\left( {x^{*}} \right) =0\) for \(-100\le x_i \le 100\). Three-dimensional graph corresponding to this function is shown in Fig. 6.

(12) Hybrid Composition Function, defined as

The functions \(f_{12}\) (CF1), \(f_{13} \) (CF7), \(f_{14}\) (CF8), \(f_{15} \) (CF9), \(f_{16} \) (CF10) and \(f_{17}\) (CF11) are composed by using 10 different functions respectively. Their global optimums are easy to find once the global basins are found. The details of constructing such functions are presented in Liang et al. (2005). And three-dimensional graph corresponding to these six functions is shown in Fig. 6.

Among the above 17 benchmark problems, functions \(f_1-f_4 \) are unimodal and functions \(f_5-f_9 \) are basic multimodal functions, \(f_{10} \) and \(f_{11} \) are expanded multimodal functions, and \(f_{12}-f_{17} \) are hybrid composition functions. The optimum value, position of the global optima, and initialization ranges for these 17 benchmark problems are provided in Table 1.

Table 1 Global optimum and initialization ranges for the benchmark functions

4.2 Multimodal optimization benchmark functions

This section lists the multimodal optimization functions which are used to evaluate the performance of DE variants. In the section, 8 multimodal optimization functions are chosen as test functions. The eight test functions (\(F_1 -F_8 )\) has diverse characteristics, and the functions used in the experiment are presented in Simionescu (2014), Surjanovic and Bingham (2015).

  1. (1)

    Beale function, defined as

$$\begin{aligned} F_1= & {} \left( {1.5-x_1 +x_1 x_2 } \right) ^{2}+\left( {2.25-x_1 +x_1 x_2^2 } \right) ^{2}\nonumber \\&+\left( {2.625-x_1 +x_1 x_2^3 } \right) ^{2} \end{aligned}$$
(41)

With global optimum is \(x^{*}=\left( {3,0.5} \right) \) and \(F\left( {x^{*}} \right) =0\) for \(-4.5\le x_i \le 4.5\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (2)

    Goldstein-price function, defined as

$$\begin{aligned} F_2= & {} \left[ {1+\left( {x_1 +x_2 +1}\right) ^{2}\left( {19-14x_1 +3x_1^2 -14x_2 +6x_1 x_2 +3x_2^2 }\right) } \right] \nonumber \\&\times \left[ {30+\left( {2x_1 -3x_2}\right) ^{2}\left( {18-32x_1 +12x_1^2 +48x_2 -36x_1 x_2 +27x_2^2}\right) } \right] \nonumber \\ \end{aligned}$$
(42)

With global optimum is \(x^{*}=\left( {0,-1}\right) \) and \(F\left( {x^{*}}\right) =3\) for \(-2\le x_i \le 2\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (3)

    Matyas function, defined as

$$\begin{aligned} F_3 =0.26\left( {x_1^2 +x_2^2 } \right) -0.48x_1 x_2 \end{aligned}$$
(43)

With global optimum is \(x^{*}=\left( {0,0} \right) \) and \(F\left( {x^{*}} \right) =0\) for \(-10\le x_i \le 10\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (4)

    Cross-in-tray function, defined as

$$\begin{aligned} F_4 =-0.0001\left( {\left| {\sin \left( {x_1 } \right) \sin \left( {x_2 } \right) {\hbox {exp}}\left( {\left| {100-\frac{\sqrt{x_1^2 +x_2^2 }}{\pi }} \right| } \right) } \right| +1} \right) ^{0.1} \end{aligned}$$
(44)

With global optimum is \(x^{*}=\left( {1.3491,-1.3491} \right) ,\left( {-1.3491,-1.3491} \right) ,\) \(\left( {-1.3491,1.3491} \right) ,\left( {1.3491,1.3491} \right) \) and \(F\left( {x^{*}} \right) =-2.06261\) for \(-10\le x_i \le 10\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (5)

    Eggholder function, defined as

$$\begin{aligned} F_5= & {} -\left( {x_2 +47} \right) sin \left( {\sqrt{\left| {x_2 +\frac{x_1 }{2}+47} \right| }} \right) -x_1 \nonumber \\&sin \left( {\sqrt{\left| {x_1 -\left( {x_2 +47} \right) } \right| }} \right) \end{aligned}$$
(45)

With global optimum is \(x^{*}=\left( {512,404.2319} \right) \)and \(F\left( {x^{*}} \right) =-959.6407\) for \(-512\le x_i \le 512\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (6)

    Simionescu function, defined as

$$\begin{aligned} F_6= & {} 0.1x_1 x_2\nonumber \\&\hbox {subjected to: } x_1^2 +x_2^2\nonumber \\&\le \left( {r_T +r_S \cos \left( {n\tan ^{-1}\left( {\frac{x_1 }{x_2 }} \right) } \right) } \right) ^{2} \end{aligned}$$
(46)

With global optimum is \(x^{*}=\left( {-0.85586,0.85586} \right) ,\left( {0.85586,0.85586} \right) ,\) \(\left( {0.85586,-0.85586} \right) ,\left( -0.85586,-0.85586 \right) \) and \(F\left( {x^{*}} \right) =-0.072625\) for \(-1.25\le x_i \le 1.25\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (7)

    Holder Table function, defined as

$$\begin{aligned} \hbox {F}_7 =-\left| {\sin \left( {\hbox {x}_1 } \right) \cos \left( {\hbox {x}_2 } \right) \hbox {exp}\left( {\left| {1-\frac{\sqrt{\hbox {x}_1^2 +\hbox {x}_2^2 }}{{\uppi }}} \right| } \right) } \right| \end{aligned}$$
(47)

With global optimum is \(x^{*}=\left( {8.05502,-9.66459} \right) , \left( {-8.05502,-9.66459} \right) \), \(\left( {-8.05502,9.66459} \right) \), \(\left( 8.05502,\right. \left. 9.66459 \right) \) and \(F\left( {x^{*}} \right) =-19.2085\) for \(-10\le x_i \le 10\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

  1. (8)

    Shubert function, defined as

$$\begin{aligned} F_8= & {} \left( {\sum \limits _{i=1}^5 i\cos \left( {\left( {i+1} \right) x_1 +i} \right) } \right) \nonumber \\&\times \left( {\sum \limits _{i=1}^5 i\cos \left( {\left( {i+1} \right) x_2 +i} \right) } \right) \end{aligned}$$
(48)
Fig. 7
figure 7

Three-dimensional graph of 8 multimodal functions

With global optimum is \(x^{*}=\left( {-7.0835,-1.4250} \right) ,\left( {-7.0835,4.8580} \right) ,\) \(\left( {-7.0835,-7.7082} \right) ,\left( -7.7082,5.4829 \right) ,\qquad \!\!\left( {-7.7082,-0.8003} \right) ,\quad \left( {-7.7082,-7.0835} \right) , \left( {5.4829,-1.4250} \right) ,\left( {5.4829,4.8580} \right) ,\left( {5.4829,-7.7082} \right) ,\left( {4.8580,-0.8003} \right) \), \(\left( {4.8580,5.4829} \right) ,\left( {4.8580,-7.0835} \right) ,\left( {-0.8003,-1.4250} \right) ,\left( {-0.8003,4.8580} \right) ,\) \(\left( -0.8003,-7.7082 \right) ,\qquad \!\!\left( {-1.4250,-0.8003} \right) ,\quad \left( {-1.4250,5.4829} \right) ,\left( {-1.4250,-7.0835} \right) \) and \(F\left( {x^{*}} \right) =-186.7309\) for \(-5.12\le x_i \le 5.12\left( {i=1,2} \right) \). Three-dimensional graph corresponding to this function is shown in Fig. 7.

Table 2 Comparison results between the basic DE algorithms and their corresponding DE variants

5 Experimental studies

In this section, comprehensive experiments are carried out to evaluate the effectiveness of new mutation strategies, including two basic DE algorithms and four advanced DE variants. It includes three studies, one of which is the accuracy study, one is the multimodal study, and another is a multiobjective problem study.

5.1 Accuracy study

In this section, 17 benchmark functions are selected as the test functions, which are from the CEC 2005 special session. The objective is to minimize values of the functions. Here, the experimental setup is shown first. Then, the comparisons between DE changed with new mutation strategies and the corresponding DE algorithms are proposed.

5.1.1 Experimental setup

To make a fair comparison, the parameters for all the DE algorithms are set as follows unless a change is mentioned.

  1. (1)

    Dimension of the functions (D): 30.

  2. (2)

    Population size (NP): 50.

  3. (3)

    Maximal number of fitness function evaluations: 10000.

  4. (4)

    Number of runs: 25.

  5. (5)

    Scaling Factor (F) and Crossover Control Parameter (Cr): For two basic DE algorithms, F and Cr are (0.5, 0.9) and (0.5, 0.3), respectively. For four advanced DE variants, F and Cr are set as mentioned in Brest and Mernik (2008), Qin and Suganthan (2005), Mallipeddi et al. (2011), Zhang and Sanderson (2009).

In the experiments, comparisons between two basic DE algorithms whose mutation strategies are set as DE/rand/1, DE/best/1 and their corresponding DE algorithms with new mutation strategies are made first. Then, the performance of several advanced DE variants with the corresponding DE variants with new mutation strategies are compared, including jDE (Brest and Mernik 2008), JADE (Zhang and Sanderson 2009), SaDE (Qin and Suganthan 2005), and EPSDE (Mallipeddi et al. 2011). All the parameters of these DE variants are set as their basic papers. Due to space limitation, only mean and standard deviation of the best error values obtained by algorithms are shown in this paper.

5.1.2 Comparison with basic DE algorithms

In this section, the DE algorithms with new mutation strategies (call DE-Niche for short) are compared with two basic DE algorithms whose mutation strategies are set as DE/rand/1 and DE/best/1 to test the effectiveness of new mutation strategies when whose F and Cr respectively are 0.5, 0.9 and 0.5, 0.3. As the changed rule which is described in Sect. 3.2, the new mutation strategies of DE/rand/1 and DE/best/1 are same. The results for the functions at 30D are shown in Table 2. Wilcoxon’s rank sum test at a 0.05 significance is performed between DE-Niche and two basic DE algorithms with different control parameters.

From Table 2, we can find that the DE-Niche significantly outperforms the corresponding DE algorithm with respect to the overall performance. Specifically, when \(F=0.5\) and \(Cr=0.9\), DE-Niche significantly improves the performance of DE/best/1 on 15 out of 17 functions and loses on only one functions. For \(F=0.5\) and \(Cr=0.3\), DE-Niche improves the performance of DE/rand/1 on 7 out of 17 functions, without losing on any functions. Table 3 presents the result of comparison of DE-Niche with two basic DE algorithms. It is clear that on average the proposed DE-Niche algorithm performs better than the two basic DE algorithms.

From the comparison between DE-Niche and two basic DE algorithms, it can be shown that the advantages of DE-Niche are obviously and the solution quality of DE-Niche is better than DE/rand/1 and DE/best/1. The improvements from DE-Niche may be contributed by the fact that the new mutation strategies are combined with the clearing niche method can balance the population diversity and exploration capability.

5.1.3 Comparison with advanced DE variants

In order to further evaluate the effectiveness of the new mutation strategies, the clearing niche method is incorporated into several advanced DE variants, namely, jDE (Brest and Mernik 2008), JADE (Zhang and Sanderson 2009), SaDE (Qin and Suganthan 2005), and EPSDE (Mallipeddi et al. 2011) including jDE-Niche, JADE-Niche, SaDE-Niche and EPSDE-Niche. The comparisons between DE variants with new mutation strategies and the corresponding DE variants are made on 17 functions at 30D. The results are shown in Table 4. Wilcoxon’s rank sum test at a 0.05 significance is performed between jDE- Niche, JADE- Niche, SaDE- Niche and EPSDE- Niche and their corresponding DE variants.

Table 3 Comparison of DE-Niche with two basic DE algorithms

From the results of functions in Table 4, it is clear that the new mutation strategies with clearing niche mechanism outperform most of the DE variants. For jDE, jDE-Niche is significantly better on 8 out of 17 functions and is outperformed by jDE on only one function. For JADE, JADE-Niche improves the performance of DE/rand/1 on 9 functions and loses on 2 functions. For SaDE, SaDE-Niche can significantly outperform SaDE on 10 functions, without being outperformed on any functions. For EPSDE, EPSDE-Niche is better on 6 functions and is worse on 3 functions. Table 5 presents the result of comparison of jDE, JADE, SaDE, and EPSDE algorithms with their corresponding DE variants. It is clear that on average the DE variants with new mutation strategies perform better than their corresponding DE algorithms.

From the comparison between jDE, JADE, SaDE, and EPSDE and their corresponding DE variants jDE-Niche, JADE-Niche, SaDE-Niche, and EPSDE-Niche, it can be shown that the solution quality of DE whose mutation strategy is based on the clearing niche mechanism is the best. Therefore, the new mutation strategies can improve the performance of most DE variants studied.

Table 4 Comparison results between the jDE, JADE, SaDE and EPSDE algorithms and their corresponding DE variants with new mutation strategies
Table 5 Comparison of the jDE, JADE, SaDE and EPSDE algorithms with their corresponding DE variants
Fig. 8
figure 8

DE-Niche with different clearing radius when \(F=0.5, Cr=0.9\)

Table 6 Results on DE-Niche with different clearing radius when F and Cr are set as 0.5 and 0.9

5.1.4 Discussion about the choices the clearing radius for DE algorithms with new mutation strategies

As mentioned in Sect. 3.1, the clearing radius threshold \(\delta \) is used to determine the size of the niches. If the value of \(\delta \) is large, most individuals would belong to one niche. On the contrary, if the value of \(\delta \) is small, the number of niches would be large. In order to determine the setup of \(\delta \) for DE algorithms with new mutation strategies, namely, DE-Niche, jDE-Niche, JADE-Niche, SaDE-Niche and EPSDE-Niche, the comparisons of DE variants with different \(\delta \) are made. The \(\delta \) value is set as 0.1, 0.2, 0.3 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9, respectively. Due to space limitation, only highly distinct functions are shown in this paper. In general, if the value for \(\delta \) obtains a poor performance we will give it up, if the value for \(\delta \) obtains a good performance we will give priority to it, if the value for \(\delta \) obtains a mild performance, at the same times, other values obtain both a good and poor performance for different functions, we will choose a mild value for \(\delta \), and if the performance of all values are close, we can also consider the convergence of functions.

In Fig. 8 and Table 6, the comparisons of DE-Niche with different \(\delta \) are shown when \(F=0.5,Cr=0.9\). It is clear that too big \( \delta \) will affect the performance of DE-Niche. For function \(f_1 \), \(f_2 \), \(f_4 \), \(f_6 \), \(f_8 \), \(f_{10}\), \(f_{13} \), \(f_{15} \) and \(f_{16} \), 0.8 and 0.9 for \(\delta \) are the worst values, while the best one is 0.8 for function \(f_5\) and \(f_{17} \) and the second is 0.9 for function \(f_{11} \). Similarly, for function \(f_1 \), \(f_2 \), \(f_4 \), \(f_5 \), \(f_{10} \), \(f_{12}\), \(f_{14} \), and \(f_{16} \), 0.7 for \(\delta \) are the worst value, while 0.7 is best for function \(f_{11} \) and the second best for function \(f_7 \), \(f_9 \), and \(f_{17} \). Likewise, 0.6 for \(\delta \) are the best value for function \(f_7 \), \(f_9 \), \(f_{10} \), and \(f_{16} \), the second and third best for function \(f_3 \), \(f_5 \), \(f_{12} \), and \(f_{16} \), while 0.6 for \(\delta \) are the second and third worst value for function \(f_8 \), \(f_{15} \), and \(f_{17} \). We can also analyze the results when the \(\delta \) is set as 0.1, 0.2, 0.3 0.4, and 0.5. Apparently, 0.2 for \(\delta \) are the best one for function \(f_{14} \), the second and third best value for function \(f_1 \), \(f_2 \), \(f_5 \), \(f_6 \), \(f_9 \), \(f_{10} \), \(f_{13} \), \(f_{15} \), and \(f_{17} \), while 0.2 for \(\delta \) are the third worst value for function \(f_{12} \). For function \(f_1 \), \(f_2 \), \(f_3 \), \(f_4 \), \(f_6 \), \(f_8\), \(f_{10} \), \(f_{12} \), and \(f_{16} \), 0.3 for \(\delta \) are the first, second or third best value, while 0.3 for \(\delta \) is the first, second or third worst value for function \(f_9 \), \(f_{11} \), and \(f_{17} \). In general, considering all functions, \(\delta =0.2\) can most enhance the performance of DE-Niche when \(F=0.5,Cr=0.9\). And the result can be seen in Table 12.

In Fig. 9 and Table 7, the comparisons of DE-Niche with different \(\delta \) are shown when \(F=0.5,Cr=0.3\). It is clear that too small \(\delta \) will affect the performance of DE-Niche and mild values just show medium performance. For function \(f_1 \), \(f_6 \), \(f_8 \), \(f_{10} \), \(f_{13} \), \(f_{15} \), and \(f_{16} \), DE-Niche obtains the similar performance for all values of \(\delta \). 0.9 for \(\delta \) obtains the best value for function \(f_2 \), \(f_4 \), and \(f_9 \), and the second best value for function \(f_{11} \), while the worst value for function for function \(f_{13} \). For function \(f_2 \), \(f_3 \), \(f_4 \), \(f_5 \), \(f_{11} \), and \(f_{17} \), 0.8 for \(\delta \) obtains the first or second best value, while for function \(f_9 \), \(f_{12} \), and \(f_{14} \), 0.8 for \(\delta \) performs worse. For function \(f_2 \), \(f_4 \), \(f_5 \), \(f_7 \), \(f_{12} \), and \(f_{17} \), 0.7 for \(\delta \) obtains the second or third best value. In the same way, we can analyze the other values of \(\delta \) and it can be found that the other values of \(\delta \) cannot obtain a better result than 0.9, 0.8, or 0.7. In general, considering all functions, \(\delta =0.7\) can most enhance the performance of DE-Niche when \(F=0.5,Cr=0.3\). And the result can be seen in Table 12.

In Fig. 10 and Table 8, the comparisons of jDE-Niche with different \(\delta \) are shown. The results indicate that a mild value of \(\delta \) for unimodal and basic multimodal functions is better, and a small value of \(\delta \) for expanded multimodal and hybrid composition functions is better. For function \(f_7 \), 0.9 for \(\delta \) is best value, 0.1 for \(\delta \) is worst value, and the other values for \(\delta \) is medium. However, when we consider from \(f_1 \) to \(f_9 \) in Table 8, it can be found 0.5 for \(\delta \) is the better choice for unimodal and basic multimodal functions. For function \(f_{11} \), the best value for \(\delta \) is 0.8, the worst value for \(\delta \) is 0.7, and the other values for \(\delta \) are close. For function \(f_{12} \), \(f_{14} \) and \(f_{15} \), 0.1 for \(\delta \) is the best value and the other values are close. For function \(f_{15} \), the

Fig. 9
figure 9

DE-Niche with different clearing radius when \(F=0.5,Cr=0.3\)

In Fig. 11 and Table 9, the comparisons of JADE-Niche with different \(\delta \) are shown. It is clear that too small or too large value for \(\delta \) shows an unstable performance on different functions. For function \(f_4 \), the worst and second worst value for \(\delta \) is 0.2 and 0.8, the other values are close, the convergence of 0.4 for \(\delta \) performs best. For function \(f_7 \), 0.8 and 0.9 for \(\delta \) are signally better than other values and other values for \(\delta \) are close, however, 0.8 and 0.9 for \(\delta \) are signally worse than other values for functions \(f_{12} \), \(f_{14} \) and \(f_{17} \). For function \(f_{11} \), 0.3 for \(\delta \) are signally better than other values and other values for \(\delta \) are close, however, 0.3 for \(\delta \) are signally worse than other values for functions \(f_7 \). For functions \(f_{14} \) and \(f_{17} \), all the values for \(\delta \) are close, but 0.4 for \(\delta \) performs better than other values. It is shown in Table 9, synthesizing each kind of situation, for JADE-Niche, the value of \(\delta \) is set as 0.4 from \(f_1 \) to \(f_{17} \). 0.4 for \(\delta \) can most enhance the performance of JADE-Niche. And the result is shown in Table 12.

In Fig. 12 and Table 10, the comparisons of SaDE-Niche with different \(\delta \) are shown. We can see that 0.5, 0.6, 0.8, and 0.9 show an unstable performance on different functions. For function \(f_4 \), the worst value for \(\delta \) is 0.5, the other values are close, and the convergence of 0.9 and 0.1 for \(\delta \) perform better. For function \(f_7 \), 0.6 and 0.3 for \(\delta \) are signally better than other values, the third best value for \(\delta \) is 0.2, and the worst and second worst values for \(\delta \) is 0.9 and 0.8 while 0.9 and 0.8 for \(\delta \) perform unstable for functions \(f_{11}, f_{12}\). For function \(f_{11} \), all the values for \(\delta \) are close, 0.9 for \(\delta \) performs best, 0.2 performs worst, while 0.8 performs third worst. For function \(f_{12}\), 0.8 for \(\delta \) performs signally best than other values. For function \(f_{15}\) we can find that 0.5 for \(\delta \) performs signally worse than other values, the other values perform similar, and the convergence of 0.2 for \(\delta \) perform better. For function \(f_{17} \), 0.2 for \(\delta \) performs signally best than other values. Considering both performance and convergence in Table 10, 0.2 for \(\delta \) performs better on functions \(f_1 \), \(f_2 \), \(f_7 \), \(f_9 \), \(f_{13} \), \(f_{14} \), \(f_{15} \), and \(f_{17} \). Therefore, for SaDE-Niche, the value of \(\delta \) is set as 0.2. And the result is shownin Table 12.

Table 7 Results on DE-Niche with different clearing radius when F and Cr are set as 0.5 and 0.3
Fig. 10
figure 10

jDE-Niche with different clearing radius

Table 8 Results on jDE-Niche with different clearing radius
Fig. 11
figure 11

JADE-Niche with different clearing radius

Table 9 Results on JADE-Niche with different clearing radius

In Fig. 13 and Table 11, the comparisons of EPSDE-Niche with different \(\delta \) are shown. It can be seen that most values for \(\delta \) perform unstably on different functions. 0.1 for \(\delta \) performs better on functions \(f_4 \) and \(f_7 \), while it performs worse on functions \(f_1 \), \(f_2 \), \(f_3 \), \(f_6 \), \(f_8 \), \(f_{10} \), \(f_{11} \) and \(f_{16} \). 0.2 for \(\delta \) performs better on functions \(f_8 \), \(f_{10} \), and \(f_{17} \), while it performs worse on functions \(f_2 \), \(f_9 \), \(f_{11} \), \(f_{14} \), \(f_{15} \), and \(f_{16} \). 0.3 for \(\delta \) performs better on functions \(f_1 \),\(f_2 \), \(f_4 \), \(f_5 \), \(f_7 \), \(f_{11} \), \(f_{13} \), \(f_{15} \),and \(f_{17} \) while it performs worse on functions \(f_6 \) and \(f_{12} \). 0.4 for \(\delta \) performs better on functions \(f_3 \), \(f_5 \), \(f_6 \), \(f_{12} \),\(f_{13} \), and \(f_{14} \), while it performs worse on functions \(f_2 \), \(f_7 \), \(f_8 \), \(f_9 \) and \(f_{15} \). 0.5 for \(\delta \) performs better on functions \(f_1 \), \(f_2 \), \(f_3 \), \(f_{10} , f_{12} ,\) and \(f_{16} \), while it performs worse on functions \(f_4 \), \(f_5 \), \(f_7 \), \(f_9 \) and \(f_{14} \). 0.6 for \(\delta \) performs better on functions \(f_1 \), \(f_9 \), \(f_{11} \), and \(f_{15} \), while it performs worse on functions \(f_4 \), \(f_5 \), \(f_7 \), \(f_{10} \), \(f_{12} \), \(f_{13} \), and \(f_{16} \). 0.7 for \(\delta \) performs better on functions \(f_4 \), \(f_5 \), \(f_6 \), \(f_8 \), \(f_{14} \) and \(f_{18} \), while it performs worse on functions \(f_3 \), \(f_{12} \), \(f_{13} \) and \(f_{17} \). 0.8 for \(\delta \) performs better on functions \(f_3 \), \(f_7 \), \(f_{12} \) and \(f_{16} \), while it performs worse on functions \(f_1 \), \(f_5 \), \(f_6 \), \(f_8 \), \(f_{10} \), \(f_{11} \), \(f_{13} \), \(f_{14} \), and \(f_{16} \). 0.9 for \(\delta \) performs better on functions \(f_2 \), \(f_6 \), \(f_8 \), \(f_9 \), \(f_{10} \), \(f_{11} \), \(f_{13} \), \(f_{14} \), and \(f_{16} \), while it performs worse on functions \(f_1 \), \(f_4 \), \(f_{15} \), and \(f_{17} \). Comprehensively considered with every factor, it can be found that the value for \(\delta \) should be set as 0.3 for EPSDE-Niche. And the result is shown in Table 12.

We all know that the bigger F or the smaller Cr is, the higher exploration capability of DE is. Regarding to the clearing niche mechanism, the smaller the value of clearing radius \(\delta \) is, the higher exploration capability of DE is. And the larger the value of clearing radius \(\delta \) is, the higher exploitation capability of DE is. The recommendation of cleaing radiuses is 0.2 when \(F=0.5,Cr=0.9\), while \(\delta \) is 0.7 when \(F=0.5,Cr=0.3\). It shows that when the values of F is equal, the smaller the value of Cr is, the bigger the value of clearing radius \(\delta \) is. Therefore, it maintains a balance between the exploitation capability and the exploration capability. Since jDE-Niche, JADE-Niche, SaDE-Niche and EPSEDE-Niche are self-adaptive algorithms, it is hard to analyze the relationship between F, Cr and clearing radius \(\delta \). Thus, we do not know how it maintains a balance between the exploitation capability and the exploration capability.

In addition, do a longitudinal analysis, it is also hard to distinguish which value of clearing radius \(\delta \) suits which function for different algorithms own different value of clearing radius \(\delta \) when the function is same. So the values of clearing radius \(\delta \) are not only based on the functions but also the algorithms.

Fig. 12
figure 12

SaDE-Niche with different clearing radius

Table 10 Results on SaDE-Niche with different clearing radius

5.2 Multimodal study

In this section, eight multimodal functions are selected as the test functions. The study focuses on finding all the global optimal solutions. Here, the experimental setup is shown first. Then, the comparisons between DE changed with new mutation strategies and the corresponding DE algorithms are made.

5.2.1 Experimental setup

To make a fair comparison, the parameters for all the DE algorithms are set as follows unless a change is mentioned.

  1. (1)

    Dimension of the functions (D): 2.

  2. (2)

    Population size (NP): 50.

  3. (3)

    Maximal number of fitness function evaluations: 10000.

  4. (4)

    Number of runs: 25.

  5. (5)

    Scaling Factor (F) and Crossover Control Parameter (Cr): For two basic DE algorithms, F and Cr are (0.5, 0.9) and (0.5, 0.3), respectively. For four advanced DE variants, F and Cr are set as mentioned in Brest and Mernik (2008), Qin and Suganthan (2005), Mallipeddi et al. (2011), Zhang and Sanderson (2009).

  6. (6)

    Clearing radius \((\delta )\): The clearing radiuses for the algorithms are shown in Table 12. And, for F1, F2, F3, and F5, the clearing radiuses for jDE-Niche is set as 0.5, while for F4, F6, F7, and F8, the clearing radiuses for jDE-Niche is set as 0.1.

  7. (7)

    Level of accuracy \((\upepsilon )\): For functions F1, F2, F3 and F7, the level of accuracy is set as {1.0E-03, 1.0E-04, 1.0E-05, 1.0E-06}. For function F4, the level of accuracy is set as {1.0E-01, 1.0E-02, 1.0E-03, 1.0E-04}. For functions F5, and F8, the level of accuracy is set as {1.0E-02, 1.0E-03, 1.0E-04, 1.0E-05}. For function F6, the level of accuracy is set as {5.0E-01, 3.0E-01, 1.0E-01, 1.0E-02}.

5.2.2 Evaluation criterions

The performance of all multimodal algorithms is measured by the following two criterions:

(1) Peak Ratio (PR): the average percentage of all known global optima found over multiple runs. The high peak ratio indicates the versatility of the algorithm against variety of the functional landscapes.

$$\begin{aligned} PR=\frac{\sum \nolimits _{i=1}^{NR} NPF_i }{NKP*NR} \end{aligned}$$
(49)

where \(NPF_i \) denotes the number of global optima found in the end of the ith run, NKP denotes the number of known global optima, and \(\hbox {NR}\) the number of runs.

(2) Success Rate (SR): the percentage of successful runs (a successful run is defined as a run where all known global optimal solutions are found) out of all runs. The high values of success rate indicate the fine search performed by the algorithms as well as the tendency to preserve the detected peaks against the drifting forces.

$$\begin{aligned} SR=\frac{NSR}{NR} \end{aligned}$$
(50)

where \(\hbox {NSR}\) denotes the number of successful runs.

5.2.3 Result and analysis

In this section, DE-Niche, jDE-Niche, JADE-Niche, SaDE-Niche, and EPSDE-Niche comparing with DE, jDE, JADE, SaDE, and EPSDE are evaluated when they are used for handling multimodal optimization problems. To verify the effectiveness of the new mutation strategies, we employ eight classic multimodal benchmark functions.

In Tables 13, 14, and 15, the success rate and peak ratio of 10 algorithms are reported respectively. The high values of success rate indicate the fine searches are performed by the algorithms, as well as the tendency is observed to preserve the detected peaks against the drifting forces. The high peak ratio indicates the versatility of the algorithm against variety of the functional landscapes. Here, we mark the results that exhibits better performance measured by SR or PR with bold-face in Tables 1314 and 15. Based on the experimental results, for the majority of the considered functions, the DE algorithms with the new mutation strategies based on clearing niche method show better or similar performances measured by both SR and PR, compared with the DE algorithms with common mutation strategies. Specifically, in the first three functions, all algorithms behave similarly, exhibiting best performance independently of the accuracy level, except the basic DE with DE/best/1 strategy performs worse on function \(F_1 \). This behavior changes in the next functions, where the DE algorithms with the new mutation strategies based on clearing niche method clearly outperform the DE algorithms with the common mutation strategies on both PR and SR measures for functions \(F_4 \), \(F_6 \), \(F_7 \), and \(F_8 \), but underperform for functions \(F_5 \). It is worth noting that the DE algorithms with the new mutation strategies based on clearing niche method exhibits significantly great performance gains for the majority of the functions aforementioned for almost all accuracy levels.

Fig. 13
figure 13

EPSDE-Niche with different clearing radius

Table 11 Results on EPSDE-Niche with different clearing radius
Table 12 Recommended clearing radiuses for the algorithms

In Table 16, it can be observed that DE-Niche achieves the best total rank of 57.5 in SR and 52 in PR when F and Cr are 0.5, 0.9. It achieves the best total rank of 58 in SR and 53 in PR when F and Cr are 0.5 and 0.3, respectively. jDE-Niche achieves the best total rank of 46.5 in SR and 43.5 in PR. SaDE-Niche achieves the best total rank of 46.5 in SR and 46 in PR. EPSDE-Niche achieves the best total rank of 46.5 in SR and 43.5 in PR. However, JADE-Niche achieves the best total rank of 47.5 in SR, while JADE achieves the best total rank of 47 in PR. Overall the DE algorithms with the new mutation strategies based on clearing niche method perform better than the DE algorithms with the common mutation strategies when it maintains both PR and SR measure.

Table 13 Peak ratios, success rates and rank of the basic DE algorithms and DE-Niche algorithm
Table 14 Peak ratios, success rates and rank of the jDE, JADE, and SaDE algorithms and their corresponding DE variants with new mutation strategies
Table 15 Peak ratios, success rates and rank of the EPSDE algorithms and EPSDE_Niche algorithms
Table 16 Total ranks for all algorithms

5.3 Study for EED problem

In this section, the present formulation treats economic environmental dispatch (EED) problem as a multi-objective mathematical programming problem is selected to test the performance of the new strategies. Here, the introduction of the multiobjective problem is given first. Then, the experimental setup is shown. Lastly, the experimental result and analysis are made.

5.3.1 Problem formulation

The EED problem attempts to optimize both cost and emission simultaneously, while satisfying both equality and inequality constraints. The following objectives and constraints are taken into account in the formulation of EED problem.

5.3.1.1 Objectives

(1) Cost

The fuel cost function of each fossil fuel-fired generator, considering the valve-point effect (Walters and Sheble 1993), is expressed as the sum of a quadratic and a sinusoidal function.

The total fuel cost in terms of real power output can be expressed as Eq. (51).

$$\begin{aligned} F=\sum \limits _{i=1}^N \left[ {a_i +b_i P_i +c_i P_i^2 +\left| {d_i \sin \left\{ {e_i \left( {P_i^{\mathrm{min}} -P_i } \right) } \right\} } \right| } \right] \nonumber \\ \end{aligned}$$
(51)

where \(P_i \) is the power output of ith unit, \(P_i^{\mathrm{min}} \) is the lower generation limits for ith unit. \(a_i \), \(b_i \), \(c_i \), \(d_i\), and \(e_i \) are the cost coefficients of ith unit. N is the number of generating units.

(2) Emission

The atmospheric pollutants such as sulfur oxides (SOx), nitrogen oxides (NOx) and carbon dioxide (CO2) caused by fossil fuel-fired generator can be modeled separately. However, for comparison purposes, the total emission of these pollutants which is the sum of a quadratic and an exponential function (Basu 2011) can be expressed as Eq. (52).

$$\begin{aligned} E=\sum \limits _{i=1}^N \left[ {\alpha _i +\beta _i P_i +\gamma _i P_i^2 +\eta _i exp\left( {\delta _i P_i}\right) }\right] \end{aligned}$$
(52)

where \(P_i\) is the power output of ith unit. \(\alpha _i\), \(\beta _i \), \(\gamma _i \), \(\eta _i \), and \(\delta _i \) are the emission coefficients of \(\hbox {ith}\) unit. N is the number of generating units.

5.3.1.2 Constraints

(1) Real power balance constraint

The total real power generation must balance the predicted power demand plus the real power losses in the transmission lines.

$$\begin{aligned} \sum \limits _{i=1}^N P_i -P_D -P_L =0 \end{aligned}$$
(53)

where \(P_D \) is the load demand. \(P_L \) is calculated by using B coefficients (the transmission loss coefficient) which can be expressed in the quadratic form as follows:

$$\begin{aligned} P_L =\sum \limits _{i=1}^N \sum \limits _{j=1}^N P_i B_{ij} P_j \end{aligned}$$
(54)

(2) Real power operating limits

$$\begin{aligned} P_i^{min} \le P_i \le P_i^{\mathrm{max}},\quad i\in N \end{aligned}$$

where \(P_i^{\mathrm{min}} \) and \(P_i^{\mathrm{max}} \) are the lower and upper generation limits for ith unit.

Table 17 Power output, cost, and emission of ten-unit system by the basic DE algorithms and DE-Niche algorithm

5.3.2 Experimental setup

To make a fair comparison, the parameters for all the DE algorithms are set as follows unless a change is mentioned.

  1. (1)

    Dimension of the functions (D): 10.

  2. (2)

    Population size (NP): 50.

  3. (3)

    Maximal number of fitness function evaluations: 1000.

  4. (4)

    Scaling Factor (F) and Crossover Control Parameter (Cr): For two basic DE algorithms, F and Cr are (0.5, 0.9) and (0.5, 0.3), respectively. For four advanced DE variants, F and Cr are set as mentioned in Brest and Mernik (2008), Qin and Suganthan (2005), Mallipeddi et al. (2011), Zhang and Sanderson (2009).

  5. (5)

    Clearing radius (\({\updelta })\): The clearing radiuses for the algorithms are shown in Table 12. And, the clearing radiuses for jDE-Niche is set as 0.5.

  6. (6)

    Load demand: \(P_D =2000\) MW.

5.3.3 Result and analysis

This system consists of ten generating units with non-smooth fuel cost and emission level functions. Unit data and loss coefficients are given in “Appendix”.

Solutions by using basic DE and DE-Niche on EED problem are shown in Table 17. During cost minimization, fuel cost is \(1.11497 \times 10^{5}{\$}\) and emission is 4612.18 lb. But cost increases to \(1.16412 \times 10^{5}{\$}\) and emission decreases to 3972.24 lb in case of emission minimization. Results obtained from DE/rand/1, DE/best/1, and DE-Niche, are summarized in Table 17. In case of EED by using DE-Niche when \(F=0.5\) and \(Cr=0.9\), fuel cost is \(1.12606 \times 10^{5}{\$}\) which is more than \(1.11497 \times 10^{5}{\$}\) and less than \(1.16412 \times 10^{5}{\$}\) , and emission is 4263.24 lb which is less than 4612.18 lb and more than 3972.24 lb. By using DE/rand/1, fuel cost is \(1.12622 \times 10^{5}{\$}\) which is more than \(1.12606 \times 10^{5}{\$}\) and emission is 4264.56 lb which is more than 4263.24 lb. By using DE/best/1, fuel cost is \(1.12621 \times 10^{5}{\$}\) which is more than \(1.12606 \times 10^{5}{\$}\) and emission is 4267.28 lb which is more than 4263.24 lb. It can be found that, the values of both fuel cost and emission by using DE-Niche are less than the values by using DE/rand/1 and DE/best/1. Likewise, it can be found that the conclusion is applicable to the results when \(F=0.5\) and \(Cr=0.3\).

Solutions by using jDE, JADE, SaDE, EPSDE, jDE-Niche, JADE-Niche, SaDE- Niche, and EPSDE-Niche on EED problem are shown in Table 18. In case of EED by using jDE, fuel cost is \(1.12612 \times 10^{5}{\$}\), and emission is 4263.96 lb. When by using jDE-Niche, fuel cost is \(1.12625 \times 10^{5}{\$}\) which is more than \(1.12612 \times 10^{5}{\$}\), but emission is 4262.36 lb which is less than 4263.96 lb. These two solutions do not dominate each other. Similarly, the solutions by using JADE and JADE-Niche do not dominate each other. At the same time, in case of EED by using SaDE, fuel cost is \(1.13763 \times 10^{5}{\$}\), and emission is 4291.64 lb. When by using SaDE-Niche, fuel cost is \(1.13543 \times 10^{5}{\$}\) which is less than \(1.13763 \times 10^{5}{\$}\), and emission is 4198.39 lb which is less than 4291.64 lb. It shows that, the values of both fuel cost and emission by using SaDE-Niche are less than the values by using SaDE. Similarly, fuel cost and emission by using EPSDE are both less than fuel cost and emission by using.

Table 18 Power output, cost, and emission of ten-unit system by jDE, JADE, SaDE, and EPSDE algorithms and their corresponding DE variants with new mutation strategies

From the comparison of solutions between jDE, JADE, SaDE, and EPSDE and their corresponding DE variants jDE-Niche, JADE-Niche, SaDE-Niche, and EPSDE-Niche, it shows that the solutions of DE whose mutation strategy is based on the clearing niche mechanism are better as a whole. Therefore, the new mutation strategies can improve the performance of most DE variants on solving EED.

6 Conclusion

In this work, the clearing niche mechanism has been incorporated into several mutation strategies, which are used on two basic DE algorithms, and four improved DE variants. Through evaluating and comparing the effectiveness of DE variants with new mutation strategies with basic DE algorithms, four improved DE variants, it can confirmed that new mutation strategies based on the clearing niche mechanism can improve the performance of most DE algorithms studied. In addition, through the discussion of adjusting the clearing radius threshold, the exploitation capability and the exploration capability are maintained.

In the future, the present work could be extended in multiple directions. First of all, an adaptive clearing radius can be used in the algorithms. Secondly, other strategies based on the clearing niche mechanism will guide the search of DE. Thirdly, more niche mechanisms will be introduced to study the mutation strategies. Lastly, the DE variants with new strategies will be used to solve real-world problems. For instance, it can be applied to petroleum reservoir identification, location problem, feature selection and rule extraction.