1 Introduction

Differential evolution (DE, see, Storn and Price 1995; Price et al. 2005; Chakraborty 2008) is a reliable and versatile function optimizer which displays a solid performance for diverse continuous optimization problems. DE is a very interesting population-based metaheuristic having mixed features. Due to its recombination and selection features DE can be seen as an evolutionary algorithm (EA). On the other hand, a DE structure tends to generate an individual with an above average performance which leads the exploration search, similar to swarm intelligence algorithms (SIA). In addition, DE, unlike EAs, generates offspring by perturbing the solutions with a scaled difference of two randomly selected population vectors, instead of recombining the solutions by means of a probabilistic criterion. Finally, DE employs a very peculiar survivor selection scheme, namely one to one spawning. This selection scheme allows replacement of an individual only if the offspring outperforms its corresponding parent.

Regardless of its classification, DE has proven to have a very good performance on various real-world problems. For example, in Joshi and Sanderson (1999) a DE application to the multisensor fusion problem is given. In Su and Lee (2003), and Chiou et al. (2004) DE applications to power electronics are presented. In Wang and Jang (2000) an application of DE to chemical engineering is proposed. In Storn (2005) a filter design is carried out by DE.

Reasons for success of the DE can be found in its simplicity and ease of implementation, while at the same time demonstrating reliability and high performance. In addition, the fact that only three parameters require tuning greatly contributes to the rapid diffusion of DE schemes among computer scientists and practitioners.

Although the DE undoubtedly has a great potential, setting of the control parameters is not a trivial task, since it has a heavy impact on the algorithmic performance. Thus, over the years, the DE community has intensively investigated the topic of parameter setting. Several studies have been reported, e.g., in Storn and Price (1997), Price and Storn (1997), Liu and Lampinen (2002), Rönkkönen et al. (2005), and Zielinski et al. (1857), and lead to contradictory conclusions. In other words, in accordance with the No Free Lunch Theorem, (Wolpert and Macready 1997), an efficient DE parameter setting is prone to problems, as the studies in Gämperle et al. (2002), Liu and Lampinen (2002), and Mallipeddi and Suganthan (2008) confirm.

To overcome the problem of the setting, some algorithms which employ adaptive and self-adaptive parameter settings have recently been proposed in literature. In Teo (2005, 2006), a variable population size is presented. The adaptive population size approach has been recently improved in two different implementations reported in Teng et al. (2009). Another scheme, which proposes a progressive population size reduction in DE, has been proposed in Brest and Maučec (2008). A variable population size DE, based on a fitness diversity adaptation is proposed in Tirronen and Neri (2009). An adaptive scheme for the DE scale factor is presented in Ali and Törn (2004). An automatic update of the scale factor has been proposed in Das et al. (2005). By following a similar line of thought, in Rönkkönen and Lampinen (2003), Omran et al. (2005), and Salman et al. (2007), a normal distribution is employed in order to perform a self-adaptation on the parameters F and CR. A Cauchy distribution in the self-adaptive scheme proposed in Soliman et al. (2007), Soliman and Bui (2008). An alternative kind of self-adaptation which employs the so called chaos mutation is proposed in Zhenyu et al. (2006). A controlled randomization of scale factor and crossover rate has been proposed in Brest et al. (2006).

In addition, since the DE algorithm suffers from many real world conditions, e.g., high dimensionality and noisy problems, some modifications on the standard DE scheme can significantly improve upon its performance. Modern DE-based algorithms can be divided into the two following categories:

  1. 1.

    DE integrating an extra component. This class includes those algorithms which use the DE as an evolutionary framework in which it is assisted by additional algorithmic components, e.g., local searchers or extra operators (see, Fan and Lampinen 2003; Liu and Lampinen 2005; Noman and Iba 2008; Tirronen et al. 2008; Caponio et al. 2009), and (Neri and Tirronen 2009). The algorithms belonging to this class can be clearly decomposed as a DE framework and additional components.

  2. 2.

    Modified structures of DE. This class includes those algorithms which make a substantial modification within the DE structure, in the search logic, the selection etc. Some examples are given in Qin et al. (2009) and Das et al. (2009).

A popular way to enhance the DE performance by structurally modifying the algorithmic functioning is through employment of structured populations. In other words, the population individuals are distributed over several sub-populations which evolve independently and interact by exchanging data and information details and contribute to a unique simultaneous evolution.

In Kwedlo and Bandurski (2006) a distributed DE scheme employing a ring topology (the cores are interconnected in a circle and the migrations occur following the ring) has been proposed for the training of a neural network. In Salomon et al. (2005), an example of DE parallelization is given for a medical imaging application. A few famous examples of distributed DE are presented in Zaharie (2002, 2003), Zaharie and Petcu (2003); in these papers the migration mechanism as well as the algorithmic parameters are adaptively coordinated according to criterion based on genotypical diversity. In paper, Zaharie (2004), a distributed DE for preserving diversity in the niches is proposed in order to solve multi-modal optimization problems. In Tasoulis et al. (2004), a distributed DE characterized by a ring topology and the migration of individuals with the best performance, to replace random individuals of the neighbor sub-population, has been proposed. An application of the algorithm in Tasoulis et al. (2004) for training of a neural network has been presented in Pavlidis et al. (2005). Following similar logic, paper (Kozlov and Samsonov 2006) proposes a distributed DE, where the computational cores are arranged according to a ring topology and, during migration, the best individual of a sub-population replaces the oldest member of the neighboring population. In De Falco et al. (2007a, b, c) a distributed DE has been designed for the image registration problem. In these papers, a computational core acts as a master by collecting the best individuals detected by the various sub-populations running in slave cores. The slave cores are connected in a grid and a migration is arranged among neighbor sub-populations. In Apolloni et al. (2008), a distributed DE which modifies the scheme proposed in Tasoulis et al. (2004) has been presented. In Apolloni et al. (2008), the migration is based on a probabilistic criterion depending on five parameters. It is worthwhile mentioning that some parallel implementations of sequential (without structured population) DE are also available in literature (see, Nipteni et al. 2006). An investigation of DE parallelization is given in Lampinen (1999).

This paper focuses on distributed differential evolutions and proposes a novel distributed algorithm. The proposed algorithm distributes its individuals within sub-populations arranged according to a ring topology. Each sub-population is characterized by its own scale factor. According to a simple probabilistic criterion, the migration of individual with the best performance and its associated scale factor occurs between neighbor sub-populations (following the ring topology). At each migration, the scale factor is also perturbed by means of a normal distribution. This paper is based on the idea that the scale factor is a determinant element within the DE search strategy. Thus, a successful search strategy can be inherited by the other sub-populations and propagated throughout the ring. A probabilistic perturbation enhances the exploration pressure of the algorithm.

The remaining of this paper is organized as follows. Section 2 describes the working principles of DE. Section 3 gives a short description of recently presented distributed versions of DE and introduces algorithms employed for comparison in the experimental section. Section 4 describes the proposed algorithm and discusses its algorithmic principle of functioning. Section 5 shows the experimental setup and numerical results of the present study. Section 6 gives the conclusions of this paper.

2 Sequential differential evolution

To clarify the notation used throughout this chapter we refer to the minimization problem of an objective function f(x), where x is a vector of n design variables in a decision space D.

According to its original definition given in Storn and Price (1995), the DE consists of the following steps. An initial sampling of S pop individuals is performed pseudo-randomly with a uniform distribution function within the decision space D. At each generation, for each individual x i of the S pop, three individuals x r , x s and x t are pseudo-randomly extracted from the population. According to the DE logic, a provisional offspring \(x^{\prime}_{\rm{off}}\) is generated by mutation as:

$$ x^{\prime}_{\rm{off}} = x_t + F(x_r - x_s) $$
(1)

where \(F\in \left[0,1+\right[ \) is a scale factor which controls the length of the exploration vector \((x_r - x_s)\) and, thus, determines how far from point x i the offspring should be generated. With \(F\in \left[0,1+\right[,\) it is meant here that the scale factor should be a positive value which cannot be much greater than 1 (see, Price et al. 2005). While there is no theoretical upper limit for F, effective values are rarely greater than 1.0. The mutation scheme shown in Eq. 1 is also known as DE/rand/1. Other variants of the mutation rule have been subsequently proposed in literature (see, Qin and Suganthan 2005):

  • DE/best/1: \({x^{\prime}_{\rm{off}}=x_{\rm{best}}+F\left({x_r-x_s}\right)}\)

  • DE/cur-to-best/1: \({x^{\prime}_{\rm{off}}=x_i+F\left({x_{\rm{best}}-x_i}\right)+F\left({x_s-x_t}\right)}\)

  • DE/best/2: \({x^{\prime}_{\rm{off}}=x_{\rm{best}}+F\left({x_s-x_t}\right)+F\left({x_u-x_v}\right)}\)

  • DE/rand/2: \(x^{\prime}_{\rm{off}}=x_t+F\left({x_r-x_s}\right)+F\left({x_u-x_v}\right)\)

  • DE/rand-to-best/2: \({x^{\prime}_{\rm{off}}}=x_t+F\left({x_{\rm{best}}-x_t}\right)+{F\left({x_r-x_s }\right)}+{F\left({x_u-x_v}\right)}\)

where x best is the solution with the best performance among the individuals of the population, x u and x v are two additional pseudo-randomly selected individuals. It is worthwhile to mention the rotation invariant mutation shown in Qin and Suganthan (2005):

  • DE/current-to-rand/1 \(x_{\rm{off}}=x_i+K\left(x_t - x_i\right)+F^{\prime}\left(x_r-x_s\right)\)

where K is the combination coefficient, which, as suggested in Price (1999), should be chosen with a uniform random distribution from [0, 1] and \(F^{\prime}=K \cdot F.\) For this special mutation, the mutated solution does not undergo the crossover operation described below.

Recently, in Price et al. (2005), a new mutation strategy has been defined. This strategy, namely DE/rand/1/either–or, consists of the following:

$$ x{^{\prime}_{\rm{off}}}=\left\{\begin{array}{lll} x_{t}+ F\left(x_{r}-x_{s}\right)&\hbox{if}& {\rm{rand}}\left({0,1}\right)< p_{F}\\ x_{t}+K\left(x_{r}+x_{s}-2x_{t}\right)&\hbox{otherwise}&\\ \end{array} \right. $$
(2)

where for a given value of F, the parameter K is set equal to 0.5 (F + 1).

When the provisional offspring has been generated by mutation, each gene of the individual \(x^{\prime}_{\rm{off}}\) is exchanged with the corresponding gene of x i with a uniform probability and the final offspring x off is generated:

$$ x_{{\rm{off}},j}=\left\{\begin{array}{lll} x_{i,j}&\hbox{if}&{\rm{\rm{rand}}}\left({0,1} \right) < CR \\ x_{{\rm{off}},j}^{\prime}&{\rm{otherwise}}&\\ \end{array} \right. $$
(3)

where rand(0, 1) is a random number between 0 and1; j is the index of the gene under examination. The crossover described in Eq. 3 is known as binary crossover (simply indicated with “bin”) and is the most common crossover scheme. It can be remarked that also the exponential crossover (see, Price et al. 2005), is used in some cases.

The resulting offspring x off is evaluated and, according to a one-to-one spawning strategy, it replaces x i if and only if \(f(x_{\rm{off}}) \leq f(x_i);\) otherwise no replacement occurs. It must be remarked that although the replacement indexes are saved one by one, during generation, actual replacements occur all at once at the end of the generation. For the sake of clarity, the pseudo-code highlighting working principles of the DE is shown in Fig. 1.

Fig. 1
figure 1

Pseudo-code of DE/rand/1/bin

3 Distributed differential evolution: recently developed algorithms

This section describes three distributed algorithms based on a DE structure recently proposed in literature. The algorithms described in this section are, according to our judgement, representative of the state-of-the-art structured DE algorithms and have been included in the benchmark for comparing the performance of the proposed approach. Although the notation can generate some confusion, i.e., all algorithms are distributed and can easily be parallelized, we decided to indicate these according to original terminology defined by their respective authors.

3.1 Parallel differential evolution

In Tasoulis et al. (2004), the problem of parallelization for DE schemes has been studied through an experimental analysis and an algorithm, namely parallel differential evolution (indicated here with PDE) has been proposed.

The original PDE implementation uses the parallel virtual machine (PVM), allowing multiple computers (called nodes) to be organized as a cluster and exchange arbitrary messages. The PDE algorithm is organized around one master node and m sub-populations running each on one node, and organized as a unidirectional ring, as illustrated in Fig. 2. It must be noted that although the logical topology is a ring which does not contain the master node, the actual topology is a star, where all communications (i.e., the migrations of individuals) are passing through the master.

Fig. 2
figure 2

Unidirectional ring topology in the parallel differential evolution algorithm

The S pop individuals constituting the populations are distributed over the m sub-populations composing the ring. Each sub-population is composed of \({\frac {S_{\rm{pop}}}{m}}\) individuals. Each sub-population runs a regular DE algorithm while the master node coordinates the migration of individuals between sub-populations. On each generation, the sub-population has a given probability ϕ to send a copy of its best individual to its next neighbor sub-population in the ring. When migration occurs, the migrating individual replaces a pseudo-randomly selected individual belonging to the target sub-population. Figure 3 describes the behavior of both the master node and the sub-populations in more detail.

The DE variant run by each sub-population is the same across all the sub-populations. In Tasoulis et al. (2004), six mutation strategies have been compared, namely DE/best/1, DE/rand/1, DE/cur-to-best/1, DE/best/2, DE/rand/2 described in Sect. 2, as well as the trigonometric operator described in Fan and Lampinen (2003). Each strategy is used with different values of the migration constant ϕ and compared over seven test functions the dimensions of which vary between 2 and 30. The results in Tasoulis et al. (2004) showed that DE/best/1 is the most efficient mutation strategy and quite stable across different values of ϕ for the low dimensional problems analyzed.

Fig. 3
figure 3

Pseudo-code of PDE. a At the master node, b At each sub-population

3.2 Island based distributed differential evolution

In Apolloni et al. (2008) a distributed DE algorithm, namely island based distributed differential evolution (IBDDE) has been proposed. The IBDDE algorithm is a modified version of PDE described in Subsect. 3.1. In IBDDE, the population, having size S pop, is structured in m sub-populations. Thus, each sub-population is composed of \({\frac {S_{\rm{pop}}}{m}}\) individuals. The migration policy is then defined as a five-tuple \({{\mathcal{M}}} = (\gamma, \rho, \phi_s, \phi_r, \tau).\) \(\gamma \in {{\mathbb{N}}}\) is the number of generations between two migrations, \(\rho \in {{\mathbb{N}}}\) is the number of individuals which migrate from a sub-population P during each migration, ϕ s is the selection function which, applied to a sub-population, returns the migrating individuals \(v^{\prime i}_g,\) ϕ r is the replacement function that selects individuals to be replaced by the immigrants in the receiving sub-population, and τ is the topological rule, which selects the target sub-population Q. The individuals to be migrated are pseudo-randomly (uniformly) chosen by the selection function ϕ s . Incoming individuals from other sub-populations replace pseudo-randomly chosen local individuals, only if the former are better, by the replacement function ϕ r .

Figure 4 describes the algorithm as pseudo-code.

Fig. 4
figure 4

Pseudo-code of IBDDE for the sub-population P

In Apolloni et al. (2008), the experiments have been run with a population size S pop equal to 20. The population is divided into two sub-populations of 10 individuals in one experiment, and into four sub-populations of 5 individuals in a second experiment. The migration parameters are set to \(\gamma=100, \rho=1,\) the functions ϕ s and ϕ r are defined to randomly select an individual, the topology τ is a unidirectional ring very similar to the logical topology used by PDE (see, Subsect. 3.1). The mutation strategy for DE is DE/rand/1, and the algorithm is tested on 25 different test functions in 30 and 50 dimensions, for a total of 50 test functions.

3.3 Distributed differential evolution

In De Falco et al. (2007a, b, c), in order to solve some image registration problems a distributed DE (indicated here with DDE) has been proposed. This algorithm differs from PDE and IBDDE by the topology it uses. Instead of a unidirectional ring, DDE uses a locally connected topology, where each node is connected to μ other nodes. Figure 5 represents such a topology, where the nodes are arranged in a mesh folded into a torus.

Fig. 5
figure 5

Torus topology in distributed differential evolution

In De Falco et al. (2007a, b, c), it has been proposed to set μ = 4, i.e., each node (such as the black disc in the Fig. 5) has exactly four nearest neighbors (represented by the four gray discs). In DDE, each node represents one processor running a DE algorithm with a DE/rand/1 mutation strategy on a sub-population. Every M I generations (the migration interval), each sub-population is allowed to exchange S I (the migration rate) individuals with its nearest neighbors. In the experimental setup, each node sends a copy of its best individual to its neighbors. Figure 6 describes the algorithm as pseudo-code.

DDE also makes use of a master node, the role of which is to collect the best solutions found in each sub-population and to present these results to the user.

Fig. 6
figure 6

Pseudo-code of the DDE algorithm at a sub-population

4 Distributed differential evolution with scale factor inheritance

The proposed algorithm enhances the PDE structure described in Subsect. 3.1 by means of the implementation, in a distributed logic, of the self-adaptive parameter control proposed in Brest et al. (2006). More specifically, an adaptive control of the scale factor “F” is proposed here. The novel mechanism proposed here in this article is named scale factor inheritance. The proposed algorithm, namely “F” adaptive control parallel differential evolution (FACPDE) consists of the following steps.

At the beginning of the optimization process, S pop individuals are pseudo-randomly sampled within the decision space D. These S pop are distributed over m sub-populations; each sub-population is composed of \({\frac {S_{\rm{pop}}}{m}}\) individuals. For each (generic hth) sub-population a scale factor F k, for \(k=1, \ldots, m,\) is assigned. Each scale factor is initially generated as pseudo-random by sampling a value from a uniform distribution between − 1 and 1. The sub-populations are then arranged according to ring topology, as with the PDE topology represented in Fig. 2.

At each generation, each sub-population performs a DE scheme. For each individual x i of the \({\frac {S_{\rm{pop}}}{m}},\) three individuals x r , x s and x t are pseudo-randomly extracted from the population. The provisional offspring \(x^{\prime}_{\rm{off}}\) is generated by mutation as:

$$ x^{\prime}_{\rm{off}}=x_t+F^k(x_r-x_s). $$
(4)

It is clear that a scale factor taking on a negative value means that the search direction is inverted.

When the provisional offspring has been generated by mutation, each gene of the individual \(x^{\prime}_{\rm{off}}\) is exchanged with the corresponding gene of x i with a uniform probability and the final offspring x off is generated:

$$ x_{{\rm {off}},j}=\left \{ {\begin{array}{lll} x_{i,j}&\hbox{if}&{\rm{rand}}\left({0,1}\right) < CR \\ x_{{\rm{off}},j}^{\prime}&{\rm{otherwise}}& \\ \end{array}} \right. $$
(5)

where, as for the sequential DE, rand(0, 1) is a random number between 0 and 1; j is the index of the gene under examination. The standard one-to-one spawning is then applied and, at the end of each generation, the scheduled replacements are performed.

For each sub-population, between two subsequent generations, a pseudo-random number rand(0, 1) is generated by means of a uniform distribution. Analogous to what was explained about the PDE in Subsect. 3.1, this pseudo-random number is then compared with a constant value ϕ, namely migration constant. If \(\hbox{rand}(0,1) < \phi,\) the individual with the best performance is selected to undergo migration. Thus, for the generic kth sub-population, the individual \(x_{\rm{best}}^k\) is duplicated and then replaces a pseudo-randomly selected individual of the neighbor (in the ring) sub-population. The scale factor inheritance mechanism occurs contextually with the migration. More specifically, when the migration occurs, performance of the individual \(x_{\rm{best}}^k\) is compared with that of the best individual belonging to the target sub-population, indicated here with \(x_{\rm{best}}^{k+1}.\) If the new immigrant has a better performance than the best individual of the target sub-population, i.e., if \(f\left(x_{\rm{best}}^k\right)< f\left(x_{\rm{best}}^{k+1}\right),\) the (k+1)th sub-population inherits the scale factor F k after a perturbation. More specifically, the scale factor F k+1 related to the (k+1)th sub-population is updated according to the following formula:

$$ F^{k+1}=F^{k}+\alpha {{\mathcal{N}}}\left(0,1\right) $$
(6)

where \({{\mathcal{N}}}\left(0,1\right)\) is a pseudo-random value sampled from a normal distribution characterized by a zero mean and variance equal to 1. The constant value α has the role of controlling the range of perturbation values \(\alpha {{\mathcal{N}}}\left(0,1\right).\) It must be observed that we did not impose any bounds for the variation of F. On the contrary, we decided to allow an unbounded variation of the control parameter and rely on the self-adaptation mechanism.

If the new immigrant does not outperform the best individual of the target sub-population, i.e., \(f\left(x_{\rm{best}}^k\right) \geq f\left(x_{\rm{best}}^{k+1}\right),\) the scale factor inheritance mechanism is not activated and, thus, only the migration of the individual \(x_{\rm{best}}^k\) occurs.

The described operations are repeated until the budget conditions are satisfied.

A graphical representation of FACPDE is given in Fig. 7.

Fig. 7
figure 7

Graphical representation of FACPDE

For the sake of clarity, the pseudo-code illustrating working principles of FACPDE is shown in Fig. 8.

Fig. 8
figure 8

Pseudo-code of the FACPDE algorithm

4.1 Scale factor inheritance: algorithmic philosophy

As shown in Sect. 2, DE is based on a very simple idea, i.e., a search by means of adding vectors and a one-to-one spawning for the survivor selection. Thus, DE is very simple to implement/code and contains a limited number of parameters to tune (only S pop, F, and CR).

From an algorithmic viewpoint, reasons for the success of DE have been highlighted in Feoktistov (2006): success of DE is due to an implicit self-adaptation contained within the algorithmic structure. More specifically, since, for each candidate solution, the search rule depends on other solutions belonging to the population (e.g., x t , x r , and x s ), the capability of detecting new promising offspring solutions depends on the current distribution of the solutions within the decision space. During early stages of the optimization process, solutions tend to be spread out within the decision space. For a given scale factor value, this implies that the mutation appears to generate new solutions by exploring the space by means of a large step size (if x r and x s are distant solutions, \(F\left(x_r-x_s\right)\) is a vector characterized by a large modulus). During the optimization process, the solutions of the population tend to concentrate on specific parts of the decision space. Therefore, step size in the mutation is progressively reduced and the search is performed in the neighborhood of the solutions. In other words, due to its structure, a DE scheme is highly explorative at the beginning of the evolution and subsequently becomes more exploitative during optimization.

Although this mechanism seems at first glance to be very efficient, it hides a limitation. If for some reason, the algorithm does not succeed in generating offspring solutions which outperform the corresponding parent, the search is repeated again with similar step size values and will likely fail by falling into an undesired stagnation condition (see, Lampinen and Zelinka 2000). Stagnation is the undesired effect which occurs when a population-based algorithm does not converge to a solution (even suboptimal) and the population diversity is still high. In the case of the DE, stagnation occurs when the algorithm does not manage to improve upon any solution of its population for a prolonged number of generations. In other words, the main drawback of the DE is that the scheme has, for each stage of the optimization process, a limited amount of exploratory moves. If these moves are not enough for generating new promising solutions the search can be heavily compromised.

Thus, in order to enhance the DE performance, alternative search moves should support the original scheme and promote a successful continuation of the optimization process. The use of multiple populations in distributed DE algorithms allows an observation of the decision space from various perspectives and, most importantly, decreases the risk of stagnation since each sub-population imposes a high exploitation pressure. In addition, the migration mechanism ensures that solutions with a high performance are included within the sub-populations during their evolution. This fact is equivalent to modifying the set of search moves. If the migration gives privilege to the best individuals, the new search moves promote the detection of new promising search directions and, thus, allow the DE search structure to be periodically “refurbished”. Thus, migration is supposed to mitigate the risk of DE (sub-)populations stagnating and to enhance the global algorithmic performance.

As a countermeasure against stagnation, several recent DE versions propose a randomization in the search logic which increases the amount of potential exploration moves. For example, in Brest et al. (2006), scale factor and crossover rates are periodically updated by generating new random values. This simple operation seems to have a very relevant effect on the algorithmic performance. Also the DE scheme proposed in Qin et al. (2009) enhances a previously proposed algorithm (presented in Qin and Suganthan 2005) by introducing a randomization of the control parameters. Thus, two operations are needed in order to obtain significant improvements in DE performance which seems to be the updating and randomization of control parameters.

In this paper we focus on the scale factor dynamics. Although DE schemes are characterized by the implicit self-adaptation described above, employment of a unique and constant scale factor value can be improper since the exploratory moves depend on distribution of the solution within the decision space. For example, for a highly multi-modal problem, a scale factor \(F \approx 1\) can generate very long moving vectors \(F\left(x_r-x_s\right)\) if the population is spread out within the decision space (see, Fig. 9a) and very short moving vectors if the population is concentrated in some areas of the decision space (see, Fig. 9b). This fact may lead to an excessively explorative behavior during some stages of the evolution and an excessively exploitative behavior during other stages. These two behaviors can cause, respectively, stagnation due to an incapability to detect a promising search direction and a premature convergence due to the excessive exploitation of a suboptimal basin of attraction. In other words, a proper choice of the scale factor depends not only on the optimization problem but also on the stage of the evolution.

Fig. 9
figure 9figure 9

Search mechanism in differential evolution performance trends in 500 dimensions. a Too explorative conditions, b Too exploitative conditions, c Ackley, d DropWave, e Michalewicz, f Rastrigin, g Rotated Michalewicz, h Rotated pathological, i Rotated rastrigin, j Schwfel, k Tirronen

On the basis of this consideration, the scale factor inheritance mechanism has been designed. The initial pseudo-random assignment of the scale factors (one per sub-population) allows each FACPDE sub-population to explore the decision space from complementary perspectives. Subsequently, each evolving sub-population can be seen as a separate searcher which cooperates and competes with the other searcher, analogous to memetic algorithms (see, Ong and Keane 2004; Tang et al. 2007a; Caponio et al. 2007), and (Neri et al. 2007). The sub-populations cooperate with each other by means of migration of the individual with the best performance which can be seen as a suggestion of a promising search direction. The sub-populations compete with each other by means of the scale factor migration; this fact can be seen like the imposition of the most successful search strategy to other sub-populations employing a weaker strategy. The ring topology assures propagation of this successful strategy to all sub-populations. On the other hand, this propagation is rather slow, so as to avoid too greedy an algorithmic behavior. In addition, randomization of the scale factor when the migration occurs guarantees a certain diversity of scale factors even in the event that a propagation throughout the entire ring occurs. Thus, we avoid that all the sub-population are characterized by the same scale factor.

In summary, the combined action of an exploration of the decision space from diverse and complementary perspectives, the cooperation mechanism with its suggestion of promising search directions, the competitive procedure of the most successful search strategies and their randomized updates should, together, compose a robust DE-based algorithm which efficiently balances its explorative and exploitative resources in order to efficiently and robustly solve complex optimization problems.

5 Experimental results

To prove the viability of the FACPDE and test its performance with respect to modern distributed DE-based algorithms, the following numerical experiments have been performed. The test problems listed in Table 1 have been considered in this study.

Table 1 Test problems

The rotated version of some of the test problems listed in Table 1 have been included into the benchmark set. These rotated problems have been generated through the multiplication of the vector of variables by a randomly generated orthogonal rotation matrix. In total, 24 test problems have been considered in this study with both n = 500 and n = 1,000. The FACPDE has been tested with these 24 test problems and its behavior and performance have been compared with those obtained by standard sequential DE/rand/1/bin (simply indicated here as DE), the improved DE algorithm employing random scale factor proposed in Das et al. (2005) and indicated as DERSF, PDE introduced in Tasoulis et al. (2004), IBDDE given in Apolloni et al. (2008) and DDE employed in De Falco et al. (2007a, b, c). It must be remarked that all the algorithms chosen for comparison are modern distributed DE algorithms recently proposed in literature, and which are representative of the state-of-the-art in the field. Each algorithm has been run for 500,000 fitness evaluations in the case of n = 500 and for 1,000,000 fitness evaluations when n = 1,000. As much as 50 independent runs have been performed for each algorithm involved in this article.

The algorithms considered in this study have been run with the following parameter setting.

  • DE has been run with F = 0.7 and CR = 0.3 in accordance with the suggestions given in Zielinski et al. (1857), Zielinski and Laur (2008). The population size has been set to \(S_{\rm{pop}} = 200\) for the 500-dimensional problems and to \(S_{\rm{pop}} = 400\) for the 1,000-dimensional ones.

  • DESRF has been run with CR = 0.3. Analogous to DE, the population size has been set to \(S_{\rm{pop}}=200\) for the 500-dimensional problems and to \(S_{\rm{pop}}=400\) for the 1,000-dimensional ones. In DERSF the scale factor is randomized, at each generation, according to the rule \(F=0.5\left(1+{\rm{rand}}\left(0,1\right)\right).\)

  • PDE has been run with populations of 200 or 400 individuals divided into 5 sub-populations of 40 or 80 individuals each, for the 500 and 1,000-dimensional problems, respectively. Despite (Tasoulis et al. 2004) showing better performance for the DE/best/1 mutation strategy in 30 and 50 dimensions, it has proven excessively exploitative and has led to premature convergence of the solutions when used on higher dimension problems. In order to perform a fair comparison, we have carried out an analysis on mutation strategies, leading to the choice of DE/rand/1 and setting the migration constant to ϕ = 0.2. These settings proved to be the best choices in terms of algorithmic performance and have, thus, been chosen for the experiments described below.

  • Similar to PDE, IBDDE has been run with populations of 200 or 400 individuals divided into 5 sub-populations of 40 or 80 individuals each, depending on the dimensionality of the test problems. The other parameters have been chosen according to the values in Apolloni et al. (2008): the sub-populations exchange one individual (ρ = 1) every 100 generations (γ = 100). ϕ s and ϕ r have been defined so as to pseudo-randomly select an individual by means of a uniform distribution, and τ has been set to a unidirectional ring.

  • For the 500-dimensional problems, the DDE has been run with a population of 200 individuals divided into 16 sub-populations of alternatively 12 or 13 individuals. In the case of the 1,000-dimensional problems, the population has been set to 400 individuals divided into 16 sub-populations of 25 individuals. Following the suggestions in De Falco et al. (2007b) the sub-populations have been organized into a 4 × 4 grid folded into a torus (μ = 4). Migration occurred in each sub-population with only its best individual \((S_I = 1)\) every \(M_I = 5\) generations.

  • Similar to PDE and IBDDE, FACPDE has been run with populations of 200 and 400 individuals divided into 5 sub-populations of 40 and 80 individuals each for the 500 and 1,000-dimensional cases, respectively. The migration constant ϕ has been set equal to 0.2 and the constant α in formula (6) has been set equal to 0.1.

It is worthwhile commenting on choice of the population sizes \(S_{\rm{pop}}=200\) and \(S_{\rm{pop}}=400.\) Although in Storn and Price (1997) it is suggested that the DE population size be set equal to about 10 times the dimensionality of the problem, this indication is not confirmed by a recent study in Neri and Tirronen (2008), where it is shown that a population size lower than the dimensionality of the problem can be optimal in many cases.

Table 2 shows the average of the final results detected by each algorithm  ±  the standard deviations, with the 500 dimension case. Table 3 shows the results for the 1,000-dimension case. The best results are highlighted in boldface.

Table 2 Average final fitness values ± standard deviations for 500-dimension problems
Table 3 Average final fitness values ± standard deviations for 1,000-dimension problems

Results in Tables 2 and 3 show that the sequential DE algorithms are outperformed by the distributed algorithms. This result confirms that a structured population can enhance performance of the DE (see, Alba and Tomassini 2002). In addition, FACPDE seems to have a very good performance with the test problems considered in this study since it detects those solutions with best performance for 23 and 21 test problems out of the 24 considered in 500 and 1,000 dimensions, respectively. It must be remarked that in the cases, where FACPDE does not detect the solutions with best performance, the algorithm still detects, in any case, competitive solutions; these solutions usually have, on average, the second best performance. In this sense, FACPDE seems to be a high-quality algorithm for various test problems.

To prove statistical significance of the results, the Wilcoxon Rank-Sum test has also been applied according to the description given in Wilcoxon (1945), where the confidence level has been fixed to 0.95. Tables 4 and 5 show results of the test. A “+” indicates the case in which FACPDE statistically outperforms, for the corresponding test problem, the algorithm mentioned in that column; a “=” indicates that a pairwise comparison leads to success of the Wilcoxon Rank-Sum test, i.e., the two algorithms have the same performance; a “−” indicates that FACPDE is outperformed.

In the case of 500-dimensional problems, the Wilcoxon test results in Table 4 shows that FACPDE, out of the 120 pairwise comparisons performed, loses out only in one case, obtains the same results in four cases, and wins in 91 cases. Thus, FACPDE comes out behind in only 0.83% of the comparisons and comes out ahead in 95.0% of the considered comparisons. In 1,000 dimensions, the Wilcoxon test results displayed in Table 5 show that FACPDE, out of the 120 pairwise comparisons performed, loses in one case, obtains the same results in seven cases, and wins in 112 cases. Thus, FACPDE loses only 0.83% of the comparisons and is shown to be superior in 93.3% of the considered comparisons. The statistical test carried out confirms that FACPDE has a very good performance in regard to the studied test problems. In addition, the comparison between PDE and FACPDE shows that the proposed scale factor inheritance mechanism seems to be very beneficial.

Table 4 Results of the Wilcoxon rank-sum test for 500-dimension problems
Table 5 Results of the Wilcoxon rank-sum test for 1,000-dimension problems

To strengthen the statistical significance of the results, the Holm procedure (Holm 1979) has been applied by following the description in Garcfa et al. (2008). The Holm procedure consists of the following. Considering the results in Tables 2 and 3, the six algorithms under analysis have been ranked on the basis of their average performance calculated over the 24 test problems. Thus, or in other words, a score R i for \(i = 1,\ldots,N_A\) (where N A is the number of algorithms under analysis, N A =  6 in our case) has been assigned. With the calculated R i values, the FACPDE has been taken as a reference algorithm. Indicating with R 0 the rank of FACPDE, and with R j for \(j = 1,\ldots,N_A-1\) the rank of one of the remaining five algorithms, the values z j have been calculated as

$$ z_j = {\frac{R_j - R_0}{\sqrt{\frac{N_A(N_A+1)}{6N_{\text{TP}}}}}} $$

where N TP is the number of test problems in consideration (N TP =  24 in our case). By means of the z j values, the corresponding cumulative normal distribution values p j have been calculated. These p j values are then compared with the corresponding \(\delta / (N_A - j)\) where δ is the level of confidence, set to 0.05 in our case. Tables 6 and 7 display z j values, p j values, and corresponding \(\delta/(N_A-j).\) The values of z j and p j are expressed in terms of \(z_{N_A-j}\) and \(p_{N_A-j}\) for \(j=1,\ldots, N_A - 1.\) Moreover, it is indicated whether the null-hypothesis (that the two algorithms have indistinguishable performances) is “Rejected” i.e., the FACPDE statistically outperforms the algorithm under consideration, or “Accepted” if the distribution of values can be considered the same (there is no outperformance).

Table 6 Results of the Holm procedure for 500-dimensional problems (FACPDE is the control algorithm)
Table 7 Results of the Holm procedure for 1,000-dimensional problems (FACPDE is the control algorithm)

The Holm procedure confirms that the FACPDE displays a significantly better performance with respect to the other algorithms in this study for both 500- and 1,000-dimensional cases.

To carry out a numerical comparison of the convergence speed performance, for each test problem, the average final fitness value returned by the best performing algorithm G has been considered. Subsequently, the average fitness value at the beginning of the optimization process J has also been computed. The threshold value \({\rm {THR}}=J-0.95(J-G)\) has then been calculated. The value THR represents 95% of the decay in the fitness value of the algorithm with the best performance. If an algorithm succeeds during a certain run to reach the value THR, the run is said to be successful. For each test problem, the average amount of fitness evaluations \(\bar{ne}\) required, for each algorithm, to reach THR has been computed. Subsequently, the Q-test (Q stands for Quality) described in Feoktistov (2006) has been applied. For each test problem and each algorithm, the Q measure is computed as:

$$ Q={\frac{\bar{ne}}{\rm {Rob}}} $$
(7)

where the robustness Rob is the percentage of successful runs. It is clear that, for each test problem, the smallest value equals the best performance in terms of convergence speed. The value “∞” means that Rob = 0, i.e., the algorithm never reached the THR.

Tables 8 and 9 show the Q values for 500-dimensional problems and 1,000-dimensional problems respectively, as well as the associated THR values. The best results are highlighted in boldface.

Table 8 Results of the Q-test for 500 dimensions problems
Table 9 Results of the Q-test for 1,000-dimension problems

The Q-test results, listed in Tables 8 and 9, show that in 500 and 1,000-dimensional cases, the proposed FACPDE algorithm has the best performance in terms of convergence speed in 22 and 21 cases out of the 24 considered, respectively. Most importantly, the FACPDE algorithm, throughout all considered test problems, is never characterized by an ∞ value of Q-measure. This fact shows that the proposed algorithm is always competitive with the other algorithms in the benchmark and is never relevantly outperformed. In summary, the algorithmic behavior of FACPDE, thanks to its scale factor inheritance mechanism, is extremely promising in terms of algorithmic robustness.

Figure 9 shows average performance trends of the five considered algorithms over a selection of the test problems listed in Table 1 in 500 dimensions.

Figure 9c–k show that in 500 dimensions, the serial DE algorithms improve only marginally. IBDDE has a slightly better performance than DE and DERSF (see, for example, Fig. 9i), but is still not competitive compared to PDE, DDE, and FACPDE for the high dimensional problems considered in this study. In Fig. 9d–k, DDE improves its solutions very quickly in the beginning, before ceasing to make any significant improvement. The transition occurs around 100,000 fitness evaluations in all cases except in Fig. 9h, where it occurs around 200,000 fitness evaluations. PDE’s improvement rate is slower than DDE’s, but contrary to the latter, it continuously improves its solutions and in all but one of the examples listed above, finds a better solution than DDE; Figure 9c is the exception, but PDE’s solution is still very close to DDE’s. Finally, FACPDE’s improvement rate at the onset is steeper than DDE’s in Fig. 9c, f, and j, similar in 9i and k, and softer in Fig. 9d, e, g, and h. In all cases, however, FACPDE does not show symptoms of premature convergence as DDE does. Moreover, FACPDE’s final solutions are, in all above examples except 9h, better than PDE’s.

Figure 10 shows average performance trends of the five considered algorithms over a selection of the test problems listed in Table 1 in 1,000 dimensions.

Fig. 10
figure 10figure 10

Performance trends in 1,000 dimensions. a Ackley, b Alpine, c DropWave, d Griewangk, e Michalewicz, f Pathological, g Rastrigin, h Rotated Ackley, i Rotated Michalewicz

Qualitative results represented in Fig. 10 confirm the findings in 500 dimensions. For large scale problems, the sequential DE seems to suffer from the curse of dimensionality. The IBDDE algorithm succeeds at improving upon the performance of sequential DE and DERSF. The other algorithms demonstrate a better behavior for the high dimensional problems. The DDE has good convergence speed performance during early stages of the evolution since it often manages to detect high quality solutions during the initial generations. However, the DDE tends to stop improving upon previous achievements quite quickly, see, e.g., Fig. 10c and i. It is interesting to observe that in 1,000 dimensions PDE is slower at detecting high-quality solutions with respect to the 500-dimensional case e.g., compare the behavior with Michalewicz function in Figs. 9e and 10e. Thus, more often than in the 500-dimensional case, the DDE outperforms PDE in high dimensions. The FACPDE algorithm, on the contrary, seems to be very efficient and outperforms, in most cases, both PDE and DDE. In addition, the gap between FACPDE and the second best algorithm performance seems to be systematically larger for the 1,000-dimensional case with respect to the results in 500 dimensions, see e.g., Fig. 10a, c, e, and h. In other words, the scale factor inheritance mechanism seems to be very beneficial in order to tackle large scale problems.

To better understand the working principle of the scale factor inheritance, Fig. 11 has been included. Figure 11 shows an example, based on a single run, of the trend that the absolute value of F takes during the FACPDE evolution for the five distributed sub-populations.

Fig. 11
figure 11

Example of the scale factor behavior over the five sub-populations

It can be noticed from Fig. 11 how the scale factor inheritance mechanism works. For example, at around 150,000 a sudden increase in the scale factor value occurs throughout all the sub-populations. This phenomenon is clearly visible in sub-populations 1, 2, and 3 and proves how promising scale factor values are propagated within the ring of sub-populations. The most important finding in Fig. 11 is related to the variation of the scale factor values. It can be observed that although the variation of F is, in principle, unbounded, there is no divergence in the trends. On the contrary, the scale factors never take a value greater than 2. In addition, the scale factor trends do not converge to a constant value. On the contrary, the trends continue to oscillate throughout the entire evolution. This fact can be seen as confirmation that there is no optimal scale factor for a given problem, but that a dynamic mechanism is required in order to satisfy the necessities of the evolution. Finally, since the evolution is biased by randomization, according to our interpretation, the scale factor control should also contain some randomization, as empirically shown in many papers on DE (e.g., Das et al. 2005, 2009; Brest et al. 2006; Qin et al. 2009).

5.1 Experimental results for a large scale optimization benchmark

The proposed FACPDE has been finally tested on the large scale benchmark settled for the 2008 IEEE Congress on Evolutionary Computation (IEEE CEC08) described in Tang et al. (2007b). Following the suggestions given in Tang et al. (2007b), each function has been considered in 100, 500, and 1,000 dimensions. For each problem, FACPDE has been has been run 25 times (25 independent runs) and the computational budget has been fixed equal to 5, 000 × n. Regarding the parameter setting, the FACPDE has been in 500 and 1,000 dimensions with the same setting mentioned above. Following the same rate \({\frac{n}{S_{\rm{pop}}}},\) in 100 dimensions, the population size \(S_{\rm{pop}}\) has been set equal to 40. The final results, expressed in terms of fitness difference between the detected value and the actual minimum ( ± related standard deviation) are given for the seven test problems contained in Tang et al. (2007b) for the three levels of dimensionality in Table 10.

Table 10 Experimental Results for the IEEE CEC08 benchmark

Although the results displayed in Table 10 are outperformed in several cases by the algorithms which took part in the CEC08 competition, FACPDE is still competitive in most cases. This fact is, in our opinion, very relevant since FACPDE, unlike most of those algorithms, does not employ local search components and did not undergo a parameter setting in order to have a high-quality performance for these specific test problems.

6 Conclusion

This paper proposes an adaptive mechanism for the scale factor in distributed differential evolution schemes. The proposed mechanism, namely, scale factor inheritance, consists of the perturbation, by means of a random number and migration, of promising scale factor values throughout sub-populations arranged according to a ring topology. This mechanism is integrated within a distributed differential evolution which employs migration of those individuals demonstrating the best performance.

The resulting algorithm has been tested on a broad and various set of optimization problems and then compared with two sequential differential evolution algorithms and three distributed differential evolution schemes, recently proposed in literature. Numerical results show that the proposed approach is very promising and that the resulting algorithm displays excellent performance in terms of detected final solutions, convergence speed, and robustness for all test problems and comparisons considered in this study.

On the basis of the obtained results and an analysis of the algorithmic working principles, some additional conclusions have been drawn. In distributed differential evolution a cooperative/competitive adaptation of the scale factor is beneficial to algorithmic performance and, more generally, the employment of multiple scale factors, updating of which can greatly improve performance of the distributed algorithm. In addition, a constant scale factor value is inadequate since it restricts the amount of search moves in differential evolution (both sequential and distributed). In other words, for a given problem there is no optimal scale factor since the optimal setting varies during the various stages of evolution. An optimal setting dynamically varies with distribution of the solutions within the decision space during evolution. Although an understanding of a proper control dynamic of the scale factor variation is not yet complete, according to our interpretation, it does not follow a progressive increase or decrease, but takes on an oscillatory behavior. Finally, since the evolution is affected by random events, mainly in the selection of individuals composing the moving vectors within the mutation operation, a certain randomization of the scale factor seems to be beneficial for a successful enhancement of the differential evolution performance.