1 Introduction

Compact Evolutionary Algorithms (cEAs) have been developed in order to address optimization problems characterized by limited memory resources. This situation is typical for robotics and control problems where a full power computing device may be unavailable due to cost and/or space limitations. For example, in industrial applications, in order to guarantee a quick reaction of the actuators, the optimization algorithm should run directly on a control card instead of a personal computer.

A cEA is an Evolutionary Algorithm (EA) belonging to the class of Estimation of Distribution Algorithms (EDAs), see [1]. Algorithms belonging to this class do not store and process an entire population and all its individuals therein but make use of a statistic representation of the population in order to perform the optimization process. In this way, a much smaller number of parameters must be stored in the memory. Thus, a run of these algorithms requires much less capacious memory devices compared to their correspondent standard EAs.

The first cEA was the compact Genetic Algorithm (cGA) introduced in [2]. The cGA simulates the behavior of a standard binary encoded Genetic Algorithm (GA). In [2] can be seen that cGA has a performance (for low dimensional problems) almost as good as that of GA. As expected the main advantage of a cGA with respect to a standard GA is the memory saving. An analysis of the convergence properties of cGA by using Markov chains is given in [3]. In [4] (see also [5]) the extended compact Genetic Algorithm (ecGA) has been proposed. The ecGA is based on the idea that the choice of a good probability distribution is equivalent to linkage learning. The measure of a good distribution is based on Minimum Description Length (MDL) models: simpler distributions are better than the complex ones. The probability distribution used in ecGA is a class of probability models known as Marginal Product Models (MPMs). A theoretical analysis of the ecGA behavior is presented in [6]. A hybrid version of ecGA integrating the Nelder-Mead algorithm is proposed in [7]. A study on the scalability of ecGA is given in [8]. The cGA and its variants have been intensively used in hardware implementation, see [911]. A cGA application to neural network training is given in [12].

Paper [13] analyzes analogies and differences between cGAs and (1 + 1)-ES and extends a mathematical model of ES [14] to cGA obtaining useful information on the performance. Moreover, [13] introduces the concept of elitism, and proposes two new variants, with strong and weak elitism respectively, that significantly outperform both the original cGA and (1 + 1)-ES. A real-encoded cGA (rcGA) has been introduced in [15]. Some examples of rcGA applications to control engineering are given in [16] and [17]. A simple real-encoded version of ecGA has been proposed in [18] and [19].

The main drawback of these algorithms is that, since cEAs simulate the population by means of a statistical representation, they are subject to premature convergence, especially when the dimensionality grows. As experimentally shown in [20], a rcGA suffers from premature convergence already for dimensionality values higher than ten. More specifically, the interdependencies among the decision variables (non-separability of the landscape, epistasis) cause the premature convergence of compact algorithms since a rcGA generates a candidate solution by independently sampling a parameter over each dimension. In highly multivariate problems, there is a high number of interactions among variables and thus rcGA can likely fail at detecting solutions with a high performance. In order to improve the performance of cEAs for complex problems, a compact algorithm employing the Differential Evolution (DE) search logic has been proposed in [20]. This algorithm, namely compact Differential Evolution (cDE) exploits the features of DE schemes to generate new candidate solutions and makes use of the knowledge of DE structures to obtain a high performance despite the limited memory usage. In other words, two algorithmic features make cDE superior to the algorithms belonging to the cGA family for a broad set of problems. First, unlike other compact evolutionary algorithms, in cDE, the survivor selection scheme of DE can be straightforwardly encoded. The one-to-one spawning typical of DE is equivalent to a tournament selection having size two. Since compact algorithms require the employment of the latter kind of selection scheme, the compact encoding of a GA heavily jeopardizes the original algorithmic nature. On the contrary, the compact encoding of a DE-based algorithm does not jeopardize the algorithmic logic of the original population-based DE algorithm, see [20]. Second, due to its nature cDE uses an implicit randomization of the offspring generation which corrects and improves the DE search logic. As shown in [21], DE is characterized by a limited amount of search moves and the employment of a proper degree of randomization can significantly improve upon the algorithmic performance.

An improvement of the cDE scheme has been obtained by coupling it, in a memetic fashion, with a local search algorithm, see [22]. The resulting algorithm, namely Memetic compact Differential Evolution (McDE) has been applied for the control of a Cartesian robot manipulator. Although the employment of the cDE structure mitigates the risk of premature convergence, since compact algorithms require mush less resources in terms of memory and hardware with respect to modern population-based algorithms, see e.g. [23], compact algorithms are likely to be outperformed (especially for problems characterized by more than ten dimensions) in terms of final solutions by population-based algorithms. This fact leads to the obvious conclusion that when memory requirements forbid the employment of population-based algorithms, compact algorithms can be a cheap and efficient solution. On the other hand, when there is no hardware limitation, population-based algorithms are preferable.

This paper proposes a compromise solution between the necessity of having a respectable performance and the memory limitations. In other words, this paper proposes an unexplored algorithmic concept in evolutionary intelligence, i.e. the employment of compact optimization algorithms as units of a distributed system. The proposed algorithm is named Composed compact Differential Evolution (CcDE). This algorithm employs multiple cDE cores arranged by means of a ring topology. In the context of networks, the term ring topology is classically used to define a structure of nodes where each node has two neighbours and the entire set of nodes forms a single continuous pathway for signals through each node. In the context of distributed optimization algorithms the ring topology is widely used as a structure for computational units in order to build up the communication in the search, see e.g. [24] and [25]. In the proposed CcDE, each core communicates with the others by exchanging pieces of information, in an adaptive fashion, about the most appropriate parameter setting during the various stages of the evolution. In other words, a migration of the elite individual, as well as the scale factor inheritance mechanism (see [26]) have been implemented. In addition, a perturbation mechanism on the virtual population attempts to inhibit the premature convergence and thus promotes the exploration of unexplored areas of the decision space. The proposed CcDE can be seen as a small population-based algorithm where each individual is a cDE with its elite and virtual population. According to the memory usage this kind of algorithms should be comparable to light population based algorithms which work with a small population size. Thus, CcDE attempts to be still useful for applications with memory limitations but employs multiple cDE cores for exploring complex decision spaces without a premature diversity loss, consequently displaying a good performance on a broad and diverse set of test problems.

The remainder of this article is organized in the following way. Section 2 describes the algorithmic components characterizing the proposed algorithm. Section 3 shows the numerical results and highlights the performance of the CcDE with respect to compact and population-based algorithms. Section 4 gives the conclusion of this work.

2 Composed compact differential evolution: algorithmic description

In order to clarify the notation used throughout this article we refer to the minimization problem of an objective function f(x), where x is a vector of n design variables in a decision space D. Without loss of generality, let us assume that design variables are normalized so that each search interval is \(\left[-1,1\right]\).

At the beginning of the optimization process, \(N_c \left(2\times n\right)\)-matrices are generated. Parameter N c stands for number of compact units. Each matrix PV m (\(m=1,2,\,\ldots\, ,N_c\)) is called Probability Vector, see [15]. More specifically, PV m is a n × 2 matrix:

$$ PV_m^t=\left[\mu^t, \sigma^t\right] $$
(1)

where μ and σ are, respectively, vectors containing, for each design variable, mean and standard deviation values of a Gaussian Probability Distribution Function (PDF) truncated within the interval \(\left[-1,1\right]\). The height of the PDF has been normalized in order to keep its area equal to 1. The apex t indicates the generation (number of performed comparison).

In the initialization phase, for each design variable \(i,\,\mu^1\left[i\right]=0\) and σ1[i] = λ where λ is a large positive constant (e.g. λ = 10). This initialization of σ values is done in order to simulate a uniform distribution. Then, one individual is sampled from each PV m . These individuals are indicated as elite x m e and are the elite individual related to the mth probability vector PV m . In other words, N c compact units for cDE, see [20], are generated. For each unit m, a scale factor F m is uniformly sampled from the interval \(\left[0,2\right]\).

For simplicity of notation, let us refer to the generic mth compact unit as PV and x e , since the same operations are repeated within each unit. The search mechanism, within each unit and at each generation, is organized in the following way. Three individuals \(x_r,\,x_s\) and x t are pseudo-randomly sampled from the PV. More specifically, the sampling mechanism of a design variable \(x_r\left[i\right]\) associated to a generic candidate solution x r from PV consists of the following steps. As mentioned above, for each design variable indexed by i, a truncated Gaussian PDF characterized by a mean value \(\mu\left[i\right]\) and a standard deviation \(\sigma\left[i\right]\) is associated. The formula of the PDF is:

$$ PDF\left(\mu\left[i\right], \sigma\left[i\right] \right)= \frac{e^\frac{-(x-\mu\left[i\right] )^2}{2 \sigma\left[i\right] ^2} \sqrt{\frac{2}{\pi}}}{\sigma\left[i\right] \left(\hbox {erf}\left(\frac{\mu\left[i\right] +1}{\sqrt{2} \sigma\left[i\right] }\right)-\hbox{erf}\left(\frac{\mu\left[i\right] -1}{\sqrt{2} \sigma\left[i\right] }\right)\right)} $$
(2)

where erf is the error function, see [27].

From the PDF, the corresponding Cumulative Distribution Function (CDF) is constructed by means of Chebyshev polynomials according to the procedure described in [28]. It must be observed that the codomain of CDF is \(\left[0,1\right]\). In order to sample the design variable x r [i] from PV, a random number rand(0,1) is sampled from a uniform distribution. The inverse function of CDF, in correspondence of rand(0,1), is then calculated. This latter value is x r [i]. A detailed description of the sampling mechanism is given in [20].

A provisional offspring \(x^{\prime}_{off}\) is then generated by mutation, according to a DE logic, as:

$$ x^{\prime}_{off} = x_t + F(x_r - x_s) $$
(3)

where \(F\in \left[0,2\right[\) is a scale factor which controls the length of the exploration vector \((x_r - x_s)\) and thus determines how far from point x t the offspring should be generated. This scale factor is the value Fm (corresponding to the mth unit) mentioned above. The mutation scheme shown in formula (3) is also known as DE/rand/1. It is important to remark that other variants of the mutation rule have been proposed in literature for standard DE, see [21].

When the provisional offspring has been generated by mutation, the exponential crossover is applied. A design variable of the provisional offspring \(x^{\prime}_{off}(j)\) is randomly selected and copied into the jth design variable of the elite solution x e (its copy). This guarantees that parent and offspring have different genotypes. Subsequently, a set of random numbers between 0 and 1 are generated. As long as \(rand\left(0,1\right)\leq Cr\), where the crossover rate Cr is a predetermined parameter, the design variables from the provisional offspring (mutant) are copied into the corresponding positions of the parent x i . The first time that \(rand\left(0,1\right)> Cr\) the copy process is interrupted. Thus, all the remaining design variables of the offspring are copied from the parent. For the sake of clarity the pseudo-code of the exponential crossover is shown in Fig. 1

Fig. 1
figure 1

Exponential crossover pseudo-code

In other words, this crossover operator generates offspring composed of the elite x e and contains, within it, a section of the chromosome of the mutant vector \(x^{\prime}_{off}\).

When the offspring is generated, its fitness value is computed and compared with that of the elite individual. The comparison allows the definition of winner and loser solutions. The winner solution biases the virtual population by affecting the PV values. The update rule for μ values is given by:

$$ \mu^{t+1}=\mu^t + \frac{1}{N_p} \left(winner-loser\right), $$
(4)

where N p is the virtual population size. The update rule for σ values is given by:

$$ \left(\sigma^{t+1}\right)^2=\left(\sigma^{t}\right)^2+\left(\mu^{t}\right)^2-\left(\mu^{t+1}\right)^2+\frac{1} {N_p}\left(winner^2-loser^2\right). $$
(5)

Details for constructing formulas (4) and (5) are given in [15].

The formulas displayed in Eqs. (4) and (5) rule the convergence of the virtual population. More specifically, the mean value of the PDF representing the population is moved towards the winner solution while the standard deviation tends to progressively narrow around the most promising solution, thus resulting in a σ value tending toward zero. The latter condition is here indicated as convergence, see [20].

In this paper extra rules for modifying the PV values are employed. More specifically, with a probability M p the PV is perturbed. The mean value μ is perturbed according to the following formula:

$$ \mu^{t+1}=\mu^{t+1} + 2 \tau \cdot rand\left(0,1\right) - \tau $$
(6)

where τ is a weight representing the maximum amplitude of perturbation. Similar to typical DE schemes, a toroidal mechanism (see [29]) ensures that μ is bounded by 0 and 1 (for example 1 + 0.1 = 0.1). The perturbation rule for the sigma is given by:

$$ \left(\sigma^{t+1}\right)^2=\left(\sigma^{t+1}\right)^2\, + \,\tau \cdot rand\left(0,1\right). $$
(7)

It is worthwhile commenting the perturbation mechanism reported in formulas (6) and (7). The proposed compact DE scheme employs a fairly exploitative structure due to its nature and the employment of the exponential crossover. Thus, each compact unit is likely to prematurely converge towards suboptimal solution. In order to mitigate this effect the perturbation mechanism then inhibits the algorithmic convergence and forces the algorithm to search elsewhere in the decision space, possibly detecting new promising solutions. It can be observed that the perturbation is equivalent to the replacement of part of the population in a population-based DE.

While the N c compact units evolve independently, two mechanisms promote the communication amongst the units. In order to understand both these mechanisms let us consider the compact units to be arranged according to a ring topology. In other words, each mth unit has two neighbour units: unit (m − 1)th and unit (m + 1)th. For the sake of clarity, we remark that the neighbours of the N th c unit are (N c  − 1)th and 1st units, respectively. The first mechanism is the unidirectional migration (see [24]) of the elite individual. More specifically, at each step (comparison between offspring and elite) of each compact unit, with a probability M e , the elite solution x m e is duplicated and replaces the solution in x m+1 e if the sender outperforms the receiver. In other words, x m e is overwritten in x m+1 e if \(f\left(x_e^m\right)\,<\,f\left(x_e^{m+1}\right)\) (minimization problem).

The second mechanism is the scale factor inheritance proposed in [26]. The scale factor inheritance mechanism occurs contextually with the migration. More specifically, when the migration occurs the \(\left(m+1\right){\rm th}\) unit inherits the scale factor F m after a perturbation. More specifically, the scale factor F m+1 related to the \(\left(m+1\right){\rm th}\) unit is updated according to the following formula:

$$ F^{m+1}=F^{m}+\alpha {\cal N}\left(0,1\right) $$
(8)

where \({\cal N}\left(0,1\right)\) is a pseudo-random value sampled from a normal distribution characterized by a zero mean and variance equal to 1. The constant value α has the role of controlling the range of perturbation values \(\alpha {\cal N}\left(0,1\right)\). It must be observed that we did not impose any bounds for the variation of F. On the contrary, we decided to allow an unbounded variation of the control parameter and rely on the self-adaptation mechanism.

For the sake of clarity the pseudo-code of the elite migration and scale factor inheritance at the generic mth unit is given in Fig. 2.

Fig. 2
figure 2

Pseudo-code of elite migration and scale factor inheritance at the mth unit

These steps are repeated until a budget condition is satisfied.

A graphical representation of the proposed CcDE is given in Fig. 3. Each compact unit is schematically represented as truncated Gaussian distribution function. The arrows indicate elite migration and scale factor inheritance mechanism. For the sake of clarity, the pseudo-code summarizing the working principles of CcDE is shown in Fig. 4.

Fig. 3
figure 3

Graphical representation of CcDE

Fig. 4
figure 4

CcDE pseudo-code

2.1 Algorithmic philosophy

This section summarizes the considerations regarding the functioning of the CcDE. As highlighted above, CcDE is composed of multiple compact units (cDEs) interconnected by means of the mechanisms of elite migration and scale factor inheritance. In order to understand the working principle of the proposed algorithm it is necessary to make some considerations about each cDE unit. As highlighted in [30], the success of standard population-based DE is due to an implicit self-adaptation contained within the algorithmic structure. Since, for each candidate solution, the search rule depends on other solutions belonging to the population (e.g. \(x_t,\,x_r\), and x s ), the capability of detecting new promising offspring solutions depends on the current distribution of the solutions within the decision space. During early stages of the optimization process, solutions tend to be spread out within the decision space. For a given scale factor value, this implies that the mutation appears to generate new solutions by exploring the space by means of a large step size (if x r and x s are distant solutions, \(F^m\left(x_r-x_s\right)\) is a vector characterized by a large modulus). During the optimization process, the solutions of the population tend to concentrate on specific parts of the decision space. Therefore, the step size in the mutation is progressively reduced and the search is performed in the neighbourhood of the solutions. In other words, due to its structure, a DE scheme is highly explorative at the beginning of the evolution and subsequently becomes more exploitative during fine tuning.

However, as highlighted in [21], the DE mechanism hides an important limitation. If, for some reasons, the algorithm does not succeed in generating offspring solutions which outperform the corresponding parent, the search is repeated again with similar step size values and will likely fail by falling into an undesired stagnation condition (see [31]). In other words, DE offers a wide margin of improvement. For this reasons, several modified DE schemes have been proposed in order to enhance the performance of DE of various problems, e.g. [23, 32], and [33]. In almost all these improved versions of DE a certain degree of randomization is introduced within the search logic, see [21]. Although cDE schemes have been introduced in order to deal with hardware characterized by a limited memory, due to the presence of a statistical model of the population a certain degree a randomization is implicitly introduced every time the solutions (e.g. \(x_t,\,x_r\), and x s ) are sampled from the truncated Gaussian distributions. This aspect of the algorithm has, apparently, a positive effect on the algorithmic performance since, as shown in [20], despite its limited memory employment, cDE is likely to outperform its population-based version in various problems for a relatively low dimensionality. On the other hand, due to its inner structure cDE has more exploitative properties with respect to its corresponding population-based version. In other words, since compact algorithms do not handle an actual population of solution but only its statistical representation which tends to converge around the most promising solutions, see [20], each compact unit of CcDE can be seen as a local search algorithm which attempts to improve its elite solution. More specifically, the working principle of each compact unit can be seen as a local search which is periodically “disturbed” (by the perturbation mechanism). The moving operators of mutation and crossover are supposed to detect promising search directions and quickly exploit them. This fact corresponds to the convergence of the virtual population towards the elite. This convergence is likely to be premature. The perturbation mechanism then inhibits the algorithmic convergence and forces the algorithm to search elsewhere in the decision space, possibly detecting new promising solutions. In conclusion, each compact unit of the CcDE can be seen as a multi-start local search algorithm which performs a highly exploitative mini-search between each pair of PV perturbations. However, this mini-search occurs while the memory of the previously achieved enhancements is kept.

The multiple local search is then coordinated by the entire ring structure. Similar to distributed systems, see e.g. [24] and [25], the migration of the elite has the role of propagating the most promising genotypes throughout the entire ring. The migrated promising solutions can then suggest search directions and modify the convergence process over the virtual populations. In addition, the injection of the new elite into a compact unit where the virtual population is centred on a different area of the search space promotes the search of unexplored genotypes since the solutions generating the provisional offspring are likely to be different from the elite solution. This mechanism is thus supposed to improve upon the elite and support the search of the global optimum.

The scale factor inheritance mechanism has a slightly different role since it may be seen as the propagation of the search rule instead of the promising genotype. As highlighted in [26], the employment of a unique and constant scale factor value can be improper since the exploratory moves depend on distribution of the solution within the decision space. For example, for a highly multi-modal problem, a scale factor F ≈ 1 can generate very long moving vectors \(F\left(x_r-x_s\right)\) if the population is spread out within the decision space and very short moving vectors if the population is concentrated in some areas of the decision space. This fact may lead to an excessively explorative behaviour during some stages of the evolution and an excessively exploitative behaviour during other stages. These two behaviours can cause, respectively, stagnation due to an incapability to detect a promising search direction and a premature convergence due to the excessive exploitation of a suboptimal basin of attraction. In other words, a proper choice of the scale factor depends not only on the optimization problem but also on the stage of the evolution. In this sense, the scale factor inheritance relies on the fact that the most promising values are propagated throughout the ring topology. However, the perturbation of the scale factor while propagated guarantees a certain degree of diversity in the search logic over the compact units and thus that exploration is continued despite the detection of strong genotypes.

Thus, CcDE can be seen as an algorithm composed out of multiple local search units coordinated by means of a self-adaptation which promotes highly performing genotypes and efficient search rules. The two self-adaptation mechanisms supervise the multiple local search by recombining the available pieces of information in order to solve global optimization problems. In other words, the local knowledge of the fitness landscape is gained by means of each compact unit while the global knowledge is pursued by the ring structure with elite migration and scale factor inheritance.

Finally, it is worthwhile commenting the memory requirements of CcDE. As explained in [20], cDE requires the memory equivalent to four solutions since one memory slot is taken by the elite two by the PV and one for offspring generation. In CcDE each compact unit requires three memory slots: one for the elite and two for the PV. In addition one memory slot, common to all the compact units, is used for offspring generation. The global memory employment of CcDE is then given by \(3\cdot N_c+1\) slots. As it will be shown in Sect. 3, not a high value of N c is necessary to solve complex problems. For this reason, the memory employment of CcDE can be considered comparable to that of a simple population-based algorithm which does not require neither a large population size nor extra memory structure such as an archive or a learning period database.

3 Numerical results

Table 1 lists the test problems that have been considered in this study. All the algorithms in this paper have been run for these test problems. Test problems \(f_{19}-f_{26}\) are characterized by a unique dimensionality value (indicated in the list). Thus, 26 test problems in total have been used in this study. For each algorithm, 30 independent runs have been performed. The budget of each single run has been fixed equal to \(5,000 \cdot n\) fitness evaluations. The proposed CcDE has been compared with modern compact and population-based algorithms. For all the experiments reported in this paper the CcDE has been run with \(N_c\,=\,10,\,N_p=20,\,Cr=0.1,\,M_p=0.0001,\,\tau=0.1,\,\alpha=0.1\) and M e  = 0.0001. It must be remarked that for compact algorithms the so called “generation” is actually a single comparison/fitness evaluation. Thus, the probabilities indicated above refer to each fitness evaluation. This explains why although the numbers appear to be low, the events occur fairly often during each run. For example, in the case of \(n\,=\,10\) a number of fitness evaluations equal to 50,000 is performed. During a single run the perturbation occurs on average 50 times over the entire ring structure, which are apparently enough to detect solutions with a high performance.

Table 1 Test problems

3.1 Comparison with modern compact algorithms

The following compact algorithms have been considered for comparison against CcDE.

  • compact Genetic Algorithm (cGA) with persistent elitism proposed in [13].

  • real compact Genetic Algorithm (rcGA) with persistent elitism proposed in [15].

  • (1 + 1) Covariance Matrix Adaptation Evolution Strategy ((1 + 1)-CMA-ES) with Cholesky update proposed in [37]. All the parameters have been set as proposed in [37] and the initial σ has been set equal to 1.

  • compact Differential Evolution (cDE) proposed in [20] with persistent elitism, DE/rand/1 mutation and binomial crossover. The cDE has been run with \(F\,=\,0.5\) and \(Cr\,=\,0.7\).

  • Memetic compact Differential Evolution (McDE) proposed in [22] with persistent elitism, DE/rand/1 mutation and binomial crossover. The McDE has been run with \(F\,=\,0.7,\,Cr\,=\,0.7\), probability of local search activation \(p_{ls}\,=\,0.005\), reduction factor \(\beta\,=\,0.8\), and initial hyper-cube dimension δ equal to 10% of the search space.

The virtual population size N p for cGA, rcGA, cDE, and McDE has been set equal to 300, as suggested in [20] and [22].

Table 2 shows the average of the final results detected by each algorithm ± the corresponding standard deviation values calculated over the performed 30 runs. The best results are highlighted in bold face. In order to strengthen the statistical significance of the results, the Wilcoxon Rank-Sum test has also been applied according to the description given in [38], where the confidence level has been fixed to 0.95. Table 3 shows the results of the Wilcoxon test for each version of CcDE against the other algorithms considered in this study. A “+” indicates the case in which CcDE statistically outperforms, for the corresponding test problem, the algorithm indicated in the top of the column; a “=” indicates that no significant difference between the performances can be detected with the Wilcoxon test; a “−” indicates that CcDE is outperformed.

Table 2 Average final fitness values ± standard deviations for compact algorithms
Table 3 Wilcoxon Rank-Sum test for compact algorithms

Numerical results show that the proposed CcDE outperforms the other algorithms considered in this study. While the comparison with cGA, rcGA, cDE and McDE displays a clear superiority of CcDE for the considered problems, the last column in Table 3 shows that CcDE outperforms, on average, \(\left(1+1\right)\)-CMA-ES (7 lost and 13 won comparisons). Numerical results thus show that the proposed compact structure appears promising while compared to other algorithms employing a similar logic. With reference to the other compact algorithms considered in this study, CcDE instead of performing the search along one direction, explores the decision space from multiple search directions. In most cases, the redundant action of the various compact units appears to guarantee that the search towards the optimum is not stopped by premature convergence but, on the contrary, is continued and new promising genotypes are detected. This effect can be clearly see for non-separable functions, for example rotated function as f 9.

Figure 5 shows average performance trends of the five considered compact algorithms over a selection of the test problems considered in this study.

Fig. 5
figure 5

Performance trends of CcDE and compact algorithms

3.2 Comparison with modern population-based algorithms

The following population-based algorithms have been considered for comparison against DEcDE.

  • Estimation of Distribution Algorithm with MultiVariate Gaussian model (\(\hbox{EDA}_{mvg}\)) proposed in [39]. \(\hbox{EDA}_{mvg}\) has been run with learning rate \(\alpha\,=\,0.2\), population size \(N_p\,=\,50\), selection ratio \(\tau\,=\,0.3\), and maximum amplification value \(Q\,=\,1.5\).

  • Real Coded Memetic Algorithm (RCMA) proposed in [40]. RCMA has been run with population size \(N_p\,=\,50\), crossover parameter \(\alpha\,=\,0.5\), mutation probability 0.125, maximum generation number \(T\,=\,10000,\,\beta\,=\,5\), maximum number of individuals taking part to the negative assortative mating N nam  = 3, roulette wheel selection. The other fixed parameters have been set as suggested in [40].

  • Differential Evolution with adaptive hill-climb Simplex Crossover (DEahcSPX) proposed in [41]. DEahcSPX has been run with \(N_p\,=\,50\), scale factor \(F\,=\,0.9\) and crossover rate \(Cr\,=\,0.9\) as suggested in the paper, the number of points involved in the hill-climb \(n_p\,=\,3\), factor \(\epsilon\,=\,1\).

Numerical results in terms of average final values and Wilcoxon test are shown in Tables 4 and 5, respectively. Numerical results show that the proposed CcDE is very promising and competitive with modern and complex population-based algorithms. More specifically, CcDE clearly outperforms, for the selected problems, \(\hbox{RCEDA}_{mvg}\) and DEahcSPX. The RCMA is competitive with CcDE, especially for low dimensional problems, uni-modal, and separable functions, but is on average outperformed by the proposed algorithm. According to our interpretation, the RCMA succeeds at outperforming the proposed CcDE in some cases thanks to the employment of the local search. It can be clearly seen that when the function is uni-modal (as for example for f 1), the local search employment during the early generations allows the quick solution of the problem. This consideration can be extended to those fitness landscapes which, albeit multi-modal, are characterized by a strong globally optimal basin of attraction, for example f 14. A further improvement to our proposed scheme is the integration of local search units within the ring topology. These local search units will allow an enhancement in robustness and a high performance also in the case of uni-modal and separable functions. Figure 6 shows average performance trends of the four considered algorithms over a selection of the test problems considered in this study.

Table 4 Average final fitness values ± standard deviations for population-based algorithms
Table 5 Wilcoxon Rank-Sum test for population-based algorithms
Fig. 6
figure 6

Performance trends of CcDE and population-based algorithms

3.3 Algorithmic functioning

In order to better explain the working principle of the proposed CcDE, the following experiment has been performed. For a selected function, f 18, CcDE has been run with \(N_c\,=\,3\) and all the other parameters as reported above. Under these conditions, the evolution of the three corresponding elite solutions has been studied. Figure 7 reports the result of our study. As explained above the domain is normalized in the domain [0, 1]2. The three elite solutions start from the origin (0, 0) and perform their evolutions, marked with black solid line, black dashed line, and grey dashed line. Eventually, the three evolutions end up in the point (0.3191, 0.5288). It can be observed that the three elite solutions follow different pathways while exploring the decision space and that these pathways, due to the effect of the migration, cross in some points. The three compact units eventually tend to focus their search in the same promising subregion of the decision space and detect at the end of the search the same promising solution.

Fig. 7
figure 7

Example of algorithmic functioning for a CcDE with 3 compact units. The three lines, black solid, black dashed, and grey dashed, represent the evolution of the elites within each compact unit. The three evolutions start from (0,0), explore different areas, and eventually reach the same solution in (0.3191,0.5288)

4 Conclusion

This paper has proposed a novel algorithmic structure for solving global optimization problems. The proposed CcDE is composed of a set of compact units. Each unit is a compact optimizer based on DE logic. The various units have the role of locally exploring the decision space from different perspectives. The multiple local search is coordinated and supervised by a self-adaptive logic which promotes the migration, and thus the propagation, of the most promising genotypes, as well as the propagation of the search rules displaying the highest performance. The latter effect is carried out by adapting to compact structure the scale factor inheritance mechanism for distributed systems. In this sense, the proposed CcDE can be seen as a parallel algorithm where virtual populations play the role of actual population and perform the search, simultaneously. On the other hand, CcDE can also be seen as a population-based algorithm where each element is a search strategy. The search strategies interact and evolve towards the detection, for each specific phase of the evolution, of a suitable search logic and finally the search of the global optimum. The proposed algorithm, despite its simplicity, displays a good performance on a set of various test problems with respect to compact algorithms and modern population-based meta-heuristics. The success of CcDE is partly due to the intuition that compact optimizers have a behaviour more similar to local search rather than global search. For this reason the study of algorithmic structures employing compact units and a supervising rule is, in our opinion, a promising research topic in evolutionary optimization. Future works will investigate structures composed of diverse compact units (e.g. employing different mutation strategies) and more advanced supervising models.