Keywords

1 Introduction

Compact Optimization [1] is a branch of Computational Intelligence Optimization devoted to the study of optimization algorithms characterized by limited memory requirements. From an algorithmic point of view, compact algorithms belong to the family of the Estimation of Distribution Algorithms (EDAs) [2], i.e. algorithms that instead of evolving a population of solutions (as is typically done in population-based optimization algorithms, such as Evolutionary Algorithms and Swarm Intelligence algorithms), build and update a probabilistic model of the distribution of solutions within the search space. Depending on the specific probabilistic model (Gaussian, binomial, etc.), different EDAs can be implemented. In this regard, the specificity of compact algorithms is that they employ a separate distribution for each variable of the problem, and update it as long as the evolutionary process proceeds. Therefore, differently from population-based algorithms where at least n D-dimensional arrays need to be stored in memory (being n the population size and D the problem dimension), compact algorithms need to store only a much more compact “Probability Vector” (\(\mathbf{PV}\)) that describes the parameters of the probabilistic model. For instance, binary-encoded compact algorithms use as PV a single D-dimensional array \(\mathbf p = [p_1, p_2, \dots , p_D]\). Each \(p_i \in [0,1]\), \(i = 1, 2, \dots , D\), represents the probability that the ith variable has value 1 (i.e., the relative frequency in a corresponding “virtual population” of \(N_p\) individuals, with \(N_p\) a parameter of the algorithm). Similarly, real-valued compact algorithms based on Gaussian distributions use as \(\mathbf{PV}\) two D-dimensional arrays: an array of means \(\varvec{\mu } = [\mu _1, \mu _2, \dots , \mu _n]\) and an array of variances \(\varvec{\sigma }= [\sigma _1, \sigma _2, \dots , \sigma _n]\), describing a (normalized) Gaussian distribution of each variable in the search space.

In the past two decades, the Compact Optimization concept has been declined in a number of compact algorithms, sparkling from the seminal works by Harik et al. [3] and Corno et al. [4], who devised a similar algorithm dubbing it respectively “compact Genetic Algorithm” (cGA) and “Selfish Gene” (SG). The family of compact algorithms was then extended to include improved versions of cGA [5, 6], real-valued cGA (rcGA) [7], compact Differential Evolution (cDE) [8] and many of its variants [9,10,11,12,13,14,15,16,17,18], compact Particle Swarm Optimization (cPSO) [19], compact Bacterial Optimization (cBFO) [20], and, more recently, compact Teaching-Learning Based Optimization (cTLBO) [21, 22], compact Artificial Bee Colony (cABC) [23, 24], and compact Flower Pollination Algorithm (cFPA) [25].

Due to their limited usage of memory, compact algorithms are particularly suited for embedded devices, such as Wireless Sensor Networks motes, wearable devices, embedded controllers for robots and industrial plants, etc. Unsurprisingly, the literature abounds with successful examples of compact algorithms applications based on this kind of devices: for instance, cDE has been applied especially in control applications, such as real-time hardware-in-the-loop optimization of a control system for a permanent-magnet tubular linear synchronous motor [8], or real-time trajectory optimization of robotic arms [18, 26] and Cartesian robots [17]. In [6], cGA was applied to micro-controller design, while cPSO was used for optimizing a power plant controller in [19]. In [27], cABC was used for topology control in Wireless Sensor Networks. The more recent cTLBO was instead applied to train Artificial Neural Networks in [22].

In this paper, we aim to push forward this research area by tackling the two main drawbacks of compact algorithms, i.e. their tendency to premature convergence (as they do not keep an actual population, they do not maintain explicitly diversity), and a poorer performance on non-separable problems (which is due to the fact that they process each variable separately). To overcome these limitations, we study the effect of a special case of restart named Re-Sampled Inheritance (RI) [28, 29], which simply generates a random solution and then recombines it - by using an exponential crossover operator similarly to Differential Evolution - with the best solution detected so far by the compact algorithm. Previous works [29, 30] have shown that the RI mechanism is a simple yet effective way to improve the performance of an optimization algorithm, as it allows to escape from local optima while preserving some of the information from the current best, thus guaranteeing a kind of inheritance and avoiding an excessively disruptive restart (compared to other restart mechanisms). Our hypothesis is then that on the one hand the re-sampling should allow compact algorithms to escape from local optima, on the other hand the inheritance mechanism -since it processes blocks of variables- should enable an overall performance improvement especially (but not only) on non-separable problems. Moreover, as the RI mechanism only needs to sample a random solution and recombine it with the current best, it does not require any additional memory with respect to a compact algorithm, therefore allowing to keep a low memory consumption.

To assess the effect of RI on different compact algorithms, we apply it separately to four (real-valued) compact algorithms taken from the literature, namely cDE (specifically, its ‘light” version [11]), rcGA, cPSO, and cBFO, and perform extensive tests on the CEC 2014 benchmark [31] in 10, 50 and 100 dimensions.

The rest of this paper is organized as follows: Sect. 2 presents the background concepts on Compact Optimization. Section 3 describes the general algorithmic scheme which combines compact algorithms with Re-Sampled Inheritance. The numerical results are then presented and discussed in Sect. 4. Finally, Sect. 5 concludes this work.

2 Background

In the rest of this paper, we focus on real-valued compact algorithms as they have been shown to perform better than their binary-encoded counterparts [7]. In case of real values, the general structure of a compact algorithm is quite straightforward and can be described as follows. First of all, for each i-th variable a Gaussian Probability Distribution Function (PDF) is considered, truncated within the interval \(\left[ -1,1\right] \), with mean \(\mu _i\) and standard deviation \(\sigma _i\) taken from the Probability Vector \(\mathbf{PV}=[\varvec{\mu },\varvec{\sigma }]\). The height of the PDF is normalized in order to keep its area equal to 1.

At the beginning of the optimization process, for each design variable i, \(\mu _i=0\) (unless Re-Sampled Inheritance is used, see next section) and \(\sigma _i=\lambda \), where \(\lambda \) is a large positive constant (e.g. \(\lambda =10\)), such that it simulates a uniform distribution (thus exploring the search space). Subsequently, a starting individual, \(\mathbf {elite}\), is generated by sampling each i-th variable from the corresponding PDF. For more details about the sampling mechanism, see [7].

Then, the iterative process starts. At each step, depending on the specific compact algorithm, a candidate solution \(\mathbf {x}\) is generated by sampling one or more individuals from the current PV. E.g., in rcGA [7], \(\mathbf {x}\) is obtained by generating a single individual and recombining it with \(\mathbf {elite}\) with binomial crossover. In cDE [8], \(\mathbf {x}\) is obtained by generating a mutated individual (for instance sampling three individuals from \(\mathbf{PV}\) and applying the rand/1 DE mutation), and then recombining it with the current \(\mathbf {elite}\) by using either binomial or exponential crossover. Other compact algorithm paradigms, such as cPSO and cBFO, use the same mechanism for sampling new individuals, but apply different algorithmic operators inspired from the corresponding biological metaphor to generate a new candidate solution \(\mathbf {x}\). In all cases, the fitnesses of \(\mathbf {elite}\) and \(\mathbf {x}\) are compared and, according to the chosen elitism scheme (persistent or non-persistent, see [1]), \(\mathbf {elite}\) is replaced by \(\mathbf {x}\). Furthermore, the fitness comparison is used to update the \(\mathbf{PV}\), i.e. it changes its \(\mu \) and \(\sigma \) values by “moving” the Gaussian PDF towards the better solution and “shrinking” the PDF around it. Details for this update mechanism are given in [7]. This iterative process is executed until a certain stop condition is met. The pseudo-code of a general real-valued compact algorithm is given in Algorithm 1.

figure a

3 Compact Optimization Algorithms with Re-Sampled Inheritance

The general scheme of a compact algorithm with Re-Sampled Inheritance is shown in Algorithm 2. The only difference w.r.t. the original compact algorithm shown in Algorithm 1 is the RI component, which enables the restart mechanism with inheritance, and is activated at the end of each execution of the compact algorithm (that is continued for a given % of the total budget).

The Re-Sampled Inheritance (see [28, 29] for more details) first randomly generates a solution \(\mathbf {x}\) from a uniform distribution within the given search space. Then, it recombines \(\mathbf {x}\) with the current best solution \(\mathbf {x_{best}}\) by applying the exponential crossover used in Differential Evolution. More specifically, a random initial index is selected in [0, D), and the corresponding variable is copied from \(\mathbf {x_{best}}\) into \(\mathbf {x}\). Then, as long as a (uniform) random number rand(0, 1) is less than or equal to Cr, the design variables from \(\mathbf {x_{best}}\) are copied into the corresponding positions of \(\mathbf {x}\), starting from the initial index. Cr, the crossover rate, is a parameter affecting the number of variables inherited from \(\mathbf {x_{best}}\), and is set as in [11], i.e. \(Cr=1/\root D\alpha \of {2}\), where \(D\alpha \) is the expected number of variables that are copied from \(\mathbf {x_{best}}\). As soon as \(rand(0,1) > Cr\), the copy process is interrupted. The copy is handled as in a cyclic buffer, i.e. when the D-th variable is reached during the copy process the next to be copied is the first one. When the copy stops, the fitness of \(\mathbf {x}\) is compared with that of \(\mathbf {x_{best}}\). If the newly generated solution \(\mathbf {x}\) outperforms the current best solution, the latter gets updated (i.e. \(\mathbf {x_{best}=\mathbf {x}}\)). The compact algorithm is then restarted after setting its mean value \(\varvec{\mu }\) (that is used in the Probability Vector \(\mathbf {PV}\)) equal to the new restarted point, i.e. \(\varvec{\mu }=\mathbf {x}\). This way, the new initial distribution is centered in a new point which, despite being randomly sampled, still contains some inheritance from the current best solution. At the end of each compact optimization routine, see Algorithm 1, an elite solution is returned and compared for replacement against the current best solution, as shown in Algorithm 2.

figure b

The rationale behind the RI mechanism is then to restart the algorithm from a partially random solution, i.e. a solution that is randomly generated but still inherits part of the variables from the current best. This way the restart is not entirely disruptive, but preserves at least a block of (an expected number of) \(D\alpha \) variables. This partial inheritance allows the algorithm to keep some information from one restart and the next one, but also to escape from local optima. From this point of view, the RI mechanism shares some resemblance with the Iterated Local Search (ILS) methods [32], that try to apply a small perturbation to the best-so-far solution during restart (in fact, as small as possible to not disrupt it much, but as large as needed to allow the local search to converge to a different local optimum). However, ILS has been especially designed for (and applied to) combinatorial optimization, while here we focus on continuous optimization.

4 Numerical Results

In the following we present the numerical results obtained on the CEC 2014 benchmark [31]. This benchmark is composed of 30 functions, with different properties in terms of separability, ill-conditioning, and landscape structure. In particular, it is worth noting that except for \(f_8\) (Shifted Rastrigin’s Function) and \(f_{10}\) (Shifted Schwefel’s Function), all CEC 2014 benchmark functions are non-separable. Therefore this benchmark is particularly suited for testing the performance of optimization algorithms on non-separable problems.

We considered the following four real-valued compact algorithms, with the parametrization proposed in their original papers:

  • cDE “light” [11] with exponential crossover and parameters: \(N_p= 300\), \(F = 0.5\), and \(\alpha _m = 0.25\);

  • rcGA [7], with persistent elitism and parameters: \(N_p = 300\);

  • cPSO [19], with parameters: \(N_p = 300\), \(\phi _1 = 0.2\), \(\phi _2 = 0.07\), \(\phi _3 = 3.74\), \(\gamma _1 = 1\), and \(\gamma _2 = 1\);

  • cBFO [20], with parameters: \(N_p = 300\), \(C_i = 0.1\), and \(N_s = 4\).

As for the corresponding versions with RI (dubbed, respectively as RIcDE, RIrcGA, RIcPSO and RIcBFO), the same parametrization was kept for the compact optimazion process while the RI component was parametrized with \(\alpha =0.05\) (such that only \(5\%\) of the variables are inherited, on average, from the current best). A number of fitness function calls equal to \(25\%\) of the total computational budget was assigned to execute the compact algorithm after each restart. It should be noted that these are the only two parameters of the RI mechanism and they were empirically set after having observed their effect in preliminary experiments.

Furthermore, in order to assess the effect of the RI mechanism w.r.t. a simple random restart without any form of inheritance, we included in our experimental setup also four variants of the same compact algorithms where the restart was applied, with the same period of the RI variants (\(25\%\) of the total computational budget), by simply applying a uniform re-sampling of a new solution \(\mathbf {x}\) within the search space, and restarting the compact algorithm by setting \(\varvec{\mu }=\mathbf {x}\). We dub these compact algorithms with random restart, respectively, as RecDE, RercGA, RecPSO and RecBFO.

Finally, to provide a baseline for all the compact algorithms with/without restart tested in this paper, we evaluated the performance of a simple Random Walk (RW) algorithm where at each step a new solution is generated by applying a uniform re-sampling within the search space. From our numerical results (see Table 5) it can be seen that its performance is -as expected- considerably worse than any of the compact algorithms considered in the experiments, thus highlighting that the “compact” logic is more than a mere random sampling and performs significantly better w.r.t. pure uniform random searches.

To assess the scalability of all the algorithms, we performed experiments in 10, 50 and 100 dimensions. Thus, the total experimental setup consists of 13 algorithms (4 compact algorithms, 4 RI variants, 4 variants with random restart, and RW) and \(30 \times 3 = 90\) optimization problems (i.e. 30 functions each tested in three different dimensionalities). On each benchmark function, each algorithm was executed for 30 independent runs, to collect statistics on the fitness values obtained in each run at the end of the allotted computational budget. Each run was executed for a total budget of \(5000 \times D\) function evaluations, being D the problem dimension.

In the following, for the sake of brevity we will show only a compact representation of the main experimental resultsFootnote 1. For that, we will use the sequentially rejective Holm-Bonferroni procedure [33, 34]. This procedure consists of the following: considering \(N_{TP}\) test problems (in our case, 90) and \(N_A\) optimization algorithms, the performance obtained by each algorithm on each problem is computed. This is measured as average of the best fitness values obtained by the algorithm on that problem over multiple (in our case, 30) independent runs, at the end of the computational budget (in our case, \(5000 \times D\) function evaluations). Then, for each problem a score \(R_i\) is assigned to each algorithm, being \(N_A\) the score of the algorithm displaying the best performance (i.e., assuming minimization, the minimum average of the fitness values) on that problem, \(N_A-1\) the score of the second best, and so on. The algorithm displaying the worst performance scores 1. These scores are then averaged, for each algorithm, over the whole set of \(N_{TP}\) test problems. The algorithms are sorted on the basis of these average scores. Indicating with \(R_0\) the rank of an algorithm taken as reference, and with \(R_j\) for \(j = 1,\dots ,N_A-1\) the rank of the remaining algorithms, the values \(z_j\) are calculated as:

$$\begin{aligned} z_j = \frac{R_j - R_0}{\sqrt{\frac{N_A(N_A+1)}{6N_{TP}}}}. \end{aligned}$$
(1)

By means of the \(z_j\) values, the corresponding cumulative normal distribution values \(p_j\) are derived. These \(p_j\) values are then compared to the corresponding \(\delta /j\) where \(\delta \) is the confidence interval, set to 0.05: if \(p_j > \delta /j\), the null-hypothesis (that the algorithm taken as reference has the same performance as the j-th algorithm) is accepted, otherwise is rejected as well as all the subsequent tests.

Table 1. Holm-Bonferroni procedure (reference: RIcDE, Rank = 2.63e+00)
Table 2. Holm-Bonferroni procedure (reference: RIrcGA, Rank = 2.53e+00)
Table 3. Holm-Bonferroni procedure (reference: RIcPSO, Rank = 1.99e+00)
Table 4. Holm-Bonferroni procedure (reference: RIcBFO, Rank = 1.84e+00)

Let us first consider, for each compact algorithm, how the corresponding algorithms with RI and random restart perform w.r.t. the original compact algorithm without restart. Tables 1, 2, 3 and 4 show, respectively, the results of the Holm-Bonferroni procedure (in this case with \(N_A=3\)) on cDE, rcGA, cPSO and cBFO based algorithms. The tables display the ranks, \(z_j\) values, \(p_j\) values, and corresponding \(\delta /j\) obtained by this procedure. In each case we considered as reference algorithm the corresponding algorithm with RI, whose rank is shown in parenthesis in each table caption. Moreover, we indicate in each table whether the null-hypothesis (that the algorithm taken as reference has the same performance as each other algorithm in the corresponding table row) is accepted or not.

From these Holm-Bonferroni procedures, we can observe that, except for the case of cPSO (where, quite surprisingly, cPSO shows the same performance as the corresponding algorithms with RI and random restart) in all other cases the algorithms with RI score a better rank than their corresponding compact algorithms. It is also interesting to note that, while in the case of cDE RIcDE performs better also than RecDE, on the other compact algorithms it results that the RI variant is statistically equivalent to the variant with random restart (note that the null-hypothesis is accepted in those cases). This equivalence between RI and random restart in the case of rcGA, cPSO and cBFO might be due to parametrization used for RI (number of restarts and number of variables inherited from the current best), as well as the different algorithmic logics used by these algorithms compared to cDE. In general though, these observations demonstrate that the use of restarts, and, especially in the case of cDE, the use of RI is beneficial in terms of optimization performance.

Finally, we provide an overall comparison of all the 12 compact optimization algorithms, in addition to the Random Walk algorithm. The resulting Holm-Bonferroni procedure is reported in Table 5, where RIcDE is considered as reference algorithm (as it shows the highest rank) and \(N_A=13\). In this case, except for RecDE, all the hypotheses are sequentially rejected, meaning that when all the algorithms are considered together, RIcDE is statistically equivalent to RecDE (although it shows a slightly higher rank), but it shows a statistically better performance (on average, on the entire set of tested problems) than all other algorithms under study. As expected, the Random Walk algorithm performs worse than all other papers. Moreover, the rank shows that each compact algorithm with RI (or random restart) performs better (on average) than the corresponding compact algorithm, confirming the fact that the RI component is beneficial to all the compact algorithms considered in our experimentation.

Table 5. Holm-Bonferroni procedure (reference: RIcDE, Rank = 1.09e+01)

5 Conclusions

In this paper we have presented an algorithmic scheme for solving continuous optimization problems on devices characterized by limited memory. The proposed scheme is based on a combination of a compact algorithm with a restart mechanism based on Re-Sampled Inheritance (RI). We tested this scheme on four different compact algorithms presented in the literature (namely: cDE, rcGA, cPSO, and cBFO) and performed numerical experiments on a broad range of benchmark functions in several dimensionalities. Our experiments show that the use of RI consistently enhances the performances of compact algorithms, still keeping a limited usage of memory. In addition to that, we noted that among the tested algorithms the best performance was obtained by cDE with Re-Sampled Inheritance.

In future works, we will further investigate the effect of the parametrization on the proposed compact algorithms with Re-Sampled Inheritance, focusing in particular on the influence of the number of restarts, as well as the number of variables inherited from the best individual at each restart. We will also investigate alternative inheritance mechanisms, for instance based on binomial crossover or exponential crossover on shuffled variables.