Keywords

1 Introduction

Since its introduction [9] and subsequent modifications [4, 18] Particle Swarm Optimization (PSO) algorithm has attracted many researchers by its simplicity of implementation and easiness of parallelization [13]. PSO has currently a several standard approaches [4], multiple parameter settings considered to be optimal [7] and successful specialized approaches [3]. PSO have also been tried with various topologies [8, 17], and unification [16] and adaptation schemes.

This paper brings various population based approaches together, and puts them in a generalized swarm-based optimization framework (GPSO). The motivation for such an approach comes from the social sciences, where diversity is seen as a source of synergy [10] and our adaptive approach (GAPSO) seeks an emergence of such a behavior within a heterogeneous swarm.

The remainder of this paper is arranged as follows. Section 2 introduces PSO and its adaptive modifications, together with discussing Differential Evolution (DE) algorithm and its hybridization with PSO. In Sect. 3 general overview of the system’s construction is provided. Section 4 describes adaptation scheme and future system implementation details. Section 5 is devoted to a presentation of the experimental setup, in particular, the benchmark sets and parametrization of the methods used in the experiments. Experimental results are presented in Sect. 6. The last section concludes the paper.

2 Particle Swarm Optimization: Modification and Hybridization Approaches

This section reviews optimization algorithms used as basic building blocks within our generalized approach: PSO and DE. Initial paragraphs introduce the basic forms of the PSO and DE algorithms, while the following summarize the research on hybridizing those approaches and creating the adaptive swarm optimizers. Please bear in mind, that in all methods we shall be considering the optimization problem to be a minimization problem.

Particle Swarm Optimization. PSO is an iterative global optimization metaheuristic method utilizing the ideas of swarm intelligence [9, 18]. The underlying idea of the PSO algorithm consists in maintaining the swarm of particles moving in the search space. For each particle the set of neighboring particles which communicate their positions and function values to this particle is defined. Furthermore, each particle maintains its current position x and velocity v, as well as remembers its historically best (in terms of solution quality) visited location. In each iteration t, ith particle updates its position and velocity, according to formulas 1 and 2.

Position update. The position is updated according to the following equation:

$$\begin{aligned} \mathbf {x}^i_{t+1} = \mathbf {x}^i_t + \mathbf {v}^i_{t+1}. \end{aligned}$$
(1)

Velocity update. In a basic implementation of PSO (as defined in [4, 18]) velocity \(v^i_{t}\) of particle i is updated according to the following rule:

$$\begin{aligned} \mathbf {v}^i_{t+1} = \omega \cdot \mathbf {v}^i_t + c_1 \cdot (\mathbf {p}^i_{best} - \mathbf {x}^i_t)+ c_2 \cdot (\mathbf {neighbors}^i_{best} - \mathbf {x}^i_t) \end{aligned}$$
(2)

where \(\omega \) is an inertia coefficient, \(c_1\) is a local attraction factor (cognitive coefficient), \(\mathbf {p}^i_{best}\) represents the best position (in terms of optimization) found so far by particle i, \(c_2\) is a neighborhood attraction factor (social coefficient), \(\mathbf {neighbors}^i_{best}\) represents the best position (in terms of optimization) found so far by the particles belonging to the neighborhood of the ith particle (usually referred to as \(\mathbf {g}_{best}\) or \(\mathbf {l}_{best}\)).

Differential Evolution. DE is an iterative global optimization algorithm introduced in [19]. DE’s population is moving in the search space of the objective function by testing the new locations for each of the specimen created by crossing over: (a) a selected \(\mathbf {x}^{j}\) solution, (b) solution \(\mathbf {y}^{(i)}_t\) created by summing up a scaled difference vector between two random specimen (\(\mathbf {x}^{(1)}\), \(\mathbf {x}^{(2)}\)) with a third solution (\(\mathbf {x}^{(i)}\)). One of the most successful DE configurations is DE/rand/1/bin, where in each iteration t, each specimen \(\mathbf {x}^{i}_t\) in the population is selected and mutated by a difference vector between random specimens \(\mathbf {x}^{(i_1)}_t\) and \(\mathbf {x}^{(i_2)}_t\) scaled by \(F \in \mathbb {R}\):

$$\begin{aligned} \mathbf {y}^{(i)}_t = \mathbf {x}^{(i)}_t + F \times (\mathbf {x}^{(i_2)}_t - \mathbf {x}^{(i_1)}_t) \end{aligned}$$
(3)

Subsequently, \(y^{(3)}_t\) is crossed-over with \(x^{best}_t\) by binomial recombination:

$$\begin{aligned} \mathbf {u}^{i}_{t} = Bin_p(\mathbf {x}^{best}_t,\mathbf {y}^{(i)}_t) \end{aligned}$$
(4)

Finally, the new location \(u^{i}_{t}\) replaces original \(x^{i}_t\) iff it provides a better solution in terms of the objective function f:

$$\begin{aligned} \mathbf {u}^{i}_{t} = \left\{ \begin{array}{rl} \mathbf {u}^{i}_t &{} \text{ if } f(\mathbf {u}^{i}_t)<f(\mathbf {x}^{i}_t) \\ \mathbf {x}^{i}_t &{} \text{ otherwise } \end{array} \right. \end{aligned}$$
(5)

Adaptive PSO Approaches. While a basic version of the PSO algorithm has many promising features (i.e. good quality of results, easiness of implementation and parallelization, known parameters values ensuring theoretical convergence) it still needs to have its parameters tuned in order to balance its exploration vs. exploitation behavior [24]. In order to overcome those limitations a two–stage algorithm has been proposed [24]. That algorithm switches from an exploration stage into an exploitation stage, after the first one seems to be “burned out” and stops bringing much improvement into the quality of the proposed solution. Another adaptive approach that has been proposed for the PSO [23], identifies 4 phases of the algorithm: exploration, exploitation, convergence, and jumping out. The algorithm applies fuzzy logic in order to assign algorithm into one of those 4 stages and adapts its inertia (\(\omega \)), cognitive (\(c_1\)) and social (\(c_2\)) coefficients accordingly. Finally, a heterogeneous self-adapting PSO has been proposed [14], but its pool of available behaviors has been limited only to the swarm-based approaches.

PSO and DE Hybridization. While DE usually outperforms PSO on the general benchmark tests, there are some quality functions for which the PSO is a better choice, making it worthwhile to create a hybrid approach [1, 20]. Initial approaches on hybridizing PSO and DE consisted of utilizing DE mutation vector as an alternative for modifying random particles coordinates, instead of applying a standard PSO velocity update [5, 21]. Another approach [22], consists of maintaining both algorithms in parallel and introducing an information sharing scheme between them with additional random search procedure. PSO and DE can also be combined in a sequential way [6, 11]. In such an approach first the standard PSO velocity update is performed and subsequently various types of DE trials are performed on particle’s \(p_{best}\) location in order to improve it further.

3 Generalized Particle Swarm Optimization

This article follows the approach set for a social simulation experiment [15], by generalizing PSO velocity update formula (Eq. (2)) into a following form (with I being and indicator function and \(N_k(i\text {th})\) being a kth neighborhood of ith particle):

$$\begin{aligned} \begin{aligned} \mathbf {v}^i_{t+1} = \,&\omega \cdot \mathbf {v}^i_t + c_1\cdot (\mathbf {p}^i_{best} - \mathbf {x}^i_t) \\&+ \sum ^{|\mathcal {N}|}_{k=1} \sum \limits _{j=1, j \ne i}^{|particles|} I (j\text {th} \in N_k(i\text {th})) c'_{j, k} \cdot (\mathbf {p}^j_{best} - \mathbf {x}^i_t) \\&+ \sum ^{\mathcal {N}}_{k=1} \sum \limits _{j=1, j \ne i}^{|particles|} I (j\text {th} \in N_k(i\text {th})) c''_{j, k} \cdot (\mathbf {x}^j_{t} - \mathbf {x}^i_t) \end{aligned} \end{aligned}$$
(6)

In that way the social component extends into incorporating data from multiple neighbors and neighborhoods. The other part of generalization is not imposing an identical neighborhood structure over all particles, but letting each particle decide on the form of neighborhood. That way we take advantage of the agent-like behavior of swarm algorithms, were each individual is making its own decisions on the basis of simple rules and knowledge exchange (the other particles do not need to know behavior of a given particle, only its positions and sampled function values).

Proposed approach would be unfeasible if one would need to set up all \(c'_{j, k}\)’s and \(c''_{j, k}\)’s to individual values. Therefore we would rely on existing particles templates, where either all those coefficients would take the same value or most of them would be equal to zero. Our approach views \(c_{j, k}'\) and \(c_{j, k}''\) as functions. In most cases second index of c coefficients would be omitted, due to the fact that only a single neighborhood is considered.

In order to test the proposed generalized approach we have implemented five distinctive types of particles, coming from the following algorithms: Standard PSO (SPSO), Fully-Informed PSO (FIPSO), Charged PSO (CPSO), Unified PSO (UPSO), Differential Evolution (DE). Remainder of this section presents how each approach fits within the proposed GPSO framework.

Standard Particle Swarm Optimization. SPSO particle acts according to the rules of PSO described in Sect. 2 with a local neighborhood topology (with \(size \in \mathbb {Z}_{+}\) being its parameter). Therefore, the I function defining the neighborhood takes a following form:

$$\begin{aligned} I_{SPSO} (j\text {th} \in N(i\text {th})) = {\left\{ \begin{array}{ll} 1 &{} |i-j| \le size\\ 1 &{} |i-j| \ge |particles| - size\\ 0 &{} \sim \\ \end{array}\right. } \end{aligned}$$
(7)

Particle changes its direction using \(l_{best}\) location. Therefore, all values of \(c'j\)’s and \(c''j\)’s are equal to 0 except the one corresponding to the particle with the best \(p_{best}\) value in the neighborhood.

$$\begin{aligned} c_j' = {\left\{ \begin{array}{ll} 0 &{} f(\mathbf {p}^j_{best})>f(\mathbf {l}_{best}) \\ X \sim U(o,{c_2}) &{} f(\mathbf {p}^j_{best})=f(\mathbf {l}_{best})\\ \end{array}\right. } \end{aligned}$$
(8)

Fully-Informed Particle Swarm Optimization. FIPSO particle [12] steers its velocity to the location designated by all of its neighbors. All the best solutions found so far by the individual particles are considered with weights \(\mathcal {W}\) corresponding to the relative quality of those solutions. FIPSO particles utilize a complete neighborhood. Therefore, the indicator function \(I_{FIPSO}\) is equal to 1. The FIPSO particle is parametrized with a single value of an attraction coefficient c. Individual \(c'_j\)’s (and \(c_1\)) follow the uniform distribution:

$$\begin{aligned} c'_j \sim U \left[ 0,\dfrac{c \cdot \mathcal {W}(f(\mathbf {p}^{j}_{best}))}{|particles|}\right] \end{aligned}$$
(9)

Charged Particle Swarm Optimization. CPSO particle has been created for the dynamic optimization problems [3] and is inspired by the model of an atom. CPSO recognizes two particle types: neutral and charged. The neutral particles behave like SPSO particles. Charged particles, have a special component added to the velocity update equation. An ith charged particle has an additional parameter q controlling its repulse from other charged particles:

$$\begin{aligned} c''_{j,2} = -\dfrac{q^2}{||\mathbf {x}^i_t-\mathbf {x}^j_t||_2} \end{aligned}$$
(10)

Charged particles repulse each other, so an individual sub-swarms are formed (as imposed by the neighborhood), which might explore areas corresponding to different local optima.

Unified Particle Swarm Optimization. UPSO particle is a fusion of the local SPSO and the global SPSO [16]. The velocity update formula includes both \(l_{best}\) and \(g_{best}\) solutions. In order to express that unification of global and local variants of SPSO the I indicator function takes the following form:

$$\begin{aligned} I_{UPSO} (j\text {th} \in N_k(i\text {th})) = {\left\{ \begin{array}{ll} I_{SPSO} &{} k = 1\\ 1 &{} \mathbf {p}_{best}^{j} \text { is } \mathbf {g}_{best} \wedge k = 2\\ 0 &{} \sim \\ \end{array}\right. } \end{aligned}$$
(11)

Thus, there are two co-existing topologies of the neighborhood, which justifies the choice of the general formula for the GPSO (cf. Eq. (6)).

Differential Evolution within the GPSO Framework. While Differential Evolution (DE) [19] is not considered to be a swarm intelligence algorithm its behavior might be also fitted within the proposed framework GPSO. The reason for that is the fact that within the DE (unlike other evolutionary approaches) we might track a single individual as it evolves, instead of being replaced by its offspring.

DE/best/1/bin configuration and DE/rand/1/bin configurations are somewhat similar to the PSO with a \(g_{best}\) and \(l_{best}\) approaches, respectfully. The most important differences between DE and PSO behavior are the fact, that:

  • DE individual always moves from the best found position (\(p_{best}\) in PSO), while PSO particle maintains current position, regardless of its quality,

  • DE individual draws the ’velocity’ (i.e. difference vector) from the global distribution based on other individuals location, while PSO particle maintains its own velocity.

Therefore, DE individual i movement might be expressed in the following way:

$$\begin{aligned} \mathbf {x}_{test}^{(i, t+1)} = Bin(\omega \mathbf {v} + (\mathbf {p}_{best} - \mathbf {x}_{test}^{(i, t)}), \mathbf {g}_{best}) \end{aligned}$$
(12)

where v follows a probability distribution based on random individuals’ locations \(\mathbf {p}_{best}^{rand1}\) and \(\mathbf {p}_{best}^{rand2})\) and Bin is a binomial cross-over operator.

4 Adaptation Scheme

Different particle types perform differently on various functions. Moreover, different phases exists during optimization process. Some particle types perform better at the beginning, some perform better at the end of optimization algorithm’s execution. Therefore, optimal swarm composition within GPSO framework should be designated in real-time. Swarm composition is modified by switching the behaviors of particles. Principle of work for adaptation scheme forming the Generalized Self-Adapting Particle Swarm Optimization (GAPSO) is presented below.

The main idea is to promote particle types that are performing better than others. Adaptation is based on the quality of success. The adaption utilizes roulette selection approach with probabilities proportional to success measure.

Let’s assume that we have P particle types. Each particle changes its behavior every \(N_a\) iterations. Behavior is chosen according to a list of probabilities (each corresponding to one of P particles’ types). Each particle has the same vector of probabilities. At the beginning all probabilities are set to \(\frac{1}{P}\). Each \(N_a\) iterations probabilities vector is changing (adapting) according to the following scheme.

The average value of successes per each particle’s type from the last \(N_a\) observations is determined. Value of success \(z_{t}^{s}\) in iteration t for particle s is presented in the following equation:

$$\begin{aligned} z_{t}^{s} = max(0,\frac{f(\mathbf {p}^s_{best})-f(\mathbf {x}^s_t)}{f(\mathbf {p}^s_{best})}) \end{aligned}$$
(13)

Let \(swarm_p\) be a set of p type particles from whole swarm. The average success \(\hat{z}_p\) of given \(swarm_p\) is obtained from \(S_p*N_a\) values, where \(S_p\) is the size of \(swarm_p\). See the following equation:

$$\begin{aligned} \hat{z}_t^{p} = \frac{1}{S_p*N_a} * \sum _{t=T}^{T - N_a} \sum _{s \in swarm_p}^{} z_{t}^{s} \end{aligned}$$
(14)

This procedure produces P success values. Let us label them as \(z_1, z_2, \ldots , z_P\). Let Z be sum of given success values: \(Z = \sum _{p}^{P} {z_p}\). So required vector of probabilities is \([\frac{z_1}{Z}, \frac{z_2}{Z}, \ldots , \frac{z_P}{Z}]\). Better average success induces grater probability of assigning given behavior to each particle. On top of the described approach an additional special rule is applied: at least one particle for each behavior has to exists. This rule prevents behaviors for being excluded from further consideration, as they might be needed in a latter phase of optimization process.

5 Experiment Setup

The GAPSO algorithm has been implemented in JavaFootnote 1. The project consists of individual particles behaviors, an adaptation scheme, a restart mechanism, hill-climbing local optimization procedure for “polishing” the achieved results, and a port to the test benchmark functions. Tests have been performed on 24 noiseless 5D and 20D test functions from BBOB 2017 benchmarkFootnote 2.

Table 1. Individual algorithms parameters.
Table 2. Framework parameters.

Parameters. General GAPSO framework setup has been tuned on a small number of initial experiments, while the parameters of the individual optimization agents have been chosen according to the literature. The parameter values are presented in Tables 1 and 2.

Restarts. In order to fully utilize the algorithms’ potential within each of the tested methods a particle is restarted if for \(N_{rp}\) iterations at least one of these 2 conditions persisted: (a) particle is its best neighbor, (b) particle has low velocity (sum of squares of velocities in each direction is smaller than 1). Additionally, the whole swarm is restarted (each particle that belongs to it is restarted), if value of best found solution has not changed since \(N_{rs}\cdot D\), where D is dimension of function being optimized.

Local Optimization. Finally (both in GAPSO and individual approaches), before swarm restart and after the last iteration of the population based algorithms a local hill-climbing algorithm is used for 1000D evaluations, initialized with the best found solution.

6 Results

Results of the experiments are presented on the figures generated within BBOB 2017 test framework, showing ECDF plots of optimization targets achieved on a log scale of objective function evaluations.

Left subplot in Fig. 1 shows efficiency of 5 individual algorithms used in GAPSO tested independently for 5D functions. It can be observed that DE is coinciding to optimum faster than each of the PSO approaches. Advantage of the DE is even more evident for 20D functions (right subplot in Fig. 1).

Fig. 1.
figure 1

Comparison of individual algorithms performance for all functions in 5 and 20 dimensions.

Fig. 2.
figure 2

Comparison of the best (DE) and the worst (FIPSO) individual algorithms with GAPSO for functions with high conditioning and unimodal in 5D (top) and multi-modal functions with adequate global structure in 20D (right).

Subsequent charts (see Fig. 2) correspond to experiments carried out on selected algorithms with specified functions. In particular cases, differences in the effectiveness of algorithms can be observed. Left subplot in Fig. 2 shows advantage of DE algorithm in optimizing 5D unimodal functions with high conditioning. While another case, shown in right sublot in Fig. 2, presents FIPSO as an algorithm performing best for 20D multi-modal functions with adequate global structure. It can be observed that the proposed GAPSO algorithm remains competitive with both “clean” approaches.

Fig. 3.
figure 3

Average number of particles types in swarm compared with ECDF plot of individual algorithms performance for 20D Rosenbrock function.

Fig. 4.
figure 4

Average number of particles types in swarm compared with ECDF plot of individual algorithms performance for 20D Schaffer function.

Figures 3 and 4 present comparison of average number of particle’s behaviors and efficiency of homogeneous swarms for two selected functions. For Rosenbrock’s function (Fig. 3) DE swarm is significantly better than other kind of swarms and GAPSO algorithm adaptation method leads to greater number of DE particles in swarm. In the case when plain DE performance is worse than all the PSO-based approaches (see Fig. 4) GAPSO swarm contains significantly lower number of DE particles. It indicates that the proposed adaptation method controls the swarm composition according to the particular optimization function. It also can be observed that the performance of various PSO approaches is similar, and there is no noticeable difference between number of particles of particular kind.

Fig. 5.
figure 5

GAPSO performance compared with the best (DE) and the worst (FIPSO) individual algorithms for all functions in 5D and 20D.

Last experiment presents the overall effectiveness of the GAPSO performance on the whole set of 5D and 20D benchmark functions. Figure 5 presents the GAPSO results against the best (DE) and worst (FIPSO) performing algorithms. Results indicate that GAPSO has come out as a more effective approach, even though its adaptation has been performed during the optimization, and not beforehand.

Table 3. Aggregated results for 15 independent runs on 24 noiseless test functions from BBOB 2017 benchmark. Number of functions for which given algorithm yielded best results (in term of average number of function evaluations) is presented in best columns. Numbers in brackets show how many of results are statistically significantly better according to the rank-sum test when compared to all other algorithms of the table with \(p = 0.05\). Target reached is the number of trials that reached the final target: \(f_{opt}+10^{8}\).

Due to space limitations, Table 3 provides only aggregated resultsFootnote 3. GAPSO obtained best results (in terms of number of function evaluation) for 10 (5D) and 8 (20D) functions (out of 24), with 7 of those results being statistically significantly better than individual approaches. None of the other algorithms were statistically significantly better than GAPSO for any function. These results show that proposed algorithm not only adapted to reach results as good as the best individual particles’ types, but also has the ability to outperform them.

Furthermore, GAPSO stability other a different initial behavior probabilities vectors was examined. 7 types of vectors were considered: uniform (each behavior with the same probability), randomly generated vector and 5 vectors (one per each behavior) with probability equals 1 to one behavior and 0 for all other. Standard deviations obtained through all approaches on benchmark functions were not significantly different than standard deviations for each approach separately. For all above options just after about 100 generations (10 adaptation procedures) numbers of particles with particular behaviors were nearly the same. It shows, that the proposed method’s ability to gaining equilibrium - optimal behaviors (from the algorithm’s perspective) is independent of the initial state of behavior probabilities vector.

7 Conclusions and Future Work

The proposed generalized GPSO view on the Particle Swarm Optimization made it possible to introduce various types of predefined behaviors and neighborhood topologies within a single optimization algorithm. Including an adaptation scheme in GAPSO approach allowed to improve the overall performance over both DE individuals and PSO particles types on the test set of 24 quality functions. While the proposed approach remains inferior to algorithms such as CMA-ES [2], the adaptation scheme correctly promoted behaviors (particles) performing well on a given type of a function. It remains to be seen if other types of basic behaviors could be successfully brought into the GAPSO framework and compete with the state-of-the-art optimization algorithms.

Our future research activities shall concentrate on testing more types of particles and detailed analysis about their cooperation by observing interactions between different particles behaviors in each generation. It would be especially interesting to evaluate a performance of some quasi-Newton method, brought into the framework of GPSO, as it could utilize the already gathered samples of the quality (fitness) function. Furthermore, other adaptation and evaluation schemes can be considered and compared with proposed method.