Keywords

1 Introduction

The Particle Swarm Optimization (PSO) algorithm is a population-based meta-heuristic for binary and real-valued function optimization inspired by the swarm and social behavior of organisms in bird flocks and fish schools [4]. The optimization is performed by a swarm of candidate solutions, called particles, which move around the problem’s search space guided by mathematical rules that define their velocity and position at each time step. Each particle’s velocity vector is influenced by its best known position and by the best known positions of its neighbors. The neighborhood of each particle—and consequently the flow of information through the population—is defined a priori by the population topology.

The reason why the swarm is interconnected is the core of the algorithm: the particles communicate so that they acquire information on the regions explored by other particles. In fact, it has been claimed that the uniqueness of the PSO algorithm lies in the interactions of the particles [5]. As expected, the population topology deeply affects the balance between exploration and exploitation and the convergence speed and accuracy of the algorithm.

The population can be structured on any possible topology, from sparse to dense (or even fully connected) graphs, with different degrees of connectivity and clustering. The classical and most used population structures are the lbest (which connects the individuals to a local neighbourhood) and the gbest (in which each particle is connected to every other individual). These topologies are well-studied and the major conclusions are that gbest is fast but is frequently trapped in local optima, while lbest is slower but converges more often to the neighborhood of the global optima.

Since the first experiments on lbest and gbest structures, researchers have tried to design networks that hold the best traits given by each structure [9]. Some studies also try to understand what makes a good structure. For instance, Kennedy and Mendes [5] investigate several types of topologies and recommend the use of a lattice with von Neumann neighbourhood (which results in a connectivity degree between that of lbest and gbest).

Recently, dynamic structures have been introduced in PSO for improving the algorithm’s adaptability to different fitness landscapes and overcome the rigidity of static structures, like [7], for instance. Fernandes et al. [1] try a different approach and propose a dynamic and partially connected von Neumann structure with Brownian motion. In this paper, we use the same model but a strategy for the conservation of function evaluations [8] is introduced in order to make the most of the underlying structure and reduce convergence speed. A formal description of the dynamic network is given here, opening the way for more sophisticated dynamics.

In the proposed topology, n particles are placed in a 2-dimensional m-nodes grid where \( m > n \). Every time-step, each individual checks its von Neumann neighborhood and, as in the standard PSO, updates its velocity and position using the information given by the neighbours. However, while the connectivity degree of the von Neumann topology is k = 5 the degree of the proposed topology is variable in the range \( 1 \le k \le 5 \). Furthermore, the structure is dynamic: in each time-step, every particle updates its position on the grid (which is a different concept from the position of the particle on the fitness landscape) according to a pre-defined rule that selects the destination node. The movement rule, which is implemented locally and without any knowledge on the global state of the system, can be based on stigmergy [2] or Brownian motion.

As stated above, the connectivity degree k of each particle in each time-step is variable and lies in the range \( 1 \le k \le 5 \). Depending on the size of the grid, there may be particles with k = 1. These particles without neighbors (except the particle itself) do not learn from any local neighbourhood at that specific iteration. Therefore, it is expected that they continue to follow their previous trajectory in the fitness landscape. Taking into account these premises, the algorithm proposed in this study does not evaluate the position of the particles when k = 1. Regardless of the loss of informant intrinsic to a conservation of evaluations policy, we hypothesize that the strategy is particularly suited for the proposed dynamic topology (in which the particles are sometimes isolated from the flow of information) and the number of function evaluations required for meeting the stop criteria can be significantly reduced. Furthermore, it is the structure of the population and the position of the particles at a specific time-step that decides the application of the conservation rule and not any extra parameter or pre-defined decision rule.

A classical PSO experimental setup is used for the tests and the results demonstrate that the proposed algorithm consistently improves the speed of convergence of the standard von Neumann structure without degrading the quality of solutions. The experiments also demonstrate that the introduction of the conservation strategy reduces significantly the convergence speed without affecting the quality of the final solutions.

The remaining of the paper is organized as follows. Section 2 describes PSO and gives an overview on population structures for PSOs. Section 3 gives a formal description of the proposed structure. Section 4 describes the experiments and discusses the results and, finally, Sect. 5 concludes the paper and outlines future research.

2 Background Review

PSO is described by a simple set of equations that define the velocity and position of each particle. The position of the ith particle is given by \( \vec{X}_{i} = (x_{i,1} ,x_{i,2} , \ldots x_{1,D} ), \) where D is the dimension of the search space. The velocity is given by \( \vec{V}_{i} = (v_{i,1} ,v_{i,2} , \ldots v_{1,D} ). \) The particles are evaluated with a fitness function \( f(\vec{X}_{i} ) \) and then their positions and velocities are updated by:

$$ v_{i,d} (t) = v_{i,d} (t - 1) + c_{1} r_{1} \left( {p_{i,d} - x_{i,d} (t - 1)} \right) + c_{2} r_{2} \left( {p_{g,d} - x_{i,d} (t - 1)} \right) $$
(1)
$$ x_{i,d} (t) = x_{i,d} (t - 1) + v_{i,d} (t) $$
(2)

were p i is the best solution found so far by particle i and p g is the best solution found so far by the neighborhood. Parameters r1 and r2 are random numbers uniformly distributed in the range [0, 1] and c1 and c2 are acceleration coefficients that tune the relative influence of each term of the formula. The first term is known as the cognitive part, since it relies on the particle’s own experience. The last term is the social part, since it describes the influence of the community in the trajectory of the particle.

In order to prevent particles from stepping out of the limits of the search space, the positions \( x_{i,d} (t) \) are limited by constants that, in general, correspond to the domain of the problem: \( x_{i,d} (t) \in [ - Xmax,Xmax] \). Velocity may also be limited within a range in order to prevent the explosion of the velocity vector: \( v_{i,d} (t) \in [ - Vmax,Vmax] \).

For achieving a better balancing between local and global search, Shi an Eberhart [12] added the inertia weight \( \omega \) as a multiplying factor of the first term of Eq. 1. This paper uses PSOs with inertia weight.

The neighbourhood of the particle defines the value of p g and is a key factor in the performance of PSO. Most of the PSOs use one of two simple sociometric principles for defining the neighbourhood network. One connects all the members of the swarm to one another, and it is called gbest, were g stands for global. The degree of connectivity of gbest is k = n, where n is the number of particles. Since all the particles are connected to every other and information spreads easily through the network, the gbest topology is known to converge fast but unreliably (it often converges to local optima).

The other standard configuration, called lbest (where l stands for local), creates a neighbourhood that comprises the particle itself and its k nearest neighbors. The most common lbest topology is the ring structure, in which the particles are arranged in a ring structure (resulting in a degree of connectivity k = 3, including the particle). The lbest converges slower than the gbest structure because information spreads slower through the network but for the same reason it is less prone to converge prematurely to local optima. In-between the ring structure with k = 3 and the gbest with k = n there are several types of structure, each one with its advantages on a certain type of fitness landscapes. Choosing a proper structure depends on the target problem and also on the objectives or tolerance of the optimization process.

Kennedy and Mendes [5] published an exhaustive study on population structures for PSOs. They tested several types of structures, including the lbest, gbest and von Neumann configuration with radius 1 (also kown as L5 neighborhood). They also tested populations arranged in randomly generated graphs. The authors conclude that when the configurations are ranked by the performance the structures with k = 5 (like the L5) perform better, but when ranked according to the number of iterations needed to meet the criteria, configurations with higher degree of connectivity perform better. These results are consistent with the premise that low connectivity favors robustness, while higher connectivity favors convergence speed (at the expense of reliability). Amongst the large set of graphs tested in [5], the Von Neumann with radius 1 configuration performed more consistently and the authors recommend its use.

Alternative topologies that combine standard structures’ characteristics or introduce some kind of dynamics in the connections have been also proposed. Parsopoulos and Vrahatis [9] describe the unified PSO (UPSO), which combines the gbest and lbest configurations. Equation 1 is modified in order to include a term with p g and a term with pi and a parameter balances the weight of each term. The authors argue that the proposed scheme exploits the good properties of gbest and lbest.

Peram et al. [10] proposed the fitness-distance-ratio-based PSO (FDR-PSO), which defines the neighbourhood of a particle as its \( k \) closest particles in the population (measured by the Euclidean distance). A selective scheme is also included: the particle selects nearby particles that have also visited a position of higher fitness. The algorithm is compared to a standard PSO and the authors claim that FDR-PSO performs better on several test functions. However, the FDR-PSO is compared only to a gbest configuration, which is known to converge frequently to local optima in the majority of the functions of the test set.

More recently, a comprehensive-learning PSO (CLPSO) was proposed [7]. Its learning strategy abandons the global best information and introduces a complex and dynamic scheme that uses all other particles’ past best information. CLPSO can significantly improve the performance of the original PSO on multimodal problems. Finally, Hseigh et al. [3] use a PSO with varying swarm size and solution-sharing that, like in [7], uses the past best information from every particle.

A different approach is given in 1. The authors describe a structure that is based on a grid of m nodes (with \( m > n \)) on which the particles move and interact. In this structure, a particle, at a given time-step, may have no neighbours except itself. The isolated particles will continue to follow its previous trajectory, based on their current information, until they find another particle in the neighbourhood. Therefore, we intend to investigate if the loss of information caused by not evaluating these particles is overcome by the payoff in the convergence speed.

Common ways of addressing the computational cost of evaluating solutions in hard real-world problems are function approximation [6], fitness inheritance [11] and conservation of evaluations [8]. Due to the underlying structure of the proposed algorithm, we have tested a conservation policy similar to the GREEN-PSO proposed by Majercik [8]. However, in our algorithm the decision on evaluating or not is defined by the position of the particle in the grid (isolated particles are not evaluated) while in the GREEN-PSO the decision is probabilistic and the likelihood of conserving a solution is controlled by a parameter.

The following section gives a formal description of the proposed network and presents the transition rules that define the model for dynamic population structures.

3 Partially Connected Structures

Let us consider a rectangular grid G of size \( q \times s \ge \mu \), where \( \mu \) is the size of the population of any population-based metaheuristics or model. Each node G uv of the grid is a tuple \( \left\langle {\eta_{uv} ,\zeta_{u,v} } \right\rangle \), where \( \eta_{uv} \in \{ 1, \ldots ,\mu \} \cup \{ \bullet \} \) and \( \zeta_{uv} \in (D \times {\mathbb{N}}) \cup \{ \bullet \} \) for some domain D. The value \( \eta_{uv} \) indicates the index of the individual that occupies the position \( \left\langle {u,v} \right\rangle \) in the grid. If \( \eta_{uv} = \bullet \) then the corresponding position is empty. However, that same position may still have information, namely a mark (or clue) \( \zeta_{uv} \). If \( \zeta_{uv} = \bullet \) then the position is empty and unmarked. Please note that when \( q\,\times\,s\,=\,\mu \), the topology is a static 2-dimensional lattice and when \( q \times s = \mu \) and q = s the topology is the standard square grid graph.

In the case of a PSO, the marks are placed by particles that occupied that position in the past and they consist of information about those particles, like their fitness \( \zeta_{uv}^{f} \) or position in the fitness landscape, as well as a time stamp \( \zeta_{uv}^{t} \) that indicates the iteration in which the mark was placed. The marks have a lifespan of K iterations, after which they are deleted.

Initially, \( G_{uv} = ( \bullet , \bullet ) \) for all \( \left\langle {u,v} \right\rangle \). Then, the particles are placed randomly on the grid (only one particle per node). Afterwards, all particles are subject to a movement phase (or grid position update), followed by a PSO phase. The process (position update and PSO phase) repeats until a stop criterion is met.

The PSO phase is the standard iteration of a PSO, comprising position and velocity update. The only difference to a static structure is that in this case a particle may find empty nodes in its neighbourhood.

In the position update phase, each individual moves to an adjacent empty node. Adjacency is defined by the Moore neighborhood of radius \( r \), so an individual i at \( \rho_{g} (i) = \left\langle {u,v} \right\rangle \) can move to an empty node \( \left\langle {u^{\prime } ,v^{\prime } } \right\rangle \) for which \( L_{\infty } \left( {\left\langle {u,v} \right\rangle \left\langle {u^{\prime } ,v^{\prime } } \right\rangle } \right) \le r \). If empty positions are unavailable, the individual stays in the same node. Otherwise, it picks a neighboring empty node according to the marks on them. If there are no marks, the destination is chosen randomly amongst the free nodes.

With this framework, there are two possibilities for the position update phase: stimergic, whereby the individual looks for a mark that is similar to itself; and Brownian, whereby the individual selects an empty neighbor regardless of the marks. For the first option, let \( {\mathbb{N}}\left\langle {u,v} \right\rangle = \left\{ {\left\langle {u^{(1)} ,v^{(1)} } \right\rangle } \right., \ldots \left. {\left\langle {u^{w} ,v^{w} } \right\rangle } \right\} \) be the collection of empty neighboring nodes and let i be the individual to move. Then, the individual attempts to move to a node whose mark is as close as possible to its own corresponding trait (fitness or position in the fitness landscape, for instance) or to an adjacent cell picked at random if there are no marks in the neighborhood. In the alternative Brownian policy, the individual moves to an adjacent empty position picked at random. In either case, the process is repeated for the whole population.

For this paper, the investigation is restricted to the Brownian structure. The algorithm is referred in the remaining of the paper has PSO-B, followed by the grid size \( q \times s \). An extension of the PSO-B is constructed by introducing a conservation of function evaluations (cfe) strategy. If at a given time-step a particle has no neighbors, then the particle is updated but its position is not evaluated. This version of the algorithm is referred to as PSO-Bcfe. The following section describes the results attained by the PSOs with dynamic structure and Brownian movement, with and without conservation of function evaluations and compares them to the standard topology.

4 Experiments and Results

An experimental setup was constructed with eight benchmark unimodal and multimodal functions that are commonly used for investigating the performance of PSO. The functions are described in Table 1. The dimension of the search space is set to D = 30 (except Schaffer, with D = 2). In order to obtain a square grid graph for the standard von Neumann topology, the population size n is set to 49 (which is within the typical range of PSO’s swarm size). The acceleration coefficients were set to 1.494 and the inertia weight is 0.729, as in [13]. Xmax is defined as usual by the domain’s upper limit and Vmax = Xmax. A total of 50 runs for each experiment are conducted. Asymmetrical initialization is used (the initialization range for each function is given in Table 1).

Table 1 Benchmarks for the experiments

Two experiments were conducted. Firstly, the algorithms were run for a limited amount of function evaluations (147000 for f1 and f5, 49000 for the remaining) and the fitness of the best solution found was averaged over 50 runs. In the second experiment the algorithms were run for 980000 evaluations (corresponding to 20000 iterations of standard PSO with n = 49) or until reaching a stop criterion. For each function and each algorithm, the evaluations required to meet the criterion was recorded and averaged over the 50 runs. A success measure is defined as the number of runs in which an algorithm attains the fitness value established as the stop criterion.

Tables 1 and 2 compare PSO-B with the standard PSO (with von Neumann topology): Table 1 gives the averaged best fitness found by the swarms while Table 2 gives, for each algorithm and each function, the averaged number of iterations required to meet the criterion, and the number of runs in which the criterion was met.

Table 2 Standard PSO with von Neumann topology and Brownian PSO with 10 × 10 grid

The best fitness values are similar in both configurations. In fact, the differences are not statistical significant except for function f 1, for which PSO-Bcfe significantly better than PSO. (For the statistical tests comparing two algorithms, non-parametric Kolmogorov-Smirnov tests (with 0.05 level of significance) have been used.) As for the convergence speed, PSO-B is faster in every test function. The results are significantly different in f1, f2, f3, f5, f6 and f8. PSO-B and the standard PSO attain similar fitness values, but PSO-B is faster.

The main hypothesis of this paper is that a conservation of evaluations strategy further improves the convergence speed of the dynamic topology. Moreover, we also expect that PSO-Bcfe performance is less affected when the size of the grid is increased. Large grid sizes result in large rates of isolated particles, deprived from social information, which reduces the convergence speed of the algorithm. By not evaluating these particles, the computational effort can be significantly reduced, hopefully without degrading the overall performance. In order to investigate these hypotheses, we have compared PSO-B and PSO-Bcfe, while varying the size of the grid (Table 3).

Table 3 Standard PSO with von Neumann topology and Brownian PSO (10 × 10)

Table 4 shows the average fitness values attained by PSO-B and PSO-Bcfe with different grid sizes. Table 5 displays the average number of function evaluations required to meet the stop criteria as well as the number of successful runs. The performance according to the fitness values is very similar, with no significant differences between the algorithm in every function except f1 (in which PSO-Bcfe is significantly better). When considering the number of function evaluations (i.e., the convergence speed), PSO-Bcfe is significantly better or statistically equivalent in every function.

Table 4 PSO-B and PSO-Bcfe
Table 5 PSO-B and PSO-Bcfe

The results confirm that PSO-Bcfe is able to improve the convergence speed of PSO-B without degrading the accuracy of the solutions. The loss of information that results from conserving evaluations is clearly overcome by the benefits of reducing the computational cost per iteration.

In the case of f1, PSO-Bcfe also significantly improves the quality of the solutions, namely with larger grids. The proposed scheme seems to be particularly efficient in unimodal landscapes, but further tests are required in order to confirm this hypothesis and understand what mechanisms make PSO-Bcfe so efficient in finding more precise solutions for the sphere function.

The differences in the convergence speed of the algorithm are particularly noticeable when the grid is larger. While PSO-B’s speed tends to decrease when the grid size increases, the behavior of PSO-Bcfe, is much more stable, and in some functions it is even faster when the grid is expanded.

Figure 1 graphically depicts the above referred observations. When the grid size grows from 8 × 8 to 20 × 20, PSO-B’s convergence speed degrades consistently, except in function f5, where the behavior is more irregular. PSO-Bcfe, on the other hand, is sometimes faster with larger grids. When its convergence speed decreases with size (f8, for instance), it scales better than PSO-B.

Fig. 1
figure 1

PSO-B and PSO-Bcfe. Function evaluations required to meet stop criteria when using grids with different sizes

5 Conclusions

This paper proposes a general scheme for structuring dynamic populations for the Particle Swarm Optimization (PSO) algorithm. The particles are placed on a grid of nodes where the number of nodes is larger than the swarm size. The particles move on the grid according to simple rules and the network of information is defined by the particle’s position on the grid and its neighborhood (von Neumann vicinity is considered here). If isolated (i.e., no neighbors except itself), the particle is updated but its position is not evaluated. This strategy may result in some loss of information, but the results show that the payoff in convergence speed overcomes the loss of information: the convergence speed is increased in the entire test set, while the accuracy of the algorithm (i.e., the averaged final fitness) is not degraded by the conservation of evaluations strategy.

The proposed algorithm is tested with a Brownian motion rule and compared to the standard static topology. The conservation of evaluations strategy results in a more stable performance when varying the grid size. Removing the strategy from the proposed dynamic structure results in a drop of the convergence speed when the size of the grid increases in relation to the swarm size.

The present study is restricted to dynamic structures based on particles with Brownian motion. Future research will be focused on dynamic structures with stigmergic behavior based on the fitness and position of the particles.