1 Introduction

Population-based optimization methods, such as evolutionary algorithms (EAs) and swarm intelligence (SI) algorithms, are widely used to solve optimization problems and have attracted continually increasing attention in recent years. EAs such as genetic algorithms (GAs) (Reeves and Rowe 2003), evolutionary programming (EP) (Yao et al. 1999), evolution strategies (ES) (Schwefel 1995), and genetic programming (GP) (Koza 1992), are inspired by natural selection. Swarm intelligence (SI) algorithms, such as particle swarm optimization (PSO) (Kennedy and Eberhart 1995) and ant colony optimization (ACO) (Dorigo and Gambardella 1997), are inspired by the collective behavior observed in biological systems. SI algorithms are a class of search methods that are based on the learning process of a collection of individuals, called candidate solutions. Each individual in an SI algorithm performs a set of operations that may include random search, positive or negative feedback, and interactions with other individuals. These operations are typically very simple, but their combination can produce surprising results.

In recent years, many new SI algorithms, including the shuffled frog leaping algorithm (SFLA) (Eusuff and Lansey 2003), the group search optimizer (GSO) (He et al. 2006), the firefly algorithm (FA) (Yang 2009), artificial bee colony algorithm (ABC) (Karaboga and Basturk 2007), the gravitational search algorithm (GSA) (Rashedi et al. 2009), and others, have been proposed to solve complex optimization problems. Each SI algorithm is motivated by a different biological process. SFLA is motivated by the leaping behavior of frogs, GSO is motivated by the food foraging behavior of animals, FA is motivated by the mutual attraction of fireflies, ABC is motivated by the foraging behavior of bees, and GSA is motivated by the law of gravity.

But all of these SI algorithms have certain features in common, including the sharing of information between individuals to improve fitness. This behavior makes SI algorithms applicable to many different types of optimization problems. SI algorithms have been applied to many optimization problems and have been shown to be effective for unimodal, multimodal, and deceptive optimization, dynamic optimization, constrained optimization, multi-objective optimization, and noisy optimization (Simon 2013). So SI algorithms are becoming increasingly popular optimization tools.

The goal of this paper is to show the similarities and differences between PSO and various new SI algorithms, including SFLA, GSO, FA, ABC, and GSA, in both a notional and experimental way. Because PSO and newer SI algorithms have similarities due to their biological motivation, it is not too surprising that the algorithms are equivalent under special conditions. In this paper, we provide general descriptions of these algorithms and provide conceptual and numerical comparisons.

There have been many papers that compare various SI algorithms, including comparisons of PSO and SFLA (Elbeltagi et al. 2005); PSO and GSA (Davarynejad et al. 2014); PSO and FA (Yang 2011; Parpinelli and Lopes 2011); PSO, BFO, and ABC (Parpinelli et al. 2012); cuckoo search, PSO, DE, and ABC (Civicioglu and Besdok 2013); and others (Zang et al. 2010). Those papers focus on performance differences on benchmarks or applications. In this paper we provide a more extensive comparison by studying similarities and differences between the algorithms, by studying more recently SI algorithms, and including a larger and more recent set of benchmarks. The benchmarks in this paper are the continuous functions from the 2013 IEEE Congress on Evolutionary Computation (Liao and Stuetzle 2013), and a set of classical combinatorial knapsack problems (Abulkalamazad et al. 2014; Bhattacharjee and Sarmah 2014; Freville 2004).

There are many other SI algorithms that we could study, including bat algorithm (BA) (Hasançebi and Carbas 2014), Cuckoo search (CS) (Cobos et al. 2014), fireworks algorithm (FWA) (Tan and Zhu 2010), and teaching-learning-based optimization (TLBO) (Rao et al. 2011). Because of space constraints we restrict our comparison to PSO, SFLA, GSO, FA, ABC, and GSA, and we defer the comparison of other SI algorithms for future work. The SI algorithms in this paper comprise a representative set, not a complete set.

The rest of this paper is organized as follows. Section 2 gives an overview of PSO and the five newer SI algorithms, analyzes their similarities, differences, and unique characteristics. Section 3 presents performance comparisons of the basic and advanced versions of PSO with basic and advanced versions of SFLA, GSO, FA, ABC, and GSA on continuous benchmark functions and combinatorial knapsack problems. Section 4 gives conclusions and directions for future research.

2 Comparisons between swarm intelligence algorithms

This section introduces the basic PSO algorithm and five newer SI algorithms, including SFLA, GSO, FA, ABC, and GSA, and then conceptually analyzes their similarities (Sect. 2.1). This section also compares their biological motivations and algorithmic details (Sect. 2.2).

2.1 Similarities between swarm intelligence algorithms

2.1.1 Particle swarm optimization

PSO, introduced by Kennedy and Eberhart (1995), was motivated by the swarming behavior of birds. PSO consists of a swarm of particles moving through an n-dimensional search space of an optimization problem. Every particle has a position vector \(x_{k}\) that encodes a candidate solution to the optimization problem, and a velocity vector \(v_{k}\) that characterizes its velocity in the search space. Each particle remembers its own previously found best position \(P_{best}\), and the global best position \(G_{best}\) of the entire swarm in the current iteration. Information from good solutions spreads throughout the swarm, which encourages the particles to move to better areas of the search space. Each iteration, the particles’ velocities are updated based on their current velocities, their previous best positions, and the global best position of the current iteration:

$$\begin{aligned} v_k ( s)\leftarrow & {} wv_k ( s)+U( {0,\phi _1 })( {P_{best} ( s)-x_k ( s)}) \nonumber \\&+\,U( {0,\phi _2 })( {G_{best} ( s)-x_k ( s)}), \end{aligned}$$
(1)

where \(v_{k}(s)\) is the sth dimension of the velocity of the kth particle, w is the inertia weight which determines the contribution of the current velocity to the new velocity, \(U( {a,\;b})\) is a uniformly distributed random number between a and b, and cognitive constant \(\phi _1 \) and social constant \(\phi _2 \) determine the importance of \(P_{best} ( s)\) and \(G_{best} ( s)\), respectively, for updating velocity. The position of the particle is updated as follows:

$$\begin{aligned} x_k ( s)\leftarrow x_k ( s)+v_k ( s). \end{aligned}$$
(2)

An outline of the basic PSO algorithm is given in Algorithm I.

figure a
figure b

2.1.2 Shuffled frog leaping algorithm

SFLA was introduced by Eusuff and Lansey (2003), and some variations and applications of SFLA are discussed in Wang and Fang (2011), Rahimi-Vahed and Mirzaei (2008), and Sarkheyli et al. (2015). SFLA is inspired by the foraging behavior of frogs, and consists of a swarm of frogs leaping in the n-dimensional search space of an optimization problem. In SFLA, every frog encodes a candidate solution, and the N frogs are divided into m sub-populations, also called memeplexes. Each sub-population is considered a separate culture, and each sub-population performs a local search algorithm. At the beginning of each generation, the population is shuffled so that each frog is randomly assigned to a new sub-population.

During local search, only the worst frog \(x_w \) in each sub-population is updated:

$$\begin{aligned} x_w ( s)\leftarrow x_w ( s)+r\cdot ( {x_b ( s)-x_w ( s)}), \end{aligned}$$
(3)

where s is the index of the problem dimension, \(x_b \) is the best frog in the sub-population, and \(r\in \left[ {0,\;1} \right] \) is a uniformly distributed random number. If (3) does not improve \(x_w \), it is updated again as follows:

$$\begin{aligned} x_w ( s)\leftarrow x_w ( s)+r\cdot ( {x_g ( s)-x_w ( s)}) \end{aligned}$$
(4)

for \(s\in \left[ {1,\;n} \right] \), where \(x_g \) is the global best frog from all m sub-populations, and \(r\in \left[ {0,\;1} \right] \) is a random number that is calculated separate for each s. If (4) does not improve \(x_w \), then \(x_w \) is replaced by a randomly generated frog. A description of SFLA is given in Algorithm II.

Suppose that the PSO logic of Algorithm I only updates the global worst solution \(x_w \) instead of each solution \(x_k \). Further suppose that the velocity \(v_k \) is randomly generated, and the inertia weight is \(w=1\). Algorithm I then becomes the modified PSO of Algorithm III.

Now suppose that in the SFLA logic of Algorithm II, the population is not divided into sub-populations; that is, \(m=1\). Further suppose the best sub-population solution \(x_b \) is set to the previous best solution of the individual, and the worst sub-population solution \(x_w \) is set to the current worst solution of the entire population. Finally, suppose that \(i_{\max } =1\). Then the SFLA logic of Algorithm II becomes the modified SFLA logic of Algorithm IV, which is equivalent to the modified PSO of Algorithm III. So SFLA with special tuning parameters can be viewed as a type of PSO algorithm, and it follows that these two algorithms perform identically under these conditions.

figure c
figure d

2.1.3 Group search optimization

GSO was introduced by He et al. (2006), and some variations and applications of GSO are discussed in Shen et al. (2009), Chen et al. (2012), Wang et al. (2012) and Zare et al. (2012). GSO is inspired by the food foraging behavior of land animals, and consists of a swarm of animals foraging in the n-dimensional search space of an optimization problem. Some animals, called producers, focus their efforts on searching for food. Other animals, called joiners or scroungers, focus their efforts on following other animals and exploiting their food-finding success. The third type of animals, called rangers, performs a random walk in their search for resources. In GSO, all animals encode candidate solutions, and the producers are typically the best. Each generation, the producer scans three points in his immediate surroundings for a better function cost value than his current location in search space. This corresponds to local search. The producer is denoted as \(x_p \), and the three scanned points \(x_z \), \(x_r \), and \(x_l \) are given by:

$$\begin{aligned}&x_z ( s)=x_p ( s)+r_1 \cdot l_{\max } \cdot D( {\phi _p ( s)}) \nonumber \\&x_r ( s)=x_p ( s)+r_1 \cdot l_{\max } \cdot D( {\phi _p ( s)+{r_2 \theta _{\max } }/ 2}) \\&x_l ( s)=x_p ( s)+r_1 \cdot l_{\max } \cdot D( {\phi _p ( s)-{r_2 \theta _{\max } }/ 2})\nonumber \end{aligned}$$
(5)

for \(s\in \left[ {1,\;n} \right] \), where \(r_1 \) is a zero-mean, unity-variance, normally distributed random number; \(r_2 \in \left[ {0,\;1} \right] \) is a uniformly distributed random number; \(\phi _p \) is the heading angle of \(x_p \); \(l_{\max } \) is a tuning parameter that defines how far the producer can see; \(\theta _{\max } \) is a tuning parameter that defines how far the producer can turn his head; and \(D( \cdot )\) is a polar-to-Cartesian coordinate transformation function (He et al. 2006).

Scroungers generally move toward the producer. But they do not move directly toward the producer; instead they move in a sort of zig-zag pattern, which allows them to search for lower cost function values while they move. The movement of a scrounger \(x_i \) is modeled as

$$\begin{aligned} x_i ( s)\leftarrow x_i ( s)+r\cdot ( {x_p ( s)-x_i ( s)}), \end{aligned}$$
(6)

where \(r\in \left[ {0,\;1} \right] \) is a uniformly distributed random number.

Rangers randomly travel through the search space looking for areas with low cost functions. The movement of a ranger \(x_i \) is modeled as

$$\begin{aligned}&\phi _i ( s)\leftarrow \phi _i ( s)+\rho \cdot \alpha _{\max } \nonumber \\&x_i ( s)\leftarrow x_i ( s)+\alpha _{\max } \cdot l{ }_{\max }\cdot r_1 \cdot D( {\phi _i ( s)}), \end{aligned}$$
(7)

where \(\alpha _{\max } \) is a tuning parameter that defines how far a ranger can turn his head; \(\rho \in \left[ {-1,\;1} \right] \) is a uniformly distributed random number; and \(l{ }_{\max }\) is a tuning parameter that is related to the maximum distance that a ranger can travel in one generation, and is the same as \(l{ }_{\max }\) in (5).

figure e

A description of GSO is given in Algorithm V. Note that algorithm V specifies that one solution is a producer, about 80 % of the solutions are scroungers, and about 20 % of the solutions are rangers.

Suppose that in the PSO logic of Algorithm I, the current global best solution \(G_{best} \) is first updated by the sum of the previous solution and the new velocity \(r_1 \cdot v_{best} \), where \(v_{best} \) is the velocity of \(G_{best} \). Then the other solutions are divided into two sub-populations. One is about 80 % of the solutions, for which the constant \(\phi _1 \) that determines the significance of \(P_{best} ( s)\) is set to 0, and which are updated independently of their previous velocities; that is, the inertia weight \(w=0\). The other sub-population is about 20 % of the solutions, and the constant \(\phi _1 \) which determines the significance of \(P_{best} ( s)\) and the constant \(\phi _2 \) which determines the significance of \(G_{best} ( s)\) are both set to 0. In this case, Algorithm I becomes the modified PSO logic of Algorithm VI.

Now suppose that in the GSO logic of Algorithm V, the three children solutions \(x_z \), \(x_r \), and \(x_l \) perform the same search strategy as that of \(x_z \), directly use the heading angle \(\phi _p \) instead of the coordinate transformation function \(D( \cdot )\), and use the tuning parameter \(l_{\max } =1\) and the parameter \(\rho =0\). The GSO logic of Algorithm V becomes the modified GSO logic of Algorithm VII, which is equivalent to the modified PSO logic of Algorithm VI, where the heading angle \(\phi \) is treated as the velocity v and the quality \(\alpha _{\max } r_1 \) is treated as the inertia weight w. So GSO can be viewed as a variation of PSO under special conditions, and it follows that these two algorithms perform identically under these conditions.

figure f
figure g

2.1.4 Firefly algorithm

FA was introduced by Yang (2009), and some variations and applications of FA are discussed in Fister et al. (2013). FA is inspired by the mutual attraction of fireflies, which is based on perceived brightness, which decreases exponentially with distance. A firefly is attracted only to those fireflies that are brighter than itself. In FA, every firefly encodes a candidate solution, and the population consists of N fireflies. Each firefly \(x_i \) compares its brightness with every other firefly \(x_j \), one at a time. If \(x_j \) is brighter than \(x_i \), then \(x_i \) will move through the search space in a direction that includes both a component that is random, and a component that is directed toward \(x_j \). The random component is denoted by the quantity \(\alpha v_k \), which is usually relatively small due to the small value of \(\alpha \). The directed component is denoted by the quantity \(\beta _0 \mathrm{e}^{-\gamma _i r_{ij}^2 }( {x_j -x_i })\), and its magnitude is an exponential function of the distance \(r_{ij} \) between \(x_j \) and \(x_i \). Typical tuning parameters of FA are as follows:

$$\begin{aligned}&\gamma _i =\frac{\gamma _0 }{\max _j \left\| {x_i -x_j } \right\| _2 },\quad \text {where } \ \gamma _0 =0.8 \nonumber \\&\alpha =0.01 \\&\beta _0 =1\nonumber \end{aligned}$$
(8)

A description of FA is given in Algorithm VIII, where l and u are the lower and upper bounds of the search space, respectively. One thing that we notice from Algorithm VIII is that the best candidate solution in the population is never updated. We might be able to improve the algorithm’s performance if we periodically update the best candidate solution.

figure h
figure i

Suppose that in the PSO logic of Algorithm I, the best candidate solution in the population is never updated, and each particle’s velocity is independent of its previous velocity; that is, the inertia weight \(w=0\). Further suppose that half of the solutions are determined solely by the current global best solution \(G_{best} \), and the other half of the solutions are determined solely by their previous best solution \(P_{best} \). In this case, the PSO logic of Algorithm I becomes the modified PSO logic of Algorithm IX.

Now suppose that in the FA of Algorithm VIII, the lower bound l and the upper bound u of the search space are replaced with the current global best solution \(G_{best} \) and the previous individual best solution \(P_{best} \). Further suppose the parameter \(\gamma \rightarrow \infty \), which denotes that fireflies are not attracted to each other, and which corresponds to random flight and random search. In this case, the FA of Algorithm VIII is equivalent to the modified PSO of Algorithm IX. So FA can be viewed as a variation of PSO under special conditions, and it follows that these two algorithms perform identically under these special conditions.

2.1.5 Gravitational search algorithm

GSA was introduced by Rashedi et al. (2009), and some variations and applications of GSA are discussed in Dowlatshahi et al. (2014), Jiang et al. (2014) and Rashedi et al. (2011). GSA is inspired by the law of gravity; each particle attracts every other particle due to gravity. The gravitational force between particles is proportional to the product of their masses, and inversely proportional to the square of the distance between them. In GSA, each particle has four characteristics: position, inertial mass, active gravitational mass, and passive gravitational mass. The position of each particle encodes a candidate solution of the optimization problem, and its gravitational and inertial masses are determined by the fitness function. All particles attract each other due to gravity, which causes a global movement toward the particles with heavier masses (better fitness values). Particles thus cooperate with each other using a direct form of communication through gravity. Heavy particles (that is, more fit particles) move more slowly than lighter ones. In other words, the algorithm is tuned by adjusting the gravitational and inertial masses. A description of GSA is given in Algorithm X, where g is the gravitational constant, M is the normalized fitness, R and F are the distance and force between particles, respectively, a and v are the acceleration and velocity, respectively, r and \(r_i \) are uniformly distributed random numbers, t and \(t_{\max } \) are the generation number and the generation limit, respectively, and \(\varepsilon \) is a small positive constant.

figure j

In the GSA of Algorithm X, the gradual reduction of the gravitational constant g reduces the exploration component of the algorithm as time progresses. The fitness values are normalized so that the worst particle has gravitational mass \(m_i =0\) and the best particle has gravitational mass \(m_i =1\). These masses are normalized to \(\left\{ {M_i } \right\} \) values that sum to 1. For each pair of particles, the attractive force is calculated with a magnitude that is proportional to their masses and inversely proportional to the distances between them. A random combination of the forces results in the acceleration vector of each particle. The acceleration is used to update the velocity and position of each particle. A more compact representation of Algorithm X, in which distance, force, acceleration, and position are combined, is shown in Algorithm XI.

figure k

Suppose that in the PSO logic of Algorithm I the fitness values of all solutions are normalized, and the constant \(\phi _1 \) which determines the significance of \(P_{best} \) is set to 0. Further suppose that the update term of the current global best position includes the coefficient \({M_{best} }/ {( {\left\| {G_{best} ( s)-x_k ( s)} \right\| _2 +\varepsilon })}\), where \(M_{best} \) is the normalized fitness of \(G_{best} \), and \(\varepsilon \) is a small positive constant to prevent division by zero. In this case, the PSO logic of Algorithm I becomes the modified PSO logic of Algorithm XII.

Now suppose that the GSA of Algorithm XI calculates acceleration based only on the current global best solution \(G_{best} \), instead of using all of the solutions in the entire population. The GSA of Algorithm XI then becomes the modified GSA of Algorithm XIII. We find that if in Algorithm XII the inertia weight w is set to a random number, and if in Algorithm XIII the gravitational constant g is set to 1, then the modified GSA is equivalent to the modified PSO. So GSA can be viewed as a variation of PSO under special conditions, and it follows that these two algorithms perform identically under these special conditions.

figure l
figure m

2.1.6 Artificial bee colony algorithm

ABC was introduced by Karaboga (2005) and some variations and applications of ABC are discussed in Dervis et al. (2014) and Dervis and Bahriye (2007). ABC is inspired by the behavior of bees as they search for an optimal food source. The location of a food source is analogous to a location in the search space of an optimization problem. The amount of nectar at a location is analogous to the fitness of a candidate solution. In ABC, there are three different types of bees: forager bees, onlooker bees, and scout bees. First, forager bees travel back and forth between a food source and their hive. Each forager is associated with a specific location, and remembers that location as it travels back and forth between the hive. When a forager takes its nectar to the hive, it returns to its food source, but it also engages in local exploration as it searches in the nearby vicinity for a better source. Second, onlooker bees are not associated with any particular food source, but they observe the amount of nectar that is returned by the foragers (that is, the fitness of each forager’s location in search space), and use that information to decide where to search for nectar. The onlookers’ search location is decided probabilistically based on their observations of the foragers. Third, scout bees are explorers, and are also not associated with any particular food source. If a scout sees that a forager has stagnated and is not progressively increasing the amount of nectar that it returns to the hive, the scout randomly searches for a new nectar source in the search space. Stagnation is indicated when the explorer fails to increase the amount of nectar it brings to the hive after a certain number of trips.

A description of ABC is given in Algorithm XIV, where \(P_f \) is the forager population size and \(P_0 \) is the onlooker population size. Note that the total population size is \(N=P_0 +P_f \). L is a positive integer, which denotes the stagnation limit, \(r\in \left[ {-1,\;1} \right] \) is a uniformly distributed random number, \(T( \cdot )\) is a forager trial counter that keeps track of the number of consecutive unsuccessful modifications of each forager.

figure n

Suppose that in the PSO logic of Algorithm I, the solutions are divided into two sub-populations. For one sub-population, the constant \(\phi _1 \) that determines the significance of \(P_{best} ( s)\) is set to 0, and which are updated independently of their previous velocities; that is, the inertia weight is \(w=0\). For another sub-population, the constant \(\phi _2 \) which determines the significance of \(G_{best} ( s)\) are set to 0, the inertia weight is \(w=0\), and the current solution \(x_k \) is replaced by the current global best solution \(G_{best} \). In addition, randomly selected one dimension in candidate solution is updated instead of all dimensions in each iteration. In this case, Algorithm I becomes the modified PSO logic of Algorithm XV.

Now suppose that in the ABC method of Algorithm XIV we do not set the forager trial counters; that is, the performance of scout bees is not considered. The ABC method of Algorithm XIV then becomes the modified ABC method of Algorithm XVI. We find that if in the first sub-population in Algorithm XV the current global best solution \(G_{best} \) is set to a random number, and in the second sub-population the current global best solution \(G_{best} \) is chosen using roulette-wheel selection, and the particle’s previous best solution \(P_{best} \) is set to a random number, then the modified ABC method is equivalent to the modified PSO algorithm. So ABC can be viewed as a variation of PSO under special conditions, and it follows that these two algorithms perform identically under these special conditions.

figure o
figure p

2.1.7 Summary of swarm intelligence algorithm comparisons

It has seen in the above sections that several new SI algorithms, including SFLA, GSO, FA, ABC and GSA, are conceptually similar to PSO. Under special conditions, these algorithms are equivalent to PSO variations. All of these algorithms have certain features in common and can be viewed as variations on the same themes, including inertia, influence by society, and influence by neighbors. Since they have so many similarities, it is easy to see why they have similar performance in many real-world optimization problems. We note that there are many other new SI algorithms, including glowworm swarm optimization (GSO) (Krishnanand and Ghose 2009), grey wolf optimizer (GWO) (Mirjalili and Lewis 2014), and others, but the study of their similarities and differences is deferred for future research.

2.2 Conceptual differences between swarm intelligence algorithms

The identical functionality of different SI algorithms discussed above occurs only under special conditions. Each SI algorithm still has its own particular features and operations that can give it flexibility that other SI algorithms may not have. In this subsection, we point out some differences between various SI algorithms.

2.2.1 Biological motivation and its effect on future research

Differences between SI algorithms result from their unique biological motivations. PSO is based on the swarming behavior of birds, SFLA is based on the leaping behavior of frogs, GSO is based on the food foraging behavior of land-based animals, FA is based on the attraction of fireflies to one another, ABC is based on the foraging behavior of bees, and GSA is based on the law of gravity. It is therefore useful to retain the distinction between these SI algorithms because they are based on different phenomena, and those distinct phenomena can be used to introduce helpful algorithmic variations for improved performance.

Retaining the swarm foundation of PSO stimulates research toward the incorporation of social behavior from animals, which can enrich and extend the study of PSO. Some of these behaviors include avoiding predators, seeking food, and seeking to travel more quickly. Retaining SFLA as a separate algorithm stimulates research toward the incorporation of frog behavior, including complex shuffling behavior and new leaping rules. Retaining GSO as a separate algorithm stimulates research toward the incorporation of foraging behavior from land-based animals, including additional food scanning mechanisms, scrounging strategies, and random search strategies. Retaining FA as a separate algorithm stimulates the research toward the incorporation of firefly behavior, including the dispersion of individuals, consideration of light intensity, and other atmospheric considerations. Retaining ABC as a separate algorithm stimulates research toward the incorporation of bee behavior, including the profitability of a food source, the distance and direction from the nest, and other foraging models. Retaining GSA as a separate algorithm stimulates research toward the incorporation of additional characteristics related to gravity, including active gravitational force, passive gravitational force, and inertia.

Table 1 Summary of the differences among the six SI algorithms that we study

2.2.2 Algorithmic differences

Differences in the performance levels of various SI algorithms arise because of differences in the details of these algorithms. Although we have shown that SI algorithms are equivalent under certain conditions, they can operate quite differently as typically implemented. For example, the basic PSO algorithm creates children by updating all solutions based on the current global best solution and the previous best solution. SFLA creates children by updating the worst sub-population solutions based on the current global best solution and the best sub-population solution. GSO creates children by updating each solution based on different search strategies. ABC creates children by updating each solution based on randomly weighted differences of the current solution. GSA creates children by updating each solution based on the same search strategy. FA creates children by updating all solutions except the best one, which is never updated.

Unifying various SI algorithms, as done in this paper, is instructive, but recognizing their individual characteristics and distinctions enables interesting mathematical and theoretical studies, and can provide valuable tools for practical problems. The no-free-lunch theorem (Wolpert and Macready 1997) says that if an algorithm achieves superior results on certain problems, it must pay for that performance with inferior results on other problems. So the existence of various SI algorithms provides an attractive choice of alternate optimization methods. The differences between the algorithms provide the possibility of application to a variety of problems, and for a variety of contributions to the SI literature.

Table 1 summarizes some characteristics of the six algorithms that we consider. The row labeled “Search Domain of Original Formulation” indicates whether the algorithm was originally developed for discrete or continuous search domains. This characteristic is not always obvious. For example, SFLA was originally applied to discrete search spaces, but the algorithm is more naturally suited for continuous search spaces, and it is usually applied that way. It should not be assumed that the various SI algorithms suit only the type of search domain in the table. In fact, they have all been modified to apply to both discrete and continuous search domains, and may even perform better a search domain type that is different than the original search domain type.

The “Convergence” row in Table 1 indicates whether the algorithms have fast or slow convergence, in general. This characteristic is relatively clear-cut; the original versions of SFLA and FA have fast convergence, and the original versions of PSO, GSA, ABC and GSO have slow convergence. However, many variants of PSO, GSA, ABC, and GSO exhibit greatly improved convergence abilities. Note that convergence ability is quantified by empirical evidence, but there is little theoretical evidence to support the convergence characterizations in Table 1.

The “Best Application of Original Formulation” row in Table 1 indicates the type of problem for which the algorithm was initially developed. SFLA was initially applied to combinatorial optimization problems and obtained good performance, but modified versions of SFLA have applied to all types of search domains. The original versions of GSO and FA show good performance for multimodal optimization problems, and the original versions of PSO, ABC and GSA show good performance for the most types of optimization problems.

3 Experimental results

This section investigates the performance of the six SI algorithms considered in this paper, including SFLA, GSO, FA, ABC, GSA, and PSO, along with advanced versions. The advanced versions include modified SFLA (MSFLA) (Emad et al. 2007), self-adaptive group search optimizer with elitist strategy (SEGSO) (Zheng et al. 2014), variable step size firefly algorithm (VSSFA) (Yu et al. 2015), modified artificial bee colony (MABC) (Bahriye and Dervis 2012), GSA with chaotic local search approach (CGSA2) (Gao et al. 2014), PSO with linearly varying inertia weight (LPSO) (Shi and Eberhart 1998), and PSO with constriction factor (CPSO) (Clerc and Kennedy 2002). We select these advanced versions because the literature shows that they generally provide better performance than the basic algorithms. Section 3.1 compares performance on the continuous benchmark functions from the 2013 IEEE Congress on Evolutionary Computation, and Sect. 3.2 compares performance on a set of combinatorial knapsack problems.

3.1 Continuous functions

This subsection compares the performance of SFLA, GSO, FA, ABC, GSA, PSO, and their advanced versions on a set of continuous benchmark functions from the 2013 IEEE Congress on Evolutionary Computation (Liao and Stuetzle 2013). These functions are briefly summarized in Table 2, and the parameters used in this subsection and the corresponding references are shown in Table 3. Note that we did not optimize the parameter values of the algorithms, but instead we used the parameter values that are given in the references in Table 3. Each algorithm has a population size of 200, and each function is evaluated in 50 dimensions with a function evaluation limit of 500,000. All algorithms terminate after the maximum number of function evaluations, or if the objective function value falls below 10\(^{-8}\). All results in this section are computed from 25 independent simulations. Results are shown in Tables 4 and 5.

Table 2 2013 CEC benchmark functions, where the search domain of all functions is \(-100\le x_i \le 100\)
Table 3 Parameter settings of SFLA, GSO, FA, ABC, GSA, PSO, and their advanced versions
Table 4 Comparison of the best error values of the 2013 CEC continuous benchmark functions with \(D =50\) for SFLA, GSO, FA, and their advanced versions
Table 5 Comparison of the best error values of the 2013 CEC continuous benchmark functions with \(D =50\) for ABC, GSA, PSO, and their advanced versions

According to Tables 4 and 5, MABC performs best on 10 functions (F2, F6, F7, F9, F10, F12, F14, F18, F25, and F28), VSSFA performs best on 5 functions (F3, F16, F20, F21, and F27), CGSA2 performs best on 4 functions (F1, F4, F11, and F19), FA performs best 4 functions (F13, F15, F17, and F22), ABC performs best on 2 functions (F5 and F23), MSFLA performs best on 2 functions (F8 and F24), and GSA performs best on function F26. These results indicate that MABC is the most effective, VSSFA is the second most effective, and CGSA2 and FA are the third most effective for the continuous benchmark functions that we studied. Furthermore, we find that for most benchmark functions, the performances of the advanced versions of SI algorithms are better than their corresponding basic versions, which indicates that the modifications of the algorithms can improve the optimization performance.

Next we briefly consider the types of functions for which the various algorithms are best-suited. Tables 4 and 5 show that for the unimodal functions (F1–F5), ABC and its advanced version perform best on two functions, and GSA and its advanced version perform best on two functions, and so they are the most effective algorithms. For the basic multimodal functions (F6–F20) and composition multimodal functions (F21–F28), ABC and its advanced version perform best on ten functions and are the most effective algorithms, and FA and its advanced version perform best on 8 functions and are thus the second most effective algorithms. This is consistent with the observation in Table 1 about the best application of the original formulation of the SI algorithms.

The average computing times of the algorithms are shown in the last row of Tables 4 and 5. Here MATLAB\(^{{{\textregistered }}}\) is used as the programming language, and the computer is a 2.40 GHz Intel Pentium® 4 CPU with 4 GB of memory. We find that all algorithms can be ranked from fastest to slowest as follows: FA, VSSFA, SFLA, MSFLA, ABC, MABC, PSO, GSA, LPSO, CPSO, CGSA2, GSO, and SEGSO.

In order to further compare the performance of SI algorithms, we perform a Holm multiple comparison test while considering PSO as the control method, which acts as the base algorithm for comparison. The Holm multiple comparison test is a nonparametric statistical test that obtains a probability (p value) that determines the degree of difference between a control algorithm and a set of alternative algorithms, assuming that the algorithms have statistically significant differences as a whole (Derrac et al. 2011). To quantify whether a set of algorithms shows a statistically significant difference as a whole, we first apply Friedman’s test with a significance level \(\alpha \) = 0.05 to the mean error rankings. If the test rejects the null hypothesis that all of the algorithms perform similarly, we compare the control method with the remaining algorithms according to their rankings. Additional details about the Holm multiple comparison test can be found in Derrac et al. (2011).

Table 6 shows the Friedman ranks of all the SI algorithms for the 2013 CEC benchmark functions. We obtain a Friedman statistic of 130.45 and a corresponding p value of 0.00012. Because the p value is smaller than 0.05, the results strongly indicate statistically significant performance differences among the algorithms.

Table 6 Friedman ranks of all SI algorithms for the 2013 CEC benchmark functions

Table 7 shows the results of a Holm multiple comparison test. It seems from Table 7 that for the 2013 CEC continuous benchmark functions, MABC is the best with an average rank of 3.62, VSSFA is the second best with an average rank of 4.14, and CGSA2 is the third best with an average rank of 4.57. These results are consistent with those shown in Tables 4 and 5. Furthermore, Table 7 shows statistically significant differences between PSO and all other algorithms except SFLA and GSO, as indicated by p values smaller than 0.05. The larger p values for SFLA and GSO, which are 0.08121 and 0.08902, respectively, indicate that although SFLA and GSO obtain better performance than PSO, the differences are not statistically significant.

Table 7 Holm multiple comparison test results of PSO and other SI algorithms, which show the average rank and the p values

We do not want to use the above results to draw broad conclusions. First, for SI algorithms, different tuning values might result in significant performance changes. In general, it can be difficult to determine optimum tuning parameters. A small change in a tuning value could change the performance of the algorithm, and this effect is problem-dependent. Second, if we use more other advanced versions of SFLA, GSO, FA, ABC, and GSA, we might obtain different results. The purpose of the comparisons here is not to tune the parameters of the algorithms to obtain the best performance, but rather to quantify performance differences between typically implemented algorithm versions, and to show that the algorithmic differences between SI algorithms can result in significantly different performance.

Table 8 The dimension and parameters of the ten benchmark knapsack problems
Table 9 Performance comparison for six SI algorithms and their advanced versions on benchmark knapsack problems

3.2 Combinatorial knapsack problems

In this section the SI algorithms are applied to the knapsack problems, which is an important and representative real-world combinatorial problem (Abulkalamazad et al. 2014; Bhattacharjee and Sarmah 2014; Freville 2004). Many combinatorial problems can be reduced to a knapsack problem.

The 0/1 knapsack problem can be simply described as follows. Consider a set of n items, where the ith item has weight \(w_i \) and profit \(p_i \). The problem is to select a subset of the n items to maximize overall profit without exceeding the weight constraint b. The problem can be mathematically modeled as follows:

$$\begin{aligned}&\mathrm{Maximize}\quad \sum \limits _{i=1}^n {w_i x_i } \nonumber \\&\text {Subject to}\quad \sum \limits _{i=1}^n {p_i x_i } \le b,\nonumber \\&\quad \text{ where } x_i \in \left\{ {0,\;1} \right\} \quad \forall \quad i\in \left\{ {1,2,\ldots ,n} \right\} \end{aligned}$$
(9)

\(x_i \) takes the value of 1 or 0, which represents the selection or rejection of the ith item. Ten benchmark knapsack problems are studied here, as summarized in Table 8. The parameters used in the SI algorithms in this section are the same as those in Sect. 3.1.

Table 9 shows comparisons of the best performance of each SI algorithm and their advanced versions after 1000 generations, averaged over 25 simulations. The results show that all SI algorithms and their advanced versions perform the same for \(f_{3}\), \(f_{4}\), and \(f_{9}\); SFLA and its advanced version obtain the best performance for five of the problems; GSA and its advanced version obtain the best performance for four and five of the problems, respectively; and FA, ABC and their advanced versions obtain the best performance for two of the problems. This indicates that SFLA and GSA are significantly better than the other SI algorithms for the combinatorial problems that we study. These results are also consistent with the observation in Table 1, which indicates that SFLA is the most appropriate for combinatorial optimization problems, and that GSA can obtain good performance for most types of optimization problems.

4 Conclusions

The similarities and differences of several popular new swarm intelligence algorithms, including SFLA, GSO, FA, ABC, and GSA, have been discussed in detail. Our discussion and comparison has been based on algorithmic motivations and implementation details. These algorithms have many similarities with PSO due to their similar biological motivations. We found that SFLA, GSO, FA, ABC and GSA are equivalent to PSO under certain conditions. In addition, we compared SFLA, GSO, FA, ABC, and GSA, with the basic PSO and with their improved variations, on a set of continuous benchmark functions and combinatorial knapsack problems. Simulation results show that although the algorithms are identical under certain conditions, their performance levels are quite different under the tested conditions, because each algorithm retains its own distinctions when implemented in its standard form. Given the overlap between algorithms, it is often difficult to know when one SI algorithm ends and another begins, when a new SI algorithm deserves to belong to its own class, or when it should be classified as a variation of an existing SI algorithm. We conclude that it can be helpful to maintain the distinctions between various SI algorithms, because the plethora of algorithms provides a diverse choice of optimization methods, research opportunities, and application opportunities.

SI researchers and practitioners typically want to know which algorithm performs best. However, the no-free-lunch theorem and the simulation results in this paper show that this is a poorly framed question. Relative performance depends on the particular problem, and is greatly affected by algorithmic variations and tuning parameters. One of the challenges for the SI research community is to find a balance between encouraging new research directions while still maintaining high standards for the introduction and development of purportedly new SI algorithms. However, the empirical results in this paper show that the advanced version of ABC performs best on continuous benchmark functions, and the advanced versions of SFLA and GSA perform best on the combinatorial knapsack problems. These results contrast with previous publications. For instance, (Elbeltagi et al. 2005) indicates that PSO performs better than GA, MA, ACO, and SFL; (Davarynejad et al. 2014) indicates that GSA performs better than PSO; (Parpinelli et al. 2012) indicates that PSO performs better than BFO and ABC; and (Civicioglu and Besdok 2013) indicates that cuckoo search performs better than PSO and ABC. These divergent results simply confirm our hypothesis that performance depends strongly on algorithmic variations and problem selection.

For future work there are several important directions. Many other SI algorithms exist, such as GWO, BA, FWA, and variations, which could provide better performance for certain types of problems than the algorithms studied in this paper. It would be interesting to analyze their similarities and differences also. The second suggested direction for future research is to study theoretical similarities and differences between algorithms based on mathematical models such as Markov chains (Nix and Vose 1992; Suzuki 1995), dynamic systems (Simon 2011), and statistical mechanics (Ma et al. (2015)). This effort would provide more definite mathematical conclusions than simulation results. The third suggested direction for future work is to develop improved versions of SI algorithms by using natural principles.