1 Introduction

Over the last decades, there has been a growing interest in algorithms inspired by the observation of natural phenomena. It has been shown by many researches that these algorithms are good replacements as tools to solve complex computational problems. Various heuristic approaches have been adopted by researches including genetic algorithm (Holland 1975), simulated annealing (Kirkpatrick et al. 1983), immune system (Farmer et al. 1986), ant system (Dorigo et al. 1996) and particle swarm optimization (Kennedy and Eberhart 1995; Kennedy and Eberhart 1997). Unfortunately, there is no algorithm to achieve the best solution for all optimization problems, and some algorithms give a better solution for some problems than the others (Wolpert and Macready 1997; Engelbrecht et al. 2005; Cheng et al. 2007; Elbetagi et al. 2005; Youssef et al. 2001).

Gravitational search algorithm (GSA) is one of the latest heuristic optimization algorithms, which was first introduced by Rashedi et al. (2009) based on the metaphor of gravitational interaction between masses. GSA is inspired by the Newton theory that says: “Every particle in the universe attracts every other particle with a force that is directly proportional to the product of their masses and inversely proportional to the square of the distance between them”. Gravity is a force, pulling together all matter.

The original version of GSA was designed for search spaces of real valued vectors. However, many optimization problems are set in binary discrete space. In this article, a version of the algorithm for binary encoding is introduced. In the previous version of GSA, the algorithmic “gravitational forces” lead directly to changes in the position of search points in a many-dimensional continuous space (the search space). In the binary version of GSA (BGSA), the outcome of these forces is converted into a probability value for each element of the binary vector, which guides whether that elements will take on the value 0 or 1.

This paper is organized as follows. Section 2 provides a brief review of GSA. In Sect. 3, we introduce basic aspects of binary version of GSA. An experimental study is given in Sect. 4, where the performance of the algorithm will be evaluated on nonlinear benchmark functions. Finally, a conclusion is given in Sect. 5.

2 The gravitational search algorithm

In the Newton gravitational law, each particle attracts every other particle with a “gravitational force” (Holliday et al. 1993; Schutz et al. 2003). The gravitational force between two bodies is directly proportional to the product of their masses and inversely proportional to the square of their distance (Schutz et al. 2003).

In this section, we introduce a brief review of GSA (Rashedi et al. 2009). In GSA, agents are considered as objects and their performance is measured by their masses. All these objects attract each other by a gravity force, and this force causes a movement of all objects globally towards the objects with heavier masses. The heavy masses correspond to good solutions of the problem.

In GSA, each mass (agent) has four specifications: its position, its inertial mass, its active gravitational mass, and its passive gravitational mass. The position of the mass corresponds to a solution of the problem, and its gravitational and inertial masses are determined using a fitness function.

In other words, each mass presents a solution, and the algorithm is navigated by properly adjusting the gravitational and inertia masses. By lapse of time, we expect that masses be attracted by the heaviest mass. This mass will present an optimum solution in the search space.

The GSA could be considered as an isolated system of masses. It is like a small artificial world of masses obeying the Newtonian laws of gravitation and motion. More precisely, masses obey the following laws (Rashedi et al. 2009):

  • Law of Gravity: each particle attracts every other particle and the gravitational force between two particles is directly proportional to the product of their masses and inversely proportional to the distance between them, R. We use here R instead of R2, because according to our experiment results, R provided better results than R2 in all experimental cases.

  • Law of Motion: the current velocity of any mass is equal to the sum of the fraction of its previous velocity of mass and the variation in the velocity. Variation in the velocity or acceleration of any mass is equal to the force acted on the system divided by mass of inertia.

Now, consider a system with N agents (masses), the position of the ith agent is defined by:

$$ X_{i} = (x_{i}^{1} , \ldots ,x_{i}^{d} , \ldots ,x_{i}^{n} )\quad {\text{for}}\;i = 1,2, \ldots ,N $$
(1)

where \( x_{i}^{d} \) presents the position of ith agent in the dth dimension and n is the space dimension.

At a specific time “t”, the force acting on mass “i” from mass “j” is defined as following:

$$ F_{ij}^{d} (t) = G(t){\frac{{M_{pi} (t) \times M_{aj} (t)}}{{R_{ij} (t) + \varepsilon }}}(x_{j}^{d} (t) - x_{i}^{d} (t)) $$
(2)

where M aj is the active gravitational mass related to agent j, M pi is the passive gravitational mass related to agent i, G(t) is gravitational constant at time t, ɛ is a small constant, and R ij (t) is the Euclidian distance between two agents i and j. The total force that acts on agent i in a dimension d is a randomly weighted sum of dth component of the forces exerted from Kbest agents:

$$ F_{i}^{d} (t) = \sum\limits_{j \in kbest,j \ne i}^{{}} {rand_{j} F_{ij}^{d} } (t) $$
(3)

where rand j is a random number in the interval [0,1] and Kbest is the set of first K agents with the best fitness value and biggest mass. Kbest is a function of time, initialized to K 0 at the beginning and decreasing with time. In such a way, at the beginning all agents apply force, and as the time passes, Kbest is decreased linearly and at the end, there will be just one agent apply force to others.

By the law of motion, the acceleration of the agent i at time t, and in direction d, \( a_{i}^{d} (t) \), is given as follows:

$$ a_{i}^{d} (t) = {\frac{{F_{i}^{d} (t)}}{{M_{ii} (t)}}} $$
(4)

where M ii is the inertial mass of ith agent. The next velocity of an agent is considered as a fraction of its current velocity added to its acceleration. Therefore, its velocity and its position could be calculated as follows:

$$ v_{i}^{d} (t + 1) = rand_{i} \times v_{i}^{d} (t) + a_{i}^{d} (t) $$
(5)
$$ x_{i}^{d} (t + 1) = x_{i}^{d} (t) + v_{i}^{d} (t + 1) $$
(6)

where rand i is an uniform random variable in the interval [0, 1]. This random number gives a randomized characteristic to the search. The gravitational constant, G, is initialized at the beginning and will be reduced with time to control the search accuracy. Hence, G is a function of initial value, G 0, and time, t:

$$ G(t) = G(G_{0} ,t) $$
(7)

Gravitational and inertia masses are simply calculated by the fitness evaluation. A heavier mass means a more efficient agent. This means that better agents have higher attractions and walk more slowly. Assuming the equality of the gravitational and inertia mass, the values of masses are calculated using the map of fitness. The gravitational and inertial masses are updated by the following equations (Rashedi et al. 2009):

$$ M_{ai} = M_{pi} = M_{ii} = M_{i} ,\,\,i = 1,2, \ldots ,N $$
(8)
$$ q_{i} (t) = {\frac{{fit_{i} (t) - worst(t)}}{best(t) - worst(t)}} $$
(9)
$$ M_{i} (t) = {\frac{{q_{i} (t)}}{{\sum_{j = 1}^{N} {q_{j} (t)} }}} $$
(10)

where fit i (t) represent the fitness value of the agent i at t, and, worst(t) and best(t) are defined as follows (for a minimization problem):

$$ best(t) = \mathop {\min }\limits_{{j \in \{ 1,..,N\} }} fit_{j} (t) $$
(11)
$$ worst(t) = \mathop {\max }\limits_{{j \in \{ 1,..,N\} }} fit_{j} (t) $$
(12)

The principle of GSA is shown in Fig. 1.

Fig. 1
figure 1

General principle of GSA (Rashedi et al. 2009)

To see how the GSA is efficient, some remarks are noted (Rashedi et al. 2009):

  • Since each agent could observe the performance of the others, the gravitational force is an information-transferring tool.

  • Due to the force that acts on an agent from its neighborhood agents, it can see space around itself.

  • A Heavy mass has a large effective attraction radius and hence a great intensity of attraction. Therefore, agents with a higher performance have a greater gravitational mass. As a result, agents tend to move toward the best agent.

  • The inertia mass is against the motion and make the mass movement slow. Hence, agents with heavy inertia mass move slowly and hence search the space more locally. So, it can be considered as an adaptive learning.

  • Gravitational constant adjusts the accuracy of the search, so it decreases with time similar to the temperature in a simulated annealing algorithm.

  • GSA is memory-less. However, it works efficiently like the algorithms with memory.

Theoretically, GSA belongs to the class of swarm based heuristic algorithms. Rashedi et al. (2009) gave a comparative study between GSA and a small number of well-known swarm algorithms like PSO. The results suggest that this approach which is inspired by the law of gravity has merit in the field of optimization.

3 Binary GSA

There are many optimization problems such as feature selection and dimensionality reduction (Avishek and Maiti 2010; Beretaa and Burczynski 2007; Wang et al. 2007; Chuang et al. 2008; Zeng et al. 2009), data mining (Srinivasa et al. 2007), unit commitment (Yuan et al. 2009), and cell formation (Wu et al. 2008), in which it is natural to encode solutions as binary vectors. In addition, problems defined in the real space, may be considered in the binary space, too. The solution is to display real digits with some bits in the binary mode. The binary search space is considered as a hypercube in which an agent may move to nearer and farther corners of the hypercube by flipping various numbers of the bits.

In this section, we present a binary version of GSA. To do this, some basic concepts of GSA will be modified. In discrete binary environment, every dimension can take only 0 or 1. Moving through a dimension means that the corresponding variable value changes from 0 to 1 or vice versa. In order to introduce a binary mode for the gravitational algorithm, the updating procedure of the force, acceleration and velocity may be considered similar to the continuous algorithm (Eqs. 46). The main difference between continuous and binary GSA is that in the binary algorithm, the position updating means a switching between “0” and “1” values. This switching should be done according to the mass velocity. Our idea is to update the position in a manner that the current bit value is changed with a probability that is calculated according to the mass velocity. In other words, BGSA updates the velocity based on Eq. 5 and considers the new position to be either 1 or 0 with the given probability.

Before defining a transfer function to map the velocity to the probability of position updating, it will be useful to remind some basic concepts of GSA.

  • A large absolute value of the velocity shows that the current position of the mass is not proper and a great movement is required to reach the optimum position.

  • A small absolute value of the velocity indicates that the current position of the mass is close to the optimum position and there is a small distance reaching to the optimum position. Then, after finding the optimal solution, the velocity becomes zero.

Based on the above concepts of the GSA, in the implementation of the BGSA the following concepts should be taken into account:

  • A large absolute value of velocity must provide a high probability of changing the position of the mass respect to its pervious position (from 1 to 0 or vice versa).

  • A small absolute value of the velocity must provide a small probability of changing the position. In other words, a zero value of the velocity represents that the mass position is good and must not be changed.

Based on the above-mentioned concepts, a proper probability function should be defined such that for a small \( \left| {v_{i}^{d} } \right| \), the probability of changing \( x_{i}^{d} \) must be near zero and for a large \( \left| {v_{i}^{d} } \right| \), the probability of \( x_{i}^{d} \) movement must be high. We define function \( S(v_{i}^{d} ) \) to transfer \( v_{i}^{d} \) into a probability function. \( S(v_{i}^{d} ) \) should be bounded within interval [0,1] and increases with increasing \( \left| {v_{i}^{d} } \right| \). We define \( S(v_{i}^{d} ) \) according to Eq. 13, which is illustrated in Fig. 2. As this figure shows, the proposed function satisfies all the above-mentioned requirements.

$$ S(v_{i}^{d} (t)) = \left| {\tanh (v{}_{i}^{d} (t))} \right| $$
(13)
Fig. 2
figure 2

The proposed function

Once \( S(v_{i}^{d} ) \) is calculated, the agents will move according to the rule explained in Eq. 14.

$$ \begin{gathered} {\text{if}} \; rand < S(v_{i}^{d} (t + 1)) \; {\text{then}} \; x_{i}^{d} (t + 1) = {\text{complement}}(x_{i}^{d} (t)) \hfill \\ {\text{else}} \, x_{i}^{d} (t + 1) = x_{i}^{d} (t) \hfill \\ \end{gathered} $$
(14)

To achieve a good converge rate, we limit the velocity, \( \left| {v_{i}^{d} } \right| < v_{\max } \). Based on our experiments, v max is set to be 6. It is to be noted here that the distance, R, is computed based on the Hamming distance.

4 Experimental results

To evaluate the proposed algorithm, a set of 23 minimization and 2 maximization benchmark functions are tested in this section and the results are reported along with Genetic Algorithm (GA) and Binary Particle Swarm Optimization (BPSO).

4.1 Benchmark functions

The benchmark functions are taken from (Yao et al. 1999; Digalakis and Margaritis 2002). These functions are given in Tables 1, 2, 3, and 4, where m is the dimension of the function, f opt is the optimum value of the function and S ⊆ R m. The first seven functions (F 1 to F 7) are unimodal. For unimodal functions, the convergence rates of the algorithm are more interesting than the final results of optimization. F 8 to F 13 are multimodal having many local minima and the algorithm must be capable in finding the optimum solution (or a good near-global optimum) and it should not be trapped in local optima. F 14 to F 23 are multimodal functions not having many local minima. Functions in Table 4 have discrete nature.

Table 1 Unimodal test functions
Table 2 Multimodal test functions
Table 3 Multimodal test functions with fix dimension
Table 4 Maximization functions in binary search space

Functions in Tables 1, 2, and 3 are minimization problems. The minimum value, f opt , of the functions of Tables 1 and 2 are zero, except for F 8 which has a minimum value of −418.9829 × m. The optimum location, X opt , for functions of Tables 1 and 2, are in [0]m, except for F 5, F 12 and F 13 with X opt in [1]m and F8 in[420.96]m. A detailed description of functions of Table 3 is given in Appendix.

Functions in Table 4 are Maximization binary problems. These functions are described only in binary space 0–1 values. The maximum values of these functions are depending with m.

4.2 Comparative study

To see how well the proposed algorithm perform, we applied BGSA to the above benchmark functions and compared the results with those of GA as well as BPSO. In GA, one-point crossover, uniform mutation and roulette wheel selection were used. The crossover probability and mutation probability were set to 0.9 and 0.005, respectively (Goldberg 1989).

In BPSO algorithm, \( v_{i}^{d} \) and \( x_{i}^{d} \) are calculated as follows (Kennedy and Eberhart 1997):

$$ v_{i}^{d} (t + 1) = w(t)v_{i}^{d} (t) + c_{1} r_{1} \left( {pbest_{i}^{d} - x_{i}^{d} (t)} \right) + \, c_{2} r_{2} \left( {gbest^{d} - x_{i}^{d} (t)} \right) $$
(15)
$$ \begin{gathered} if \, rand < \left( {{\frac{1}{{1 + e^{{ - v_{id} (t + 1)}} }}}} \right)\quad then \; x_{i}^{d} \left( {t + 1} \right) = 1 \hfill \\ \; else \, x_{i}^{d} \left( {t + 1} \right) = 0 \hfill \\ \end{gathered} $$
(16)

where r 1 and r 2 and rand are three uniform random variables in the range [0, 1], c 1 are c 2 are positive constants, w is the inertia weight. \( X_{i} = (x_{i}^{1} ,x_{i}^{2} , \ldots ,x_{i}^{n} ) \) and \( V_{i} = (v_{i}^{1} ,v_{i}^{2} , \ldots ,v_{i}^{n} ) \) represent position and velocity of the ith particle, respectively. \( pbest_{i} = (pbest_{i}^{1} ,pbest_{i}^{2} , \ldots ,pbest_{i}^{n} ) \) and \( gbest = (gbest^{1} ,gbest^{2} , \ldots ,gbest^{n} ) \) represent the best pervious position of the ith particle and the best pervious position among all the particles in the population, respectively. In addition, c 1 = c 2 = 2 is taken and the inertia factor, w, is decreased linearly from 0.9 to 0.2.

In binary version of GSA, G is considered as a linear decreasing function as follows.

$$ G(t) = G_{0} (1 - {\frac{t}{T}}) $$
(17)

where G 0 is a constant and set to 100 and T is the total number of iterations (the total age of system). Furthermore, K 0 (in Eq. 4) is set to N(total number of agents) and is decreased linearly to 1.

4.3 Results and discussion

We applied the three mentioned algorithms to the benchmark functions. In all cases, population size is set to 50 (N = 50). Dimension is 5 (m = 5) for functions of Tables 1 and 2 and maximum iteration is 500 for functions of Tables 1, 2 and 3. To represent each continuous variable, 15 bits have been used. Therefore, for each continuous function the dimension of agents is n = m × 15. The results are averaged over 30 independent runs under 30 different random seeds and the average best-so-far solution, standard deviation of best solution and median of the best solution in the last iteration are reported.

4.3.1 Unimodal functions

The results for unimodal functions are reported in Table 5. As this table illustrates BGSA provides better results than BPSO and slightly better than GA for all functions. In addition, the good convergence rate of BGSA could be concluded from Fig. 3. According to Table 5 and Fig. 3, BGSA tends to find the global optimum in unimodal functions and has a good convergence rate.

Table 5 Comparison between BGSA, GA and BPSO on unimodal functions of Table 1 with m = 5, where “ABSF”, “MBSF” and “STDV” indicate Average best so far solution, Median best so far solution, and Standard Deviation of the best so far solution in the last iteration, respectively
Fig. 3
figure 3

Comparison between BGSA, BPSO and GA on unimodal functions a F 1, b F 7 with m = 5

4.3.2 Multimodal functions

We have two sets of multimodal functions. The functions of Table 2 (F 8F 13) contain many local optima and their local minima increase exponentially as the dimension of the function increases. The dimension of functions of Table 3 (F 14F 23) is fixed and they have a few number of local minima.

We have carried out experiments on F 8F 13. The dimension of these functions is set to 5. The results are averaged over 30 runs and the average best so far solution, standard deviation of best so far solution and median of the best so far solution in the last iteration are reported for these functions in Table 6. For functions F 8F 12 BGSA reaches a much better solution than the other algorithms while for function F 13, GA presents a better solution than BGSA. Table 7 shows the comparison between BGSA, GA and BPSO on multimodal benchmark functions of Table 3 (F 14F 23). The results show that BPSO and BGSA in most functions have similar solutions and their performances are almost the same; slightly better for BPSO in some cases. The progress of the average best-so-far solution over 30 independent runs for functions F 10, F 12, F 15 and F 19 are shown in Fig. 4.

Table 6 Comparison between BGSA, GA and BPSO on multimodal functions of Table 2 with m = 5, where “ABSF”, “MBSF” and “STDV” indicate Average best so far solution, Median best so far solution, and Standard Deviation of the best so far solution in the last iteration, respectively
Table 7 Comparison between BGSA, GA and BPSO on multimodal functions of Table 3, where “ABSF”, “MBSF” and “STDV” indicate Average best so far solution, Median best so far solution, and Standard Deviation of the best so far solution in the last iteration, respectively
Fig. 4
figure 4

Comparison between BGSA, BPSO and GA on unimodal functions a F 10 with m = 5, b F 12 with m = 5, c F 15 with m = 4 and d F 19 with m = 3

4.3.3 Discrete functions

The functions of Table 4 (Max-Ones and Royal-Road) are binary in nature. These functions should be maximized. For the maximization problem, the concepts of the best and the worst are changed (Eqs. 11 and 12). This means that the best is computed according to Eq. 12 and vice versa. To evaluate the ability of the algorithms, the functions of Table 4 are considered with several value of m, m = 32, 64, 80 and 160 where m indicates the dimension of the function. The maximum iteration for m = 32, 64 and 80 is set to be 1000 while for m = 160, it is considered to be 2000. The dimension of agents, n, for binary functions is equal to m.

The results are given in Table 8. As this table suggests BGSA provides the optimum solution for all cases. The largest difference in performance between BGSA, GA and BPSO occurs in binary functions especially when the dimension of the functions increases. In addition, the good convergence rate of BGSA could be concluded from Fig. 5. According to this figure, BGSA tends to find the global optimum faster than the other tested algorithms.

Table 8 Comparison between BGSA, GA and BPSO on Max-Ones and Royal-Road function with m = 32, 64, 80 and 160, where “ABSF”, “MBSF” and “STDV” indicate Average best so far solution, Median best so far solution, and Standard Deviation of the best so far solution in the last iteration, respectively
Fig. 5
figure 5

Comparison between BGSA, BPSO and GA on a Max-Ones with m = 32, b Max-Ones with m = 160, c Royal-Road with m = 32, and d Royal-Road with m = 160

4.4 BGSA with different parameters

The BGSA has some parameter to be tuned such as G 0 and kbest. To examine the effect of different values of these parameters on the performance of BGSA in detail, a set of experiments have been carried out on BGSA using different values of G 0 and kbest. A set of seven benchmark functions from Tables 1, 2, 3, and 4 were selected in these experiments. The setup of these experiments such as population size, maximum iteration, etc. is the same as before. In the first group of experiments, we examine the impact of G 0. To do this, the values G 0 = 50 and G 0 = 150 are examined for seven benchmark functions. It is noted that in previous experiments we used the value of 100 for G 0. For simplicity, the results related to G 0 = 100 are also reported along with those of new values in Table 9. The results are averaged over 30 independent runs and the average best so far solution, and standard deviation of best so far solutions are reported for these functions. The results show that G 0 = 100 is not the optimal value for all functions. This means that G 0 is a problem dependent parameter.

Table 9 The average best so far solution and standard deviation found by BGSA using different G 0 for seven benchmark functions

In addition, to see why kbest must be varied with time, we performed some experiments with different values of kbest in which kbest is considered as a constant. The results are given in Table 10. As the table shows, using the linear time varying function for kbest produces better results than using constant values.

Table 10 The average best so far solution and standard deviation found by BGSA using different kbest for seven benchmark functions

In fact, different values of parameters lead the algorithm to different exploration and exploitation capabilities. In other words, by setting these parameters, one can control the convergence rate, exploration, and exploitation capabilities and escape trapping local optima. On the other hand, each problem needs to have the specific capabilities of exploration and exploitation. Hence, it is acceptable to have some parameters to be problem dependent.

5 Conclusion

In recent years, various heuristic optimization algorithms have been developed. GSA is a new heuristic search algorithm where it is constructed based on the law of Gravity and the notion of mass interactions. The GSA algorithm uses the theory of Newtonian physics and its searcher agents are the collection of masses. In this article, a binary version of GSA has been introduced.

In order to evaluate our algorithm, we have examined it on a set of various standard benchmark functions. In this paper, BGSA has been tested and compared with a small number of well-known alternative approaches. The results suggest that this interesting approach inspired by the law of gravity has merit in the field of optimization in binary search spaces, but much work has to be done to establish how well BGSA performs against state of the art techniques.