1 Introduction

Optimization problems are problems of optimizing a given objective function under a set of constraints. A particular minimization problem can be given as in Eq. (1).

$$\begin{aligned} \min \limits _x \{f(x)|x \in S \subseteq \mathfrak {R}^n \} \end{aligned}$$
(1)

where x is the decision variable, f(x) is the objective function and S is the feasible region. Since one can switch between a minimization and maximization problem by multiplying the objective function by negative one, we will consider a minimization problem.

The application of optimization solution methods have gone beyond our day to day activity. It has been applied in complex problems arising from different disciplines including, engineering, agriculture, management, economics, food science, politics, music science and the likes (Sweitzer 2008; Tilahun et al. 2012; Grachten et al. 2014; Volpato et al. 2008; Ondrisek 2009; Hamadneh et al. 2012; Hernandez and Fontan 2014; Lucia and Xu 1990; Tilahun and Asfaw 2012; Ropponen et al. 2010; Pike et al. 2014; Tilahun and Ong 2012a, b). Hence, contribution to the solution methods has been done by different professionals in different disciplines. Solution methods for an optimization problem can broadly be categorized as deterministic and non-determinist approaches. Metaheuristic algorithms are among the non-deterministic solution methods. Even though these algorithms do not guarantee optimality they are found to give a reasonable and acceptable solution with appropriate tuning of the algorithm parameters.

Since the introduction of evolutionary algorithms in mid 1970s, many researches have been done on heuristic algorithms. Introducing new algorithms has been one of the leading research areas (Yang 2011). Currently, there are hundreds of these algorithms. Most of these new algorithms are introduced by mimicking a scenario from nature. For instance, genetic algorithm is inspired by the Darwin theory of survival of the fittest (Negnevitsky 2005); Particle swarm optimization mimics how a swarm moves by following each other (Kennedy and Eberhart 1995); Firefly algorithm is inspired by how fireflies signal each other using the flashing light to attract for mating or to identify predators (Yang 2008) and Prey predator algorithm is another metaheuristic algorithm inspired by the interaction between a predator and its prey (Tilahun and Ong 2014; Tilahun et al. 2013). These algorithms use different degree of exploration and exploitation based on their different search mechanisms. In addition to introducing new algorithms merging two or more algorithms to improve the overall performance of the algorithms is also another research area which has been studied extensively. Hybrid algorithms are when multiple algorithms are combined so that the weakness of one algorithm will be compensated by the strength of another. Some of these hybridizations are done in particular to solve a particular problem. Furthermore, the application of metaheuristic algorithms is also another forefront research issue.

Firefly algorithm is a swarm based metaheuristic algorithm inspired by the flashing behavior of fireflies. Randomly generated solutions will be considered as fireflies and each will be assigned with a brightness based on their performance in the objective function. Then a firefly will be attracted towards bright fireflies. The algorithm has became popular due to its easiness to understand as well as to implement. It is also not difficult for parallel implementations. Due to its effectiveness, it has been used in different applications, including engineering applications, decision science applications, computer science applications, economics applications and in medical applications (Pan et al. 2013; Reddy and Sekhar 2014; Tilahun and Ong 2013; Alweshah 2014; Kwiecien and Filipowicz 2012; Poursalehi et al. 2013). In order to increase its effectiveness, different modifications on the standard firefly algorithm have been suggested. Hence, in this paper, a detailed review and analysis of these modifications for continuous optimization problems will be discussed. Review papers have recently been published on firefly algorithm (Fister et al. 2013b; Ali et al. 2014; Ariyaratne and Pemarathne 2015; Khan et al. 2016; Abdelaziz et al. 2015), however, this paper focuses particularly on modification done to improve the performance of firefly algorithm when dealing with continuous problem (interested reader for discrete version of the algorithm can refer to Tilahun and Ngnotchouye 2017) and also unlike the other review papers a detailed analysis on each of the modification with their strength and weakness will be discussed. Furthermore, the literature has expanded quickly and hence very recent modifications are not included in previous review papers. In addition, unlike the previously done research papers, a simulation based comparative study will also be presented.

In the next section a discussion on the standard firefly algorithm will be given followed by a discussion on the modified versions in Sect. 3. In Sect. 4 a discussion on future works along with summarizing the paper will be given. In Sect. 5, a simulation based comparison will be presented followed by a conclusion in Sect. 6.

2 Standard firefly algorithm

Nature has been an inspiration to the introduction of many meta-heuristic algorithms. It has managed to find solutions to problems without being told but through experience. Natural selection and survival of the fittest was the main motivation behind the early metaheuristic algorithms, Evolutionary algorithms. In addition most of metaheuristic algorithms are inspired by a given natural scenario.

Firefly algorithm is a swarm based metaheuristic algorithm which is introduced by Yang (2008). The algorithm mimics how fireflies interact using their flashing lights. The algorithm assumes that all fireflies are unisex, which means any firefly can be attracted by any other firefly; and the attractiveness a firefly is directly proportional to its brightness and depends on the objective function. A firefly will be attracted to a brighter firefly. Furthermore the brightness, or light intensity, decreases through distance based on inverse square law, as given in Eq. (2).

$$\begin{aligned} I\propto \frac{1}{r^2} \end{aligned}$$
(2)

If the light is passing through a medium with a light absorbtion coefficient \(\gamma \), then the light intensity at a distance of r from the source can be given as in Eq.  (3)

$$\begin{aligned} I=I_0e^{-\gamma r^2} \end{aligned}$$
(3)

where \(I_0\) is light intensity at the source.

Similarly the brightness, \(\beta \), can be given as in equation (4).

$$\begin{aligned} \beta =\beta _0e^{-\gamma r^2} \end{aligned}$$
(4)

A generalized brightness function for \(m\ge 1\) is given in Eq. (5) Yang (2008). In fact any monotonically decreasing function can be used.

$$\begin{aligned} \beta =\beta _0e^{-\gamma r^m} \end{aligned}$$
(5)

In the algorithm randomly generated feasible solutions called fireflies will be assigned with a light intensity based on their performance in the objective function. This intensity will be used to compute the brightness of the firefly, which is directly proportional to its light intensity. For minimization problem a solution x with smallest functional value will be assigned with highest light intensity. Once the intensity or brightness of the solutions are assigned each firefly will follow fireflies with better light intensity. For the brightest firefly since there is no other brighter firefly to follow it will perform a local search by randomly moving in its neighborhood. Hence, for two fireflies i and j with locations \(x_i\) and \(x_j\), respectively, if firefly j is brighter than firefly i, then i will move towards j using the updating formula given in Eq. (6).

$$\begin{aligned} x_i:=x_i+\beta _0e^{-\gamma r^2}(x_j-x_i)+\alpha (\varepsilon ()-0.5) \end{aligned}$$
(6)

where \(\beta _0\) is the attractiveness of \(x_j\) at \(r=0\) for implementation \(\beta _0=1\), \(\gamma \) is an algorithm parameter which determines the degree in which the updating process depends on the distance between the two fireflies, \(\alpha \) is an algorithm parameter for the step length of the local search and \(\varepsilon ()\) is a random vector of appropriate dimension with each component randomly generated from a uniform distribution between zero and one. For the brightest firefly, \(x_b\), the second expression in Eq. (6) will be omitted, as given in Eq. (7).

$$\begin{aligned} x_b:=x_b+\alpha (\varepsilon ()-0.5) \end{aligned}$$
(7)

The iteration continues until a termination criterion is met. The termination criterion can be maximum number of iteration, a tolerance from the optimum value if it is known or no improvement is achieved in consecutive iterations. The algorithm is summarized in Table 1.

Table 1 The standard firefly algorithm

3 Modified versions of firefly algorithm

The modification of the standard firefly algorithm is done to increase the effectiveness of the algorithm. Basically, there are two types of modifications. The first one is modifications in the parameter level. This is when the algorithm parameters are modified to be adaptive or have a certain structure and the second type is modification on the strategy level. This level includes a modified updating strategy including modified updating formula, added mutation operator and the likes.

3.1 Parameter level modification

In the standard firefly algorithm, the parameters in Eq. (6) are user defined and constants. Like any other metaheuristic algorithms, the performance of a firefly algorithm highly depends on the parameter values. It controls the degree of exploration and exploitation.

3.1.1 Modifying \(\alpha \)

In the standard firefly algorithm, a firefly \(x_i\) moves towards better solutions which helps the algorithm to explore around better solutions and also moves randomly. The effect of this random movement depends on the parameter \(\alpha \). If \(\alpha \) is chosen to be large then the solution \(x_i\) will randomly jump away from the neighborhood and explore the solution space and if it is very small then its jump will be in the neighborhood and also may become negligible compared to the movement towards brighter fireflies. It can also dominate and also move the solution out of the solution space if it is too large. To deal with this different modifications have been proposed. We would also like to point out that, a fixed value of \(\alpha \) doesn’t mean the movement step length will be \(\alpha \) but bounded by \(\alpha \) since it is multiplied by a random vector (not-unit vector) whose entries are between −0.5 and 0.5. Some papers purposed a modified \(\alpha \) by assuming that the step length is a fixed value but not the upper bound (Farahani et al. 2011a, b).

Some of the modifications done on \(\alpha \) is to make it decreasing with iteration. In Subramanian and Thanushkodi (2013), Liu et al. (2015), Coelho and Mariani (2012), a modification is proposed based on preassigned initial and final values for \(\alpha \) and using \(\alpha =\alpha _{max}-\frac{t(\alpha _{max}-\alpha _{min})}{t_{max}}\), where t is current iteration number, \(t_{max}\) is the maximum iteration number, \(\alpha _{max}\) is the initial step length and \(\alpha _{min}\) is the final step length. This scenario is similar with a linear decreasing scenario, linearly from (1, \(\alpha _{max}\)) to (\(t_{max}, \alpha _{min}\)), proposed in Yan et al. (2012), Goel and Panchal (2014), except that in the linear case it is a slightly larger with a decreasing difference with iteration. If \(\alpha _{max}=2.5\) and \(\alpha _{min}=0.2\) with \(t_{max}=100\), the first values for the linear and the first decreasing case will be 2.5 and 2.477, respectively, and decrease as shown in Fig. 1.

Fig. 1
figure 1

Adaptive \(\alpha \) as in Subramanian and Thanushkodi (2013), Liu et al. (2015), Coelho and Mariani (2012) and in a linear decreasing way

Based on given initial and final step length, an exponential decreasing step length is proposed in Shafaati and Mojallali (2012). The updating formula for \(\alpha \) is given by \(\alpha =\alpha _{min}+(\alpha _{max}-\alpha _{min})e^{-t}\). It decreases under the linear function. For \(\alpha _{max}=2.5\) and \(\alpha _{min}=0.2\), starting from 1.0641 in the first iteration, at \(t=1\), it will reach to the minimum value under ten iterations. Hence, it may not be suitable where a smooth adapting change is needed, furthermore, it will not be equal to the initial or maximum value even at the beginning, hence when setting up the initial value this needs to be considered and larger value from the intended value should be assigned for \(\alpha _{max}\), perhaps \(\alpha _{max}=exp(1)(\alpha _M-\alpha _{min})+\alpha _{min}\), where \(\alpha _M\) is the intended starting step length, can be used.

In addition to modifications of \(\alpha \) with a given starting and final step length, there are modification based on an initial step length only. In Wang et al. (2012), \(\alpha \) is made to be inversely proportional to the square of the iteration number and given by \(\alpha =\frac{\alpha _{max}}{t^2}\). It decreases quickly and the random movement of the firefly will almost vanishes within small number of iterations, for example if \(\alpha _{max}=2.5\), in the second iteration it will be 0.625 and within 16 iterations it will be 0.0098.

In Amaya et al. (2014), \(\alpha ^{(t)}=\alpha ^{(t-1)}(1+K(\varepsilon -1))\), for a random number \(\varepsilon \) from a uniform distribution between zero and one and K is a new parameter. It should be highlighted that K should be between zero and one, otherwise a negative step length may result. The updating strategy of \(\alpha \) decreases quicker than a linear function. Figure 2, shows the step length for \(K=0, 0.1, 0.2 \ldots 0.9, 1\).

Fig. 2
figure 2

Adaptive \(\alpha \) as given in Amaya et al. (2014) for 100 iteration

Another decreasing scheme for the step length \(\alpha \) is proposed in Shakarami and Sedaghati (2014), Olamaei et al. (2013), Kavousi-Fard et al. (2014), using \(\alpha ^{(t)}=\alpha ^{(t-1)}(\frac{1}{2t_{max}})^{\frac{1}{t_{max}}}\), where \(\alpha ^{(t)}\) and \(\alpha ^{(t-1)}\) represents the step length in iteration t and \(t-1\), respectively. In which the step length starts from \(\alpha _{max}\) and it decreases quicker than a linear function and approaches zero when the iteration grows. Unlike the previous two modification, there is no need to give two values as initial and final but a single starting step length. However, it also has a disadvantage of quick decreasing of the step length starting from the beginning which usually may not be needed. A similar modification is applied in Brajevic and Ignjatovic (2015) where \(\alpha ^{(t)}=\alpha ^{(t-1)}(\frac{10^{-4}}{9})^{\frac{1}{t_{max}}}\).

In Yang (2013), Manoharan and Shanmugalakshmi (2015), another decreasing approach is used for \(\alpha \) given by \(\alpha =\alpha _{max}0.9^t\). It decreases faster than a linear function. If we consider the above example in which \(\alpha _{max}=2.5\), it will start with 2.25 in the first iteration, hence, the maximum step length needs to be put a little higher than the intended starting step length. Like the previous case providing an initial step length is enough, no need to give a final value for \(\alpha \). A generalized form of this modification which is \(\alpha =\alpha _{max}\theta ^t\) for \(\theta \epsilon (0,1]\) is given in Baghlani et al. (2013). Here \(\theta \) is new algorithm parameter to control the step length. Figure 3, shows the behaviour of the adaptive \(\alpha \) for \(\theta = 0.1, 0.2, \ldots , 0.9\) with \(\alpha _{max}=2.5\). Note that if \(\theta =1\), the \(\alpha =\alpha _{max}\) for all the iterations. From the figure it can be seen that when \(\theta \) decreases \(\alpha \) also decreases quicker.

Fig. 3
figure 3

Adaptive \(\alpha \) as in Baghlani et al. (2013) with different values of the new algorithm parameter \(\theta \)

Another similar modification with additional parameter is proposed in Fu et al. (2015). \(\alpha \) is updated using \(\alpha =\alpha _{max}-(\alpha _{max}-\alpha _{min})(\frac{t-1}{G_0-1})^\lambda \). Even though the authors did not mention \(\lambda \) should be non-negative, otherwise negative step length may result. If \(\lambda =0\), then for any iteration \(\alpha =\alpha _{min}\) and if \(\lambda =1\) then \(\alpha \) will decrease linearly. For \(\lambda <1\)\(\alpha \) will decrease quicker than a linear function and if \(\lambda >1\) it will decrease slower than a linear function as demonstrated in Fig. 4, for \(\alpha _{max}=2.5\), \(\alpha _{min}=0.2\) and \(G_0=95\) for \(t_{max}=100\). As can be seen from Fig. 4, if \(G_0\) is not properly chosen then the values of \(\alpha \) may go under zero in final iterations.

Fig. 4
figure 4

Adaptive \(\alpha \) as in Fu et al. (2015) with different values of the new algorithm parameter \(\lambda \)

In Yu et al. (2015a), a modification for the random step length is given by \(\alpha =\frac{0.4}{1+e^{0.005(t-t_{max})}}\). \(\alpha \) decreases almost in a linear way from 0.2489 in the first iteration and 0.2 in the final 100 iteration. Even though it is a decreasing function, \(\alpha \) will be too small, unless accompanied by a scaling parameter, for problems with wide feasible region. Another similar modification is given by assigning the inverse of golden ratio as a step length \(\alpha \), is presented in Dhal et al. (2015b).

In Othman et al. (2015), the algorithm parameters are modified based on the problem properties and characteristics. \(\alpha \) is made to be inversely proportional to the square of the iteration number, which results a quick decrease as the iteration increases.

In Wang et al. (2014), the authors mentioned that the step length deceases but no decreasing equation is provided.

The summary of the decreasing scenarios of the modifications, \(\alpha _{max}=2.5\) and \(\alpha _{min}=0.2\), is summarized in Fig. 5.

Fig. 5
figure 5

Adaptive \(\alpha \) based on different modifications, \(\alpha _1\) is from Subramanian and Thanushkodi (2013), Liu et al. (2015), Coelho and Mariani (2012); \(\alpha _2\) is from Shafaati and Mojallali (2012); \(\alpha _3\) is from Wang et al. (2012); \(\alpha _4\) is from Amaya et al. (2014) with \(K=0.5\); \(\alpha _5\) is from Shakarami and Sedaghati (2014), Olamaei et al. (2013), Kavousi-Fard et al. (2014); \(\alpha _6\) is from Brajevic and Ignjatovic (2015); \(\alpha _7\) is from Yang (2013), Manoharan and Shanmugalakshmi (2015), Baghlani et al. (2013) with \(\theta =0.9\) in Baghlani et al. (2013); \(\alpha _8\) is from Fu et al. (2015) with \(\lambda =2\); \(\alpha _9\) is from Fu et al. (2015) with \(\lambda =0.5\) and \(\alpha _{10}\) is from Yu et al. (2015a) for 100 iteration

A modification of \(\alpha \) based on the performance of the solutions is proposed in AL-Wagih (2015), Wang et al. (2014b), Yu et al. (2013). In AL-Wagih (2015), \(\alpha \) is updated using \(\alpha =\alpha _{max}-(\alpha _{max}-\alpha _{min})\frac{I_{max}-I_{mean}}{I_{max}-I_{min}}\), where \(I_{mean}\) is the average intensity of the fireflies in an iteration. This is similar with the updates done in Subramanian and Thanushkodi (2013), Liu et al. (2015), Coelho and Mariani (2012) with the only difference being rather than the iteration ratio here intensity is used. \(\frac{I_{max}-I_{mean}}{I_{max}-I_{min}}\) is always non-negative number at most equal to one. However, it should be highlighted that this updating formula works as far as all the solutions doesn’t converge to a global or local solution, because if that is the case a singularity case will be arise with \(I_{max}-I_{min}=0\). Considering, singularity is not the case, the denominator always is greater or equal to the numerator, hence \(\alpha \) is always in the range between \(\alpha _{max}\) and \(\alpha _{min}\), but not necessarily decreasing.

In Wang et al. (2014b) also an update of \(\alpha \) based on the intensity of the population is proposed. In order to compute the step length first a value, \(\xi \), based on change in light intensity of \(x_i\) and brighter firefly \(x_j\) is calculated using \(\xi =\frac{I_j-I_i}{\max (I)-\min (I)}\). Then \(\alpha _0=\{\begin{array}{ll} \xi &{} \xi >\eta \\ \eta &{} \xi \le \eta \end{array}\) for a new algorithm parameter \(\eta \) using this \(\alpha \) is updated by \(\alpha =\alpha _0(0.02r_{max})\) where \(r_{max}=\max \{d(x_i,x_j), \forall i,j \}\). For smaller value of \(\eta \), if the light intensity difference between two fireflies is bigger, then it will result a bigger step length. That is an acceptable relation between the step length and the performance of solutions. However, if \(\eta \) is set bigger, the step length will remain to be the same for any change in the brightness.

In Yu et al. (2013), \(\alpha \) is made adaptive based on the performance of previous iterations of the solutions. It utilizes memory to save the performance of the solutions. The updating formula for \(\alpha \) is given by \(\alpha =1-\frac{1}{\sqrt{(f_{best}-f_{i})^2+h_i^21}}\) for \(h_i=\frac{1}{\sqrt{(f_{i-best}^{(t-1)}-f_{i-best}^{(t-2)})^2+1}}\), where \(f_{best}\) is the functional value of the best solution solution so far, \(f_i=f(x_i)\), \(f_{i-best}^{(t-1)}\) is the best functional value of \(x_i\) until previous iteration, \(t=t-1\) and \(f_{i-best}^{(t-2)}\) the best functional value of \(x_i\) until two iteration earlier. It is a good idea to adjust the step length based on the performance of the algorithm. However, As can be seen \(\alpha \) is always between zero and one, hence unless a scaling factor is added it may be too small for problems with huge feasible region. Whenever the solution \(x_i\) approaches the best solution \(\alpha \) will be decreasing. Hence, it is a promising modification based on the performance of the solution, rather than taking a simple decreasing function.

The random step length is not always modified as in a decreasing way. For instance in Coelho and Mariani (2013), it is modified using \(\alpha =0.3|G|\), where G is from a normal distribution of mean 0 and variance 1. Figure 6 shows the step length as a function of iteration for 100 iterations. A scaling parameter should be incorporated which adjusts the length based on the the size of the feasible regions as done for example in Maidl et al. (2013), where \(\alpha \) is multiplies by a scaling parameter which is given by \(\max \limits _k x_i(k) - \min \limits _k x_i(k)\). In a similar way chaos mapping and other distributions have been used in Amiri et al. (2013), Coelho et al. (2011), Fister et al. (2014), which will produce a non-decreasing or non-increasing step length. A chaotic mapping with Levy flight is also used in Dhal et al. (2015a) and in Yang (2010), Sahoo and Chandra (2013) a Levy distribution is used to generate the random vector for the random movement. Scaling parameter for the random movement is also utilized in Farahani et al. (2011a, b), Liu et al. (2015), Brajevic and Ignjatovic (2015), Baghlani et al. (2013), Kanimozhi and Latha (2013).

Fig. 6
figure 6

Adaptive \(\alpha \) as given in Coelho and Mariani (2013) for 100 iteration

Another modification done to \(\alpha \) is in Selvarasu and Kalavathi (2015). \(\alpha \) is encoded as a parameter in the solution along with other parameters. So that the algorithm adjusts the parameter values itself. It is an interesting idea, however, the complexity issue needs to be studied further. In Sulaiman et al. (2012), the authors mentioned that a mutation operator is used to tune \(\alpha \) but no description on the operator is given.

3.1.2 Modifying attraction step length, \(\beta \)

The updating formula of a firefly algorithm has two updating terms as given in Eq. (6). The second term, which is given by \(\beta _0e^{-\gamma r^2}(x_j-x_i)=\beta (x_j-x_i)\), represents an attraction term of \(x_i\) towards \(x_j\) with \(\beta _0\) has a value based on the light intensity of firefly j at \(r=0\). In the standard firefly algorithm it is suggested that \(\beta _0=1\). The value of \(\beta \) gives the step length of \(x_i\) towards \(x_j\). If \(\beta = 1\) then updating \(x_i\) towards \(x_j\) will put \(x_i\) in \(x_j's\) position. The attraction step length, \(\beta \), depends on the initial attraction \(\beta _0\) based on the light intensity at the source, the distance between the fireflies, r, and the light absorbtion constant, \(\gamma \). If this step length \(\beta =0\) then \(x_i\) will not be attracted and hence will not move towards \(x_j\). If \(0<\beta <1\) then \(x_i\) moves towards \(x_j\) and updated to a new position on the line joining the two fireflies. However if \(\beta >1\), \(x_i\) will be updated and moved beyond \(x_j\) in the direction from \(x_i\) to \(x_j\). Hence, if \(\beta \) is assigned with a value in the neighborhood of 1, \(x_i\) will move in the neighborhood of \(x_j\) and the intensification or exploitation degree of the algorithm around \(x_j\) is done. For \(\beta _0=1\), as suggested by the standard firefly algorithm, and with \(0 \le \gamma \le 3\) and \(0 \le r \le 3\), \(\beta =e^{-\gamma r^2}\) the scenarios are presented in Fig. 7.

Fig. 7
figure 7

\(\beta =e^{-\gamma r^2}\) as a function of r for \(0\le r \le 3\) and \(0 \le \gamma \le 3\)

In recent studies modifying the attraction step length is mainly done by modifying \(\gamma \) and \(\beta _0\). In Selvarasu et al. (2013), based on user assigned \(\beta _{max}\) and \(\beta _{min}\), \(\beta \) is represented using \(\beta =\beta _{min}+(\beta _{max}-\beta _{min})e^{-\gamma r^2}\). In the standard firefly algorithm if \(\gamma =0\) then the attraction step length will be \(\beta _0\) resulting any firefly can be attracted equally to any other brighter firefly irrespective of the distance between them and when r is very large with \(\gamma \ne 0\) the step length vanishes. Hence the updating is equivalent with the standard firefly updating mechanism if \(\beta _{min}=0\), with \(\beta _{max}=\beta _0\). However, this updating mechanism makes sure that the attraction step length is in the range \([\beta _{min},\beta _{max}]\). Similar modification is presented in Meena and Chitra (2015), with \(\beta _{min}\) computed by the intensity difference of the two fireflies. The same updating equation for \(\beta \) is also used in Selvarasu and Kalavathi (2015), with \(\beta _{max}\) being limited between 0 and 1. Furthermore, \(\beta _{min}\) along with \(\alpha \) and \(\gamma \) is encoded in the solution \(x_i\), hence the dimension of \(x_i\) increases by three.

Other modification on the attraction is done by using different chaotic mappings as presented in Gandomi et al. (2013), where twelve different chaotic maps are presented to make \(\gamma \) and \(\beta \) adaptive. Similar work is done in Jansi and Subashini (2015) using chebyshev mapping and in AL-Wagih (2015), Abdel-Raouf et al. (2014) using sinusoidal map. The attraction term is supposed to be influenced by the light intensity and also the distance between the fireflies along with the light absorbtion coefficient. Furthermore, using a chaotic map for \(\gamma \) results a chaotic updating behaviour in \(\beta \). For instance Fig. 8 a logistic map with a starting value of \(\gamma =0.87\) and its corresponding value of \(\beta \) with \(\beta _0=1\) and \(r=1\). As can be seen from the figure when \(\gamma \) behaves in a chaotic way so does \(\beta \). Hence, there is no need to update both \(\gamma \) and \(\beta \) at the same time. Rather making \(\beta _0\) influenced by the light intensity at \(r=0\) and changing \(\gamma \) seems a better idea rather than updating \(\beta \) so that the influence of the light intensity will not be lost. In Coelho et al. (2011) a logistic map is used to update \(\gamma \) and in Long et al. (2015) \(\beta _0\) is updated using gaussian map.

Fig. 8
figure 8

Chaotic (Logistic) map for \(\gamma \) and it effect on \(\beta \) with \(r=1\) and \(\beta _0=1\)

In Amaya et al. (2014), \(\gamma \) is modified to decrease with iteration using \(\gamma =\gamma (1+K(\varepsilon -1))\), for a random number \(\varepsilon \) from a uniform distribution between 0 and 1 and K is a new parameter. It should be highlighted that K should be between zero and one, otherwise a negative value for \(\gamma \) may result. \(\gamma \) is modified in Coelho and Mariani (2013) using \(\gamma =0.03|G|\) where G is generated from a normal distribution given by N(0, 1) with \(\beta _0=1\), which limits the value of \(\gamma \) in a certain region. Theoretically, \(\gamma \) can have any positive value. A similar modification with additional iteration term added is given in Coelho and Mariani (2012) using \(\gamma =|G|x'\frac{t}{t_{max}}\) where G is generated from a normal distribution given by N(0, 0.3) and \(x'\) is the scaled value of x using linear scaling in such a way that it will put \(x'\) in between 0 and 1. The updating gives a direct relation between \(\gamma \) and number of iteration, t, that results a decrease in step length of attraction. There is also additional modification of \(\gamma \) as a function of iteration number is given in Liu et al. (2015), Fu et al. (2015). The update formula for \(\gamma \) given in Liu et al. (2015) is given by \(\gamma =\gamma _{min}+(\gamma _{max}-\gamma _{min})\frac{t}{t_{max}}\). It is a linearly increasing function starting from \(\gamma _{min}+\frac{\gamma _{max}-\gamma _{min}}{t_{max}}\) at the first iteration, \(t=1\), and \(\gamma _{max}\) in the final iteration. This shows that the attraction decreases with a reasonable change of r. However, in Fu et al. (2015), a decreasing updating formula is propose using \(\gamma =\gamma _{max}-(\gamma _{max}-\gamma _{min})(\frac{t}{t_{max}})^2\), with a final value being \(\alpha _{min}\). Figure 9, shows the two approaches with \(\gamma _{min}=0.5\) and \(\gamma _{max}=3\). Another modification based on \(\gamma \) is given in Lukasik and Zak (2009). \(\gamma \) is set to decrease based on the maximum distance between fireflies using \(\gamma =\frac{\gamma _0}{r_{max}^\omega }\) for \(\omega =1,2\) for \(\gamma _0\) being between 0 and 1. Hence, the square of the distance expression in \(\beta \) will be minimized (as the iteration increases both \(r_{max}\) and r decreases, resulting \(r'\) more or less equivalent to a neighborhood of a constant number smaller than 1), and \(\beta \) will be independent of the distance, resulting all fireflies are viable to any other firefly, for \(\omega =2\). If \(\omega =1\) the effect of the distance will not be in squares but will still be there. The adaptive scenarios for \(\gamma \) is summarized in Fig. 10. As can be seen from Fig. 10, \(\gamma _5\) is neither decreasing nor increasing, there are two decreasing scenarios, which are proposed in \(\gamma _2\) and \(\gamma _4\) . The one from \(\gamma _2\) decreases slower than a linear function and also from the one in \(\gamma _4\). The other four approaches gives an increasing \(\alpha \), \(\gamma _3\) produced a very low values with slow increase compared to the other increasing scenarios; \(\gamma _7\) grows slower than \(\gamma _6\) until around \(t=60\), and start to grow quickly and over take even \(\gamma _1\) around \(t=80\). \(\gamma _1\) grows quicker than \(\gamma _6\).

Fig. 9
figure 9

\(\gamma _1(t)\) is from Liu et al. (2015) and \(\gamma _2(t)\) is from Fu et al. (2015)

Fig. 10
figure 10

Different scenarios of adaptive \(\gamma \) where \(\gamma _1\) is from Liu et al. (2015); \(\gamma _2\) is from Fu et al. (2015); \(\gamma _3\) is from Coelho and Mariani (2012) with x generated randomly between zero and one; \(\gamma _4\) is from Amaya et al. (2014) for \(K=0.5\); \(\gamma _5\) is from Coelho and Mariani (2013); \(\gamma _6\) is from Lukasik and Zak (2009) with \(\omega = 1\), \(\gamma _0=0.75\) and \(r_{max}\) made to decrease from rmax starting from 10 decreases exponentially to 0.2 in the 100th iteration using \(r_{max}=10.4029e^{-0.0395t}\) and \(\alpha _7\) is the same with \(\alpha _6\) except \(\omega =2\). Furthermore, \(\gamma _{max}=5\) and \(\gamma _{min}=0.2\) is set to be used whenever needed

In Amaya et al. (2014), Wang et al. (2014b), Tilahun and Ong (2012c), modifications are done based on the light intensity difference between the fireflies. Suppose firefly j is brighter than firefly i, then for firefly i to move towards firefly j, \(\xi \) needs to be calculated using \(\xi =\frac{I_j-I_i}{I_{max}-I_{min}}\) which always put \(\xi \) in between zero and one. Then based on \(\xi \), \(\beta _0\) will be \(\beta _0=\{\begin{array}{cc} \xi &{} \xi >\eta \\ \eta &{} \xi \le \eta \end{array}\) where \(\eta \) is a new algorithm parameter, (Wang et al. 2014b). Bigger value of \(\eta \) results \(\beta _0\) to be more random, rather than to be dependent on difference in light intensity. Hence, smaller value of \(\eta \) should be used, so that the effect of the brightness will be in the step length of the attractiveness move. In Maidl et al. (2013), a scaling parameter a is added. Hence, \(\beta =a\beta _0e^{-\gamma r^2}\), where \(a=\frac{f(x_j)-f(x_b)}{max\{f(x)\}-min\{f(x)\}}\) for a current global best solution \(x_b\), which depends solely on \(x_j\), not \(x_i\) i.e. no effect of the performance of \(x_i\) is added. The scaling value is always in between 0 and 1. In Tilahun and Ong (2012c), \(\beta _0\) is expressed based on light intensity of firefly i and firefly j, using \(\beta _0=e^{I_{0,j}-I_{0,i}}\) where \(I_{0,j}\) and \(I_{0,i}\) are the light intensity of firefly j and i at \(r=0\), respectively. Here, \(\beta _0>1\), because firefly j is brighter than firefly i. Hence, for problems with big functional value compared to the size of the feasible region, the attraction step length will be too big, hence appropriate scaling parameter needs to be used.

Based on the problem properties \(\gamma \) was made adaptive with \(\beta _0=1\), in Othman et al. (2015). Another modification is done in Cheung et al. (2014), which is given by \(\beta _0=\frac{d(x_i,x_b)-\min \limits _j \{d(x_i,x_j)\}}{-\max \limits _j \{d(x_i,x_j)\}-\min \limits _j \{d(x_i,x_j)\}}\). Hence, \(\beta _0\) is between 0 and 1. Furthermore, \(\gamma =\frac{1}{1+\sigma e^{-\rho \beta _0}}\), for new algorithm parameters \(\sigma \) and \(\rho \). Parameter assignment is not an easy task for a user, hence expressing \(\gamma \) with two other parameters doesn’t seem reasonable. Furthermore, \(\beta _0\) is not expressed in terms of light intensity but based on its location compared with other fireflies. Hence, two fireflies with different intensity at \(r=0\) may have same \(\beta _0\) value and that is against the original idea in the standard firefly algorithm.

In Lin et al. (2013), the attractive step length is modified and expressed using \(\beta =\beta _0\gamma (1-r')\), where \(r'=\frac{r}{r_{max}}\), for \(r_{max}=\max \limits _{i,j}\{d(x_i,x_j)\}\). The effect of the distance decreases linearly.

In addition to \(\gamma \) and \(\beta _0\), the way the distance is measured affects the attraction step length, \(\beta \). Cartesian distance is used in most of the applications with a number of exceptions. In fact any function, satisfying the properties of a distance function, can be used to compute the distance between two fireflies. The distance representation has been normalized in Lin et al. (2013) using \(\frac{r-r_{min}}{r_{max}-r_{min}}\) with \(r_{min}=0\) and \(r_{max}=\sqrt{\sum \limits _{d=1}^n(\max \limits _i\{x_i(d)\} - \min \limits _i\{x_i(d)\})^2}\). The new distance will be normalized and is between zero and one. In Yan et al. (2012), the parameter m given in the general form as in Eq. (5) is replaced by a function of the dimension of the problem and the the size of the feasible region by \(r^m=r^{\eta \sqrt{nR}}\), where n is the dimension of the problem and R is the maximum range given by \(R=\max \limits _i\{x_{max}(i) - x_{min}(i) \text { for all dimension }i\}\). A minimum variation is used in place of cartesian distance in Sulaiman et al. (2012). In Othman et al. (2015), the distance is defined based on cartesian distance formula and the property of the problem. Another modification in the distance computation is done in Subramanian and Thanushkodi (2013). In Subramanian and Thanushkodi (2013), instead of computing the distance on the feasible region it is computed in the outcome or functional space using \(f(x_j)-f(x_i)\). It should be noted that the solutions are exploring the feasible region and hence, two solutions with big difference in functional value may in fact be close in the feasible region and viceversa. Hence, in general it can be misleading to use the distance on the solutions space.

3.2 Strategy level modifications

In this modification category four kinds of modification will be discussed.

3.2.1 Modifying the movement of best or worst solution

The performance of firefly algorithm highly depends on the updating strategy used for the brightest firefly, \(x_b\). If it is allowed to move randomly with a big step length, it may move to a non-promising area and its performance decreases. Hence, the already achieved good solution, in a promising region, may get lost as memory is not utilizing in the standard firefly algorithm to track the previous position and performance. Hence, in Tilahun and Ong (2012c), the random movement of the brightest firefly is updated by choosing a best direction, a direction which improves the solution’s performance if it goes in that direction, from randomly generated \(m_r\) directions, where \(m_r\) is a new algorithm parameter. If any of these directions are not improving then it will stay in its current position. In that way the best solution found will not get lost. Another similar modification is done in Verma et al. (2016), where the brightest solution check the component of each of the other solution for improvement. Hence, the update is done using

$$\begin{aligned} \begin{aligned} y=x_b&\\ \text {for }i=&1:n \text {(for all dimension)}\\&\text {for }j=1:N \text {(for all the solutions)}\\&\quad \quad y(i)=x_j(i)\\&\quad \quad \text {if }[f(y)\text { is better than }f(x_b)] \, x_b=y\text { end if}\\&\text {end for}\\ \text {end for}&\end{aligned} \end{aligned}$$

Both approaches make sure the brighter solution will be replaced by a better solution or will not change if better solution is not archived. In the first case a new solution in the neighborhood is searched whereas in the second approach the brighter solution is copy some components from the existing value and the result may not necessarily be in the neighborhood. In Fu et al. (2015), Gauss distribution based move is applied for the brighter firefly \(x_b\), which is given by \(x_b:= x_b+x_bN(\mu , \sigma )\). This will be applied if the variance of the solutions before a predetermined M iterations is less than a given precision parameter \(\eta \). Perhaps modifying the movement based on the performance of the algorithm is a good idea to escape local solutions. However still like in the standard firefly algorithm the best solution may get lost in the process, unless methods discussed on Tilahun and Ong (2012c), Verma et al. (2016) on the movement of the brightest firefly is used or a memory is utilized to keep the global best solution.

The updating of the brightest firefly along with other top fireflies are modified in Kazemzadeh-Parsi (2014, 2015), Kazemzadeh-Parsi et al. (2015) . Two ways are proposed. The first one is the top \(m_{top}\) solutions in an iteration will pass to be kept in memory and will replace weak solutions in the next iteration. This implies that the selected top fireflies will repeat the same search paradigm as in the previous iteration which possibly gives another \(m_{top}\) solutions resembling solutions in the current solution. The other approach is no update will be done on the top solutions. In both cases the ratio of the top solution selected matters on the performance of the solution and in general if the number of solution is larger than the number top fireflies, \(N>>m_{top}\), then it can be considered as an improved version based on elitism concept. However if \(N>>m_{top}\) is not true then rather than using solutions to explore and exploit we are just keeping them and wasting the computational power of the algorithm.

In the standard firefly algorithm the worst solution will be attracted to all other fireflies as it is the dimmer and has a least brightness. However, in Yu et al. (2015b) a new updating mechanism based on opposite numbers is introduced. The new updating formula for the worst solution, \(x_{worst}\), is given by \(x_{worst}:=\{\begin{matrix} x_b &{} \varepsilon < p\\ x_{max}+x_{min}-x_{worst} &{} otherwise \end{matrix}\), for a new parameter p. It should be noted that, larger value for p means the worst solution is likely to be put in the best solutions place and it decreases the search capacity of the algorithm from N fireflies to \(N-1\) as that location is already being found and being exploited by \(x_b\). However, assigning smaller value for p results a change of \(x_{worst}\) by its opposite number. Hence, using a smaller value for p gives better exploration.

3.2.2 Adding mutation operators

In order to escape from local solutions as well as improve the search mechanism beyond the vicinity of the existing solutions different modifications are proposed. These modifications are done by incorporating a mutation operator, crossover operator, by simple relocate a solution or by generating a random and feasible new solution.

In Kazemzadeh-Parsi (2014, 2015), Kazemzadeh-Parsi et al. (2015) , a random generation of new solutions is used to replace weak solutions. In the proposed modification \(\nu \) random feasible solutions are generated and added to the solutions and the weak \(\nu \) solutions will be deleted. In Shakarami and Sedaghati (2014), Olamaei et al. (2013), for each firefly \(x_i\), three fireflies, \(x_1, x_2\) and \(x_3\), different from \(x_i\) will be selected randomly from the existing fireflies to produce a new test firefly, \(x_T\), given by \(x_T=x_1+\varepsilon (x_2-x_3)\). Then based on \(x_T\), two new solution will be generated by \(x_{m_1}(k):=\{\begin{matrix} x_T(k) &{} \varepsilon _1 < \varepsilon _2\\ x_b(k) &{} otherwise \end{matrix}\) and \(x_{m_2}=\varepsilon x_b + \varepsilon (x_b-x_i)\), where \(\varepsilon , \varepsilon _1\) and \(\varepsilon _1\) are random numbers between 0 and 1. The better performing solution from \(x_i, x_{m_1}\) and \(x_{m_2}\) will be selected to be included in the solution set. Similar modification is proposed in Mohammadi et al. (2013), Kavousi-Fard et al. (2014) where for each \(x_i\) seven new solutions are generated based on the randomly selected three solutions, the best solution and the worst solution. Another similar mutation operator for using firefly algorithm for multiobjective optimization problem is proposed in Amiri et al. (2013). In all of the mentioned cases, for each solution \(x_i\), new solutions will be generated using different but more or less similar operators and the better solution found will replace \(x_i\). It should be noted that the main motivation of metaheuristic algorithm is that exhaustive search is not always possible rather a systematic exploration along with exploitation techniques are used. A mutation operator can enhance the standard firefly algorithm by giving a diversify solution and increasing its exploration property. However, generating many solutions in using a mutation operator resembles as if exhaustive search is being conducted and it may increase the complexity of the algorithm significantly for specially high dimensional problems. Hence, it needs to be wisely implemented.

In Wang et al. (2012), a portion of fireflies with better brightness are selected to be in top firefly list. For each firefly in the top group another firefly from the top fireflies list will be chosen at random and a new solution on the line joining them and near to the brighter one will be computed to replace one of these solutions, if its performance is better. This makes a focus on exploitation, and if the two solutions are in the neighborhood of the same local solution, both will not scape the local solution using this updating operator.

Another modification which can be mentioned in this category is a modification proposed in Bidar and Kanan (2013). The performance of each of the solutions will be recorded and if a hazardous condition, a condition which shows the performance of a solution in previous iterations, is under a given threshold, which means if that solution is not performing well in previous iterations, then it will be relocated to a new solution. Saving the performance of each of the solutions may significantly affect the memory and algorithm complexity, especially for high dimensional problems with wide feasible area where the number of random solutions is expected to increase highly.

3.2.3 New updating formula

In some of the researches done to improve the performance of the standard firefly algorithm modification are done in the updating formula given in Eq. (6).

In Lin et al. (2013), a new updating formula \(x_i:=x_i+\beta (x_j-x_i)(\alpha \varepsilon )\) for a modified \(\beta =\beta _0\gamma (\frac{r_{ij}}{r_{max}})\) for \(r_{max}\) being the maximum distance between fireflies, is given. Here, the random movement expression is multiplied with the attractiveness term with an objective of increase the intensification. However, it simply omits the random movement with a new \(\beta = \beta \alpha \varepsilon \). This means the random term simply affected the attractiveness step length, hence a firefly follows brighter firefly and will not move in a random movement. That is a huge drawback from the exploration point of view. A similar modification is proposed in Hassanzadeh and Kanan (2014) using \(x_i:=x_i+[\beta _0e^{-\gamma r_{ij}^2}(x_j-x_i)+\sum _{h=1}^kA(h)\beta _0e^{-\gamma r_{ih}^2}(x_h-x_i)]\alpha (\epsilon ()-\frac{1}{2})\), where \(A(h)=\frac{f(x_b)}{l(f(x_h)-f(x_b))}\) for l being a new algorithm parameter. The authors started with a wrong claim saying “in the standard firefly algorithm, only one firefly in each iteration can affect others and attract its neighbors”, and hence try to make a firefly to follow better k fireflies. In addition multiplying the attraction term with the random movement will only influence the step length of the attraction and omits the random movement, as discussed above.

Another modification which is given in Amaya et al. (2014), Azad (2011), modifies the update of a firefly \(x_i\) which is attracted to \(x_j\) by updating it based on the vicinity of the brighter firefly. For example in Azad (2011), the update formula used is \(x_i:=x_j+\beta (x_j-x_i) + \alpha (\varepsilon ()-0.5)\). This updating formula is equivalent with \(x_i:=\beta x_i+ (\beta +1)(x_j-x_i)+\alpha (\varepsilon ()-0.5)\) and the second term moves \(x_i\) towards \(x_j\) in a step length larger than the distance between them. That simply is a neighborhood search for \(x_j\) in a specific direction and solutions in between \(x_i\) and \(x_j\) will not have a chance to be explored. However in Amaya et al. (2014), the second term is omitted and the updating formula is given by \(x_i:=x_j+\alpha (\varepsilon ()-0.5)\) which is equivalent with \(x_i:=x_i+(x_j-x_i)+\alpha (\varepsilon ()-0.5)\). This means the attractiveness step length , \(\beta \), is one. That means it will move directly to \(x_j\) and updated based on the random term. Like the first case this one also performs a neighborhood search around \(x_j\). However, when comparing the two modifications, the first case is better as the step length for the neighborhood around \(x_j\) is larger because there is additional movement than the random movement.

The updating formula is modified by multiplying the first term by a decreasing weight term given by \(x_i:=wx_i+\beta (x_j-x_i)+\alpha (\varepsilon - 0.5)\) where \(w=w_{max}-(w_{max}-w_{min})\frac{t}{t_{max}}\). \(\beta _0\) is fixed to be 1, (Tian et al. 2012).

In Kazemzadeh-Parsi (2014, 2015), Kazemzadeh-Parsi et al. (2015), the authors claim that instead of a firefly to move step by step towards better or brighter fireflies, it can move towards a representative point, P, which shows the over all distribution of the brighter firefly. P is is the average location of all brighter fireflies compared to \(x_i\). Based on this the updating formula will be \(x_i:=x_i+\beta (P-x_i) +\alpha (\varepsilon ()-0.5)\). In the standard firefly algorithm, if \(x_j\) attracts \(x_i\) then the brightness of \(x_j\) is used to determine the attraction step length. However, in this modification, the authors didn’t mention if the brightness is also averaged or the brightest firefly’s brightness is used or what other approach they used. In addition taking the average direction, results that the attraction will be the same irrespective of how strong the attraction is, hence, if many weak but better than \(x_i\) solutions exit in a ceratin neighborhood far from global solution while there is a far better solution relatively near \(x_i\), \(x_i\) will be mislead to follow the other solution equally. Furthermore, the computational complexity still will be the same, if we move the solution step by step or computing a representative direction as computing a representative direction itself will be done step by step.

In Shafaati and Mojallali (2012), Hongwei et al. (2015) an updating of solutions with additional attracting towards the global best is given, i.e. in addition to attracting to brighter solutions and move randomly, another term represent an attraction to the global solution is given. Hence, a memory needs to be used to save and update the global solution. In similar manner, in Goel and Panchal (2014), the updating is made with additional two terms. The first is towards the global best, and the second is towards the best solution in that iteration. However, we would like to highlight that moving towards the brightest solution is already included in the second term of the updating equation in the standard firefly algorithm, and adding another attraction means increasing the step length and that may not always give us a result desired.

A modification on the random movement term is given in Yu et al. (2015) based on and to control the diversity of the solutions. The update of the solution will continue using the usual way if the diversity of the solution is acceptable, where the diversity given by \(\frac{1}{NL}\sum \limits _{i=1}^N||x_i-\bar{x}||\) for \(\bar{x}\) is the average location of all fireflies, N is total number of solutions and L is the largest diagonal of the feasible region. If the diversity is below a given threshold, then a new updating formula based on \(\bar{x}\) will be used. The new updating formula is given by \(x_i:=x_i+\beta (x_j-x_i)+\alpha \varepsilon (x_i-\bar{x})\). The intention of this updating mechanism is to diversify the solution set. The random movement term indeed takes the firefly away from the average location of all fireflies and hence the diversity may be achieved. However, it would be better if this term is implemented to some of the solutions with a certain criteria based on their performance, otherwise in some cases it may hinder the search behavior of the algorithm.

\(x_i=\{\begin{array}{l} x_i+\beta _0e^{-\gamma r^2}(x_j-x_i)+\alpha (\varepsilon ()-\frac{1}{2})(x_{max}-x_{min}), \quad \varepsilon >\frac{1}{2} \\ \frac{t_{max}-t}{t_{max}}(1-\eta )x_i+\eta x_b, \; \; \; \; \; \; \; \; otherwise \end{array}\) is another updating forumla where \(\eta \) is calculated based on grey relational analysis, is is proposed in Cheung et al. (2014). This means that in half of the time the usual updating formula will be used and in the rest using the new formula which is equivalent with \(x_i:=x_i+\eta (x_j-x_i)+\frac{t}{t_{max}}(\eta -1)x_i\). Hence the attraction will be there with different step length and the random movement is modified in a certain direction. One of the weakness of this modification is the introduction of a number of new parameters.

After the update is done using the updating formula used in the standard firefly algorithm, additional updating equation called social behaviour, as the authors called, is introduced in Farahani et al. (2011a, b), Kanimozhi and Latha (2013). For each solution at the end of the iteration, it will be updated using \(x_i:=x_i+\alpha (\varepsilon ()-0.5) (1-P)\) where P is a random number from gaussian distribution. This means \(x_i\) is updated using \(x_i:=x_i+\beta (x_j-x_i) + \alpha (\varepsilon ()-0.5) + \alpha (\varepsilon ()-0.5)(1-P)\) which is equivalent with \(x_i:=x_i+\beta (x_j-x_i) + \alpha (\varepsilon ()-0.5)(2-P)\). Hence, the new added mechanism is equivalent with updating the random movement using gaussian distribution.

In Arora and Singh (2014a, b), another modification is proposed. In the modification, the brighter firefly donates some of its features based on a new algorithm parameter called probability of mutation, \(p_m\). It is not mentioned what features and the amount of the features copied from the brighter firefly. However, based on the context it seems, some components of the vectors for \(x_i\) will be replaced by the corresponding components from the brighter firefly \(x_j\). In Arora and Singh (2014b), this operator of coping features from the brighter firefly will replace the updating formula of the standard firefly algorithm whereas in Arora and Singh (2014a), it will be done after the update is taken place using the usual updating scheme. If the usual updating scheme is replaced, the random movement of the standard firefly algorithm will be omitted and that significantly can affect the exploration behavior of the algorithm. Perhaps it is better to add some randomized parameter with the new updating approach. The new approach works similar with the attraction term in the update formula of the standard firefly algorithm, as the distance between the two fireflies decrease resulting \(x_i\) approaches or moved to \(x_j\), by copying some of its components from \(x_j\).

3.2.4 Modification on the structure

In a modification presented in Fister et al. (2013a), each component of a solution, \(x_i(k)\), will be represented by quaternion, \(x_i(k)=q_i^{(k)}=(q_i^{(k)}(1),q_i^{(k)}(2),q_i^{(k)}(3),q_i^{(k)}(4))\), and the updating will be done over the quaternion space. In order to compute the brightness and measure the fitness of the solutions, the euclidian space is used by changing the quaternion space using a norm function, \(x_i(k)=||q_i^{(k)}|| \). It is clear to see that the dimension of the problem will become four fold. However, since it zooms to each component and try to optimize, a better result can be achieved. It would be interesting to explore the search behavior mathematically.

An opposite based learning is used in Verma et al. (2016) to generating the initial solution. Like the standard firefly algorithm N random feasible solutions will be generated and their opposite number is computed using \(x_{min}+x_{max}-x_i\) for each firefly \(x_i\). Hence, the total number of solution will be 2N. The best N solutions will be taken as initial solutions for firefly algorithm.

In Dhal et al. (2015b), starting from the first iteration a parent of N solution and the new updated N children will be merged to form a total of 2N solutions. Then if \(\varepsilon < (0.5-\varphi )\) a solution will be chosen from low performing solutions otherwise from top performer solutions, for \(\varphi \) measuring the diversity based on average, minimum and maximum fitness of the fireflies with \(\varphi \) approaching one implying good diversity.

In the standard firefly algorithm a firefly will move towards all brighter fireflies. However, in modifications done in Yan et al. (2012), Fateen and Bonilla-Petriciolet (2014) a firefly will move not to all brighter fireflies but to a portion of brighter fireflies. For instance, in Yan et al. (2012) a winking parameter is added so that a firefly, \(x_i\), will be attracted to another brighter firefly, \(x_j\), if \(x_j\) is winking. However, even though a firefly is attracted not to all better solutions, no winking method is added in Fateen and Bonilla-Petriciolet (2014), rather a fraction of the total population of top fireflies will be chosen and a firefly will move towards the fireflies in the top fraction if it is brighter than itself. In the modification done in Banati and Bajaj (2011) also, a firefly does not move to all brighter fireflies but by identifying the one which will improve its performance. However, it should be noted that decreasing the performance of a solution is sometimes not bad in order to escape local solutions.

In addition to the modification mentioned, another approach using multi swarm approach in which the fireflies are divided into sub-populations and firefly algorithm is applied on each of these sub-populations for a fraction of the total iteration and with possible migration scheme between the sub-populations is proposed in Dugonik and Fister (2014). Furthermore, these multi swam approach is used for dynamic environment in Farahani et al. (2011c), Abshouri et al. (2011). Using different swarms as a sub-population has the advantage of exploring different promising region for multimodal problems. Furthermore, for high dimensional problems with wide feasible region or a complex problem with high computational complexity, a parallel implementation can easily be implemented using swarm sub-populations. Parallel implementation of firefly algorithm has already been done in some researches Wadhwa et al. (2014), Husselmann and Hawick (2011), Subotic et al. (2012), Paula et al. (2014).

Fig. 11
figure 11

The updating of \(x_1\) towards four brighter fireflies in an iteration, the green arrow represent a random movement using \(x_1:=x_1+\alpha \epsilon \), the broken arrow represent an attraction movement and the red arrow represent the move by \(x_1\) due to an attraction and a random movement, the resultant of the two. (Color figure online)

4 Discussion

As any metaheuristic algorithms the performance of firefly algorithm depends on the appropriate tuning of the algorithm parameters. It basically has two parameters \(\alpha \) and \(\gamma \). \(\alpha \) controls the step length of the random movement of a solution and \(\gamma \) controls the step length of the attraction term. Using the second term of equation (6) a firefly \(x_i\) moves towards other brighter firefly \(x_j\) with a step length \(\beta =\beta _0e^{\gamma r^2}\). Due to this attraction term \(x_i\) moves in the direction of \(x_j\) resulting an exploitation the area in between the two fireflies and the random movement plays as an exploration mechanism if the step length is sufficiently large. Figure 11 shows an iteration update of a solution \(x_1\) towards other four brighter fireflies. However, keeping the step length large until the end of the solution procedure is not reasonable because the solution will jump around the approached solution rather than converging to the nearest optimal solution. So, most of the approaches proposed on \(\alpha \) is making it adaptive to decrease through iterations (Subramanian and Thanushkodi 2013; Liu et al. 2015; Coelho and Mariani 2012; Yan et al. 2012; Goel and Panchal 2014; Shafaati and Mojallali 2012; Wang et al. 2012; Amaya et al. 2014; Shakarami and Sedaghati 2014; Olamaei et al. 2013; Kavousi-Fard et al. 2014; Brajevic and Ignjatovic 2015; Yang 2013; Manoharan and Shanmugalakshmi 2015; Baghlani et al. 2013; Fu et al. 2015; Yu et al. 2015a; Othman et al. 2015) As can be seen from Fig. 5, most of these algorithms decrease quicker than a linear function starting from the first iteration. However, starting to decrease the step length from the beginning will force the algorithm to converge to the nearest, possibly, local solution. Hence, in order to decrease the step length a certain criteria after a proper exploration of the solution needs to be set. It could be based on the performance of the solution. Some attempts are made in this regards (AL-Wagih 2015; Wang et al. 2014b; Yu et al. 2013), i.e. making the step length adaptive based on its performance in previous iterations. This means a possible modification, which will increase the step length and also decrease whenever necessary based on the solutions current situation and previous performance, is perhaps a good area to explore as a possible future work. A further research in this direction may possibly produce a promising result. In some researches, different probability distribution are also used (Farahani et al. 2011a, b; Liu et al. 2015; Brajevic and Ignjatovic 2015; Baghlani et al. 2013; Coelho and Mariani 2013; Amiri et al. 2013; Coelho et al. 2011; Fister et al. 2014; Dhal et al. 2015a; Yang 2010; Kanimozhi and Latha 2013). However, it would be interesting to study their strength and weakness based on a comparison using different class of problems.

In regard to the attraction movement, \(\gamma \) has been modified in a couple of papers. In the standard firefly algorithm \(\beta _0\) is set to be one. Hence the step length depends the value assigned to \(\gamma \). Small value for \(\gamma \) results larger step length of attraction. As can be seen from Fig. 10, different scenarios are presented in which \(\gamma \) increases, decreases or neither through iteration for a fixed \(\beta _0\) (Liu et al. 2015; Coelho and Mariani 2012; Lukasik and Zak 2009; Amaya et al. 2014; Fu et al. 2015; AL-Wagih 2015; Coelho and Mariani 2013; Coelho et al. 2011; Selvarasu et al. 2013; Meena and Chitra 2015; Gandomi et al. 2013; Jansi and Subashini 2015; Abdel-Raouf et al. 2014; Long et al. 2015). Some researches proposed an adaptive attractiveness based on the light intensity of the solutions (Amaya et al. 2014; Wang et al. 2014b; Tilahun and Ong 2012c). Controlling the step length of the attraction based on the light intensity of the attractive firefly is the motivation given in the standard firefly algorithm and proposing an updating for \(\beta \) either \(\beta _0\) or \(\gamma \) based on the current and possibly past performance of the solutions can be studied further.

The brightest solution needs to explore its neighborhood to improve its performance rather exploring other region otherwise it may lose its current value and deteriorate its performance. Hence, there are some researches proposed on how to update the brightest firefly (Tilahun and Ong 2012c; Verma et al. 2016). In addition, it is possible to keep the best solution in memory and go on with the usual updating strategy. Hence, some research utilize memory to save the performance of the solution to direct the updating strategy in future iterations. As far as the memory and time complexity is reasonable, it is a good idea to learn from experience. Promising results are proposed which keeps the best solution or update it in an ’accept only improving’ way (Tilahun and Ong 2012c; Verma et al. 2016).

As can be seen from Fig. 11, the solutions tend to fly over the solution space together so the exploration property is smaller. In order to increase the exploration behavior by increasing \(\alpha \) it will result the best solution to wonder around the solution space. Hence, perhaps different alpha can be used for the best solution and the others, as done in Tilahun and Ong (2014) or use a mutation operator to generate a solution possible away from the pack.

Proposing new modification formula which give a good exploration is an interesting area to explore and surely improve the premature convergence possibly towards local solutions. Some promising studies with mutation incorporation and proposed new formula to increase the exploration property of the algorithm are proposed (Shakarami and Sedaghati 2014; Olamaei et al. 2013; Kavousi-Fard et al. 2014; Amiri et al. 2013; Kazemzadeh-Parsi 2014, 2015; Kazemzadeh-Parsi et al. 2015; Mohammadi et al. 2013; Bidar and Kanan 2013; Yu et al. 2015). It should also be noted that for a solution if its performance decreases may not always a bad idea. For example for a misleading problem where the global solution is far from the packed local solutions, a solution needs to perform weak in order to reach to the global solution and move in a non promising direction to scape local solutions. Perhaps a modification on the updating strategy which incorporates this idea can be done in the future.

Another point that needs attention is the number of algorithm parameters. In order to assign a single parameter introducing more than one parameter may not always be reasonable. Some researches proposed a modification of the standard firefly algorithm, however a number of new parameters introduced and that by itself needs another study. Hence, the number of parameters should be put into consideration. Another issue in modifying firefly algorithm is to map the decision space or the feasible region to an ’easy to search’ space. However, a single study, Fister et al. (2013a), is reported in this aspect and can be an interesting area for future work.

5 Simulation based comparison

A simulation based comparison of the modified versions along with the standard firefly algorithm is done. For the comparison purpose 14 modified versions from strategy level modification are selected along with the standard firefly algorithm. The criterion of selecting these algorithms are clear description which can be replicate for any problem and small number of additional parameters. These algorithms are; FFA1 from Tilahun and Ong (2012c), FFA2 from Verma et al. (2016), FFA3 from Yu et al. (2015b), FFA4 from Kavousi-Fard et al. (2014), Mohammadi et al. (2013), FFA5 from Shakarami and Sedaghati (2014), Amiri et al. (2013), FFA6 from Azad (2011), FFA7 from Amaya et al. (2014), FFA8 from Hongwei et al. (2015), FFA9 from Maidl et al. (2013), FFA10 from Goel and Panchal (2014), FFA11 from Yu et al. (2015), FFA12 from Farahani et al. (2011a, b), Kanimozhi and Latha (2013), FFA13 from Yang (2010) and FFA14 from Tian et al. (2012).

5.1 Benchmark problems

For the simulation purpose forty benchmark problems are used. The benchmark problems are constructed by varying the dimension of ten base problems.

  1. 1.

    First base problem: The first problem is Rastrigin function (Molga and Smutnicki 2016). It is a multimodal, continuous, differentiable and separable problem. It is given as in Eq. (8).

    $$\begin{aligned} f_1(x)=10D+\sum \limits _{i=1}^D(x_i^2-10cos(2\pi x_i)) \end{aligned}$$
    (8)

    The feasible region is \(-5.12 \le x_i \le 5.12, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_1(x^*)=0\).

  2. 2.

    Second base problem: The second problem is Alpine01 function (Gavana 2013). It is a multimodal, continuous, non-differentiable and separable problem, as given in Eq (9)

    $$\begin{aligned} f_2(x)=\sum \limits _{i=1}^D\mid x_isin(x_i)+0.1x_i \mid \end{aligned}$$
    (9)

    The feasible region is \(-10\le x_i \le 10, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_2(x^*)=0\).

  3. 3.

    Third base problem: The third problem is a multimodal, discontinuous, non-differentiable and separable problem (Tilahun et al. 2016; Tilahun 2017). It is given in Eq. (10).

    $$\begin{aligned} f_3(x)=\sum \limits _{j=1}^D \left[ \sum \limits _{i=1}^5 p(i)\lfloor x_j \rfloor ^{5-i}\right] \end{aligned}$$
    (10)

    for \(p=[0.03779 \; \; -0.8405 \; \; 6 \; \; -14.42 \; \; 7.134 ]\). The feasible region is \(-1\le x_i \le 12, \forall i\). The global optimum is found at \(2 \le x^*_i \le 3, \forall i\) with \(f_3(x^*)=-3.82536D\).

  4. 4.

    Fourth base problem: Ackley 2 Function is the fourth test problem (Jamil and Yang 2013). It is a unimodal, continuous, differentiable and non-separable problem given by Eq. (11)

    $$\begin{aligned} f_4(x)=-200e^{0.02 \sqrt{\sum \limits _{i=1}^D x_i^2}} \end{aligned}$$
    (11)

    The feasible region is \(-32\le x_i \le 32, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_4(x^*)=-200\).

  5. 5.

    Fifth base problem: XinSheYang01 is the fifth problem selected (Gavana 2013). It is a multimodal, non-differentiable, separable and stochastic problem. It is given as in Eq. (12).

    $$\begin{aligned} f_5(x)=\sum \limits _{i=1}^D rand_i |x_i|^i \end{aligned}$$
    (12)

    The feasible region is \(-5\le x_i \le 5, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_5(x^*)=0\).

  6. 6.

    Sixth base problem: Cosine Mixture Function is chosen to be the sixth problem (Jamil and Yang 2013). It is a multimodal, discontinuous, non-differentiable, separable problem given as in Eq. (13).

    $$\begin{aligned} f_6(x)=-0.1\sum \limits _{i=1}^D cos(5\pi x_i)-\sum \limits _{i=1}^Dx_i^2 \end{aligned}$$
    (13)

    The feasible region is \(-1\le x_i \le 1, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_5(x^*)=-0.1D\).

  7. 7.

    Seventh base problem: The seventh problem is Schaffer’s F6 Function (Dieterich and Hartke 2012). It is a multimodal, continuous, differentiable and non-separable problem. It is given in Eq. (14).

    $$\begin{aligned} f_7(x)=0.5+\frac{sin^2\left( \sqrt{\sum \limits _{i=1}^Dx_i^2}\right) -0.5}{(1+0.001\sum \limits _{i=1}^Dx_i^2)^2} \end{aligned}$$
    (14)

    The feasible region is \(-100\le x_i \le 100, \forall i\). The optimum solution is found at \(x^*_i=0, \forall i\) with \(f_7(x^*)=0\).

  8. 8.

    Eighth base problem: The eighth problem is a stochastic, multimodal, non-differentiable, separable and a stochastic problem as given in Eq. (15) (Gavana 2013).

    $$\begin{aligned} f_8(x)=\sum \limits _{i=1}^D rand_i |x_i-\frac{1}{i}| \end{aligned}$$
    (15)

    where \(rand_i\) is a random number between 0 and 1 from a uniform distribution. The feasible region is \(-5\le x_i \le 5, \forall i\). The optimal solution is found at \(x^*=(1, \frac{1}{2}, \frac{1}{3}, \ldots , \frac{1}{D})\) with \(f_8(x^*)=0\).

  9. 9.

    Ninth base problem: StyblinskiTang is selected to be the ninth problem (Gavana 2013). It is given in Eq. (16).

    $$\begin{aligned} f_9(x)=\sum \limits _{i=1}^D (x_i^4-16x_i^2+5x_i) \end{aligned}$$
    (16)

    The problem is a multimodal, continuous, differentiable and separable problem. The feasible region is \(-5\le x_i \le 5, \forall i\). The optimal solution is found at \(x^*_i=-2.903534018185960\) with \(f_9(x^*)=-39.16616570377142D\).

  10. 10.

    Tenth base problem: Michalewicz is selected to be the tenth problem (Bingham 2016). It is multimoda, continuous, differentiable and separable problem given as in Eq. (17).

    $$\begin{aligned} f_9(x)=-\sum \limits _{i=1}^D sin(x_i)sin^{2m}\left( \frac{ix_i^2}{\pi }\right) \end{aligned}$$
    (17)

    where \(m=10\) and \(0\le x_i \le \pi , \forall i\). The optimal solution depends on the dimension of the problem. For instnace if \(D=2, 5 \) and 10, then \(f(x^*)= -1.8013, -4.687658\) and −9.66015, respectively (Bingham 2016).

5.2 Simulation setup

The simulations are performed on Intel\(Core^{TM}\) i3-3110M CPU @ 2.40 Ghz 64 bit operating system. MATLAB 7.10.0 (R2010a) is used for these simulations.

Maximum number of iteration is used as a termination criterion and it is set to be 100. In addition, 30 trials are conducted. The same initial solution are used for each versions of the algorithms in each simulation with the number of solutions being 100, 200 and 300 for the 3, 5, 8 and 12 dimensional problems. In addition, \(\gamma =1.2\) and \(\alpha \) is set based on the size of the feasible region. It set to be 5, 2.5, 0.8, 8, 1.2, 0.25, 25, 0.25, 0.25, and 0.4 for problems from the first to the tenth, respectively. Furthermore, for FFA1 \(m=100\); for FFA3 \(p=0.5\); for FFA10 \(\lambda =1\) and for FFA11 the tolerance for diversification, tol, is set to be 0.01. To generate a Levy random direction the inverse erf function is used, i.e \(\frac{1}{2(erfinv(rand(D,1)))^2}\).

5.3 Simulation results

Based on the simulation setup given in the previous section and running the algorithm 30 times the average and standard deviation of the best functional value and also the CPU time is recorded as given in Table 2.

Table 2 Simulation results (T is CPU time, \(\mu \) is mean value and SD is the standard deviation)

To compare the results of the algorithms Friedman test is used (Villegas 2016). The rank of each of the algorithms in the 40 instances of problems is presented in Table 3.

Table 3 Ranking of the algorithms (D is the dimension, f is the problem and r is a rank of an algorithm in the ith problem instance)

The null hypothesis of the test is that there is no significant difference on the performance of the algorithms. The test statistics, T, can be calculated using Eq. (18) for P problems and K algorithms.

$$\begin{aligned} T=\frac{(P-1)[B-\frac{PK(K+1)^2}{4}}{A-B} \end{aligned}$$
(18)

where A is the sum of the square of the ranks of all the algorithms and problems, i.e. \(A=\sum \nolimits _{i=1}^P \sum \nolimits _{j=1}^K r_{ij}^2\) and B is the sum of values produced by squaring the sum of the ranks for each algorithm divided by the number of the problems, i.e. \(B=\frac{1}{P}\sum \nolimits _{i=1}^K\left[ \sum \nolimits _{j=1}^Pr_{ij}\right] ^2\).

For our problem \(A=49274.5\) and \(B=41901.70625\). Therefore, the test statistics, \(T=18.52\). Hence, the null hypothesis is rejected as the test statistics is greater than 0.99 quantile of the F distribution with 14 and 546 degree of freedom. This implies that there are at least two algorithms with significant performance difference. In order to compare the performance of each of the algorithm a pair wise comparison needs to be performed. Hence, for two algorithm i and j, if the difference in their rank fulfills the condition given in equation (19) then the two algorithms have different performance.

$$\begin{aligned} |r_i-r_j|>t_{1-\frac{\alpha }{2}}\sqrt{\frac{2P(A-B)}{(P-1)(K-1)}} \end{aligned}$$
(19)

where \(t_{1-\frac{\alpha }{2}}\) is the \(1-\frac{\alpha }{2}\) quantile of the t-distribution with \((P-1)(K-1)\) degrees of Freedom. For our case with \(\alpha =0.01\), \(t_{1-\frac{\alpha }{2}}=2.585\). Hence, \(t_{1-\frac{\alpha }{2}}\sqrt{\frac{2P(A-B)}{(P-1)(K-1)}}=84.9621\). Based on this value and the difference in sum of ranks given in Table 4, it is possible to see which algorithms performs in similar way and which algorithm outperforms the other. If the entry is greater than 84.9621 then the corresponding algorithm in the column performs better than the corresponding algorithm in the row and if the entry is less than −84.9621 then the algorithm corresponding to the row performs better than the corresponding algorithm in the column. If the entry is in the interval [−84.9621, 84.9621] then the performance of the two algorithms is the same.

Table 4 Rank difference (each entry is the subtraction of the rank of the algorithm in the column from the rank of the algorithm in the row)

A similar analysis can be done for the CPU time. The results are demonstrated on the Fig. 12.

Fig. 12
figure 12

The simulation result based on Friedman test. a is for the minimum functional value equivalence and b is CPU time equivalence, as going down the efficiency decrease and the arrow between the algorithm indicates that the two algorithms have similar performance

Fig. 13
figure 13

Simulation result based sorting of the algorithm

Hence, based on the simulation result FFA4, FFA2 and FFA1 are the best performing whereas FFA13 is the algorithm with least performance. However, if we look at the CPU time analysis, FFA9 is the algorithm with smaller CPU time whereas FFA5, FFA4, FFA13 are time expensive algorithms. Hence, based on the resource availability and the sensitivity of the problem, a user can decide which version of firefly algorithm to implement. For instance, if the problem is very sensitive and sufficiently enough time can be given FFA4 will be the best choice whereas if a quick solution is needed with a reasonable solution perhaps FFA1 is the best choice as it is among the top performers in terms of efficiency and also not among the worst in terms of CPU time. FFA2 will be the next best choice after FFA1 as it needs slightly more time than FFA1 to run but among the top efficient versions of firefly algorithm.

5.4 Discussion

Based on the simulation result on the forty problems, we can roughly categorize the algorithms into five categories. The first one contains algorithms which are computationally expensive but effective versions of the algorithm. It includes FFA4, FFA2 and FFA5. The second category include those algorithms which are computationally expensive and not so effective compared to the others and it includes FFA13 and FFA8. The third category is when the computational time is smaller and effective algorithms, which includes FFA1 and FFA7. The fourth category includes FFA9, FFA12 and FFA11, where the computational time smaller and their performance is not better when compared with the others. The last category is a category in the middle both in the performance as well as running time. This category includes FFA, FFA3, FFA6, FFA10 and FFA14. It is summarized in the Fig. 13.

If a sensitive problem is being solved with sufficient time then the algorithms in the first category are suitable whereas if a reasonable solution is needed within relatively short period of time the algorithms in the third category. The simulation is done based on a fixed parameter setting, which is based on recommendations from literature. It should be noted that by adjusting the parameters some of the algorithms can perform better. Hence, possible future works includes a detailed comparison with a wide number of problems and different parameter tuning. In addition the test problem used and their dimension is limited. The aim of this paper is to do a detailed theoretical analysis and have an initial simulation based comparison where advanced simulation based comparison is left for future works.

6 Conclusion

In this paper a detailed review of the standard firefly algorithm with its modified versions for continuous problems is discussed. Each of the modification is discussed and categorized based on two basic categories. The first category is parameter level modification where the algorithm parameters are modified to boost the performance of the algorithm. The step length for the random movement \(\alpha \) has been made to decrease through iteration and different probability distribution is used. Decreasing of \(\alpha \) is a good idea but the starting iteration in which \(\alpha \) start decreasing or conditions needs to be studied. The same is done for \(\gamma \) with a decreasing, increasing and a neither increasing nor decreasing updating scheme, each with different strength, is proposed. In some research \(\beta _0\) is made to be computed based on the light intensity of the fireflies and that is the original idea of the algorithm. The second category is strategy level category in which, modified movement for the best and worst solution, mutation incorporation, new updating formula and modifications on the structure is presented. In the movement modification of the brightest firefly, the best performance should not get lost. Hence, either a memory should be used to store the global best or only improving solutions should be accepted. Mutation operators are used to compute a new solution away from the swarm movement and that is one promising way of increasing the exploration property of the algorithm. Some research modified the updating formula in such away that a better exploration property is added to the algorithm. Change of search space to ’an easy to search’ space and different probability distribution function to guide the random movement is also proposed. Each of the modified versions are presented and their weakness along with good aspects is reported. Possible future works are also outlined.