1 Introduction

Optimization is the process of searching for the optimal solution for a specific problem. Mathematically, the optimal solution usually refers to the maximum or minimum of a function. Traditionally, deterministic algorithms based on numeric linear and nonlinear programming methods have been the only methods available for solving optimization problems. Because of the massive exploitation of gradients, deterministic algorithms are capable of obtaining the global optimum in certain simple and ideal systems. However, with the increasing complexity of practical problems in recent decades, the flaw in which deterministic algorithms become caught in local optima has become more prominent. Deterministic algorithms have difficulty finding global optimal solutions when applied to optimization problems with a large number of local solutions and constraints. Therefore, new optimization algorithms must be studied to overcome the drawbacks of traditional methods.

Stochastic algorithms are a class of optimization methods that search for global optimal solutions using random operators. The random strategy can effectively help the algorithms to escape the local optimum. In contrast to deterministic algorithms, stochastic algorithms do not rely on substantial gradient information during the optimization process; instead, these algorithms evaluate objective functions multiple times. Therefore, from a stochastic perspective, optimization problems are generally equivalent to black boxes, which is why these algorithms are more widely applied and are more suitable for solving an unknown search space than are traditional methods. Due to the abovementioned advantages, stochastic algorithms have replaced deterministic algorithms to become the main methods for solving modern optimization problems.

In general, stochastic algorithms can be divided into two categories: single-based and population-based algorithms. As the name implies, single-based algorithms generate only one random solution that is updated throughout the optimization process. Hill climbing [24] and simulated annealing (SA) [13] are two typical single-based algorithms. Hill climbing is a simple greedy search algorithm that selects an optimal solution from the neighborhood of current solutions until a local optimal solution is reached. SA simulates the annealing process of solid materials and can accept a solution that is worse than the current one with a certain probability. This mechanism helps to enhance the exploration capability of SA.

Although single-based algorithms have the advantages of lower computational cost and fewer function evaluations, they tend to converge prematurely. Population-based algorithms have a superior ability to avoid local optima at the expense of increased computational costs and function evaluation times. In general, these costs are acceptable. Thus, population-based algorithms are more suitable for solving modern complex optimization problems than are single-based algorithms.

In contrast to single-based algorithms, population-based algorithms create multiple solutions randomly and improve them over the course of optimization. The most used algorithms are the genetic algorithm (GA) [9], particle swarm optimization (PSO) [6] and ant colony optimization (ACO) [5]. GA is based on Darwin’s theory of biological evolution and the survival of the fittest. In the algorithm, optimization is treated as a process of biological evolution, and the population is improved via selection, crossover and mutation, with poorer individuals phased out and superior individuals retained and allowed to reproduce. PSO is based on the flocking behavior of birds. Each particle position is updated based on its individual best position and the global optimum position found by the swarm, and current velocity information is also incorporated. The ACO is inspired by the foraging behavior of ants. Ants transfer information among individuals, which enables them to identify the shortest path between the nest and food source. According to the source of inspiration, algorithms can generally be classified into three principal categories: evolutionary, based the laws of natural selection; swarm intelligence, based on the behavior of groups of animals in nature; and physics, based on the physical rules of nature. Some of the classic and novel algorithms of each subclass are as follows:

Evolutionary: differential evolution (DE) [25], self-organizing centroid optimization (SOC-opt) [3], and monkey king evolution (MKE) [21].

Physics-based: gravitational search algorithm (GSA) [23], electro-search (ES) [27], and thermal exchange optimization (TEO) [12].

Swarm intelligence: artificial bee colony (ABC) [11], cuckoo search (CS) [32], firefly algorithm (FA) [31], grey wolf optimizer (GWO) [18], spider monkey optimization (SMO) [2], moth-flame optimization (MFO) algorithm [16], crow search algorithm (CSA) [1], whale optimization algorithm (WOA) [17], moth search (MS) [28], and satin bowerbird optimizer (SBO) [19].

Other algorithms have different sources of inspiration, such as teaching-learning-based optimization (TLBO) [22], joint operations algorithm (JOA) [26], and kidney-inspired algorithm (KA) [10].

The above algorithms share the basic idea that they gradually narrow the search range and constantly improve the accuracy based on an approximate global optimal solution, which is found by performing a wide range of searches. The main difference among the algorithms stems from the different degrees of emphasis on exploration and exploitation. Exploration embodies the global search capability of the algorithm, and exploitation represents the local search capability around the near-optimal solution. However, these two capabilities are conflicting, with one bound to weaken if the other is strengthened. For example, certain algorithms select only the individuals with better performance during the optimization process and ignore individuals with worse performance. The excessive pursuit of exploitation ability compromises the performance of algorithms through a lack of knowledge of population diversity. However, for any optimization problem, the exact tradeoff between exploration and exploitation in the algorithm is unknown. Therefore, identifying a superior method for balancing exploration and development remains the focus of optimization algorithms. Furthermore, the conflict between exploration and exploitation is especially pertinent when considering large-scale problems that adversely affect the efficiency of the algorithm. Therefore, in addition to effectively balancing exploration and exploitation, an adjustment strategy is required for a successful algorithm. This strategy can adjust the search direction of the individual through the individual’s own search experience or through the gradient information of the objective function to improve the overall search efficiency.

Although new algorithms are constantly emerging and being applied in various engineering fields, a fundamental question remains: is it meaningful to propose additional algorithms? The answer to this question is affirmative. According to the no-free-lunch theorem [30], none of the algorithms can be applied to all optimization problems. From this perspective, the no-free-lunch theorem can be regarded as a conservation law: if the algorithm performs well for certain issues, it will inevitably perform poorly for other problems. Namely, the average performance of an optimization method is the same if all optimization problems are considered. Thus, a number of specific optimization problems must be addressed by new algorithms rather than current optimization techniques. Therefore, a novel population-based algorithm inspired by the behavior of birds during the foraging process, called birds foraging search (BFS), is proposed. This algorithm works in three phases: exploration, exploitation and mutation. The main contribution of this paper is to propose a new population-based algorithm to further solve the global optimization problem. Compared with existing methods, BFS has no redundant control parameters and is thus easy to implement and to adapt to a wide variety of applications. Furthermore, BFS has advantages in terms of accuracy, convergence rate and stability.

The rest of this paper is organized as follows. Section 2 discusses the analogy between birds foraging and population-based algorithms. Section 3 describes the mathematical model of the BFS algorithm in detail. Section 4 describes the experimental setting and demonstrates the experimental results. Section 5 presents conclusions and suggestions for further work.

2 Comparison of birds foraging and population-based algorithms

Birds live and breed on all seven continents and in most terrestrial habitats. Approximately 10,000 species of birds are found worldwide. Foraging behavior accounts for the majority of their daily activity: birds need to constantly search for food to survive and adapt to changing habitats. In general, three types of behavior are observed during the foraging process: flying search behavior, territorial behavior, and cognitive behavior. First, birds rely on their flight advantages to conduct extensive reconnaissance of the ground from the air to discover new food. Raptors are typical birds with long, broad wings and sharp vision, and they use air currents for long circling flight to extensively search for prey in vast airspace [20]. Second, most birds, such as red-winged blackbirds and white-bellied antbirds, present territorial behavior [7, 33]. This process can be described as follows: certain birds in the group will occupy the area as their habitat or rut after discovering rich resources, and when other birds approach the territory, the individuals that occupy the area will drive them away by twittering to protect their territory. Finally, certain birds, such as hummingbirds and crows, exhibit cognitive behavior during foraging—a result of strong memory and learning abilities [8, 29]. For example, hummingbirds can perform targeted searches based on prior experience, which helps them judge the amount of nectar in flowers according to shape or color. If, based on the birds experience, a flower is judged to be of high quality, then the hummingbird will further exploit this flower; otherwise, the hummingbird will quickly leave and search for the next target. Thus, the efficiency of food searching is improved.

An appropriate tradeoff of exploration and exploitation is the key to obtaining powerful optimization capabilities for all population-based algorithms. Exploration is employed to enhance the ability to jump out of local optima by searching new regions using a randomization method. Exploitation is applied to improve the convergence speed of the population-based algorithms by identifying a better solution near the current optimal solution. With the development of human society, a number of extremely complex practical optimization problems continue to emerge, and “efficiency” is one of the major challenges. These problems include complicated objective functions with a large number of decision variables. Moreover, the optimal solutions for certain nonlinear constrained optimization problems with many local optima are difficult to determine, and the solving processes consume large amounts of time, which leads to low efficiency. In such situations, simply balancing exploration and exploitation is not sufficient, and an adjustment strategy for the algorithms based on previous gradient information is often required to improve the searching efficiency.

A comparison between the foraging process and the structure of population-based algorithms provides a number of interesting analogies. When explicit food information is missing, birds must expand their search area and cover as much of the search space as possible to avoid missing valuable information. Thus, flying search behavior has a role similar to that of exploration. When a bird occupies a territory, other birds will obtain directional information, which attracts them to the target area, and the search area is gradually reduced to the target area. This behavior is similar to the exploitation ability of algorithms. Cognitive behavior is similar to the adjustment strategies of algorithms. Birds will improve the search efficiency of the overall algorithm according to previous search experience, which further improves the local exploitation ability based on the balance of exploration and exploitation.

In short, these similarities provide new ideas for the novel BFS algorithm presented here. Concrete details of the BFS algorithm will be presented in the next section.

3 Birds foraging search algorithm

According to the previous analysis, the BFS can be divided into three phases: flying search behavior phase, territorial behavior phase and cognitive behavior phase.

The BFS aims to find the optimal food source using individuals. The position of the optimal food source is represented by the optimal solution, and the position of each bird should be equivalent to the position of the corresponding food source, which is a feasible solution of the optimization problem. The amount of food provided by a food source (the value of the fitness function) represents the quality of the solution. The number of birds is equivalent to the number of feasible solutions. Similar to other optimization algorithms, initialization is performed before implementing the three phases. The initial positions of the birds in space are represented as follows:

$$ X_{i} = UB - r_{1} \cdot (UB - LB) $$
(1)

where \( X_{i} \) is the i-th position of the bird, \( UB \) and \( LB \) represent the upper and lower bounds of the variables, respectively, and \( r_{1} \) is a random value in the range \( [0,1] \).

Before the three phases are elaborated, four ideal rules of the BFS are listed as follows:

  1. 1.

    A bird in the algorithm does not refer to a particular type of bird but represents an artificial bird with the characteristics of all aforementioned birds.

  2. 2.

    All birds live in flocks.

  3. 3.

    Only the current best individual possesses territory.

  4. 4.

    The territory is regarded as a space particle whose position is the same as that of the current best individual.

3.1 Flying search behavior phase

Raptors, such as eagles, often hover over an area with the potential to contain abundant food resources. After intensive study, researchers have determined that the flight path of a raptor is similar to the shape of a logarithmic spiral [15]. The following logarithmic spiral formula [16] can be used to describe this flight path:

$$ X_{i}^{iter + 1} = D_{i - p} \cdot e^{\theta } \cdot \cos (2\pi \theta ) + PA^{iter} $$
(2)

where \( i \) is a random index selected from \( \left[ {1,2,3, \ldots , \, N} \right] \), with \( N \) representing the population size; \( \theta \) is a random number in the range \( [ - 1,1] \); \( D_{i - p} = \left| {X_{i}^{iter} - PA^{iter} } \right| \) indicating the distance of the i-th bird from the potential area, with \( X_{i}^{iter} \) representing the position of the i-th bird at the \( iter \)-th iteration; and \( PA^{iter} \) is the position of the potential area at the \( iter \)-th iteration, which indicates the current best solution, \( X_{best}^{iter} \). The new position will replace the old position if it has a better fitness value.

As shown in Fig. 1, the value of \( \theta \) can be used to define the proximity of the next position of the bird to the potential area (\( \theta {\kern 1pt} {\kern 1pt} {\kern 1pt} { = }{\kern 1pt} - 1 \) is the closest position to the potential area, and \( \theta {\kern 1pt} {\kern 1pt} {\kern 1pt} { = }{\kern 1pt} {\kern 1pt} 1 \) is the farthest). In detail, the bird will fly around the potential area instead of simply approaching it in a straight line (as shown in B and D). The most distinguishing characteristic of this movement is that each bird can fly along any direction in the potential area. In other words, the next position of the individual can be any direction from the best solution, which means that the next position is outside (as shown in C) or inside (as shown in E) the space between the current position and the potential area. Therefore, the flight mode has strong global search capability and simultaneously guarantees the exploitation capability of the algorithm.

Fig. 1
figure 1

Logarithmic spiral updating position in flying search behavior

3.2 Territorial behavior phase

In this phase, the current best individual is called the territory bird, and the other birds are called incursion birds. These two types of birds have different types of movement. The territory bird patrols small areas around its territory to find better food sources and protect its territory from the approach of other birds. This behavior can be formulated as follows:

$$ X^{T,iter + 1} = X^{T,iter} + r_{d} \cdot \lambda $$
(3)

where \( X^{T,iter} \) is the position of the territory bird in the \( iter \)-th iteration; \( r_{d} \) is a random value in the range \( [{ - }1,1] \), which represents the direction of the search; and \( \lambda \) is a scale factor that lets the bird move slightly around its current position. Here, \( \lambda \) is set to \( (X^{T,iter} - X^{S,iter} ) \), where \( X^{S,iter} \) is the position of the current suboptimal individual. \( X^{T,iter + 1} \) is accepted if it provides a better fitness value.

After an area is occupied by the territory bird, all incursion birds will move toward the territory bird, and the territory bird will drive other birds away by calling to defend its territory. This case includes two states.

State 1: The incursion bird \( j \) is not affected by the warning of the territory bird and moves quickly to the territory bird, and its position is updated as follows:

$$ X_{j}^{I,iter + 1} = X_{j}^{I,iter} + r_{2} \cdot (X^{T,iter} - IF \cdot X_{j}^{I,iter} ) $$
(4)

where \( X_{j}^{I,iter} \) is the position of the j-th incursion birds at the \( iter \)-th iteration; \( r_{2} \) is a random value in the range \( [0,1] \); and \( IF \) is the incursion factor that determines the positions of incursion birds to be changed. Figure 2 provides a vector representation of this state using different \( IF \) values. As shown in Fig. 2, \( IF{ = }1 \) results in a local search (\( X_{j}^{{I,iter{ + }1}} \) lies between \( X_{j}^{I,iter} \) and \( X^{T,iter} \)), whereas \( IF{ = }2 \) leads to a global search (\( X_{j}^{{I,iter{ + }1}} \) is outside of \( X_{j}^{I,iter} \) and \( X^{T,iter} \)). To balance the exploration and exploitation capabilities, the value of this factor is 1 or 2 and represents a heuristic step that is selected randomly with equal probability as \( IF = round[1 + rand(0,1)\{ 2{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} - 1\} ] \).

Fig. 2
figure 2

Vector representation of an incursion bird’s movement with different incursion factors in state 1

State 2: The warning of the territory bird produces the intended effect. The incursion bird \( j \) is frightened and flees to the search space randomly. This process is described by the following formula:

$$ X_{j,d}^{I,iter + 1} = X_{j,d}^{I,iter} { + }r_{3} \cdot (X_{k,d}^{I,iter} - X_{m,d}^{I,iter} ){ + }r_{4} \cdot (X_{l,d}^{I,iter} - X_{h,d}^{I,iter} ) $$
(5)

where \( j,k,m,l,h \in \{ 1,2,3, \cdots ,N - 1\} \), \( j \ne k \ne m \ne l \ne h \), \( d \in \{ 1,2,3, \cdots ,D\} \), with \( D \) indicating the dimension of the problem; and \( r_{3} \) and \( r_{4} \) indicate random values in the range \( [0,1] \).

Overall, states 1 and 2 can be expressed as follows:

$$ \left\{ {\begin{array}{*{20}llll} {X_{j}^{I,iter + 1} = X_{j}^{I,iter} + r_{2} \cdot (X^{T,iter} - IF \cdot X_{j}^{I,iter} )\quad if\;P^{iter} \le rand} \hfill \\ {X_{j,d}^{I,iter + 1} = X_{j,d}^{I,iter} { + }r_{3} \cdot (X_{k,d}^{I,iter} - X_{m,d}^{I,iter} ){ + }r_{4}\cdot (X_{l,d}^{I,iter} - X_{h,d}^{I,iter} )\quad otherwise} \hfill \\ \end{array} } \right. $$
(6)

where \( P^{iter} \) is the probability that the warning is effective, and \( rand \) is a random value in the range \( [0,1] \). Over time, incursion birds will gradually discover the territory bird’s bravado. In addition, the physical strength of the territory bird will also decline. Therefore, the warning will gradually lose efficacy. Accordingly, \( P^{iter} \) is determined as follows:

$$ P^{iter} { = 1} - \frac{iter}{{Max{ - }iter}} $$
(7)

\( P^{iter} \) decreases linearly with iteration from 1 to 0. From the perspective of optimization, \( P^{iter} \) enables the algorithm to perform a global search with a larger probability in the early stages and then gradually reduces the search space and emphasizes exploitation in the later stages. Thus, \( P^{iter} \) balances the exploration and exploitation abilities of the algorithm. Finally, the incursion bird \( j \) updates its position only if the new position is better; otherwise, its position remains unchanged.

In addition, a role-switching mechanism is implemented throughout the search process. If an incursion bird finds a better area than that found by other incursion birds and the current territory bird, then in the next searching round, the incursion bird becomes the new territory bird and the original territory bird joins the team of incursion birds. This role-switching mechanism helps the algorithm avoid becoming trapped in local optima.

3.3 Cognitive behavior phase

In general, cognitive behavior is a process of self-learning that is based on accumulated historical experience; thus, birds can avoid unnecessary searching and improve their foraging efficiency based on experience gained from previous searches. To implement this process, we compare the current and last retained position information, the results of which constitute the experience for the next search. The entire cognitive behavior phase includes two states.

State 1: Birds continuously find better sources of food [i.e., the previous two positions are different (\( X_{i}^{iter} \ne X_{i}^{iter - 1} \))]. This state indicates that the search direction is correct and promising. As a result, birds will learn according to the original gradient information. This targeted search can accelerate the convergence rate of the algorithm. The following formula illustrates this situation:

$$ X_{i}^{iter + 1} = X_{i}^{iter} + r_{5} \cdot (X_{i}^{iter} - X_{i}^{iter - 1} ) $$
(8)

where \( X_{i}^{iter} \) and \( X_{i}^{iter - 1} \) are the positions of the bird at the \( iter \)-th iteration and the \( iter{ - }1 \)-th iteration, respectively; and \( r_{5} \) is a random value in the range \( [0,1] \).

State 2: Birds continuously search but fail to find better results [i.e., the previous two positions are the same (\( X_{i}^{iter} {\kern 1pt} { = }X_{i}^{iter - 1} \))]. The search route is changed randomly because experience tells the birds that the original search direction is no longer applicable (the algorithm may fall into local optima). The process is implemented based on a Gaussian distribution:

$$ X_{i}^{iter + 1} = Gaussian(X_{best}^{iter} ,\xi ) $$
(9)

The objective of formula (9) is to enable the birds to seek feasible positions within a region determined by the best individual. \( Gaussian(X_{best}^{k} ,{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} \xi ) \) is a random number sampled from a Gaussian distribution with a mean value of \( X_{best}^{k} \) and standard deviation of \( \xi \). \( X_{best}^{k} \) is the best solution in the k-th iteration, and \( \xi \) is calculated as follows:

$$ \xi = {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} (\log (iter)/iter){\kern 1pt} \cdot {\kern 1pt} abs(X_{best}^{iter} - r_{6} \cdot X_{i}^{iter} ) $$
(10)

where \( r_{6} \) is a random value in the range \( [0,1] \) that is used to enhance the distribution of new individuals when \( X_{best}^{iter} \) is equal to \( X_{i}^{iter} \); and \( \log (iter)/iter \) is used to adjust the size of the standard deviation. The value of \( \log (iter)/iter \) decreases as the number of iterations increases, and the standard deviation decreases accordingly. As a result, this process leads to a more concentrated stochastic distribution and a slight disturbance around the global best solution, which improves the local search capability. Similarly, \( X_{i}^{{iter{ + }1}} \) is accepted only if its fitness value is better than that of \( X_{i}^{iter} \).

In addition, certain individuals may search beyond the borders. To prevent this invalid search, we introduce a border control strategy as follows:

$$ X_{i,d}^{iter} = UB - r_{7} \cdot (UB - LB){\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} if{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} X_{i,d}^{iter} < LB{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} or{\kern 1pt} {\kern 1pt} {\kern 1pt} X_{i,d}^{iter} > UB $$
(11)

where \( X_{i,d}^{iter} \) is the d-th dimension of the i-th solution at the \( iter \)-th iteration, and \( r_{7} \) is a random value in the range \( [0,1] \). The overall procedure of the standard BFS algorithm is shown in Algorithm 1.

figure a

4 Salient features of BFS

Our BFS algorithm is similar to previously proposed algorithms in certain respects. For example, similar to most population-based algorithms, the BFS represents the candidate solution by only its position. Moreover, during the flying search behavior phase, the spiral flight strategy is inspired by the MFO algorithm. Finally, during the territorial behavior phase, we are inspired by “DE/rand/2” to simulate the random escape of incursion birds. However, as a new optimization algorithm, the BFS presents certain salient features that differentiate it from existing algorithms. First, although certain algorithms, such as the PSO and CSA algorithms, also imitate bird foraging behavior, the BFS algorithm is different from these algorithms in terms of both the overall structure and the specific mathematical model. The BFS uses a flying search behavior phase, territorial behavior phase, and cognitive behavior phase to search for the optimal solution, which represents the first application of existing algorithms. Second, many population-based algorithms contain several parameters that need to be tuned before running. For example, the ABC should be tuned to a predetermined cycle (limit). The SMO algorithm considers the following four parameters: perturbation rate, local leader limit, global leader limit and maximum group. By contrast, specific control parameters do not need to be adjusted for the BFS. In other words, compared with previous algorithms with multiple control parameters, the BFS is easy to implement and is adaptable to a wide variety of optimization problems, which represents one of the great advantages of BFS. Third, the search steps of many algorithms, such as the CS, CSA and SBO algorithms, are fixed constants. In the BFS, when an individual moves closer to the optimal area, we use the intrusion factor [reflected in Eq. (4)] instead of the traditional fixed search step to balance exploration and exploitation. Fourth, from the population update perspective, the position update equations of many algorithms, such as DE, ABC and SMO, are based on the difference vector. The BFS is updated in a variety of ways. In addition to the difference vector, a spiral search mode and Gaussian variation are included. Multiple update methods also increase the probability that the BFS will escape the local optimum. Moreover, to improve the search efficiency, we are inspired by the cognitive behavior of the birds and use the individual’s own gradient information to improve the overall exploitation ability of the algorithm, which is a novel concept. The abovementioned salient features of the BFS demonstrate that it is different from existing algorithms.

5 Experimental study

To verify the performance of the BFS algorithm, experimental studies were performed on classical and modern challenging benchmark functions in this section. All the experiments are conducted in MATLAB 2016a in a Windows 10 environment using a 2.60 GHz Intel(R) Core(TM) i7-6700HQ computer with 16 GB of RAM.

5.1 Experiment I: classical benchmarks

In total, 13 standard benchmark functions are used to test the relative performance of the BFS algorithm on classical benchmark functions. These functions are the classical functions utilized by many researchers [1, 16, 18, 19]. These functions are characterized as being unimodal, multimodal, separable and nonseparable. Nonseparable functions are more difficult to handle than separable functions because each variable in a function is related to all other variables. The difficulty of a problem also increases with the function’s dimensionality. Some functions represented by f08 are highly deceptive because the global optimum of the function is far from the suboptimal value, and the algorithm is often induced to converge in the wrong direction. Some functions represented by f10 have global optimal values on the edge. If the algorithm does not have strong exploration capability, the global optimal value is difficult to find. In addition, some functions with flat surfaces, such as f05 and f06, are difficult to handle because the flatness of the function does not provide the algorithm any useful information from which to find the global optimum. In this section, to examine the exploration and exploitation of algorithms, these benchmark functions are divided into two respective categories: unimodal (f01f07) and multimodal (f08f13). Unimodal signifies that the function has only one global optimal value, which can be used primarily to test the exploitation ability of the algorithm. Multimodal signifies that the function has many local optimal values, which can be used to investigate the exploration ability of the algorithm. Table 1 summarizes the 13 benchmarks, including the cost functions, bounds of variables and optimal values.

Table 1 Description of the classic benchmark functions (U: unimodal, M: multimodal, Dim: dimension)

Our BFS algorithm is compared with 6 well-known algorithms that have been widely used in various fields: PSO [6], DE [25], ABC [11], CS [32], GSA [23], and FA [31]. The maximum number of function evaluations is set to 300,000 for f01f13. Each algorithm is performed independently for the test functions 30 times. The control parameter settings for the compared algorithms in this test are as follows:

  1. 1.

    PSO: \( \omega = 0.6 \) as a weight factor and \( c_{1} = c_{2} = 2 \) as in [4];

  2. 2.

    DE: \( F = 0.5 \) and \( CR = 0.9 \) as in [25];

  3. 3.

    ABC: \( limit = (N \cdot D)/2 \) and size of employed bee=onlooker bee=(colony size)/2 as in [11];

  4. 4.

    GSA: \( G_{O} = 100 \) and \( \alpha = 20 \) as in [23];

  5. 5.

    CS: \( \beta = 1.5 \) and \( p_{a} = 0.25 \) [32];

  6. 6.

    FA: \( \alpha = 0.2 \), \( \beta_{0} = 1 \), and \( \gamma = 1 \) [31].

For a fair comparison, all the algorithms should have equal function evolutions for the same number of maximum iterations. In terms of the structure of the algorithms, the BFS and CS possess three phases and two phases, respectively, and the other algorithms have only one phase. Therefore, the population size of the BFS is set to 50, the population size of the CS is set to 75, and the population sizes of the remaining algorithms are set to 150. As a result, all the algorithms require the same number of function evolutions for each iteration.

Table 2 summarizes the results obtained by applying the BFS and the six abovementioned algorithms to unimodal functions (f01f07). To perform a comprehensive and reliable comparison, each algorithm is evaluated using the best, worst, and mean solution and the corresponding standard deviations based on 30 independent runs. For convenience, the best and mean solutions and the standard deviations are represented by “Best”, “Mean”, and “SD”, respectively. The algorithms are ranked from the smallest to largest mean value. Moreover, we calculate the average ranks and obtain the overall rank to reflect the performance of each algorithm more intuitively. The best results among all algorithms for each function are shown in bold text. Table 2 shows that the BFS outperforms the PSO, DE, ABC, CS, GSA and FA for f01f04, f06, and f07, whereas its performance is not superior for f05. The findings show that the BFS ranks first among the algorithms. The ABC algorithm shows the best performance for f05, and the BFS presents the second-best performance for this function. The main purpose of investigating unimodal functions is to examine the exploitation capability of the algorithms. Thus, the convergence rate is more important for the algorithm than the final result. Figure 3 depicts the graphical results of all studied algorithms for 6 typical functions and shows that the BFS has a faster convergence rate than do the other algorithms for four test functions.

Table 2 Optimization results of the unimodal functions for the comparative algorithms
Fig. 3
figure 3

Convergence curves of PSO, DE, ABC, CS, GSA, FA and BFS for 6 classic unimodal benchmarks

The experimental results obtained for PSO, DE, ABC, CS, GSA, FA, and BFS on multimodal functions (f08f13) over 30 independent runs are presented in Table 3. Compared with unimodal functions, multimodal functions contain many local minima whose number increases exponentially as the problem size increases. Therefore, such functions are often used to test the global search capability of algorithms. As shown in Table 3, the BFS shows excellent performance for multimodal functions. For the 6 evaluated multimodal functions, the BFS achieves the best results for all test functions except f08 and f13. The DE presents promising performance for optimizing f11 and f13, and the ABC performs the best for f08. Moreover, for all multimodal functions, the BFS presents the overall best solutions, which implies that the BFS has a powerful ability to search for globally optimal solutions. To demonstrate the convergence characteristics of the seven overall algorithms, we also select four functions and plot their convergence curves for 30 independent runs (Fig. 4). As shown, the BFS outperforms the compared algorithms in terms of the convergence rate.

Table 3 Optimization results of the multimodal functions for different algorithms
Fig. 4
figure 4

Convergence curves of PSO, DE, ABC, CS, GSA, FA and BFS for 4 classic multimodal benchmarks

In total, 13 functions are used to test whether significant differences exist among the BFS and its competitors. The Wilcoxon signed rank test, a nonparametric test method that is not subject to the overall distribution of the samples and can be used to test for statistically significant differences between two samples, is used here at the 0.95 confidence level (\( \alpha = 0.05 \)). Table 4 shows the statistical results for each function, and the signs “+”, “≈” and “−” correspond to the following three cases: the BFS is significantly better than, significantly worse than and nonsignificantly different from the compared algorithms. “R+” denotes the sum of the ranks for which the BFS excelled compared with the corresponding competitor, and “R–” represents the sum of the ranks for the opposite case. The results in the last row of Table 4 with “+/≈/–” show that the BFS has a greater number of “+” signs than its competitors, which reflects that the BFS has excellent performance compared to that of all other algorithms in the Wilcoxon signed rank test at the 0.95 confidence level.

Table 4 The results of the Wilcoxon signed ranks test based on the best solution for each function among 30 independent runs (α=0.05)

5.1.1 Time consumption analysis of BFS

Time is an important indicator for measuring the performance of algorithms, and implementation processes that require less time are preferable. Thus, the mean time spent on the BFS and the other 6 algorithms over 30 independent runs for the 13 test functions is summarized in Table 5. The mean time of each algorithm is ranked from short to long. The results show that the PSO algorithm consumes the least time and ranks first, followed by the CS and ABC algorithms. The BFS algorithm ranks fourth but is significantly better than the remaining three algorithms. The BFS is followed by the DE and FA algorithms; the GSA is the most time-consuming algorithm and ranks last. Based on the excellent performance for the 13 benchmark functions, the slight increase in the time consumption of the BFS is acceptable.

Table 5 Mean time consumption in seconds for PSO, DE, ABC, CS, GSA, FA and BFS with 30 independent runs on 13 classic functions

5.1.2 Search behavior and parameter analysis of BFS

The results of the preliminary studies showed that the BFS algorithm presents excellent performance when solving optimization problems. However, before further discussing the BFS, a question should be answered: must the three phases be simultaneously adopted for optimization problems? Therefore, in this experiment, the proposed algorithm configuration is compared with three different algorithm configurations: (1) contains only flying search behavior and territorial behavior, denoted BFS-C; (2) contains only territorial behavior and cognitive behavior, denoted BFS-F; and (3) contains only flying search behavior and cognitive behavior, denoted BFS-T. The populations of the three tested algorithms are set to 75 because each has two phases. These three configurations, along with the standard BFS, are evaluated using 8 typical test functions from the abovementioned classical benchmark functions. Table 6 lists the average results of 30 runs for BFS-C, BFS-F, BFS-T and BFS, and Fig. 5 presents the convergence graph for each algorithm for several functions.

Table 6 Statistical results of the investigation of different BFS behavior
Fig. 5
figure 5

Convergence rates of the BFS and three alternative configurations for 6 classic benchmark functions

The results in Table 6 and Fig. 5 lead to three significant conclusions. First, the standard BFS is superior to the other three configurations in terms of its accuracy and convergence speed, which indicates that the inclusion of flying search behavior, territorial behavior, and cognitive behavior is important. In addition, the performance of BFS-T is the worst, which means that territory behavior has the most important impact on the performance of the BFS. Finally, although flying search behavior has less impact on the unimodal function than do the other two behaviors and although cognitive behavior has less influence on multimodal functions than doe the other two behaviors, flying search behavior and cognitive behavior have important effects on the performance of the BFS. In general, the simulation results demonstrated that flying search, territorial and cognitive behavior have unique functions in the BFS and affect the quality of the final optimization results to varying degrees. Therefore, every behavior is indispensable for the BFS algorithm. Furthermore, the results demonstrate that the standard BFS is stable and reliable for solving global optimization problems.

As previously stated, the BFS does not contain any special control parameters, in contrast to other algorithms. However, as a population-based algorithm, the BFS is sensitive to the population size \( N \). To study the influence of different population sizes on the performance of the BFS, the proposed algorithm is tested on the above eight benchmark functions with eight different population sizes for 30 independent runs. The MaxFEs of each function are set according to previous experiments. The statistical results of the mean and standard deviation are listed in Table 7, which shows that the BFS achieves the best performance when \( N = 50 \) for all test functions except f05 and f08. Furthermore, the performance of the algorithm occasionally becomes worse as \( N \) increases for certain functions, which means that an increase in the population size does not improve, and can even degrade, the performance of the BFS. Figure 6 exhibits the influence of population size on the convergence speed for certain benchmark functions, and Fig. 7 shows the mean time consumption of the 8 test functions. Figures 6 and 7 show that the convergence rate is inversely proportional to the population size. The BFS with \( N = 10 \) converges faster than the BFS with other population sizes, although the computational time is the longest. According to the abovementioned analysis and considering certain factors, such as performance and time consumption, the optimal population size is 50.

Table 7 Optimum results obtained by BFS for the 8 typical classical functions with different values of N
Fig. 6
figure 6

Effect of N on the convergence rate for 4 classical test functions

Fig. 7
figure 7

Effect of N on the average time consumption for 8 classic test functions

5.1.3 Convergence analysis of BFS

A theoretical analysis of the convergence of the optimization algorithm will facilitate a deep understanding of the algorithm and guide its improvement. The convergence of algorithms refers to whether individuals in the entire population become global optimal solutions under the condition that the iteration time tends to infinity. In this section, we use random functional analysis theory to prove the convergence of the BFS algorithm.

Assume that \( f(a) \) is the objective function. Its solution space is expressed as \( \Omega = [A|A = a_{1} ,a_{2} , \ldots ,a_{D} ] \), with \( D \) representing the dimensions of the solution vector; and \( {\kern 1pt} l_{i} \) and \( u_{i} \) are the lower and upper bounds of \( a_{i} \), respectively. Clearly, the solution space is a bounded subset of a \( D \)-dimensional Euclidean space \( R^{n} \).

Definition 1

Let \( A \) be a nonempty set and \( d \) be a mapping from \( A \times A \) to \( R \). For any element \( a \), \( b \), \( c \) in \( A \), if they satisfy the following three properties:

  1. 1.

    Nonnegativity: \( d(a,b) \ge 0 \); \( d(a,b) = 0 \) if and only if \( a = b \);

  2. 2.

    Symmetry: \( d(a,b) = d(b,a) \);

  3. 3.

    Triangle inequality: \( d(a,b) \le d(a,c) + d(c,b) \);

then, \( d \) is called a metric on \( A \), and \( (A,d) \) is called a metric space.

Definition 2

If \( (A,d) \) is a metric space, and \( a_{n} \) is any sequence in \( (A,d) \) l for any \( \varepsilon > 0 \), there is a positive integer \( N \) that makes \( d(a_{n} ,a) < \varepsilon \) tenable such that for \( n > N \), the sequence \( a_{n} \) is said to converge in \( (A,d) \).

Definition 3

The sequence \( a_{n} \) in the metric space \( (A,d) \) is called the Cauchy sequence, which means that for any given \( \varepsilon > 0 \), there is always a positive integer \( N \) that makes \( d(a_{m} ,a_{n} ) < \varepsilon \) when \( m,n > N \) or makes \( d(x_{m} ,x_{n} ) \to 0 \) when \( m,n \to \infty \). If every Cauchy sequence in space \( (A,d) \) converges, then \( (A,d) \) is a complete metric space.

Definition 4

If \( (A,d) \) is a complete metric space, \( f \) is a mapping of \( A \) to \( A \). If there is any constant \( \zeta \in (0,1) \) such that for all \( a,b \in A \) there is always \( d(f(a),f(b)) \le \zeta d(a,b) \), then \( f \) is a contraction map on \( A \).

Theorem 1 (Banach contraction mapping theorem)

Let\( (A,d) \)be a complete metric space and\( f \)be a compression map from\( A \)to\( A \); then, \( f \)has one and only one fixed point\( a^{ *} \in A \)such that for any\( a_{0} \in A \), \( a^{ *} {\text{ = lim}}_{k \to \infty } f^{k} (a_{0} ) \)is satisfied, where\( f^{0} (a_{0} ){ = }a_{0} \), \( f^{k + 1} (a_{0} ){ = }f(f^{k} (a_{0} )) \).

Theorem 2

When\( iteration \to \infty \), the BFS algorithm gradually converges to the optimal solution of the population.

Proof

Let \( x_{{b{\text{est}}}} \) be the optimal solution for the population, let \( X^{k} \) be the set of all individuals’ positions at the \( k{ - }th \) iteration of the BFS algorithm (that is, \( {X^{k}} = {x_{1}}^{k} , {x_{2}}^{k} ,\ldots , {x_{N}}^{k} \)), let the set of all \( X^{k} \) be denoted as S, and let the metric d: S × S → R be defined as

$$ d(X^{n} ,X^{m} ) = \left\{ \begin{aligned} \sum\limits_{{x_{i}^{n} \in x^{n} }} {\left| {f(x_{i}^{n} ) - H} \right| + \sum\limits_{{x_{i}^{m} \in x^{m} }} {\left| {f(x_{i}^{m} ) - H} \right|,\quad X^{n} \ne X^{m} } } \hfill \\ 0,\quad X^{n} { = }X^{m} \hfill \\ \end{aligned} \right. $$
(12)

where \( H \) is the lower bound of \( f(x) \) and exists. From definition 1, it can be concluded that the three properties satisfied by the above formula are as follows:

  1. 1.

    \( d(X^{n} ,X^{m} ) \ge 0 \); \( d(X^{n} ,X^{m} ){ = }0 \) if and only if \( X^{n} { = }X^{m} \);

  2. 2.

    \( d(X^{n} ,X^{m} ){ = }d(X^{m} ,X^{n} ) \);

  3. 3.

    \( d(X^{n} ,X^{m} ) = \sum\limits_{{x_{i}^{n} \in x^{n} }} {\left| {f(x_{i}^{n} ) - H} \right| + \sum\limits_{{x_{i}^{m} \in x^{m} }} {\left| {f(x_{i}^{m} ) - H} \right|} }; \le \sum\limits_{{x_{i}^{n} \in p^{n} }} {\left| {f(x_{i}^{n} ) - H} \right| + \sum\limits_{{x_{i}^{k} \in x^{k} }} {\left| {f(x_{i}^{k} ) - H} \right|} } + \sum\limits_{{x_{i}^{k} \in x^{k} }} {\left| {f(x_{i}^{k} ) - H} \right| + \sum\limits_{{x_{i}^{m} \in x^{m} }} {\left| {f(x_{i}^{m} ) - H} \right|} } = d(X^{n} ,X^{k} ) + d(X^{k} ,X^{m} ) \);

Therefore, individual sequences \( (S,d) \) form a metric space.

By definition 3, since \( X^{k} \) is any sequence in \( (S,d) \), for any given \( \varepsilon > 0 \), there is always a positive integer \( K \) such that \( d(X^{n} ,X^{m} ) < \varepsilon \) when \( m,n > K \), or when \( m,n \to \infty \), \( d(X^{n} ,X^{m} ) \to 0 \), i.e., \( X^{n} { = }X^{m} \). Meanwhile, all the Cauchy sequences \( X^{k} \) in the population converge; thus, \( (S,d) \) is a complete metric space.

The workflow of the BFS algorithm shows that the process of solving BFS is a cyclical iterative process. During each iteration, the BFS algorithm completes the search through the flying search behavior phase, the territorial behavior phase and the cognitive behavior phase. Therefore, all operations of the generation of the next BFS iteration population can be regarded as a mapping \( f:S \to S \) of the solution space \( \varOmega \). Since the BFS algorithm uses a greedy strategy when updating individuals’ positions, the population generated by the next iteration must not be worse than the population of the previous iteration. Therefore, the fitness value sequence corresponding to the individual positions formed during the entire iteration of the population is a monotonous nonincremental sequence; i.e.,\( f(x_{i}^{k + 1} ) \le f(x_{i}^{k} ) \le f(x_{i}^{k - 1} ) \) or \( f(f(f(x_{i}^{k} ))) \le f(f(x_{i}^{k} )) \le f(x_{i}^{k} ) \), where \( f(x_{i}^{k} ) \) is the fitness value of the current position \( x_{i} \) of the \( i{ - }th \) individual at the \( k{ - }th \) iteration. Thus, for mapping \( f \),

$$ \begin{aligned} d(f(x_{i}^{n} ),f(x_{i}^{m} )) & = \sum\limits_{{x_{i}^{n} \in x^{n} }} {\left| {f(f(x_{i}^{n} )) - H} \right| + \sum\limits_{{x_{i}^{m} \in x^{m} }} {\left| {f(f(x_{i}^{m} )) - H} \right|} } \\ & \le \sum\limits_{{x_{i}^{n} \in x^{n} }} {\left| {f(x_{i}^{n} ) - H} \right| + } \sum\limits_{{x_{i}^{m} \in x^{m} }} {\left| {f(x_{i}^{m} ) - H} \right|} \\ & \le \zeta d(X_{i}^{n} ,X_{i}^{m} ) \\ \end{aligned} $$
(13)

where \( 0 < \zeta < 1 \). From definition 4, the map \( f \) is a contraction map. In summary, the BFS algorithm satisfies the condition of the Banach contraction mapping theorem; that is, \( (S,d) \) is a complete metric space, and the map \( f \) is a contraction map. According to Theorem 1, the algorithm is convergent, and there must be a unique fixed point \( x_{best} \in S \) such that for any \( x \in S \), \( x_{best} { = }\mathop {\lim }\limits_{k \to \infty } f^{k} (x) \) is satisfied, where \( f^{0} (x){ = }x \), and \( f^{k + 1} (x){ = }f(f^{k} (x)) \). When the values of all individuals in S are equal to \( x_{best} \), the algorithm converges to the only fixed point \( x_{best} \), and the theorem is confirmed.□

5.2 Experiment II: CEC2014 benchmarks

In this experiment, all 30 numerical functions of the 2014 Congress on Evolutionary Computation (CEC 2014) Special Session and Competition on Single Objective Real Parameter Numerical Optimization suite listed in Table 8 are selected to test the performance of the BFS for complex real-parameter numerical optimization problems. These functions include several neoteric features, such as novel basic problems, and the test problems are constructed by a dimension-wise extraction of features from several problems, including the graded level of linkages and rotated trap problems. The traditional test function often sets the global optimal value in the central area of the search space and places the local optimal value on the coordinate axis, resulting in these functions not truly being able to reflect the actual optimization problem. The CEC2014 solves these problems by shifting the global optima and rotating the test functions. Therefore, they are more similar to the optimization problem in the real world. The CEC2014 test functions contain rotated unimodal, shifted and rotated multimodal, hybrid, and composition test functions, and the details are provided in [14]. The unimodal functions have no local optima, and there is a single global optimum. Thus, they are suitable for evaluating the exploitation of algorithms. By contrast, multimodal functions have a large number of local solutions and are useful for testing the exploration of the algorithm. In hybrid test functions, enormous numbers of local optimal distributions exist in the search space, and the variables are randomly divided into subcomponents, with different basic functions used for different subcomponents. The purpose of these functions is benchmarking the ability of an algorithm to escape a local optimum when faced with complex problems. The composition test functions are the rotated, shifted, biased, and combined versions of several unimodal, multimodal and hybrid functions. This set of functions is extremely complex because the functions have various shapes for different regions of the search space. These characteristics are beneficial for testing the abilities of the algorithm in terms of balanced exploration and exploitation.

Table 8 Descriptions of the CEC2014 test functions

The proposed BFS is compared with the following state-of-the art algorithms that have previously been demonstrated to possess superior performance compared with that of classic optimization algorithms: TLBO [22], GWO [18], SMO [2], MFO [16], WOA [17], CSA [1] and SBO [19]. Specific algorithm parameters are not required for TLBO, and the control parameter settings for the remaining algorithms in this test are as follows:

  1. 1.

    GWO: \( a = 2 - 1 \times (2/MaxCycle) \) as in [18];

  2. 2.

    SMO: \( MG = 5 \), \( GlobalLeader \, Limit = 50 \), \( LocalLeader \, Limit = 1500 \) and \( pr_{G + 1} = pr_{G} + (0.4 - 0.1)/MIR \) as in [2];

  3. 3.

    MFO: \( b = 1 \) and \( flame{\kern 1pt} {\kern 1pt} {\kern 1pt} {\kern 1pt} no = round[N - l \cdot (N - 1)/T] \) as in [16];

  4. 4.

    WOA: \( b = 1 \) and \( a = 2 - 1 \times (2/MaxCycle) \) as in [17];

  5. 5.

    CSA: \( AP = 0.1 \) and \( fl = 2 \) as in [1];

  6. 6.

    SBO: \( \alpha = 0.94 \), \( Z = 0.02 \) and mutation probability of 0.05 as in [19];

The dimensions and the maximum number of function evaluations are set to 30 and 300,000, respectively. To ensure the fairness of the comparison, all algorithms should run under the same number of function evaluations and iterations. Therefore, the population sizes of the BFS and SMO are set to 50 because they contain three update phases. Correspondingly, the population size of the TLBO is set to 75 because of the inclusion of two phases, and the population sizes of the remaining 6 algorithms are set to 150 because they include one phase. Tables 9, 10, 11 and 12 present the statistical results of the BFS and 6 compared algorithms for each function from 30 independent runs. Similar to the previous test, we rank the algorithms from smallest to largest mean value. Finally, we calculate the average ranks of each algorithm and draw conclusions.

Table 9 Optimization results of the CEC2014 unimodal functions for 8 algorithms
Table 10 Optimization results of the CEC2014 simple multimodal functions for 8 algorithms
Table 11 Optimization results of the CEC2014 hybrid functions for 8 algorithms
Table 12 Optimization results of the CEC2014 composition functions for 8 algorithms

(1) Unimodal functions group (f01(CEC)f03(CEC)): Table 9 provides the experimental results obtained by all algorithms from 30 independent runs. Figure 8 depicts the graphical results of the algorithms for f01(CEC) and f03(CEC). Table 9 shows that the BFS has clear advantages over the other algorithms for all functions. In addition, the BFS can find the global optimum for f02(CEC). The outstanding accuracy and convergence speed are highlighted in Fig. 8. The results in Table 9 show that the BFS algorithm has good exploration capability, mainly due to its full use of global optimal individual information and its own gradient information, which effectively improves the local exploitation capability of the algorithm.

Fig. 8
figure 8

ANOVA test and convergence graphs of 2 unimodal functions for all algorithms

(2) Simple multimodal functions group (f04(CEC)f16(CEC)): The final results obtained by the BFS and 6 remaining algorithms from 30 independent runs are reported in Table 10, and the convergence rates and ANOVA tests for several typical multimodal functions are plotted in Fig. 9. Table 10 shows that the BFS obtains the best mean value for five functions (f04(CEC), f05(CEC), f07(CEC), f14(CEC), and f15(CEC)) and almost reaches the global optimum for f07(CEC). As shown by the overall rank in the last line of Table 10, the BFS ranks first for the simple multimodal functions from the CEC2014. Moreover, the SMO obtains the best mean values for f06(CEC), f08(CEC), f09(CEC) and f13(CEC); the GWO performs better than the other compared algorithms for f11(CEC) and f16(CEC); and the TLBO and the SBO exhibit the best performance for f10(CEC) and f12(CEC), respectively. The graphical results shown in Fig. 9 indicate that the BFS performs better than the other algorithms in terms of the convergence rate and accuracy for the listed typical multimodal functions. Summarizing the above analysis, BFS has good exploration capability, mainly due to the extensive search of space via a logarithmic spiral pattern in the flying search behavior phase.

Fig. 9
figure 9

Graphical results and ANOVA test of 6 CEC2014 simple multimodal functions for 8 algorithms

(3) Hybrid functions group (f17(CEC)f22(CEC)): Compared with the previous two sets of test functions, this set of functions has a complex form. Thus, finding the global optimal value is more difficult. Table 11 summarizes the statistical results obtained via the BFS and its competitors from 30 independent runs. The BFS is superior for f18(CEC), f19(CEC) and f20 (CEC); the CSA outperforms the other algorithms for f17(CEC) and f18(CEC), although the BFS can find the minimum best solutions for these two functions; and the SMO shows the best performance for f22 (CEC). The overall rank in the last line of Table 11 shows that the overall performance of the BFS for hybrid functions is the best. The graphical results are depicted for 4 typical functions in Fig. 10, which shows that the BFS performs significantly better than do the compared algorithms in terms of convergence rate and solution quality. The above results prove that the BFS algorithm has a strong ability to jump out of a local optima when applied to complex problems because the BFS algorithm incorporates various random search strategies, such as spiral search [Eq. (2)], random escape [Eq. (5)], and Gaussian variation [Eq. (9)], throughout the optimization process. Different search methods guide individuals to search by collaborating with each other, thereby improving the global search ability of the algorithm and increasing the probability of the algorithm escaping local optima.

Fig. 10
figure 10

Graphical results and ANOVA test of 4 CEC2014 hybrid functions for different algorithms

(4) Composition functions group (f23(CEC)f30(CEC)): This set of functions is extremely complex, and finding optimal solutions to these functions presents the greatest difficulty. Table 12 lists the optimization results obtained by the BFS and the other 6 compared algorithms. The BFS algorithm ranks first for five composition functions (f23(CEC), f24(CEC), f25(CEC), f27(CEC), and f28(CEC)), and the SMO algorithm shows the best performance for f26(CEC), f29(CEC) and f30(CEC). The performance of the BFS for f26(CEC) ranks second, and for f29 (CEC), the best solution of the BFS ranks first. When examining the overall rank in the last line of Table 12, the BFS undoubtedly ranks first, whereas the SMO and CSA rank second and third, respectively, and the WOA exhibits the worst performance. For a graphical perspective, the convergence rate and ANOVA test graphs of the 6 selected functions obtained using the BFS and its competitors are presented in Fig. 11. The graphical results show that the BFS clearly outperforms the compared algorithms. In summary, BFS has a good ability to balance exploration and exploitation, as shown in Table 12 and Fig. 11, for two main reasons: 1) in each iteration, three phases are executed sequentially to jointly execute a complete search; and 2) although different phases focus on different capabilities, each phase attempts to balance exploration and exploitation. For example, in the flying search behavior phase, a spiral path also guarantees some degree of exploitation. In the territorial behavior phase, the probability \( P^{iter} \) can play a role in balancing exploration and exploitation. In the cognitive behavior phase, individuals use gradient information to increase the search efficiency while implementing Gaussian variation to improve the ability of the BFS to escape local optima.

Fig. 11
figure 11

Graphical results and ANOVA test of 6 CEC2014 composition functions for various methods

To verify the performance of the BFS and the other competitors via an overall comparison of the CEC2014 benchmark functions, we also employ the Wilcoxon signed rank test conducted at the 0.95 confidence level (\( \alpha = 0.05 \)) for a paired comparison. The comparative results of each function are listed in Table 13. The last row of Table 13 shows that except for the 19 “+” in the TLBO comparison, BFS has 20 or more “+” in the comparisons with the remaining algorithms. In other words, the BFS presents more “+” values than “−” or “≈” values compared with every competitor; thus, our BFS presents statistically better performance than the competitors based on the Wilcoxon signed rank test at the 0.95 confidence level. Furthermore, SMO performs better than BFS for 8 functions, which is the largest number of all competitors. WOA does not have a function that performs better or similarly to BFS. For TLBO, 5 functions are not significantly different from BFS, ranking first among all competitors.

Table 13 The results of the Wilcoxon signed ranks test based on the best solution for each function among 30 independent runs (α=0.05)

6 Conclusions

Because an increasing number of practical applications require optimization techniques, optimization algorithms will continue to be used for several decades. By simulating three different behaviors of birds during foraging, a new algorithm is introduced, and this algorithm can be divided into three phases. In the first phase, the algorithm uses the spiral path method to conduct extensive searching, thus highlighting the exploration ability of the algorithm. In the second phase, the algorithm mimics the territorial behavior of birds, thus emphasizing the exploitation ability of the algorithm. In the last phase, the algorithm simulates the cognitive behavior of birds, thus highlighting the overall search efficiency of the algorithm. In total, 13 classical benchmark functions and 30 modern complex benchmark functions of the CEC2014 are used to test the performance of the BFS. The results demonstrate that the BFS performs significantly better than other classical and state-of-the art algorithms. Therefore, the BFS has the potential for handling global optimization problems.

Future work can be performed in several areas. First, binary and multiobjective versions of the BFS are worth researching. Second, different flight search behaviors, such as levy flight, can be integrated into the BFS to improve its performance. Finally, studying the application of the BFS in engineering design problems in the future is of considerable practical significance.