1 Introduction

Solving optimization problems is the norm in almost all disciplines of engineering [1, 2] and science [3, 4], and the need for more robust solutions is ever increasing. This means, we need plausible algorithms that can fit the intricate nature of such up-to-date scientific and engineering challenges. When surveying the literature for the existing optimization methods, one may find a wide variety of these methods [5]. These range from traditional optimization techniques that use both linear and nonlinear programming [6], to the newer nature-inspired meta-heuristics [7, 8], each with their own strengths and weaknesses. Despite being successful in solving well-known optimization problems [2, 9], traditional algorithms on one side suffer from inherent dependency on gradient information and the desperate need for a promising initial starting vector within the search space [2, 9]. The existing nature-inspired meta-heuristic optimizers, on the other side, are highly problem dependent in that they might be very successful in solving certain problems and may not be able to provide satisfactory solutions for other problems [10]. This is partly ascribed to the common behavior of these meta-heuristics of being trapped in local or suboptimal solutions [11].

What makes contemporary optimization problems tough in nature can be summarized in a few points. It is very likely that such problems are nonlinear in essence, troublesome in nature, contain numerous decision variables, their objective functions are, in some cases, complex and handcuffed with several constraints, in addition to having multifarious peaks [12]. For these problems, it is imperative to go ahead from an encouraging starting point with the hope of finally hitting the global optimum solution. After many years of research, the research community has found that traditional methods might not represent themselves as the best catch for solving these contemporaneous optimization problems [13]. More specifically, the found solution must be accurate enough to be accepted, and the time needed to solve the problem should fall within reasonable ranges [14]. To this end, the researchers have turned their attention toward nature-inspired meta-heuristics that have shown extremely heartening capabilities in dealing with very knotted shapes of challenging optimization problems [15, 16]. Meta-heuristic techniques are global optimization methods designed based on simulations and methods inspired by nature that are openly applied to solve global optimization problems [17, 18].

In contrast to traditional algorithms, meta-heuristic methods have become startlingly very trendy. This reputation is due to the fact that these methods are very flexible, do not require gradient information and have proven their success in escaping from local minimums when solving real-world scientific or engineering problems that have several local solutions [11, 19]. It is important to note that the first and second merits stand out from the verity that meta-heuristics tackle optimization problems by assuming them as black boxes in that they only require knowledge of the input and output sets of the variables. Thereby, there is no necessity to calculate a derivative of the search space. Also, meta-heuristics belong to the family of stochastic optimization methods, in which they make use of the stochastic operators. This feature has been broadly affirmed, in which meta-heuristics have proven successful in keeping away from local minima when addressing real problems that often have a large number of local minimums [11, 18]. This explains their eligibility to handle challenging optimization problems in diversified domains [20, 21]. More closely, meta-heuristics have been harnessed to tackle hard real-life problems in a variety of scientific and engineering disciplines. Examples of such domains encompass, but are not limited to, image processing [22, 23], signal processing [24], the realm of process control [25], text clustering [26], classification problems [27] as well as several other domains [28, 29].

1.1 Motivations of the work

According to the “no-free-lunch” (NFL) theory [30], it is difficult to employ a single meta-heuristic algorithm in striving to solve all possible optimization problems [31]. As it really is, one meta-heuristic algorithm might do a good job in optimizing certain problems in particular fields, but it falls short to find the global optima in other fields [11]. This has been a motive for the researchers in this field, as well as ourselves, to look for new and innovative nature-inspired methods to solve and show superior scores on the current and new hard real-life problems [32]. The door is still open, and here we present a novel meta-heuristic algorithm based on human behavior with the very famous tale of Ali Baba and the forty thieves, as our inspiration targeting numerical optimization problems.

1.2 Contributions of the work

The core of this paper is to establish a novel nature-inspired algorithm, referred to as Ali Baba and the forty thieves (AFT), to solve global optimization problems. As its name glimpses, AFT falls into the category of human-based algorithms, as it is inspired by human interactions and human demeanor in a human-related story. The thieves’ behavior, in the tale of Ali Baba and the forty thieves, in finding Ali Baba and the intelligent methods that Ali Baba’s maid used to save him from the thieves, inspired us to simulate this behavior with an optimization algorithm. In this anecdote, there is a behavior that has many similarities with optimization processes. From the point of view of optimization, the thieves are the search agents, the environment (i.e., town of Ali Baba) is the search space, each position of the town corresponds to a realizable solution, the home of Ali Baba is the objective function and Ali Baba is the global solution. Based on these similarities, the AFT algorithm was developed to mimic the behavior of the thieves and the maid to locate the global solution to the considered optimization problems. The performance of AFT was evaluated on sixty-two benchmark test functions, and it was applied to optimize the designs of five engineering problems.

Section 2 presents the literature and related works. Section 3 shows the tale of Ali Baba and the forty thieves and the key concepts of this tale. Section 4 presents the mathematical models and analysis of the AFT method. Some of the possible expansions of AFT from several aspects are given in Sect. 5. Section 6 then presents a conceptual comparison of AFT with other existing optimizers. The experimental, qualitative and statistical analysis results are introduced in Sect. 7. Section 8 presents the applicability and reliability of AFT in solving five engineering problems. The conclusion comments and some further research paths are shown in Sect. 9.

2 Related works

This section looks at the most recent developments in the field of optimization. There are many sections in this field such as multi-objective, single-objective, constrained and others [33]. Since the meta-heuristic algorithm proposed in this work is turned to solve single optimization problems, the chief hub of this section concerns the relevant works in single optimization areas.

2.1 Single-objective optimization problems

In single-objective optimization problems, there is only one objective to be maximized or minimized. This kind of optimization might be subject to a set of constraints, which fall into two categories: equality and inequality [11]. Without loss of generality, single-objective optimization problems can be expressed as a minimization or a maximization problem. The search space is created using a set of variables, objectives, range of variables and constraints. For optimization problems, the search space can be easily plotted in the Cartesian coordinate system and its shapes can then be observed. Having a large number of decision variables is the first challenge when addressing optimization problems. The limitation of the search space is the range of variables, which is diversified. These variables can be discrete or continuous. This means that they either create a discrete or a continuous search space. In the first case, there is a finite set of points between two points in the search space, while in the second case, there is an infinite number of points between every two points [11].

Usually, an optimization method might begin with an initial range and extend it while optimization. The constraints restrict the search space even more, and typically lead to breaks in the search space because the solutions in those areas are not appropriate when solving an optimization problem. A set of constraints can even divide the search space into various detached areas. The solutions that penetrate the constrained regions are named infeasible solutions, while the solutions in the constrained regions are named feasible solutions. There are two terms for the portions of the search space that are within and out of the constrained regions: feasible and infeasible. A restricted search space has the potency to render the optimization method ineffective in spite of its sensible performance in an unrestricted search space [34]. Thus, optimization methods must be well prepared with adequate operators to deal effectively with the constraints [34]. Another challenge that arises when tackling optimization problems is the existence of local solutions.

In a single-objective search space, there is usually the global optimal solution that returns the best objective value. However, there are normally several other solutions that yield values close to the objective value of the global optimal [33]. This type of solutions is named local solutions as it is locally the best solution if we take into account the search space in its vicinity. On the other hand, it is not the best solution globally when taking into account the whole search space. The existence of local solutions in optimization problems bring many optimization algorithms to fall into local solutions [8]. A real search space generally contains a large number of local solutions. Thus, an optimization method must be able to efficiently averting them to find the global optimum. An optimization algorithm that is able to eschew local solutions is not necessarily capable of converging to the global optimum. The approximate position of the global optimal is found when an algorithm averts local solutions. The convergence speed is also a difficulty when solving optimization problems [8]. Needless to say, rapid convergence leads to local optima stagnation. In contrast, abrupt variations in the solutions result in avoiding local optima, but slow down the convergence rate toward the global optimal. These two trade-offs are the key challenges that an optimization algorithm handles while addressing optimization problems. There are other varieties of difficulties when addressing a single-objective search space such as: isolation of the optimum, dynamic objective function and many more [11]. Each of these challenges demands special attention. These conceptions are outside the scope of this paper, so solicitous readers are referred to the studies conducted by Boussaid [33].

2.2 Single-objective optimization algorithms

In the literature, optimization algorithms can be split into two broad categories:

  • Deterministic algorithms these algorithms always locate the same solution for a particular problem if they commence with the same starting point. The main merit of these methods is the reliability as they decidedly find a solution in each run. However, local optima stagnancy is a flaw as these algorithms do not typically contain random behaviors when solving optimization problems.

  • Stochastic algorithms these algorithms benefit from stochastic operators. This leads to find a different solution at each run even if the starting point, in the runs, remains unaltered and thus makes stochastic methods less reliable as compared to the deterministic methods. However, the stochastic behavior has the vigor to avoid the local optimums. The reliability of stochastic algorithms can be boosted by adjusting and rising the number of runs. Stochastic methods fall into two classes:

2.2.1 Individualist algorithms

The stochastic method starts and carries out optimization with a single solution. This solution is randomly changed and enhanced for a predefined number of steps or realization of a final criterion. The most well-respected algorithms in this class are Tabu search [35], hill climbing [36] and iterated local search [37]. The most chief feature of the algorithms in this set is the low computational effort and the need for few function evaluations.

2.2.2 Collective algorithms

Collective techniques generate and develop multiple random solutions during optimization. Usually, the collection of solutions collaborates to better identify the global optimum in the search domain. Multiple solutions reduce the chance to slack in local optima [38]. This is a key merit of these algorithms. However, each of the solutions requires a single function evaluation, where building an efficient cooperation between the solutions is a challenge. Despite these two flaws, collective stochastic optimization methods are widely used in optimization problems [11]. This is due to the well coveted features of these algorithms.

Irrespective of the distinctions between collective algorithms, they all pursue the same course of action for finding the global optimum. Optimization first begins with a pool of random solutions, which need to be combined and changed at random, quickly, and suddenly. This elicits that the solutions move globally. This stage is called exploration of the search space because the solutions are attracted toward various areas of the search space by abrupt changes [38]. After sufficient exploration, the solutions begin to sparingly change and move locally around the most promising solutions of the search space in order to raise their quality. This phase is called exploitation, and its key aim is to enhance the precision of the best solutions got in the exploration phase [39]. Although avoidance of local optima may occur in the exploitation phase, the coverage of search area is not as broad as the exploration occur. In this case, the solutions evade local solutions in the proximity of the global optimal. So we can deduce that the exploration and exploitation phases pursue inconsistent goals [40]. So, most of the methods seamlessly demand the search agents to transit from exploration to exploitation using adaptive strategies. A convincing recommendation for good performance is to achieve an adequate balance between them [38]. Due to the random behavior of the meta-heuristics, they can be deemed as stochastic collective algorithms [21, 41]. Continuing from this last point, any meta-heuristic algorithm can fall into:

  • Physics-Based (PB) algorithms these methods utilize the physical foundations present on Earth, in particular, or in our universe, at the broadest level. The general technique of PB methods is different from other meta-heuristics’ mechanisms because the search agents of these methods contact around the search space according to physics rules firmed up in physical processes. Some of the most prominent examples of PB algorithms include simulated annealing (SA) algorithm [42, 43], gravitational search algorithm (GSA) [44], multi-verse optimizer (MVO) [45], Henry gas solubility optimization (HGSO) [46] and equilibrium optimizer (EO) [47].

  • Evolutionary Algorithms (EAs) EAs follow the Darwinian theory of natural selection and biological Darwinism that represents the survival of the fittest, where the well-known evolution mechanisms in biology are simulated. These methods often do well at finding near-optimal solutions in view of the fact that they do not lend any credence about the underlying fitness landscape. The list of EAs includes, but not limited to, evolutionary strategy (ES) [48], genetic algorithm (GA) [49, 50], genetic programming (GP) [51] and differential evolution (DE) algorithm [52].

  • Swarm Intelligence (SI) algorithms these algorithms use the intelligence of the social collective behavior of various societies of creatures such as birds, bees, ants and the alike. This class includes a large variety of algorithms such as particle swarm optimization (PSO) [53], ant colony optimization (ACO) [54], artificial bee colony (ABC) algorithm [55], grey wolf optimizer (GWO) [56], dragonfly algorithm (DA) [57], salp swarm algorithm [11, 58], coral reefs optimization (CRO) [59] and many others [7, 8].

  • Human-based algorithm the algorithms of this class originate from human interactions in societies. The inspiration for researchers in the realm of human-based algorithms comes from experiences and stories related to human demeanor and human actions [60]. Previous works in this area include harmony search (HS) [61], seeker optimization algorithm [62], Human Group Formation (HGF) [63], Social-Based Algorithm (SBA) [64], Interior Search Algorithm (ISA) [65] and the football game inspired algorithm (FGIA) [66].

3 Inspiration

The proposed AFT algorithm is based on the well-known tale of Ali Baba and the forty thieves. We have found in this anecdote several intrinsic traits that inspired us to develop the AFT algorithm. Rather than literally retelling the story in this section, we prefer to link some of the events that took place in the tale to the attributes that constitute the AFT algorithm. The tale itself can be found in several books as well as web pages. We refer the interested readers to [67], as an example.

The nature of the tale is a search-based, where a gang of forty thieves go after Ali Baba. The ultimate goal of the gang is to catch Ali Baba for revenge and to get their treasure back. The search carried out by the gang for Ali Baba is iterative in nature, going in several rounds, each time reinforcing the solution found by previous iterations. The search is based upon the collective behavior of the thieves, represented as a population in the proposed algorithm. The counter measures took by the main character of the tale, named Marjaneh, prevented the gang at each iteration from fulfilling their search mission. The big town, where Ali Baba lives, represents the search space. The tale shows the success of the forty thieves in tracking down Ali Baba and spotting the location of his house. The looters’ successful actions in achieving their strategic target and finding Ali Baba were based on smart tricks and tactics. However, the acumen of the savvy maid of Ali Baba, Marjaneh, saved the life of Ali Baba in each of these tactics, abridged as follows [68].

In the first trial, the gang’s assistant captain disguised himself as a foreigner and entered the town in the hope to hear any talk or find any clue that could lead him to Ali Baba. He managed to get to Ali Baba’s house and marked his door with an ‘X’ sign in his pursuit to later fulfill the ultimate mission of the gang. Marjaneh observed the mark and, in response, she placed similar marks on the doors of all houses in the neighborhood, rendering the plan useless. The tactics followed by the robber are harnessed in the proposed algorithm to maximize the exploration efficiency.

The second trial took place when another assistant of the captain took the mission. He built upon the procedures previously followed by his comrade. In our algorithm, this is used by making the search in every iteration builds upon the best solution found so far from previous iterations. This time, the robber marked Ali Baba’s house with a sign that is not easy to be observed by chance. In AFT, this again reflects on the utilization and enhancement of previously found solutions [69]. This leads to strong exploration and exploitation features in the proposed algorithm.

The third incident occurred when the captain decided to change the plan, and take upon himself the task of capturing Ali Baba, yet to build upon the achievements attained so far by his two assistants. The captain and his followers succeeded in arriving at Ali Baba’s house, and they took every possible measure not to be discovered while attacking Ali Baba, but they failed.

The final trial in targeting Ali Baba took place by the captain alone with a totally new plan, who disguised as a merchant selling silks, and introduced himself to the son of Ali Baba. The perseverance and persistence of the gang’s captain are good traits for a successful search technique.

The approaches adopted in the tale, such as the attempts of the thieves and Marjana’s intelligence to disrupt these attempts, are reflected in the exploration and exploitation mechanisms built into the proposed algorithm. This has led to the mathematical models developed to design AFT and perform optimization. The proposed algorithm is described in detail below.

4 Ali Baba and the forty thieves algorithm

The overall goal of this work is to present a new optimization method that imitates the tale of Ali Baba and the forty thieves as a coordinated model of social behavior of humans’ actions. The following principles derived from this tale achieve the basic assumptions of this algorithm:

  • The forty thieves collaborate in a group and get guidance from someone or from one of the thieves to find Ali Baba’s house. This information may or may not be correct.

  • The forty thieves will travel a distance starting from an initial distance until they can locate Ali Baba’s house.

  • Marjaneh can deceive the thieves many times with astute ways to somehow protect Ali Baba out of arrival of them by a proportion.

The behaviors of the thieves and Marjaneh can be drawn up in such a manner that they can be linked to an objective function to be optimized. This makes it feasible to evolve a new meta-heuristic algorithm as detailed below.

4.1 Random initialization

The AFT algorithm is initiated by randomly initializing the position of a number of n individuals in a d-dimensional search space as shown below:

$$\begin{aligned} x= \begin{bmatrix} x_{1}^{1} &{} x_{2}^{1} &{} x_{3}^{1}&{} \ldots &{} x_{d}^{1} \\ x_{1}^{2} &{} x_{2}^{2} &{} x_{3}^{2} &{} \ldots &{} x_{d}^{2} \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ x_{1}^{n} &{} x_{2}^{n} &{} x_{3}^{n}&{} \ldots &{} x_{d}^{n} \\ \end{bmatrix} \end{aligned}$$
(1)

where x is the position of all thieves, d is the number of variables of a given problem and \(x_j^i\) represents the jth dimension of the ith thief.

The initial position of population (i.e., thieves) can be generated as shown in Eq. 2.

$$\begin{aligned} x^i= l_{j} + r \times (u_{j} - l_{j}) \end{aligned}$$
(2)

where \(x^i\) is the position of the ith thief that denotes a candidate solution to a problem, \(l_{j}\) and \(u_{j}\) refer to the lower and upper bounds in the jth dimension, respectively, and r is a uniformly distributed random number in the range from 0 to 1.

The wit level of Marjaneh with respect to all thieves can be initialized as shown below:

$$\begin{aligned} m=\left[ \begin{array}{cccc} m_{1}^{1} &{} m_{2}^{1} &{} \ldots &{} m_{d}^{1}\\ m_{1}^{2} &{} m_{2}^{2} &{} \ldots &{} m_{d}^{2}\\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ m_{1}^{n} &{} m_{2}^{n} &{} \ldots &{} m_{d}^{n} \end{array}\right] \end{aligned}$$
(3)

where \(m_j^i\) denotes the astute level of Marjaneh in relation to the ith thief at the jth dimension.

4.2 Fitness evaluation

The values of the decision variables are inserted into a user-defined fitness function that is evaluated for each thief’s position. The corresponding fitness values are stored in an array as given in the following form:

$$\begin{aligned} f= \begin{bmatrix} f_{1}([x_1^{1}, &{} x_2^{1}, &{} \ldots , &{} x_d^{1})] \\ f_{2}([x_1^{2}, &{} x_2^{2}, &{} \ldots , &{} x_d^{2})]\\ \vdots &{} \vdots &{} \vdots &{} \vdots \\ f_{n}([x_1^{n}, &{} x_2^{n}, &{} \ldots ,&{} x_d^{n})]\\ \end{bmatrix} \end{aligned}$$
(4)

where \(x_{d}^{n}\) is the dth dimension of the position of the nth thief.

In the simulation of the AFT algorithm, the solution quality is evaluated for each thief’s new location based upon a defined fitness function. After that, the location is updated if it is better than the solution quality of the current one. Each thief stays in his current location if his solution quality is more efficient than the new one.

4.3 Proposed mathematical model

As discussed above, three fundamental cases may occur while thieves search for Ali Baba. In each case, it is assumed that the thieves search efficiently throughout the surrounding environment, while there also a proportion that occurs due to Marjaneh’s intelligence that forces the thieves to search in random locations. The above searching behavior can be mathematically modeled as follows:

Case 1 The thieves may track down Ali Baba with the help of information obtained from someone. In this case, the new locations of the thieves can be obtained as follows:

$$\begin{aligned} x^{i}_{t+1}= & {} gbest_{t} + \biggl [Td_{t} \left( {\rm best}^{i}_{t} - y^{i}_{t}\right) r_{1} \nonumber \\+ & {} Td_{t} \left( y^{i}_{t} - m^{a(i)}_{t} \right) r_{2} \biggr ] \mathop {\mathrm {sgn}}({\rm rand}-0.5);\nonumber \\&r_{3} \ge 0.5, \;\; r_{4} >Pp_{t} \end{aligned}$$
(5)

where \(x^{i}_{t+1}\) represents the position of thief i at iteration \((t+1)\), \(y^{i}_{t}\) is the position of Ali Baba in relation to thief i at iteration t, \({\rm best}^i_t\) represents the best position that has achieved so far by thief i at iteration t, \(gbest_t\) represents the global best position obtained so far by any thief up to the \(t_\mathrm{th}\) iteration, \(m^{a(i)}_{t}\) represents Marjaneh’s intelligence level used to camouflage thief i at iteration t, \(Td_{t}\) is the tracking distance of the thieves at iteration t, \(Pp_{t}\) denotes the perception potential of the thieves to Ali Baba at iteration t, rand, \(r_{1}\), \(r_{2}\) and \(r_{4}\) are random numbers generated with a uniform distribution between 0 and 1 , \(r_{3} \ge 0.5\) gives either 1 or 0 to indicate that the information obtained to the thieves is true or false, respectively, and \(\mathop {\mathrm {sgn}}({\rm rand}-0.5)\) gives either 1 or − 1 to change the direction of the search process.

The parameter a in \(m^{a(i)}_{t}\) can be defined as follows:

$$\begin{aligned} a = \lceil {(n-1) \cdot {\rm rand}(n,1)\rceil } \end{aligned}$$
(6)

where rand(n, 1) represents a vector of random numbers generated with a uniform distribution in the range of [0, 1].

Marjaneh updates her astute plans if the quality of the new solution that the thieves come up with is better than their previous position. In this case, Eq. 7 can be used to update her’s plans.

$$\begin{aligned} m^{a(i)}_{t}={\left\{ \begin{array}{ll} x^{i}_{t} &{} if \;\; f\left( x^{i}_{t}\right) \ge f\left( m^{a(i)}_{t}\right) \\ m^{a(i)}_{t} &{} if \;\; f\left( x^{i}_{t}\right) < f\left( m^{a(i)}_{t}\right) \\ \end{array}\right. } \end{aligned}$$
(7)

where \(f(\cdot )\) stands for the score of the fitness function.

The tracking distance parameter \({\rm Td}_{t}\) is defined as given in Eq. 8.

$$\begin{aligned} {\rm Td}_{t} = \alpha _0 e^{-\alpha _1(t/T)^\alpha _1} \end{aligned}$$
(8)

where t and T denote the current and maximum number of iterations, respectively, \(\alpha _0\) (\(\alpha _0=1\)) represents the initial estimate of the tracking distance at the first iteration and \(\alpha _1\) is a constant value used to manage exploration and exploitation capabilities.

Equation 8 shows that \({\rm Td}_{t}\) is iteratively updated during the course of iterations of the AFT algorithm. Figure 1 shows the values of \({\rm Td}_t\) over 1000 iterations.

Fig. 1
figure 1

Proposed exponential iterative function for \({\rm Td}_{t}\)

The tracking distance, as shown in Fig. 1, greatly affects the search ability, which has a large impact on both the exploration and exploitation power of the AFT algorithm. As presented in Fig. 1, the parameter \({\rm Td}_t\) starts from a value of 1.0 and goes down to the lowest value where it assumes that the thieves have arrived at Ali Baba’s house. Large values of \({\rm Td}_t\) result in global search that can be diverted toward further exploration, and this may avoid local optimal solutions. On the other side, small values of \({\rm Td}_t\) lead to local search, where this increases the exploitation ability in AFT so that the thieves have a good possibility to find Ali Baba.

Similarly, the perception potential parameter \({\rm Pp}_{t}\) was defined as given in Eq. 9.

$$\begin{aligned} {\rm Pp}_{t} = \beta _0 log(\beta _1 (t/T)^{\beta _0}) \end{aligned}$$
(9)

where \(\beta _0\) (\(\beta _0=0.1\)) represents a final rough estimation of the probability that the thieves will realize their target at the end of the iterative process of AFT and \(\beta _1\) is a constant value used to manage exploration and exploitation capabilities.

Figure 2 shows the values of \({\rm Pp}_{t}\) over 1000 iterations.

Fig. 2
figure 2

Proposed exponential iterative function for \({\rm Pp}_{t}\)

As shown in Fig. 2, by gradually increasing the value of \({\rm Pp}_t\), AFT tends to move from global search to local search in the most promising areas where a potential solution could be found in these areas. In other words, large values of \({\rm Pp}_t\) lead to local search that intensifies the search in the most appropriate areas of the search space. On the other side, small values reduce the possibility of searching in the vicinity of current good solutions. Thus, an increase in this value stimulates AFT to explore the search space on a global scale and to diversify the search in all areas of the search space.

For all of the problems solved in this work, \(\alpha _1\) and \(\beta _1\) are both equal 2.0. These parameters are found by experimental testing for a large subset of test functions.

Case 2 The thieves may grasp that they have been deceived, so they will randomly explore the search space for Ali Baba. In this case, the new locations of the thieves can be obtained as follows:

$$\begin{aligned} x^{i}_{t+1}= {\rm Td}_{t} \left[ \left( u_j - l_j\right) {\rm rand} + l_j\right] ; \;\; r_{3} \ge 0.5,\; r_{4} \le {\rm Pp}_{t} \end{aligned}$$
(10)

The parameter \({\rm Td}_t\) is incorporated in Eq. 10 because the thieves have a good level of knowledge to discern of the most propitious areas of the search space where Ali Baba’s house could be.

Case 3 In order to ameliorate the exploration and exploitation features of the AFT algorithm, this study also considers the search in other positions than those that could be obtained using Eq. 5. In this case, the new locations of the thieves can be obtained as follows:

$$\begin{aligned} x^{i}_{t+1}= & {} gbest_{t} - \biggl [Td_{t} \left( {\rm best}^{i}_{t} - y^{i}_{t}\right) r_{1} \nonumber \\+ & {} Td_{t} \left( y^{i}_{t} - m^{a(i)}_{t} \right) r_{2} \biggr ] \mathop {\mathrm {sgn}}({\rm rand}-0.5); \nonumber \\&r_{3} < 0.5 \end{aligned}$$
(11)

The pseudo-code of the AFT algorithm can be briefly described by the iterative steps given in Algorithm 1.

figure a

Algorithm 1 reveals that AFT initiates optimization in solving an optimization problem by randomly generating a set of positions (i.e., potential solutions), considering the upper and lower bounds of the problem variables. After that, the best position, the global best position of the thieves and Marjaneh’s wit plans are initialized. The quality of each created solution is assessed using a pre-defined fitness function, whereby the suitability of each solution is recalculated within each iteration in order to identify the thief with the optimal solution. For each dimension, the new position of the thieves is computed iteratively within each iteration using Eqs. 510 and 11. The feasibility of each new position is examined to see if it moves out of the search area. In such a context, it will be brought back to the boundary on the basis of the simulated steps of AFT. Then, the new position, the best position, the global best position of the thieves and the wit plans of Marjaneh are assessed and updated accordingly. All the steps of AFT shown in Algorithm 1, except the initialization steps, are iteratively performed until the termination evaluation condition is reached. At the end, the best position of the thieves is scored as a solution of the optimization problem.

4.4 Exploration ability of AFT

There are many parameters in AFT that lead to exploration, explained as follows:

  • \(Td_{t}\): this parameter controls the exploration quantity of AFT. It identifies the extent to which the new locations of the thieves would be to the house of Ali Baba. The selection of appropriate values for \(\alpha _0\) and \(\alpha _1\) for \(Td_t\) would reduce the recession probability in local optima and augment the probability of approaching the global optimum. Based on the experimental tests, \(\alpha _{0} = 1\) and \(\alpha _1 = 2\) offers a good balance between exploration and exploitation.

  • \(Pp_{t}\): this parameter underlines the high exploration capacity of AFT when it takes relatively small values. This parameter is gradually increased during the iterative process of AFT. The choice of the values for \(\beta _0\) and \(\beta _1\) in \(Pp_{t}\) is a little bit arbitrary, but was selected based on pilot testing for a large set of test functions. In the initial iterations, the candidates are all far away from each other in distance. Updating the parameter \(Pp_{t}\) improves AFT’s ability to avoid stagnation entrapment in local optima and approaches the global optimum. Based on empirical testing, \(\beta _0 = 0.1\) and \(\beta _1 = 2\) present a good balance between exploration and exploitation.

  • \(\mathop {\mathrm {sgn}}({\rm rand}-0.5)\): this parameter manages the direction of exploration. Since rand takes values between 0 and 1 with a uniform distribution, there is an equal probability of negative and positive signs.

  • Marjaneh’s intelligence plans: Using this parameter will directly improve the AFT’s ability for exploration.

4.5 Exploitation ability of AFT

The key parameters that help to perform local search and exploitation in AFT can be described as follows:

  • \(Td_{t}\): as iteration passes, exploration fades out and exploitation fades in. Small values of \(Td_{t}\) lead to local searches in promising areas of the search space. As a result, at the last iterations, where thieves are close to the house of Ali Baba, the positioning updating process with cases 1, 2 and 3 will assist in local search around the best solution, leading to exploitation.

  • \(Pp_{t}\): this parameter controls the exploitation feature, by quantifying the quantity of exploitation through in-depth search around the best solution. With the passage of iterations, the exploitation stage heightens with facing relatively large values of this parameter. Thus, the positioning updating process with cases 1 and 2 enhances AFT’s ability to locally ramp up the searches in the space, which results in further exploitation.

  • \(\mathop {\mathrm {sgn}}({\rm rand}-0.5)\): this parameter controls the quality of exploitation by changing the direction of the search.

4.6 Computational complexity analysis

Computational complexity of an optimization method can be defined by a function that links the runtime of the optimization method to the input size of the problem. To do this, Big-O notation is applied here as a widespread terminology. In this, the time complexity of AFT can be given as follows:

$$\begin{aligned} {\mathcal {O}} (AFT)= & {} {\mathcal {O}} ({\rm problem} \; {\rm def}.) + {\mathcal {O}} ({\rm initialization})\nonumber \\&+ {\mathcal {O}}(t ({\rm pos}. \; {\rm update})) + {\mathcal {O}}(t ({\rm cost} \; {\rm function})) \nonumber \\&+ {\mathcal {O}}(t ({\rm boundary} \; {\rm control})) \end{aligned}$$
(12)

As Eq. 12 suggests, the time complexity of AFT relies on the number of iterations (t), the number of thieves (n), the dimension of the problem (d) and the cost of the objective function (c). In concrete terms, the overall time complexity of AFT under the termination method can be computed as follows:

$$\begin{aligned} {\mathcal {O}} ({\rm AFT})= & {} {\mathcal {O}}(1) + {\mathcal {O}}(nd) + {\mathcal {O}}(vtnd) \nonumber \\&+ {\mathcal {O}}(vtnc) + {\mathcal {O}}(vtnd) \end{aligned}$$
(13)

where v denotes the number of evaluation experiments.

The number of iterations (t) is typically greater than the number of thieves (n), the cost of the fitness function (c) and the number of problem’s variables (d). Also, the number of problem variables (d) and the cost of the fitness function (c) are usually less than the number of thieves (n). Accordingly, the parameters t and n are important factors in assessing the computational complexity. As \(nd \ll tnd\) and \(nd \ll tcn\), the items 1 and nd can be excluded from the complexity issue given in Eq. 13; also, \(2 vtnd \cong vtnd\). Therefore, the time complexity of AFT can be expressed as follows:

$$\begin{aligned} {\mathcal {O}} ({\rm AFT}) \cong (vtnd + vtnc) \end{aligned}$$
(14)

As it is shown, the complexity issue of the AFT is of the polynomial order, which can be deemed as an effective meta-heuristic optimization algorithm.

4.7 Characteristics of AFT

Human-based algorithms possess two abilities, that is, exploration and exploitation. This is to optimize the search space of a problem. In AFT, these abilities are realized by the convergence of the thieves toward the global optimum solution. To be precise, convergence means that most of the thieves gather in the same position in the search space. AFT utilizes several parameters that lead to exploration and exploitation as explained in Subsects. 4.4 and 4.5, respectively. These parameters are beneficial for carrying out the convergence process of AFT. The thieves (i.e., search agents) in AFT can change their position in line with a mathematical model and tuning criteria as implemented by three basic cases that may occur while thieves search for Ali Baba. These cases are presented in Eqs. 5, 10 and 11. In each case, it is assumed that the thieves search efficiently throughout the surrounding environment, while there is also a percentage that occurs due to Marjaneh’s intelligence that forces the thieves to search in random locations. There are two important parameters in AFT, referred to as tracking distance and perception potential that are presented in Eqs. 8 and 9, respectively. With these two parameters, AFT can better search the space for all possible solutions to identify the optimal or suboptimal solutions. Another important parameter in AFT is the simulation of Marjaneh’s intelligent ways to deceive the thieves. Thereby, the thieves will explore the search space in different locations and directions, which implies that better solutions may be found in other promising areas. In short, AFT has several distinct merits according its basic principle, summarized as follows: (1) The position updating models for case 1 and case 3 of AFT effectively assist the individuals of the population to explore and exploit every area in the search space. (2) The random search that thieves use in the search space using case 2 not only enhances the diversity of the population but also ensures the speed of convergence, indicating an efficient balance between exploration and exploitation. (3) The number of parameters in AFT is small, but they have good ability to improve its strength and performance. (4) The computational burden of AFT is low as discussed in Subsect. 4.6.

As a result, there is a big room for enhancing the performance of AFT according to the above mentioned characteristics, as presented in the following section.

5 Possible developments of AFT

To further study the potential performance of the AFT algorithm, it is elaborated from several aspects as shown in the following subsections.

5.1 Self-adaptation of tracking distance of AFT

Self-adaptive tracking distance is used to strike a better balance between exploration and exploitation during the search process [70]. This distance decreases as a function of time indicating that the exploration capacity gradually fades out and exploitation capacity gradually fades in. However, the search agents trapped in local optima area demand reasonable exploration to escape from this local optima. Some search agents require a large tracking distance to explore the search space and others exploit the local area with a small tracking distance. So, it is imperative for each search agent to have its own tracking distance to balance exploration and exploitation. When the fitness value of a search agent is worse or unaltered, it denotes that the search agent can identify the local optimal area. In this, the search agents require a large tracking distance to move away from this area. When the fitness value of a search agent ameliorates, it indicates that the search agent has a superior chance of getting close to the optimal solution. Hence, the value of the tracking distance, Td(t), of the search agents of AFT should be grown. Figure 3 presents an illustration of the self-adaptive tracking distance [70].

Fig. 3
figure 3

Illustration of self-adaptive tracking distance in AFT [70]

In Fig. 3, the pentagram, sphere, circle and arrowhead stand for the global optimum area, the local optimum area, search agent and tracking distance, respectively. Case 1 implements that if a search agent is trapped into local optimal area, it needs an appropriate tracking distance to raise its strength to escape from this area. On the other hand, Case 2 indicates that a search agent rapidly moves to the global optimal area with mounting tracking distance. In this context, to estimate the situation of the search agents in AFT, the counters \(ns_i\) and \(nf_i\) are presented in Eqs. 15 and 16, respectively, to record the fitness results of the ith search agent [70].

$$\begin{aligned} ns_{i}^{t}= & {} {\left\{ \begin{array}{ll} ns_{i}^{t-1}+1 &{} if \;\; f_i(t) < f_i(t-1)\\ 0 &{} if \;\; f_i(t) \ge f_i(t-1) \end{array}\right. } \end{aligned}$$
(15)
$$\begin{aligned} nf_{i}^{t}= & {} {\left\{ \begin{array}{ll} nf_{i}^{t-1}+1 &{} if \;\; f_i(t) > f_i(t-1)\\ 0 &{} if \;\; f_i(t) \le f_i(t-1) \end{array}\right. } \end{aligned}$$
(16)

Equations 15 and 16 are presented to adjust the parameter Td(t) according to the fitness of the objective function, where it is employed to estimate the search condition of search agents. If a search agent fails to obtain a better solution in many iterations, the search agent gets stuck in the local optima area with a high probability. If a search agent is improved in many iterations, it may migrate to the global optima.

A threshold \(\theta\) and probability p are applied to dominate the update of the tracking distance of search agents over the course of iterations. If \(ns^t_i\) exceeds \(\theta\), the tracking distance will be increased to speed up the convergence of the ith search agent toward the super search agent. Likewise, if \(nf^t_i\) exceeds \(\theta\), the tracking distance is enhanced to mend the ability to avert the local optima while searching. The self-adaptive \({\rm Td}_i(t)\) of the ith search agent is defined in Eq. 17:

$$\begin{aligned} {\rm Td}_{i}(t)={\left\{ \begin{array}{ll} {\rm Td}_{i}(t)\cdot r_i(t) &{} {\rm if}\;\; {\rm count} > \theta\; \&\, {\rm rand} < p\\ Td_{i}(t) &{} {\rm otherwise} \end{array}\right. } \end{aligned}$$
(17)

where count comprises two counters \(ns^t_i\) and \(nf^t_i\), and rand is a uniformly distributed random value in the interval [0, 1].

In Eq. 17, when count overrides \(\theta\) and rand overrides p, the search agent requests a large tracking distance to strengthen its exploration capability, where it is multiplied by \(r_i(t)\). Otherwise, the updated tracking distance is set to the tracking distance of AFT.

The tracking distance of search agents is related to the distance that thieves use to follow Ali Baba as given in Eq. 8. Also, there is another parameter that is related to the perception potential of the thieves for Ali Baba. Therefore, the tracking distance and perception potential constants of the ith search agent are used to adapt its \({\rm Td}_i(t)\). The ratio of tracking distance and perception potential constants of the search agents can be thought of as the new tracking distance value for the search agents. Here, \(r_i(t)\) stands for the adjusted value of \({\rm Td}_i(t)\) and is presented in Eq. 18:

$$\begin{aligned} r_{i}(t)={\left\{ \begin{array}{ll} \frac{2}{c} &{} if \;\; c < 1\\ c &{} c \ge 1 \end{array}\right. } \end{aligned}$$
(18)

where c is the ratio of tracking distance and perception potential constants as shown in Eq. 19.

$$\begin{aligned} c= \left| log\left( \frac{{\rm Td}_i(t)}{{\rm Pp}_i(t)}\right) \right| \end{aligned}$$
(19)

where \({\rm Td}_i(t)\) and \({\rm Pp}_i(t)\) are the tracking distance and perception potential of search agent \(x_i\) at iteration t, respectively.

Equation 18 states that \(r_i(t)\) is set to the reciprocal of c when c is less than 1 to ensure that the search agent can have a reasonably large tracking distance, otherwise, \(r_i(t)\) is set to c.

The original and self-adaptive tracking distance constants are presented in Figs. 1 and 4, respectively.

Fig. 4
figure 4

Proposed exponential function for the self-adaptive tracking distance

The tracking distance constant in Fig. 4 is changed in accordance to the search condition of the search agents, where the red and blue lines indicate a modification of Case 1 and Case 2, respectively. The search agent is in a state of failure when the search agent gets a worse search status. In contrast, the search agent is in a state of success when the search agent gets better search status.

In sum, in the search process, various search agents have different search cases. Some of them want vigorous exploration to explore the solution space, while others want extensive exploitation to locate a better solution. Therefore, each search agent adjusts its Td to balance exploration and exploitation.

5.2 Population hierarchical structure of AFT

The hierarchical structure of the population is that the search agents are arranged and placed in different layers according to some specific characteristics [71, 72]. These layers can be stated as different levels from top to bottom according to the actual effect of each layer on the search agents. The top layer leads its next layer, and its next layer leads its second one, and so on. In this light, the interactive relationship between the layers is created to form a hierarchical structure that is used to guide the evolution direction of the search agents [72]. Here, a hierarchical population structure was used in the AFT algorithm, so that premature convergence could be mitigated and search agents could elicit correct interactive information to realize a better development. Further, the search agents have ample opportunity to escape from local optima and get close to the global optimum. Here, a three-layer hierarchical structure was constructed for hierarchical AFT (HAFT) as follows:

  • Bottom layer The distribution of all search agents in the current population is displayed on this layer. Search agents move toward better ones in terms of the best search agents on the middle layer. This layer can divulge the landscape of the function created by a big number of search agents.

  • Medium layer For the effective guidance of the development of general search agents, the best predetermined search agents are ranked on this layer. At each iteration, the medium layer leads the bottom layer to fulfill the position update for the search agents. Each search agent needs a large tracking distance and small perception potential to globally explore the whole search area in the first few iterations of the search process, and these parameters are gradually updated over iterations. This means that the exploration ability of AFT wants to be supported by a large suitable Td(t) and its exploitation capacity demands a small one. Thus, to improve the exploration ability of AFT, a new Td(t) is presented to supersede the original one given in Eq. 8. In the proposed HAFT, a log-sigmoid transfer function was used to design a new constant Td(t) with the formula given below:

    $$\begin{aligned} {\rm Td}(t) = \frac{T0}{1+ e^{\frac{t-\frac{t}{2}}{L}}} \end{aligned}$$
    (20)

    where L is a step length. It can be observed that the value of Td(t) in the graph shown in Fig. 1 decreases rapidly before 500 iterations, indicating the exploration ability of AFT is rapidly diminishing. On the other side, the value of Td(t) in Fig. 5 consistently preserves a large value before 500 iterations and then drops rapidly to near zero. This effect can ensure a powerful exploration ability of AFT in the early stage so that it has enough time to search for an approximate optimal that can be further improved by its next exploitation capability. Since the number of the best search agents gradually decreases with iterations, which means that the number of best search agents dynamically decreases on this layer. It is useful for the global optimum search agent on the top layer to effectively direct many elite search agents in the current population to provide a better development tendency for all of the search agents on the bottom layer.

  • Top layer To provide efficient management for the middle layer, a global optimum search agent is determined and placed on this layer. At each iteration, the pre-identified best search agents on the medium layer are selected to be compared with the global optimum search agent. In case there is a better search agent, the optimal global one is replaced by it. The global optimal search agent has the best position in the current population. Hence, it can attract many of the best search agents to move toward it. This strategy could prevent the best search agents from being trapped in the local optima and could accelerate the convergence speed of the population. According to this strategy, the formula of updating the position of the best search agents oriented by the global optimal search agent is presented as follows:

    $$\begin{aligned} x^{i}_{t+1}= & {} g_{{opt}_{t}} - \biggl [{\rm Td}_{t} \left( y^{i}_{t} - m^{i}_{t} \right) r_{2} \biggr ] \mathop {\mathrm {sgn}}(0.5-r) \nonumber \\&r_{3} <0.5 \end{aligned}$$
    (21)

where \(g_{\rm opt}\) denotes the global optimal search agent and r represents a random number with a uniform distribution in the interval [0, 1].

In HAFT, Eq. 21 was proposed to update the position of the search agents on the top layer which could efficaciously alleviate the early convergence of AFT and improve its performance.

In order to further clarify the apparent properties of HAFT, Fig. 6 was drawn to show its operating precept on the multimodal landscape with local optima.

Fig. 5
figure 5

The new tracking distance parameter Td(t)

Fig. 6
figure 6

The illustrative diagrams of the operating precept of HAFT [71]

It is obvious from Fig. 6a that search agents \(x_3\) and \(x_4\) are heading toward a local optimal, while the global optimal search agent \(g_{\rm opt}\) provides additional guidance to assist them escape from the local optimal. When search agents \(x_3\) and \(x_4\) do not fall into a premature convergence as shown in Fig. 6b, they can draw others to move toward them. Meanwhile, \(g_{\rm opt}\) will accelerate the movement of \(x_3\) and \(x_4\) in order to improve the convergence rate of the population. Thus, the capabilities of exploration and exploitation can be enhanced in the search process. More details about hierarchical population structure can be found in [71, 72].

5.3 Exploration and exploitation

Exploration and exploitation are the two most important properties of meta-heuristic algorithms to achieve success when addressing optimization problems [38]. With regard to these two concepts, empirical experiments have shown that there is a robust relationship between the exploration and exploitation ability of a particular search method and its convergence speed. Particularly, while exploitation procedures are known to improve convergence toward the global optimum, they are also known to rise the likelihood of entrapment into local optima [38]. Conversely, search strategies that promote exploration over exploitation incline to increase the likelihood of locating areas within the search space, where the global optimum is more probable to be identified. This is at the cost of deteriorating the convergence speed of optimization algorithms [39]. In recent years, the question of how exploration and exploitation of solutions is realized in meta-heuristics has remained an open subject, and although it appears trivial, it has stayed as a source of contention among many researchers [40]. Although many thoughts and notions may sound opposite, there appears to be a common consent within the research community on the conception that an adequate ratio between exploration and exploitation is necessary to ensure reasonable performance in this type of search methods.

Meta-heuristics use a set of candidate solutions to explore the search area with the goal of finding satisfying solutions for an optimization problem. Generally, the search agents with the superior solutions are liable to guide the search process toward them. As a result of this attraction, the distance between the search agents fades in while the impact of exploitation fades out. On the other side, when the distance between the search agents increases, the influence of exploration strategy is more pronounced. To compute the increase and decrease in distance between the search agents, a diversity measurement [73] is taken into account. Under this method, population diversity is stated as follows [38]:

$$\begin{aligned} {\rm Div}_j= & {} \frac{1}{N}\sum _{i=1}^{N} \left| {\rm median}(x^j) - x_i^j\right| \end{aligned}$$
(22)
$$\begin{aligned} {\rm Div}= & {} \frac{1}{m}\sum _{j=1}^{m}{\rm Div}_j \end{aligned}$$
(23)

where \({\rm median}(x^j)\) represents the median of dimension j in the entire population, \(x^j_i\) is the dimension j of search agent i, n stands for the number of search agents and m denotes the number of design variables of the optimization problem.

The diversity in each dimension, \({\rm Div}_j\), is stated as the distance between the dimension j of each search agent and the median of that dimension on average. The full balance response is defined as the percentage of exploration and exploitation utilized through a given algorithm. These values are calculated at each iteration using the following formulas [38]:

$$\begin{aligned} {\rm XPL}\%= & {} \left( \frac{\rm Div}{{\rm Div}_{\rm max}}\right) \times 100 \end{aligned}$$
(24)
$$\begin{aligned} {\rm XPT}\%= & {} \left( \frac{\left| {\rm Div-Div}_{\rm max}\right| }{{\rm Div}_{\rm max}}\right) \times 100 \end{aligned}$$
(25)

where \({\rm Div}_{\rm max}\) stands for the maximum diversity value present in the whole optimization process.

The percentage of exploration (XPL%) corresponds to the relationship between the diversity at each iteration and the maximum diversity reached. The percentage of exploitation (XPT%) represents the level of exploitation [38]. As can be observed, both elements XPL% and XPT% are mutually conflicting and complementary. In assessing the balance response, the use of the median value averts discrepancies through the use of a reference element. This balance response is also affected by \(Div_{max}\) that is found during the whole optimization process. This value is employed as a reference to assess the rate of exploration and exploitation.

5.4 Chaos in meta-heuristic algorithms

Chaos theory is one of the most effective strategies used to improve the performance of meta-heuristics by fostering their exploration and exploitation features. Chaos appears to exhibit irregular motion, a characteristic often encountered in nonlinear dynamic systems [74, 75]. It appears to be random, unexpected behavior that a deterministic nonlinear system can present under deterministic conditions. Thus, a chaotic system alters randomly and ultimately passes through each state in the search space when the time period is long enough. The applications of chaos in global optimizers fall into two categories.

5.4.1 Chaotic maps and sequences

Chaotic maps are one of the preferable ways to reinforce the performance score of meta-heuristics in terms of both local optima avoidance and convergence property. They are widely used to improve population diversity and solution quality by substituting random values and adjusting parameters in the initialization of population and iterative loop procedures [74, 75]. Chaotic properties have been used in improved and new meta-heuristics, such as EAs [74, 75], the immune system algorithm [76], PSO [77] and DE [78]. These chaotic meta-heuristics have received a high level of performance through the use of chaotic sequences to replace random variables and parameters. In this, they presented superb performance compared to the other corresponding standard meta-heuristics.

5.4.2 Chaotic local search

Chaotic local search (CLS) appears as an applicable option by making use of randomness and ergodicity of chaos [74, 75]. Chaotic search is a mechanism that could be conducted to improve the accuracy of the search and convergence speed. For this reason, CLS has been integrated with several meta-heuristic algorithms and achieved splendid success in enhancing their performance, such as chaotic PSO [79], chaotic DE [80] and chaotic GSA [81]. Their outcomes showed that CLS could prominently strengthen search capacity and dwindle the problem of getting into local optima. It has been widely demonstrated that meta-heuristics with CLS achieved better performance in terms of convergence rate and solution accuracy than the other corresponding original versions [74, 75].

5.5 Theoretical analysis of the AFT algorithm

To theoretically analyze the performance of the AFT algorithm from the perspective of complex network, there is a need to establish a relationship among its search agents. This analysis is helpful and vital to explain its essence and discover some guiding methods to overcome its limitations in order to better foster the performance of this algorithm [82]. For this intent, we use the population interaction network (PIN) method reported in [83, 84] to put in place the relationship between the search agents of AFT. This is for exploring and analyzing the intrinsic phenomenon that occurs in a complex network. A clustering method was used to classify the search agents [83, 84]. In this method, each search agent can be regarded as a vertex and the update position mechanism between the search agents denotes the generation of edges. The PIN method can be used to obtain both the intrinsic connection of knowledge and characteristic of the network formed by the population. The method that composes the interaction of population in AFT is displayed in Fig. 7.

Fig. 7
figure 7

A descriptive schematic diagram of PIN method in the AFT algorithm [83, 84]

In Fig. 7, blue circles, transparent circles, transparent rectangles, blue rectangles, and blue diamond shapes implement the search agents (i.e., vertices) in the current population, the clusters, search agents \(x_p\) that will be replaced, the search agents that have been replaced and the new created search agent \(U_n\), respectively. The circle, square and triangle denote the current, new constructed and old substituted vertices, respectively. It is noticed from Fig. 7 that the distribution of search agents has changed in the whole population and that the number of search agents in each class changes accordingly. In sum, the initial construction process of PIN can be described as stated below:

  1. 1.

    There are three classes and nine basic search agents in the population;

  2. 2.

    Two chosen search agents, \(x_{s1}\) and \(x_{s2}\), from two classes yield the new created search agent \(U_{n}\) for comparison with the previous search agent, \(x_{p}\). A vertex and two edges are created at the same time;

  3. 3.

    If the search agent \(U_n\) overrides the search agent \(x_p\), in which \((f(U_n) < f (x_p))\), then \(U_n\) replaces \(x_p\);

  4. 4.

    Another search agent \(x_s\) is chosen from one class to create \(U_n\) to replace \(x_p\), which means creating a vertex and an edge;

  5. 5.

    The replacement process is carried out once more when \(f (U_n) < f (x_p)\);

  6. 6.

    At the next iteration, the clustering method resumes classifying the search agents into three classes to finally get to terminate the algorithm and obtain the PIN topology.

Readers can read [83, 84] for a detailed description of the PIN method.

6 Comparative analysis of AFT with other meta-heuristics

This section presents a comparative analysis of AFT with other meta-heuristics such as PSO, GSA, DE, GA, covariance matrix adaptation-evolution strategy (CMA-ES) and ant colony optimization (ACO) algorithm.

6.1 Particle swarm optimization

PSO [85] mimics the cooperative social collective behavior of the living creatures such as flocks of birds. Optimization begins with the use of randomly generated solutions known as artificial particles. Each particle in the swarm has a randomly generated velocity. If \(x_i\) is the initial position of the ith particle with velocity \(v_i\), then the position updating strategy of PSO can be given as follows [86]:

$$\begin{aligned} v_i(t + 1)&= wv_i(t) + c_1(Pbest_i - x_i(t)) r_{1} \\&\quad + c_2(Gbest - x_i(t)) r_{2} \end{aligned}$$
(26)
$$\begin{aligned} x_i(t + 1)= x_i(t) + v_i(t + 1) \end{aligned}$$
(27)

where w is the inertial weight, \(c_1\) and \(c_2\) are cognitive and social constants, respectively, \(r_{1}\) and \(r_{2}\) are distributed random numbers in the interval [0, 1], \(Pbest_i\) is the local best solution of the ith particle and Gbest is the global best solution among all particles.

6.1.1 AFT versus PSO

Similar to PSO, AFT initiates the optimization process by motivating the search agents to move in the search space in search of their target. However, the positioning updating mechanism of AFT is entirely different from that of PSO. Some of the main differences are described as follows:

  1. 1.

    In PSO, the movement update of the ith particle is obtained by \(Pbest_i\) and Gbest as given in Eq. 26, where the effect of these two parameters is considered to identify the new position of the particles in the search space. In regard to AFT, the new position of the search agents are obtained through three different cases as given in Eqs. 510 and 11. In other words, PSO updates all solutions with one strategy as presented in Eq. 27, while the search agents of AFT use three strategies to update their position in the search space.

  2. 2.

    The PSO algorithm is greatly influenced by the initial values of the cognitive and social parameters as well as the weighting strategy of the velocity vector, where these parameters are used as the particle develops a new position. In the AFT model, the thieves develop a new position with the help of tracking distance which is adapted during its iterative loops. This enables AFT to alternate between local and global searches.

  3. 3.

    The behavior of the thieves’ movement is affected by the information given by someone about the whereabouts of Ali Baba’s house, which is designed with a random number (\(r_{3}\)). Accordingly, case 1 or case 3 shown in Eqs. 5 or 11 are used, respectively. Inclusion of this random number in the AFT model suddenly redirects thieves’ movement and thus improves exploration and exploitation in AFT. On the other hand, PSO does not use such behavior.

  4. 4.

    Simulation of thieves’ behavior in Eq. 10 imparts an opportunity to present a random behavior of thieves’ movement. This enables AFT to mitigate stumbling in local optimum areas. This behavior is not used in PSO due to the natural behavior of swarms.

  5. 5.

    The use of Marjaneh’s intelligence that is formulated in Eq. 7 improves the exploration feature of AFT, where there is no such thing in the PSO algorithm.

  6. 6.

    The use of tracking distance and perception probability in AFT enables it to conduct local searches in local areas at some times, exploration of the search space on a global scale at other times as well as getting an appropriate balance between exploration and exploitation features. These two parameters are not present in the PSO algorithm.

6.2 Gravitational search algorithm

Gravitational search algorithm is a physics-based algorithm evolved on the basis of the law of gravity [44]. Each individual (i.e., agent) evolves its position according to the gravitational force among individuals. The mechanism of GSA is based on the interaction of masses in the universe by means of the Newtonian law of gravitation. To describe GSA, consider a system with N masses (i.e., agents), where the position of the ith mass is defined as follows:

$$\begin{aligned} X_i = (x^1_i , x^2_i , \ldots , x^d_i )\;\;\;\;\; i\in 1, 2, 3, \ldots , N \end{aligned}$$
(28)

where \(x^d_ i\) denotes the position of the ith mass in the dth dimension and d represents the total number of dimensions in the search space.

The mass of the ith agent is computed after calculating the fitness of the current populations, which is defined as follows:

$$\begin{aligned} m_i(t)= & {} \frac{f_i(t)-{\rm worst}(t)}{{\rm best}(t)-{\rm worst}(t)} \end{aligned}$$
(29)
$$\begin{aligned} M_i(t)= & {} \frac{m_i(t)}{\sum _{j=1}^{N}m_j(t)} \end{aligned}$$
(30)

where \(M_i(t)\) and \(f_i(t)\) represent the mass and fitness values of the ith agent at iteration t, respectively, best(t) and worst(t) represent the best and worst fitness values of the current population in the tth iteration, respectively, where worst(t), for a minimization problem, is defined as follows:

$$\begin{aligned} {\rm worst}(t) = {\rm max} fit_j(t), \;\;\;\;\;\;\;\; j \in \left\{ 1, 2, \ldots , N\right\} \end{aligned}$$
(31)

The gravitational force between agents \(X_i\) and \(X_j\) in the dth dimension can be computed as follows:

$$\begin{aligned} F_{ij}^d(t)= G(t) \frac{M_i(t) \times M_j(t)}{R_{ij}(t)+\epsilon } \left( x_j^d(t) -x_i^d(t) \right) \end{aligned}$$
(32)

where \(R_{ij}(t)\) stands for the Euclidean distance between agents i and j, \(\epsilon\) is a small value used to eschew division by zero and G(t) is a gravitational constant given as a function of time as shown below:

$$\begin{aligned} G(t) = G_0 \times e^{-\alpha \frac{t}{T}} \end{aligned}$$
(33)

where \(G_0\) represents an initial value, \(\alpha\) represents a constant value, and t and T represent the current iteration number and the maximum number of iterations, respectively. The total gravitational force \(F^d_i(t)\) for agent \(X_i\) is given as follows:

$$\begin{aligned} F_i^d(t) = \sum _{j \in Kbest, j \ne i}{\rm rand}_iF_{ij}^d(t) \end{aligned}$$
(34)

where Kbest refers to a group of the first K best agents with the best fitness value and biggest mass is kbest, K is the agent number of Kbest and \({\rm rand}_i\) is a uniformly distributed random number in the range [0, 1].

Hence, the acceleration \(a^d_i(t)\) of agent \(X_i\) in the dth dimension at time t can be computed using a law of motion as shown in Eq. 35.

$$\begin{aligned} a_i^d(t)= \frac{F_i^d(t)}{M_i(t)} \end{aligned}$$
(35)

Then, the velocity \(v^d_i(t + 1)\) and position \(x^d_i(t + 1)\) of agent \(X_i\) are updated, respectively, as follows:

$$\begin{aligned} v_i^d(t+1)= & {} {\rm rand} _i \times v_i^d (t) + a_i^d(t) \end{aligned}$$
(36)
$$\begin{aligned} x_i^d(t+1)= & {} x_i^d (t) + v_i^d(t+1) \end{aligned}$$
(37)

where \({\rm rand}_i\) is a uniformly distributed random value in the interval from 0 to 1.

6.2.1 AFT versus GSA

  1. 1.

    As shown in Eqs. 510 and 11, the position updating mechanism of the search agents of AFT is totally different from that of GSA as defined in Eq. 37.

  2. 2.

    GSA uses acceleration and velocity vectors for the movement of its agents, while AFT generates new directions of movement for its search agent by various mechanisms.

  3. 3.

    There are some own adaptive parameters for AFT such as \({\rm Td}_t\) and \({\rm Pp}_t\). However, GSA does not use these parameters, where it has its own parameters such as \(\epsilon\) and \(R_{ij}(t)\).

  4. 4.

    AFT algorithm incorporates the concept of random movement of the thieves based on the parameter \(Pp_t\), and uses Marjaneh’s intelligence in its position updating mechanism. Obviously, the generation of new solutions by AFT seems very different from the update mechanism of the agents of GSA.

6.3 Conventional differential evolution algorithm

The differential evolution (DE) algorithm is a population-based evolutionary algorithm evolved to solve real-valued optimization problems [87]. The evolutionary process of DE involves evolutionary concepts such as mutation, crossover and selection strategies similar to those used by GAs. The initialization of each individual \(X_i , i \in \left\{ {1, 2, \ldots , {\rm NP}}\right\}\) in DE is described as follow:

$$\begin{aligned} X_i^d= X_i^{dl} + {\rm rand} (0, 1) \cdot (X_i^{u} - X_i^{l}) \end{aligned}$$
(38)

where NP is the population size, \(d \in \left\{ {1, 2, \ldots , D}\right\}\) denotes the dimension of the problem, u and l represent the upper and lower bounds of \(X_i\) in the dth dimension, respectively.

The mutation strategy of DE can characteristically create a mutant vector to be an intermediate variable \(V_i\) for evolution according to:

$$\begin{aligned} V_i= X_{r1} + F \cdot (X_{r2}-X_{r3}) \end{aligned}$$
(39)

where \(r_1, r_2\) and \(r_3 \in \left\{ {1, 2, \dots , {\rm NP}}\right\}\) are random indices, \(i \ne r_1 \ne r_2 \ne r_3\) and F is a constant operator that indicates the level of amplification.

The crossover strategy of DE that can boost the diversity of new agent \(U_i\) by combining the original agent \(X_i\) with the intermediate variable \(V_i\) can be defined as follows:

$$\begin{aligned} U_{i}^{d}={\left\{ \begin{array}{ll} V_{i}^{d} &{} {\rm if} \;\; {\rm rand}(0, 1) \le {\rm CR} \;\; {\rm or} \;\; d=d_{\rm rand}\\ X_{i}^{d} &{} {\rm otherwise} \end{array}\right. } \end{aligned}$$
(40)

where CR represents a crossover control parameter and \(d_{\rm rand} \in \left\{ [1, 2, \ldots , D]\right\}\) denotes a random number.

The selection process in DE is performed in each iteration by contrasting \(U_i\) with \(X_i\) using a greedy norm for a better agent reserve in the population for the next iteration. Through these evolutionary processes, DE could rapidly converge and eventually obtain the global optimum.

6.3.1 AFT versus DE

Generally speaking, since AFT is a human-based optimization algorithm, so there is no need for evolutionary processes such as crossover, mutation and selection operations. The main differences between DE and AFT can be briefed by the following points:

  1. 1.

    The AFT algorithm preserves search space information over subsequent iterations, while the DE algorithm discards the information of previous generations once a new population is formed.

  2. 2.

    AFT involves fewer operators to adjust and run as compared to DE that uses several operations such as selection and crossover. Moreover, AFT utilizes a parameter denoting Marjaneh’s plans, while DE does not memorize the best solution obtained so far.

  3. 3.

    In DE, exploration is enhanced using crossover and selection operations, while in AFT, it is enhanced by allowing the thieves to randomly explore the search space.

  4. 4.

    In DE, mutation is generally implemented on the basis of enhancing the exploitation of DE. However, a better exploitation of AFT is achieved with the perception probability parameter.

6.4 Genetic algorithm

GA was first put forwarded by Holland [88]. It is considered as a global optimization algorithm inspired by biological mechanisms such as evolution and genetics. When using GAs, the search space is used to construct chromosomes, whereby every possible solution is coded as a chromosome (i.e., individual). In optimization with GA, evolution begins with a group of randomly formed individuals from a population. The fitness score of each individual is computed in each generation. The variables of the solutions are adjusted based on their fitness values. Since the best individuals are given higher probability to participate in enhancing other solutions, the random initial solutions are very probable to be improved. Based on a fitness function, chromosomes are selected and certain genetic operators such as mutation and crossover are applied to the selected chromosomes to form new ones. The idea is that these chromosomes evolve and always create better individuals until they reach the global optimum [89].

6.4.1 AFT versus GAs

Both GAs and AFT are population-based techniques; however, the key differences between them can be briefed as follows:

  1. 1.

    Since GA is similar to DE in which it uses crossover, mutation and selection operations; the AFT algorithm does not use these operations.

  2. 2.

    The AFT algorithm uses Marjaneh’s intelligence, while GA does not use such a parameter and does not save the best solutions obtained so far.

  3. 3.

    GA evolves and updates its population using crossover, mutation and selection operations, while AFT improves its exploration ability with the concept of random relocation of the thieves that is managed by a perception probability parameter.

6.5 Covariance matrix adaptation-evolution strategy

The CMA-ES [90, 91] is an evolutionary algorithm for nonlinear non-convex optimization problems in continuous domain. Specifically, it is a second-order approximation algorithm that estimates the derivatives of a covariance matrix within an iterative procedure according to the principle of maximum likelihood. By doing this, it tends to maximize the probability of the distribution. At each iteration, the members of the new population are sampled from a multivariate normal distribution N with covariance \(C\in {\mathbb {R}}^{n\times n}\) and mean \(m\in {\mathbb {R}}^n\). The new individuals at generation \(i + 1\) are sampled as:

$$\begin{aligned} x^{i+1} _k \sim m^i + \sigma ^i{\mathcal {N}}\left( 0, C^i \right) \;\;\;\;\;\;\;\;\;\;\;\;\;\;k = 1,\ldots ,\lambda \end{aligned}$$
(41)

where \(\sigma ^i\) is the ith step size and \(x^i_k\) is the \(k_\mathrm{th}\) individual at generation i.

The sampled points, \(\lambda\), are ranked in ascending order of fitness, and the best points, \(\mu\), are chosen. The mean of the sampling distribution given in Eq. 41 is updated using weighted intermediate recombination of these specified points:

$$\begin{aligned} m^{i+1} = \sum _{j=1}^{\mu }\omega _j X_{j:\lambda }^{i+1} \end{aligned}$$
(42)

with

$$\begin{aligned} \sum _{j=1}^{\mu }\omega _j =1, \;\;\;\;\;\;\;\;\; \omega _1 \ge \omega _2\ge \ldots \ge w_{\mu }>0 \end{aligned}$$
(43)

where \(\omega _j\) are positive weights, and \(x_{j:\lambda }^{i+1}\) stands for the jth ranked individual of the \(\lambda\) sampling points \(x_{k} ^{i+1}\). The sample weights of the standard CMA-ES implementation are decreased as:

$$\begin{aligned} \omega _j =log\left(\frac{\lambda - 1}{2} + 1\right) - log(j) \end{aligned}$$
(44)

The covariance matrix can be adapted for the next generation using a combination of rank-\(\mu\) and rank-one update as follows:

$$\begin{aligned} C^{i+1}= & {} (1 -c_{cov})C^i + \frac{c_{cov}}{\mu _{cov}} p_c^{i+1} p_c^{{(i+1)}^T} \nonumber \\+ & {} c_{cov}\left(1-\frac{1}{\mu _{cov}}\right)\sum _{j=1}^{\mu }\omega _j y_{j:\lambda }^{i+1}(y_{j:\lambda }^{i+1})^T \end{aligned}$$
(45)

With \(\mu _{cov} \ge 1\) is the weighting between rank-one update and rank-\(\mu\), \(c_{cov} \in [0, 1]\) is the learning rate for the covariance matrix update, and \(y_{j:\lambda }^{i+1} = (X_{j:\lambda }^{i+1}- m^i)/\sigma ^i\). The evolution path \(p_c^{i+1}\) and \(\sigma ^i\) are identified by an adaptation formula [91].

6.5.1 AFT versus CMA-ES

Both CMA-ES and AFT are population-based techniques; however, the major differences between them are summarized as shown below:

  1. 1.

    The CMA-ES basically parameterizes the multivariate normal distribution \({\mathcal {N}}(textbf{m}, \sigma ^2C)\) which consists of three terms: the mean vector m, the step-size \(\sigma\) and the covariance matrix C. On the other hand, the AFT algorithm does not use these components, rather it uses Eqs. 510 and 11 to update the position of its search agents.

  2. 2.

    The CMA-ES uses two evolution paths that accumulate consecutive steps of the mean vector update for the cumulative step-size adaptation and the rank-one update of the covariance matrix. However, the AFT algorithm does not use such these evolution paths.

  3. 3.

    AFT uses some concepts to assist alternating between local and global solutions in the update of its search agents’ position. However, the CMA-ES algorithm uses a covariance matrix that can be adapted for the next generation using an integration of rank-\(\mu\) and rank-one.

6.6 Ant colony optimization

ACO is a meta-heuristic algorithm that distributes the search activities to so-called “ants” [92]. The activities are split among agents with simple basic abilities that imitate, to some extent, the behavior of real ants in foraging. It is crucial to underline that ACO has not been developed as a simulation of ant colonies, but to employ the metaphor of artificial ant colonies and their application as an optimization tool. At the start of processing in ACO, where there is no information about the path to go from one point to another, the choice of ants about which path to walk in is totally random. During processing, the intention is that if an ant has to choose between different paths at a given point, those that have been chosen heavily by the preceding ants (i.e., those with a high trail level) are chosen with a higher probability. Generally, ACO approach tries to address an optimization problem by repeating the next two steps:

  • A pheromone model, as a specific probability distribution, was used to evolve the candidate solutions over the solution space;

  • The candidate solutions are utilized to adjust the pheromone values in a manner that is deemed to bias future sampling toward higher quality solutions.

The choice of ant agents in constructing solution components using a pheromone model is probabilistically defined at each construction step. An ant moves from node i to node j using the following rule:

$$\begin{aligned} \rho _{(i, j)} = \frac{\tau _{(i, j)}^{\alpha } \cdot n_{(i, j)}^{\beta }}{\sum (\tau _{(i, j)}^{\alpha }) \cdot (n_{(i, j)}^{\beta })} \end{aligned}$$
(46)

where \(\tau _{(i, j)}^{\alpha }\) is the pheromone value associated with edge (ij), \(n_{(i, j)}^{\beta }\) is the heuristic value associated with edge (ij), \(\alpha\) is a positive real parameter whose value identifies the relative importance of pheromone versus heuristic information and controls the influence of \(\tau _{(i, j)}^{\alpha }\), \(\beta\) is a positive parameter that identifies the relative importance of pheromone versus heuristic information and controls the influence of \(n_{(i, j)}^{\beta }\).

Once the solution is built, the ant evaluates the partial solution to be used using Eq. 47 that specifies how much pheromone to deposit.

$$\begin{aligned} \tau _{(i, j)} = (1- \rho )\tau _{(i, j)} + \delta \tau _{(i, j)} \end{aligned}$$
(47)

where \(\tau _{(i, j)}\) is the pheromone value correlated with edge (ij), \(\rho \in (0, 1]\) is the pheromone evaporation rate and \(\delta \tau _{(i, j)}\) is the amount of pheromone deposited, typically given by:

$$\begin{aligned} \tau _{(i, j)}^k ={\left\{ \begin{array}{ll} 1/L_K &{} {\rm if} \; {\rm ant} \; K \; {\rm travels} \; {\rm on} \; {\rm edge} \; (i, j)\\ 0 &{} {\rm otherwise} \end{array}\right. } \end{aligned}$$
(48)

where \(L_K\) is the cost of the \(K{th}\) ant’s tour.

6.6.1 AFT versus ACO

ACO and AFT apparently look similar but are quite different. They both present several differences in their formulation and position updating mechanism.

  1. 1.

    Both ACO and AFT work on the effective division in the search for the optimal solution. In ACO, the idea is to create a pool of artificial ants that move randomly around an environment. In AFT, the thieves search for Ali Baba by exploring and exploiting each area in the search space using three different cases in the position updating process.

  2. 2.

    In ACO, the candidate solutions are constructed using a pheromone model. However, AFT uses the local and global best solutions of the thieves to locate the optimum solutions.

  3. 3.

    In ACO, a new solution is created by Eq. 48, which is conceptually different from the position updating strategy of AFT given in Eqs. 5 to 11. The updating strategy of AFT is a kind of directed and undirected search approach, in which new solutions are forced to move toward a better solution.

  4. 4.

    The AFT algorithm uses a memory parameter (i.e., Marjaneh’s plans) in its updating process. On the other hand, ACO does not use such a parameter in updating its new solutions. Apart from this, AFT also uses a stochastic location updating strategy as shown in Eq. 10 to improve its exploration feature, while ACO does not use such a strategy.

  5. 5.

    AFT has two parameters that can be adapted during its iterative process to enhance exploration and exploitation features and to balance them. However, ACO does not use any parameters to be adapted over the course of iterations.

As previously discussed, an effective meta-heuristic must strike an appropriate balance between exploration and exploitation. However, there is no rule of thumb [93] to make this happen. The slight differences in solutions update and random distributions could have a significant effect on the performance of the designed algorithms [94]. Therefore, AFT becomes a good competitor to the existing meta-heuristics.

7 Experimental results and analysis

In this section, to assess the accuracy of the proposed AFT algorithm, we conducted intensive evaluations on a set of 62 test functions, involving commonly used unimodal, multimodal, hybrid and composition functions. These functions involve: (1) 23 benchmark test functions (described in Table 33 in "Appendix A"), (2) 29 functions taken from IEEE CEC-2017 benchmark functions [95] described in Table 34 in "Appendix B", and (3) a set of 10 IEEE CEC-2019 functions (described in Table 35 in "Appendix C"). These functions are over and over used in the literature to test the performance of any new meta-heuristic algorithm. The experiments designed to verify the performance of AFT are outlined as follows:

  • First, exhaustive comparative studies were presented to verify the reliability and accuracy of the AFT algorithm in relation to other meta-heuristics.

  • Second, a set of qualitative measures including search history, trajectory, convergence curves and average fitness values were plotted to examine the adequacy of AFT in addressing several types of test functions.

  • Third, the optimization performance of AFT was studied in light of several development applied to AFT from several aspects.

  • Forth, Friedman’s and Holm’s test methods were used to verify the significance of the outcomes obtained by AFT.

7.1 Experimental setup

The results produced by AFT in the optimization of the above three mentioned benchmark functions are compared with those produced by other well-regarded algorithms. The parameter definitions of AFT algorithm and those comparative algorithms are given in Table 1.

Table 1 Parameter setting values of the AFT algorithm and other algorithms

In order to provide a fair comparison between the proposed algorithm and another selected set of algorithms, we followed the same initialization process for all the compared algorithms. For all experiments, the common parameter settings for all algorithms are set as follows: The number of individuals used in the search process is set to 30, the number of iterations is set to 1000, and the maximum number of function evaluations (NFEs) for all benchmark functions is set to \(d\times 10^3\), where d represents the dimension of the test functions. For each function, the executed runs for each algorithm are repeated 30 times independently to obtain the statistical results. The stop condition for all algorithms was set to the maximum number of iterations. The average fitness (Ave) and standard deviation (Std) values were computed over thirty independent runs to explore the accuracy and stability of the proposed algorithm compared to others. The best scores are shown in bold throughout this paper.

7.2 Classical benchmark test functions

Twenty-three widely used benchmark test functions were used to evaluate the overall performance of AFT and to compare it to other optimization algorithms. These test functions were carried out under minimization problems which can be categorized into three main kinds: unimodal [99], multimodal [96] and fixed-dimension multimodal [99]. "Appendix A" presents the mathematical description of these categories. The characteristics of the unimodal (\(\hbox {F}_1\)\(\hbox {F}_7\)), multimodal (\(\hbox {F}_8\)\(\hbox {F}_{{13}}\)) and fixed-dimension multimodal (\(\hbox {F}_{{14}}\)\(\hbox {F}_{{23}}\)) functions are detailed in Table 33. These three groups of test functions are widely accepted in the literature as benchmark evaluation functions due to:

  • The functions in the first group (i.e., unimodal functions) have only one global optimum with no local optimum. This group is usually used to assess the convergence behavior as well as the exploitation power of any new or enhanced optimization algorithm.

  • The functions in the second and third sets (i.e., multimodal and fixed-dimension multimodal functions) have several local optimum and more than one global optimum. These functions are effective in examining the ability of AFT to avoid the local optimums and in evaluating its exploration feature.

The average processing times elapsed by AFT and other comparative algorithms in optimizing the classical benchmark test functions are given in Table 2 and graphically illustrated in Fig. 8.

Table 2 Average running time of AFT and other meta-heuristic algorithms
Fig. 8
figure 8

Average running time of the proposed algorithm and other optimization algorithms

It can be seen that AFT outperforms other algorithms as it takes less processing time than other algorithms. Therefore, it can be deduced that the computational efficacy of AFT is much better than other competitors.

7.2.1 Performance of AFT in unimodal functions

As discussed above, the unimodal functions (\(\hbox {F}_1\)\(\hbox {F}_7\)) are beneficial to assess the exploitation capability of optimization algorithms. The average fitness and standard deviation results of AFT and other algorithms are displayed in Table 3. These values are recorded after running the experiments 30 times for each algorithm.

Table 3 Results of the AFT algorithm and other meta-heuristics in unimodal benchmark test functions

It can be seen from Table 3 that the performance of AFT is very efficient compared to other competitors. In particular, the AFT algorithm scored the best scores for functions \(\hbox {F}_4\), \(\hbox {F}_5\) and \(\hbox {F}_6\). Also, it achieved the best score equally with SHO algorithm for functions \(\hbox {F}_1\), \(\hbox {F}_2\) and \(\hbox {F}_3\). From an engineering point of view, it does not matter whether the algorithm finds the optimal result such as 10E-05 or 10E-25; both are considered zero. It is clear that the SHO algorithm is a competitive algorithm such that it got the best scores for functions \(\hbox {F}_1\) and \(\hbox {F}_3\). Although SHO got the best results for functions \(\hbox {F}_1\) and \(\hbox {F}_3\), the difference between it and AFT is of a very small value. The SOA algorithm obtained the best results for function \(\hbox {F}_7\). Once again, the proposed AFT algorithm got the best standard deviation values for functions \(\hbox {F}_4\)\(\hbox {F}_6\). These outcomes clearly indicate that the AFT algorithm is robust, reliable and highly effective compared to other widely used algorithms in the literature. The AFT algorithm achieved either the best score or competitive results in functions \(\hbox {F}_1\)\(\hbox {F}_7\). The very small values of STD of the AFT algorithm reveal the high level of stability that this algorithm has.

7.2.2 Performance of AFT in multimodal functions

In order to evaluate the capabilities of optimization algorithms for local optimum avoidance and exploration potency, the multimodal (\(\hbox {F}_8\)\(\hbox {F}_{{13}}\)) and fixed-dimension multimodal (\(\hbox {F}_{{14}}\)-\(\hbox {F}_{{23}}\)–) are employed in the literature as benchmark functions for evaluating algorithms for this purpose. The outcomes are given in Tables 4 and 5 to show the Ave and Std, over 30 independent runs, for all the compared algorithms in multimodal and fixed-dimension multimodal functions, respectively.

Table 4 Results of the AFT algorithm and other meta-heuristic algorithms in multimodal benchmark test functions

The results shown in Table 4 point out that the performance of AFT is also very efficient when employed to solve multimodal problems. Particularly, the AFT algorithm scored the best scores for functions \(\hbox {F}_{{8}}\), \(\hbox {F}_{{10}}\), \(\hbox {F}_{{12}}\) and \(\hbox {F}_{{13}}\). GSA got the second best score for function \(\hbox {F}_{{13}}\), and SHO got the best scores for functions \(\hbox {F}_9\) and \(\hbox {F}_{{11}}\), where the results of SHO in these functions are nearly close to the results released by the AFT algorithm. These outcomes once again confirm the reliability and stability of AFT since it has very small values for Std.

Table 5 Results of the AFT algorithm and other meta-heuristic algorithms in fixed-dimension multimodal functions

The results given in Table 5 confirm the superiority of AFT, which gained the best mean fitness values, either individually or equally with other algorithms. The CSA was competitive and got the best average fitness equally with the proposed AFT algorithm in \(\hbox {F}_{{14}}\), \(\hbox {F}_{{16}}\), \(\hbox {F}_{{17}}\) and \(\hbox {F}_{{18}}\) test functions. Although the AFT algorithm did not achieve the best average fitness values for \(\hbox {F}_{{19}}\) and \(\hbox {F}_{{20}}\), its results are very comparable to those algorithms that obtained the best fitness in these two cases (i.e., GSA and SHO). The outcomes in Table 5 indicate that the average fitness values of AFT are better than other algorithms on most of the test functions. With regard to the values of Std, AFT performed better than other algorithms in six out of ten test functions (\(\hbox {F}_{{14}}\), \(\hbox {F}_{{17}}\) to \(\hbox {F}_{{21}}\)). This supports the fact that we previously concluded that AFT has a high degree of stability when applied to different search spaces. The results in Tables 4 and 5 indicate that AFT is in the first rank, in terms of the average fitness value, in twelve out of sixteen test functions (i.e., \(\hbox {F}_8\), \(\hbox {F}_{{10}}\), \(\hbox {F}_{{12}}\) to \(\hbox {F}_{{18}}\) and \(\hbox {F}_{{21}}\)\(\hbox {F}_{{23}}\)). This reinforces that AFT has good exploration ability when it is employed in these search problems.

7.3 Performance of self-adaptive AFT algorithm

The amendment value of the self-adaptive tracking distance of AFT algorithm, or referred to as SAFT, is computed using the ratio of the current tracking distance and perception potential constants of the search agents over the course of iterations. The SAFT algorithm has two main parameters, referred to as \(\theta\) and \(\rho\), that influence the search performance. These parameters are the adjustment frequency and opportunity of the tracking distance constant, respectively. Small values for \(\theta\) and \(\rho\) may cause the search agent to frequently move, which implies that the search agent will have a difficulty in converging. On the other hand, large values for \(\theta\) and \(\rho\) make the search agent to lose the globally optimum area, and converge to a locally optimum area that leads to early convergence. To locate the best settings for \(\theta\) and \(\rho\), several experiments were performed on some of the unimodal functions with different values for these parameters, where \(\theta \in \left\{ 1, 2, 3\right\}\) and \(\rho \in \left\{ 0.2,0.5,0.8\right\}\). The results obtained based on these parameter settings after running the experiments 30 times are shown in Table 6.

Table 6 Results of SAFT for different values of \(\theta\) and \(\rho\) on some unimodal benchmark functions

It is evident from the results shown in Table 6 that the parameter settings \(\theta = 3\) and \(\rho = 0.2\) are the best settings.

7.4 Performance of hierarchical AFT algorithm

To illustrate the performance of the hierarchical AFT (HAFT) algorithm, it is tested on 23 benchmark functions with various dimensions. The parameter settings of HAFT are given as follows: \(T_0 = 1\), \(L = 100\), \(n= 30\) and \(T=1000\). The experimental results obtained by HAFT which consist of mean and standard deviation values are given in Table 7.

Table 7 Experimental results of HAFT algorithm in standard benchmark test functions

Table 7 presents that HAFT can effectively and efficiently find optimal solutions for many benchmark functions. This indicates that HAFT is efficacious in balancing the search capability of AFT between exploration and exploitation in order to boost its accuracy from the onset to the end of the optimization process. A comparison between HAFT in Table 7 and AFT in Tables 34 and 5 shows that HAFT has better performance.

7.5 Evaluation of the balance of exploitation and exploration of AFT

The multimodal function with fixed dimension \(\hbox {F}_{{16}}\) shown in Eq. 49 is used as an example to illustrate the evaluation of the balance response of AFT.

$$\begin{aligned} F_{16}(x_1, x_2) =4x_{1}^{2}-2.1x_{1}^{4}+\frac{1}{3}x_{1}^{6}+x_{1}x_{2}-4x_{2}^{2}+4x_{2}^{4} \end{aligned}$$
(49)

In Eq. 49, the range of \(x_1\) and \(x_2\) is set to: \(-5 \le x_i \le 5\) with the dimension set to 2. Figure 9 shows the performance behavior yielded by AFT in the function \(\hbox {F}_{{16}}\), over 500 iterations, in terms of evaluating the balance given by Eqs. 24 and 25.

Fig. 9
figure 9

Performance of AFT during 500 iterations which describes the balance evaluation given by Eqs. 24 and 25

In Fig. 9, five points (1), (2), (3), (4) and (5) have been chosen to represent the diversity of solutions and the balance assessments of each of them. Point (1) represents a premature stage of the AFT algorithm where the balance evaluation values of XPL% and XPT% are 90 and 10, respectively. With these percentage, AFT works with a clear direction to explore the search space. With this effect, it can be inferred that the solutions preserve a high dispersion of the search space. Point (2) correlates with 70 iterations, where at this position, the balance evaluation conserves a value of XPL% = 70 in conjunction with XPT% = 30. In this position, AFT fundamentally conducts exploration with a low degree of exploitation. Points (3) and (4) correspond to 75 and 100 iterations, respectively, where the balance assessments have exploration and exploitation values of XPL%= 25, XPT% = 75, XPL% = 05 and XPT% = 95, respectively. At these percentage, the behavior of AFT was flipped to promote more exploitation than exploration. Under these configurations, the solutions are spread out in several bunches which reduces the overall diversity. Finally, point (5) implements the last juncture of the AFT algorithm. In such a situation, the AFT algorithm sustains a perspicuous trend to the exploitation of the top found solutions without taking into account any exploration strategy.

7.6 Chaotic maps for AFT

This work integrates four chaotic maps into one of the components of AFT. This is to further study and investigate the effectiveness of chaos theory in improving exploration and/or exploitation of AFT. Chaotic maps are applied to define the selection probability of the defined cases of AFT which is directed to promote its performance degree. The four chaotic maps selected in this study are listed in Table 8.

Table 8 Four different types of chaotic maps used to improve AFT

The set of chaotic maps shown in Table 8 was selected with different behaviors with the initial point set to 0.7 for each of them. As mentioned earlier, chaotic maps are used to manipulate the selection process of the defined three cases of the AFT algorithm, where this process was defined by a probability of rand. Here, chaotic maps were used to provide chaotic behaviors for this probability. The final value from the chaotic map should lie within the range of [0, 1]. The proposed Chaotic AFT (CAFT) algorithm is benchmarked on 23 benchmark test functions with chaotic iterative (I), circle (C), logistic (L) and piecewise (P) selection operators. The results are averaged over 30 independent runs. The mean and standard deviation results of the best solutions found at the last iteration are reflected in Table 9.

Table 9 Results of the CAFT algorithm on \(\hbox {F}_1\)\(\hbox {F}_{{23}}\) with several chaotic selection operators

Considering the results of Table 9, it can be said that chaotic maps improve the performance of AFT not only in terms of exploitation but also exploration. These results are superior to the corresponding ones of the standard AFT as shown in Tables 34 and 5.

7.7 Analysis of the number of clusters of AFT

To investigate the degree of population interaction of the AFT algorithm using the PIN method reported in [83, 84], it is tested with different numbers of clusters c in the optimization of a group of 23 benchmark functions. The parameter c is set as follows: \(c \in {3, 7, 11}\). The other parameters of AFT are remained the same during each experiment. The AFT algorithm is run 30 times for each function, and the mean and standard deviation are obtained and displayed in Table 10.

Table 10 The results obtained by the AFT algorithm with different number of clusters (c)

It can be found from Table 10 that the effect of population interaction is reasonably positive and can significantly augment the performance of AFT in optimization. The performance of AFT is the best at \(c = 3\) according to the mean rank while it is not seriously different from other values of c. According to Tables 10 as well as 3, 4 and 5, it is evident that AFT with the use of PIN is somewhat outperforming the standard AFT in optimization.

7.8 Qualitative analysis of AFT

The qualitative results, including search landscapes, convergence curves, average fitness curves in logarithmic shapes, search history and trajectory of the first individual, associated with the AFT algorithm in solving a selected set of test functions, are shown, for up to 200 iterations, in Fig. 10.

Fig. 10
figure 10

Qualitative results of AFT for test functions \(\hbox {F}_{{1}}\), \(\hbox {F}_{{3}}\), \(\hbox {F}_{{5}}\), \(\hbox {F}_{{11}}\), \(\hbox {F}_{{13}}\) and \(\hbox {F}_{{16}}\): Search landscapes, convergence curves, average fitness curves of all thieves, search histories and trajectories in the first dimension of the first thief

The qualitative metric measures used to plot the curves in Fig. 10 can be interpreted as follows:

  • Convergence curves these curves are used to describe how well AFT converges toward the optimum global, which demonstrates its ability in fulfilling exploration and exploitation phases. They show the best solutions found after each iteration. It is observed that the fitness value of the convergence curves gets better after each iteration. The convergence curves of the unimodal functions \(\hbox {F}_1\), \(\hbox {F}_3\) and \(\hbox {F}_5\) have some slight differences with those of the multimodal functions \(\hbox {F}_{{11}}\) and \(\hbox {F}_{{13}}\) and the fixed-dimension multimodal function \(\hbox {F}_{{16}}\). This is mainly ascribed to the fact that unimodal functions have only one global optimum with no local optimum. Broadly, AFT provides a fast convergence response for \(\hbox {F}_1\) and \(\hbox {F}_{{3}}\), has a sensible convergence response for \(\hbox {F}_{{5}}\) and \(\hbox {F}_{{13}}\), and shows the optimal convergence for \(\hbox {F}_{{11}}\) and \(\hbox {F}_{{16}}\). As shown in the convergence curves, the AFT algorithm first explores the search space, and most of the thieves move toward the optimal areas of the search space in the first 50 iterations.

  • Average fitness curves these curves characterize the average fitness values of the thieves at the end of each iteration. They indicate that, as iterations proceed, the average fitness values decline. This underscores that AFT not only improves the global best fitness of one thief, but also improves the fitness of all thieves.

  • Search history this drawing shows the search history of the thieves during their search for the global optimum. It can be observed that the sample point of the unimodal functions \(\hbox {F}_1\), \(\hbox {F}_3\) and \(\hbox {F}_5\) are distributed in promising areas. In contrast, some of the sample points of the multimodal and fixed-dimension multimodal functions, \(\hbox {F}_{{11}}\), \(\hbox {F}_{{13}}\) and \(\hbox {F}_{{16}}\), respectively, are marginally spread around unpromising areas. This is related to the degree of difficulty of these test functions. As the search history drawings demonstrate, the sample points are roughly scattered around the correct optimum solution, and thus, this constitutes a visual evidence of the ability of AFT in exploitation. This means that the thieves quickly explore the whole search space at first and gradually move toward the global optimum.

  • Trajectory this plot depicts the thieves’ path in the first dimension. Displaying the trajectory of the thieves across all dimensions complicates the analysis process; thus, we have stored and illustrated the search path of the thieves in one dimension. This search plot presents the exploitation and exploration capacities of AFT. Figure 10 shows that the thieves go through abrupt fluctuations in the initial phases, progressive variations after the initial phases, and a steady situation in the final stages. This confirms the ability of AFT in exploring the search space at the initial stages of the search and exploiting the global optimum areas in the consequent stages.

Overall, the metric measures presented in Fig. 10 show that AFT is able to efficiently explore the search space, exploit each promising area, avoid local optimal solutions, achieve high levels of accuracy and reasonably converge toward optimality.

7.9 Performance of AFT in CEC-2017 benchmark

With the aim of further challenging the proposed AFT algorithm, a challenging and a recent benchmark test suite, IEEE CEC-2017, was used. This test suite composes of 30 functions with varying difficulty levels. It should also be pointed out that due to the unstable behavior of the C-2017-f2 function, particularly for higher dimensions, it has been set aside from this test suite [107]. Thus, out of the 29 benchmark test functions, there are 2 unimodal functions (C-2017-f1 and C-2017-f3), 7 simple multimodal functions (C-2017-f4 - C-2017-f10), 10 hybrid functions (C-2017-f11 - C-2017-f20) and 10 composition functions (C-2017-f21 - C-2017-f30). These benchmark test functions include many properties of real-world problems, and thus, in the literature, many algorithms were examined on these test problems. The performance of AFT in this test suite is compared with a set of well-known comparative algorithms previously reported promising performance in the literature. The parameter settings for each optimization algorithm are discussed in Subsect. 7.1 and listed in Table 1.

Table 11 shows the optimization results of AFT and different methods in this test suite. The results of other algorithms were taken from reference [47].

Table 11 Optimization results of the AFT algorithm and other algorithms in IEEE CEC-2017 test functions

The results in Table 11 bear out the ascendancy of the AFT algorithm over others in optimizing very challenging optimization functions. The AFT algorithm again reported the best average fitness values in twenty-eight out of twenty-nine benchmark test functions. With regard to the Std values, AFT performed significantly better than other algorithms in twenty-one out of twenty-nine test functions. This sustains the fact that we formerly inferred that AFT has a high degree of stability when applied to various search domains and test beds. These findings indicate that AFT ranks first in terms of the powerful in exploration and exploitation capabilities, which then turns to obtain high performance in both of the accuracy and standard deviation values. This achievement is a further evidence of the ability of AFT to excel well-studied meta-heuristic algorithms while achieving highly competitive results with high performance methods in complex test functions.

7.10 Performance of AFT in CEC-C06 2019 benchmark

A set of ten benchmark functions, named IEEE CEC-C06 2019, have been developed by Professor Suganthan and his colleagues [108] for use in evaluating single-objective optimization problems. Table 35 in "Appendix C" describes this group of test functions. These benchmark functions are designed to be scalable. The functions CEC04 - CEC10 are shifted and rotated, while CEC01 - CEC03 are neither shifted nor rotated. Table 1 shows the parameter settings of AFT and other optimization algorithms, and a description of these parameter settings is presented in Subsect. 7.1.

The average fitness and standard deviation values were computed to compare between the accuracy of AFT and other algorithms. The results of DA, WOA, and SSA were taken from [109]. Table 12 displays the Ave and Std values for the AFT algorithm and other algorithms when used to search for the global optimum in the CEC-C06 2019 benchmark test functions.

Table 12 Results of AFT and other meta-heuristic algorithms in IEEE ECE-C06 2019 benchmark test functions

The outcomes presented in Table 12 show that the AFT algorithm ranked first in six out of ten functions (i.e., CEC01, CEC05 and CEC07-CEC10) in terms of the average fitness. The MFO algorithm is in the second place since it achieved the best Ave values in the CEC02 and CEC03 functions, while the MVO algorithm got the best Ave in the CEC04 function. Nevertheless, the results obtained by AFT in CEC02, CEC03 and CEC04 are comparable. In terms of measuring the stability of AFT, the value of Std was calculated and compared with that of other competitive algorithms. The AFT algorithm gained the least values of Std in seven out of ten CEC-C06 functions. This once again confirms that AFT has the most stable results when compared to other algorithms.

7.11 Statistical test analysis

The average fitness and standard deviation of the results tabulated in Tables in 34, 511 and 12 presented a general view of the performance of AFT and to what extent AFT was stable during the 30 independent runs. The qualitative analysis presented in Sect. 7.8 clarified the exploitation and exploration skills that the AFT has, but did not interpret to what degree. This subsection presents statistical analysis using Friedman’s and Holm’s tests [110] to exhibit the statistical significance of the results in Tables 34, 511 and 12 and do not statistically deviate from the results of other competitors.

The significance level (\(\alpha\)) in Friedman’s test is set to 0.05 in the experiments shown below. In case if the p value calculated by Friedman’s test is equal or less than \(\alpha\), a null hypothesis is rejected which indicates that there are significant differences between the performance of the evaluated algorithms. In this study, Friedman’s test is followed by Holm’s method, as a post hoc test method, to counteract the problem of various comparisons. The lowest ranked method by Friedman’s test will be used as a control method for post hoc analysis.

7.11.1 Statistical test on functions \(\hbox {F}_1\)\(\hbox {F}_7\)

Table 13 shows a summary of the ranking results generated by Friedman’s test when applied to the results presented in Table 3.

Table 13 A summary of the results generated by Friedman’s test on the average results of unimodal test functions in Table 3

The p value given by Friedman’s test on the average results of unimodal functions is 4.280198E-05. Thus, the null hypothesis of equivalent accuracy is rejected, confirming the existence of a statistically significant difference between the comparative algorithms. It can be observed from Table 13 that AFT is statistically significant, and it is the best one among all other algorithms. As shown in Table 13, the proposed AFT algorithm achieved the best statistical score on the set of unimodal test functions studied in this work, followed sequentially by SOA, SHO, CMA-ES, GWO, ACO, CSA, PSO, GA, DE, GSA and MFO in the last rank.

After the application of Friedman’s test, Holm’s test method was used to decide if there were statistically significant differences between the AFT algorithm and the others. Table 14 displays statistical outcomes generated by Holm’s method when it is applied to the results shown in Table 13. In Table 14, \(R_0\) represents Friedman’s-rank given to the control method (i.e., AFT), \(R^i\) represents Friedman’s rank given to algorithm i, z represents the statistical difference between two algorithms, and finally ES is the effect size of AFT on algorithm i.

Table 14 Results of Holm’s method according to Friedman’s results in Table 13 with \(\alpha =0.05\)

Holm’s method in Table 14 rejects the hypotheses that have p-value \(\le 0.00625\). It is apparent from Table 14 that AFT yielded the best statistical results. In addition to that, the p-values presented in Table 14 confirm the reliability of the results generated by AFT in unimodal test functions.

7.11.2 Statistical test on functions \(\hbox {F}_8\)\(\hbox {F}_{{23}}\)

Table 15 presents a summary of the ranking results generated by Friedman’s test when it is applied to the mean results presented in Table 4.

Table 15 The results generated by Friedman’s test on the average results of multimodal functions in Table 4

The p-value generated by Friedman’s test on the average fitness values of multimodal functions is 0.091583. There is a statistically significant difference between the comparative algorithms which means that the null hypothesis is rejected. It is clearly observed from the results shown in Table 15 that AFT is statistically significant and that it achieved the best score among other algorithms. In sum, the ranking results from Friedman’s test, when applied to multimodal functions in Table 4, are AFT in the first place, followed in order by CMA-ES, GWO, SOA, PSO, GA, DE, SHO, ACO, GSA, CSA and MFO at last.

The statistical results generated by Holm’s method on the results presented in Table 15 concerned with to the results of Table 4 are shown in Table16.

Table 16 Results of Holm’s method according to Friedman’s test results in Table 15 with \(\alpha =0.05\)

In Table 16, the hypotheses with p-value \(\le 0.004545\) were rejected by Holm’s method. The results in Table 16 confirm that AFT is statistically superior when it is compared to other competitors.

A summary of the ranking results obtained based on applying Friedman’s statistical test on the average results in Table 5 is displayed in Table 17.

Table 17 A summary of the ranking results obtained based on Friedman’s test on the average results shown in Table 5

In Table 17, the proposed AFT outperformed all other algorithms, with the lowest rank of 2.2 followed in order by CSA, CMA-ES, DE, GSA, ACO, SHO, MFO, GWO, PSO and finally SOA.

The statistical results obtained by applying Holm’s test method to the results presented in Table 15, which are related to the results reported in Table 5, are shown in Table 18.

Table 18 Results of Holm’s method according to Friedman’s test results in Table 17

In Table 18, the hypotheses with p value \(\le 0.008333\) were rejected by Holm’s test method. The results in this table illustrate that AFT is statistically the best one, when it is compared to the others.

7.11.3 Statistical test on IEEE CEC-2017 benchmark

Table 19 displays a summary of the statistical results obtained by Friedman’s test when it is applied to the average results shown in Table 11.

Table 19 Ranking results of the Friedman’s test when applied to the Ave results shown in Table 11

The p value obtained by Friedman’s test to the results of IEEE CEC-2017 test suite is 4.990430E-11. It can be seen from Table 19 that AFT is statistically the best algorithm among all other algorithms. As can be noted from Table 19, AFT has the best statistical score in the IEEE CEC-2017 test suite, followed in order by CMA-ES, EO, PSO, GWO, and GSA. Holm’s test method was then applied to the results in Table 19 to determine if there were statistically significant differences between AFT and other algorithms. The results of this method are presented in Table 20.

Table 20 Results of Holm’s method based on the results reported in Table 19

It is obvious from Table 20 that the AFT algorithm delivered the best outcomes.

7.11.4 Statistical test on CEC-C06 2019 benchmark

Table 21 displays the ranking results revealed by the application of Friedman’s test on the results given in Table 12.

Table 21 The ranking results obtained based on Friedman’s test on the IEEE CEC-C06 2019 benchmark with \(\alpha =0.05\)

The p-value computed by Friedman’s test when it was applied to the average results of CEC-C06 functions in Table 12 is 2.535367E-7. As clearly shown in Table 21, the AFT algorithm is statistically significant as it has the best score among all the other algorithms. The order of the algorithms produced after applying Friedman’s test to this test suite is AFT, CSA, MVO, MFO, SSA, SCA, WOA, and DA. The statistical results obtained after applying Holm’s method on these benchmark functions are given in Table 22.

Table 22 Results of Holm’s method based on the Friedman’s statistical test results displayed in Table 21

As displayed in Table 22, those hypotheses having p-value \(\le 0.016666\) were rejected by Holm’s method. It can be seen from Table 22 that the AFT algorithm is a robust algorithm in optimizing the IEEE CEC-2017 and IEEE CEC-C06 2019 test suites. It is clearly discerned from the statistical findings presented in this subsection that AFT has better exploitation skills than exploration ones. This can be inferred from the results obtained when the AFT algorithm was applied to unimodal functions compared to the results obtained in multimodal, hybrid and composition functions. However, the results showed that this is not a big concern since the exploration in AFT algorithm is plausible due to the updating mechanism that this algorithm follows to explore the search space to a large extent. A small degree of exploration is usually not sufficient to find the global optimum solution since in optimization problems there is always necessity to strike a convenient balance between exploitation and exploration. These impressive results make the AFT algorithm applicable for solving real-world engineering problems as shown below.

8 Real engineering design problems

To further substantiate the robustness of the proposed AFT algorithm, its optimization aptitudes were assessed on five real-world engineering design problems, known as: the welded beam problem, the pressure vessel problem, the tension/compression spring problem, the speed reducer problem and the rolling element bearing problem. What distinguishes these design problems is that they have a host of constraints. Hence, in order to be capable of addressing these design problems, it is crucial for the AFT algorithm to be well prepared with a constraint handling method.

8.1 Constraint handling

The AFT algorithm was disposed to fit with a static penalty approach to be talented to deal with the constraints of the aforementioned engineering design problems while addressing them, as elaborated in the following subsections.

$$\begin{aligned} \zeta (z) = f(z) \pm \left[ \sum _{i=1}^{m} l_i \cdot {\rm max} (0, t_i(z))^\alpha + \sum _{j=1}^{n} o_j \left| U_j(z) \right| ^\beta \right] \end{aligned}$$
(50)

where \(\zeta (z)\) stands for the objective function, \(l_i\) and \(o_j\) define positive penalty constants, \(t_i(z)\) and \(U_j(z)\) are constraint functions. The values of the parameters \(\beta\) and \(\alpha\) were set to 2.0 and 1.0, respectively.

The static penalty approach assigns a penalty value for each unattainable solution. This feature encouraged us to use the static penalty function to handle the constrains in the aforementioned design problems, since it can assist the search agents of AFT to move in the direction of the feasible search space for the given problems to be solved. The number of search agents and number of iterations of AFT in solving the following design problems were set to 30 and 1000, respectively.

8.2 Welded beam design problem

The main goal of this design problem is to lessen the manufacturing cost of the welded beam design displayed in Fig. 11 [111].

Fig. 11
figure 11

A schematic diagram of a welded beam design

The elements of the welded beam architecture shown in Figure 11 are a beam, A and the welding desired to be attached to the piece, B. The constraints of this design problem are the bending stress in the beam (\(\theta\)), buckling load on the bar (\(P_c\)), end deflection of the beam (\(\delta\)) in addition to the shear stress (\(\tau\)). In addressing this problem to achieve the optimality, it is essential to solve for the structural parameters of the welded beam structure. Specifically, the design variables of this problem can be given as follows: thickness of the weld (h), thickness of the bar (b), length of the clamped bar (l) and the height of the bar (t). The variable vector of this problem can be written as follows: \(\mathbf {x} = [x_1, x_2, x_3, x_4]\), where \(x_1\), \(x_2\), \(x_3\) and \(x_4\) represent the values of the variables h, l, t and b, respectively. This cost function required to lessen the cost of designing this problem is formulated in the following form:

Minimize: \(f(\mathbf {x}) = 1.10471x^2_1x_2 +0.04811x_3x_4(14.0+x_2)\)

Subject to the following constraints,

\(g_1(\mathbf {x}) = \tau (\mathbf {x})-\tau _{max}\le 0\)

\(g_2(\mathbf {x}) = \sigma (\mathbf {x})- \sigma _{max} \le 0\)

\(g_3(\mathbf {x}) = x_1 -x_4 \le 0\)

\(g_4(\mathbf {x}) = 1.10471x^2_1 +0.04811x_3x_4(14.0+x_2) -5.0 \le 0\)

\(g_5(\mathbf {x}) = 0.125- x_1 \le 0\)

\(g_6(\mathbf {x}) = \delta (\mathbf {x})- \delta _{max} \le 0\)

\(g_7(\mathbf {x}) = P-P_c(\mathbf {x}) \le 0\)

where the remaining variables of the welded beam design are drawn up as follows:

\(\tau (\mathbf {x})=\sqrt{((\tau ')^2 + (\tau '')^2)+\frac{2\tau '\tau '' x_2}{2R}}, \tau '=\frac{p}{\sqrt{2}x_1x_2}\)

\(\tau ''=\frac{MR}{J}, M=P(L+\frac{x_2}{2}), R=\sqrt{\left(\frac{x_1+x_3}{2}\right)^2+\frac{x_2^2}{4}}\)

\(J=2\left\{ \sqrt{2}x_1x_2\left[ \frac{x_2^2}{12}+(\frac{x_1+x_3}{2})^2\right] \right\} , \sigma (\mathbf {x})=\frac{6PL}{x_4x_3^2}\)

\(\delta (\mathbf {x})=\frac{4PL^3}{Ex_4x_3^3}, P_c(\mathbf {x})=\frac{4.013\sqrt{EGx^2_3x_4^6/36}}{L^2}\left( 1-\frac{x^3}{2L}\sqrt{\frac{E}{4G}}\right)\)

where \(P =6000lb, L =14\)in, \(\delta _{max} = 0.25\)inch, \(E =30*10^6\) psi, \(G = 12*10^6\) psi, \(\delta _{\rm max} = 13600psi\), \(\sigma _{\rm max} = 30000\) psi. The ranges of variables of this design are used as: \(0.1 \le x_i \le 2.0\) when \(i = 1\) and 4 and \(0.1 \le x_i \le 10.0\) when \(i = 2\) and 3.

A comparison of the best solutions obtained from AFT and other optimization algorithms reported in the literature is presented in Table 23.

Table 23 A comparison of the optimal costs attained by AFT and other optimization algorithms for the welded beam problem

Succinctly, the results of Table 23 indicate that the AFT algorithm converges toward the optimal design and offers the best solution among all other competitors. A statistical comparison of AFT with other competitors, after 30 independent runs, in terms of the best score, worst score, average score Ave and standard deviation score Std, is presented in Table 24.

Table 24 Statistical results obtained from AFT and other algorithms for the welded beam design problem

The results of Table 24 expose that AFT again behaves much better than other optimization algorithms in terms of average and Std results. This ascertains the degree of reliability of the AFT algorithm in addressing this design problem.

8.3 Pressure vessel design problem

The pressure vessel design is another well-respected engineering problem that has been extensively considered in optimization [101]. The goal of designing this problem is to reduce the overall cost of formation represented as the material and welding of a cylindrical vessel. This is wrapped at both endings with hemispherical heads. Figure 12 illustrates a description of a schematic diagram of this design.

Fig. 12
figure 12

A structural representation of the cross section of a pressure vessel design

The variables of this design problem can be described as given: The first variable is the inner radius (R), thickness of the shell (\(T_s\)), length of the cylindrical section of the vessel without considering the head (L) and the last variable is the thickness of the head (\(T_h\)). These variables can be outlined by a vector as follows: \(\mathbf {x} = [x_1, x_2, x_3, x_4]\), where the parameters of this vector stand for the values of \(T_s\), \(T_h\), R and L, respectively. The problem of the pressure vessel design can be mathematically defined as follows:

$$\begin{aligned} \text {Minimize}: f(\mathbf {x})= & {} 0.6224x_1x_3x_4 + 1.7781 x_2x^2_3 \\+ & {} 3.1661x^2_1x_4+19.84x^2_1x_3 \end{aligned}$$

This problem undergoes to four constraints, as outlined below,

\(g_1(\mathbf {x}) = -x_1 +0.0193x_3\le 0\)

\(g_2(\mathbf {x}) = -x_2 +0.00954x_3 \le 0\)

\(g_3(\mathbf {x}) = -\pi x^2_3x_4 -\frac{4}{3}\pi x^3_3+1296000 \le 0\)

\(g4(\mathbf {x}) = x_4 -240 \le 0\)

where \(0 \le x1 \le 99\), \(0 \le x_2 \le 99\), \(10 \le x_3 \le 200\) and \(10 \le x_4 \le 200\).

A comparison of the optimal solutions obtained from the proposed AFT algorithm and other optimization algorithms reported in the literature is given in Table 25.

Table 25 A comparison of the results achieved by AFT and other algorithms for the pressure vessel design problem

As distinctly perused from Table 25, the AFT algorithm provided the best design with the minimum cost of about 5885.332773. No other competitor algorithm was capable to achieve this cost.

The statistical results of the AFT algorithm for the pressure vessel design problem compared to others, in terms of the best, average, worst and standard deviation scores after 30 independent runs, are presented in Table 26.

Table 26 Statistical results obtained from the AFT algorithm and other algorithms in solving the pressure vessel design problem

The statistical results in Table 26 assert that the AFT algorithm performs better than all other competitors’ algorithms in terms of the best, Ave and Std values obtained so far.

8.4 Tension–compression spring design problem

The structure of a tension/compression spring design problem is shown in Fig. 13 [112].

Fig. 13
figure 13

A schematic diagram of a tension/compression spring design

In the tension/compression spring design problem, we strive to reduce the weight of this design. This optimization problem is subject to three constraints, which are given as follows: surge frequency, shear stress and minimum deflection. The variables of this design problem are the mean coil diameter (D), wire diameter (d) and the number of active coils (N). These variables can be characterized as: \(\mathbf {x} = [x_1, x_2, x_3]\), with the parameters of \(\mathbf {x}\) represent D, d and N, respectively. The mathematical representation of this problem can be described in the following way:

Minimize: \(f(\mathbf {x}) = (x_3 +2)x_2x^2_1\)

This problem is subject to the following constraints:

\(g_1(\mathbf {x}) = 1-\frac{x^3_2x_3}{71785x^4_1}\le 0\)

\(g_2(\mathbf {x}) =\frac{4x^2_2-x_1x_2}{12566(x_2x^3_1-x^4_1)}+\frac{1}{5108x^2_1}-1 \le 0\)

\(g_3(\mathbf {x}) = 1-\frac{140.45x_1}{x^2_2x_3} \le 0\)

\(g_4(\mathbf {x}) = \frac{x_1+x_2}{1.5} -1\le 0\)

where \(0.05 \le x_1 \le 2.0\), \(0.25 \le x_2 \le 1.3\) and \(2 \le x_3 \le 15.0\).

Table 27 shows a comparison between the AFT algorithm presented in this work and other algorithms reported in the literature.

Table 27 A comparison of the best solutions achieved by AFT and other algorithms for the tension/compression spring design problem

Examining the results of Table 27 in terms of the optimal costs, we can clearly realize that AFT scored the best solution out of all the optimization algorithms for this problem and achieved the best design with a cost of 0.012665, which no other competitor has achieved. A comparison of the statistical results obtained by AFT and other algorithms reported in the literature for this design problem is presented in Table 28.

Table 28 Statistical results of the proposed AFT algorithm and other optimization algorithms for the tension/compression spring design problem

The results in Table 28 divulge that the AFT algorithm once again acts much better in terms of the statistical results than other algorithms.

8.5 Speed reducer design problem

The speed reducer design problem is a challenging benchmark problem due to that this problem comprises of seven variables [113]. The structural description of a speed reducer design is shown in Fig. 14.

Fig. 14
figure 14

A structural design of a speed reducer problem

The prime objective of this problem is to reduce the weight of the speed reducer, which is subject to the following constraints: transverse deflections of the shafts, surface stress, bending stress of the gear teeth and stresses in the shafts [99]. The seven variables of this problem can be given as follows: the number of teeth in the pinion (z), the module of teeth (m), the face width (b), the module of the teeth (m), the diameter of the first shafts (\(d_1\)), the diameter of the second shafts (\(d_2\)), the length of the first shaft between bearings (\(l_1\)) and the last design variable is the length of the second shaft between bearings (\(l_2\)). These variables can be represented as follows: \(\mathbf {x}= [x_1 x_2 x_3 x_4 x_5 x_6 x_7]\). The mathematical description of this design problem can be given as follows:

Minimize: \(f(\mathbf {x}) = 0.7854x_1x_2^2(3.3333x_3^2+14.9334x_3-43.0934) -1.508x_1(x_6^2+x_7^2)+7.4777(x_6^3+x_7^3)+0.7854(x_4x_6^2+x_5x_7^2)\)

This function is subject to eleven constraints, which are described as follows:

\(g_1(\mathbf {x}) = \frac{27}{x_1x_2^2x_3}-1 \le 0\)

\(g_2(\mathbf {x}) = \frac{397.5}{x_1x_2^2x_3^2}-1 \le 0\)

\(g_3(\mathbf {x}) = \frac{1.9 x_4^3}{x_2x_6^4x_3}-1 \le 0\)

\(g_4(\mathbf {x}) = \frac{1.93 x_5^3}{x_2x_7^4x_3}-1 \le 0\)

\(g_5(\mathbf {x}) = \frac{[(745(x_4/x_2x_3))^2+16.9\times 10^6]^{1/2}}{110x_6^3}-1 \le 0\)

\(g_6(\mathbf {x}) = \frac{[(745(x_5/x_2x_3))^2+157.5\times 10^6]^{1/2}}{85x_7^3}-1 \le 0\)

\(g_7(\mathbf {x}) = \frac{x_2x_3}{40}-1 \le 0\)

\(g_8(\mathbf {x}) = \frac{5x_2}{x_1}-1 \le 0\)

\(g_9(\mathbf {x}) = \frac{x_1}{12x_2}-1 \le 0\)

\(g_{10}(\mathbf {x}) = \frac{1.5x_6+1.9}{x_4}-1 \le 0\)

\(g_{11}\mathbf {x}) = \frac{1.1x_7+1.9}{x_5}-1 \le 0\)

where the range of the design variables \(b, m, z, l_1, l_2, d_1\) and \(d_2\) were used as \(2.6\le x_1\le 3.6\), \(0.7\le x_2\le 0.8\), \(17\le x_3\le 28\), \(7.3\le x_4\le 8.3\), \(7.3\le x_5\le 8.3\), \(2.9\le x_6\le 3.9\) and \(5.0\le x_4\le 5.5\), respectively.

Table 29 displays the best designs and optimum costs achieved by the AFT algorithm and other algorithms for the speed reducer design problem.

Table 29 A comparison of the best results obtained by AFT and other algorithms for the speed reducer design problem

Table 29 corroborates that AFT provides the optimal design compared to others with a cost of approximately 2994.471066. The statistical results of AFT and other competitive methods, over 30 independent runs, for the speed reducer design problem are tabulated in Table 30.

Table 30 Statistical results of the AFT algorithm and other optimization algorithms for the speed reducer design problem

According to the statistical results in Table 30, the AFT algorithm found the best results compared to other promising optimization algorithms.

8.6 Rolling Element Bearing Design Problem

The main goal of this design problem is to make the dynamic load carrying power of a rolling element bearing as large as possible. The schematic diagram of this design problem is shown in Fig. 15 [99].

Fig. 15
figure 15

A schematic view of a rolling element bearing

This problem consists of ten decision variables given as follows: ball diameter (\(D_b\)), pitch diameter (\(D_m\)), number of balls (X), inner (\(f_i\)) and outer (\(f_o\)) raceway curvature factors, \(K_{Dmin}\), \(K_{Dmax}\), e, \(\epsilon\) and \(\zeta\). The mathematical representation of this design problem is as follows:

Maximize: \(C_d = f_c X^{2/3}D_b^{1.8}\;\;\;\;\;if\;\;D \le 25.4 {\rm mm}\)

\(C_d=3.647f_cX^{2/3}D_b^{1.4}\;\;\;\;\; if\;\;D>25.4\, {\rm mm}\)

The constraints and \(f_c\) of this design problem are presented as follows:

\(g_1(\mathbf {x}) = \frac{\phi _0}{2sin^{-1}(D_b/D_m)}-X +1 \le 0\)

\(g_2(\mathbf {x}) = 2D_b-K_{Dmin}(D-d) \ge 0\)

\(g_3(\mathbf {x}) = K_{Dmax} (D-d)-2D_b \ge 0\)

\(g_4(\mathbf {x}) = \zeta B_w-D_b \le 0\)

\(g_5(\mathbf {x}) = D_m-0.5(D+d) \ge 0\)

\(g_6(\mathbf {x}) = (0.5+e) (D+d) - D_m \ge 0\)

\(g_7(\mathbf {x}) = 0.5 (D-D_m-D_b)-\epsilon D_b \ge 0\)

\(g_8(\mathbf {x}) = f_i \ge 0.515\)

\(g_9(\mathbf {x}) = f_o \ge 0.515\)

$$\begin{aligned} f_c&= 37.91\biggl [1+\biggl \{1.04\left( \frac{1-\gamma }{1+\gamma }\right) ^{1.72}\left( \frac{f_i(2f_o-1)}{f_o(2f_i-1)} \right) ^{0.41}\biggr \}^{10/3}\biggr ]^{-0.3}\nonumber \\&\quad \times \left[ \frac{\gamma ^{0.3}(1-\gamma )^{1.39}}{(1+\gamma )^{1/3}} \right] \left[ \frac{2f_i}{2f_i-1} \right] ^{0.41}\nonumber \\ x&= \left\{ \frac{(D-d)}{2}-3\frac{T}{4})\right\} ^2 + \left\{ \frac{D}{2}-\frac{T}{4}-D_b\right\} ^2 \nonumber \\ & \quad- \left\{ \frac{d}{2}+\frac{T}{4}\right\} ^2 \nonumber \\ y&= 2\left\{ \frac{(D-d)}{2}-3\frac{T}{4}\right\} \left\{ \frac{D}{2}-\frac{T}{4}-D_b\right\} \end{aligned}$$
(51)

\(\phi _0=2\pi - 2cos^{-1} \left( \frac{x}{y} \right)\)

\(\gamma =\frac{D_b}{D_m}, f_i=\frac{r_i}{D_b}, f_o=\frac{r_o}{D_b}, T=D-d-2D_b\)

\(D=160, d=90, B_w=30, r_i=r_o=11.033\)

\(0.5(D+d) \le D_m \le 0.6(D+d), 0.15(D-d)\le D_b \le 0.45(D-d)\)

\(4 \le X \le 50, 0.515 \le f_i\) and \(f_o \le 0.6\)

\(0.4 \le K_{Dmin} \le 0.5, 0.6 \le K_{Dmax} \le 0.7\)

\(0.3 \le e \le 0.4, 0.02 \le e \le 0.1, 0.6 \le \zeta \le 0.85\)

Table 31 shows a comparison of the best solutions for the rolling element bearing design obtained by AFT and other optimization algorithms.

Table 31 Optimization results of the rolling element bearing design problem achieved by AFT and other algorithms

As per the optimum costs of the rolling element bearing design problem reported in Table 31, the AFT algorithm got the best design with the optimal cost of about 85206.641. The statistical results of AFT and other optimization methods over 30 runs are shown in Table 32.

Table 32 Statistical results obtained from the AFT algorithm and others for the rolling element bearing design problem

It may be observed from Table 32 that the AFT algorithm has once again obtained the best optimal solutions for the rolling element bearing design problem over other algorithms.

In a nutshell, the general performance of the proposed AFT algorithm has corroborated its reliability and efficiency in addressing the above five classical engineering design problems. Therefore, we can deduce that the AFT algorithm is an appropriate and effective optimizer and is definitely a promising candidate for solving real-world contemporary problems.

9 Conclusion and Future Work

This paper has proposed a novel human-based meta-heuristic algorithm called Ali Baba and the forty thieves (AFT) for solving global optimization problems. The performance of the AFT algorithm was benchmarked on three benchmarks of sixty-two basic and challenging test functions taken from the so-called classic benchmark functions, IEEE CEC-2017 benchmark functions and IEEE CEC-C06 2019 benchmark functions. Several developments were conducted for the AFT algorithm from several aspects to innervate its exploration and exploitation aptitudes. Extensive comparisons with many well-studied, new and high-performance algorithms have shown that AFT is highly reliable and effective in getting near-optimal or optimal solutions for most of the test functions studied. In real-world problems, the AFT algorithm was practically applied to solve five engineering design problems as evidence of its reliability and applicability in addressing real-life applications. In future work, a parallel optimization algorithm could be developed by a combination of AFT algorithm and other algorithms as potential researches to further improve its performance level. Further expansions of the AFT algorithm can be developed by adapting, implementing and testing both binary and multi-objective version of this algorithm to solve large-scale real-world problems.