1 Introduction

Constraint optimization is a search process for finding the best solution(s) meeting a set of problem requirements (constraints) and optimizing one or more objectives [1]. The main challenge is to find the optimal solution (assuming one single objective function) in a reasonable amount of time. Optimization algorithms are classified into: exact and approximate. While they guarantee to return the optimal solution, exact algorithms are limited, in practice, by their exponential time cost. For those problems where these methods fail to return an optimal solution within a reasonable time, approximate methods including metaheuristics can be a good alternative. Indeed, metaheuristics do not explore the entire search space, and therefore trade the quality of the solution returned for the running time. Metaheuristics include nature-inspired (NI) which mimic a natural phenomenon from biology, physics, or ethology [2].

In the last decades, there has been a lot of attention devoted to NI algorithms, which are classified into: single-based and population-based solutions. In 1960s, L.J Fogel et al. introduced the basic concept of Evolutionary Programming (EP) [3]. In 1964, Rechenberg et al. introduced the evolutionary strategies (ESs). In 1975, John Holland introduced the basic concept of genetic algorithms (GAs) [4]. In 1983, Kirkpatrick et al. proposed the simulated annealing (SA) algorithm that is inspired by the simulated annealing phenomena [5]. In 1995, Kennedy and Eberhart introduced the particle swarm optimization (PSO) algorithm that mimics the movement of a flock of birds [6]. In 2002, Passino introduced the bacterial foraging (BF) algorithm that is built on the foraging behavior of E. coli bacteria [7]. In 2007, Atashpaz-Gargari introduced the first idea of imperialist competitive algorithm (ICA) that is built on the imperialist competitive idea [8]. In 2013, Cheng et al. introduced the competitive swarm optimizer (CSO) that is built on the same concept of PSO with some diffirences [9]. There are many other NI algorithms that have been proposed and achieved promising results [10]. In the paper, we review just few algorithms that have been implemented in too many different applications and proved their efficacy.

The focus of our paper is on when we can use NI algorithms, and what the main drawbacks of using them. The remaining of this paper is structured as follows. Section 2 presents an overview of optimization algorithms, including exact and approximation methods. Section 3 describes the basic concept of single-solution based NI algorithms, such as SA. Section 4 presents the population-based NI algorithms. Section 4.1 outlines the basic steps of Evolutionary Computation (EC) techniques: GAs, ESs, and EP are discussed as examples of EC algorithms. Section 4.2 presents a special class of NI algorithms called “Swarm Intelligence” (SI). The last sections describe the methodology to assess, in practice, the performance of a NI technique.

2 Overview of Optimization Algorithms

In [11], Archetti et al. classified global optimization algorithms into two main categories: exact (deterministic) and approximate (probabilistic), as shown in the taxonomy presented in Fig. 1. In addition, Fister et al. classified many swarm intelligence and bio-inspired algorithms in a well-organized way [12]. A taxonomy of data-driven metaheuristics has also been reported by El Ghazali in [13].

Fig. 1
figure 1

Optimization algorithms classification

2.1 Exact and Approximate Algorithms

Exact methods are systematic search techniques that explore the entire search space in order to find the optimal solution, such as dynamic programming, backtracking and its variants, branch-and-bound, and constraint programming techniques [14]. These algorithms are deterministic, meaning that they follow the same search path across different runs for solving the same problem instance. While they guarantee to return the optimal solution, in some cases, these methods can be unpractical due to their exponential time cost, especially when dealing with NP-hard problems [15]. Exact algorithms can be used to solve hard optimization problems that have different levels of difficulties, such as size of the problem, hardness (tightness) of the constraints, and nonlinear characteristics. For example, in [16], the traveling salesman problem has been successfully solved for different instances using an exact method. In other words, some large instances might be solved by exact methods in a reasonable time frame, while smaller instances fail to be solved. For many NP-hard problems, there is a region, called the phase transition, that includes the hardest problems to solve, regardless of the problem size. The phase transition can be seen as the borderline between solvable and unsolvable problems [17].

Approximate algorithms are built upon the concept of randomness guided by different rules, without a guarantee of convergence. NI algorithms represent well-known and efficient subclass of approximate algorithms that is inspired by different natural phenomena. NI are considered general-purpose algorithms, and they are suitable for a wide range of problems and problem instances. This class of algorithms have one (single-solution-based algorithms) or more (population-based algorithms) candidate solutions (also called agents). These latter are evaluated and rewarded with fitness values. Candidate solutions change over time depending on the application of operators, following some random parameters. NI techniques are designed following a trade-off between two main strategies: exploration and exploitation [10]. Exploration, also known as diversification, is the process of exploring a wider search space. Exploitation, also known as intensification, is the process of exploiting the best solutions found and intensifying the search locally. Using a good balance between these two strategies will help delivering satisfactory solutions in a reasonable time frame for black-box problems (problems that do not have clear properties) and large-size industrial problem instances. NI have been used in many different industrial applications and has proven both efficiency and efficacy [18, 19]. In addition, this class of algorithms is the corner stone of many machine learning techniques [13]. The main disadvantage is that inconsistency cannot be proven for those overconstrained problems; however, it has the ability to efficiently produce good solutions.

2.1.1 Why and When to Use NI Algorithms

The choice of an appropriate algorithm for a given optimization problem is an open question in the optimization research community. There is no claim that a specific optimization problem can be solved with only one optimization algorithm, but the selection process of the right algorithm to solve a given problem in a reasonable time frame using minimal resources is a difficult and a tedious task. For instance, a discontinuous objective function cannot be effectively solved using a classical gradient-based approach, such as hill-climbing, while the highly nonlinear problems can be solved with NI population-based methods [20]. Similarly, a constraint problem can be solved in polynomial time if its constraint graph representation is a tree [14].

There are several factors should be taken in consideration: the size of the problem (small, medium, and large), the type or structure of the problem (linear or nonlinear, continuous or discrete), the time limit [10] (fast or slow), the desired quality of solutions (exact or approximate), the availability of resources, the easiness of implementing the algorithm [20]. The structure of an optimization problem plays a significant role over the size of the problem, because some easy–medium and large-size problems could be solved by exact algorithms in reasonable time frame. In some cases, a user needs an efficient algorithm to reduce the number of function evaluations (time), because each iteration may take few hours or even weeks as in case of optimizing the mixing index of biomass optimization problem.

The first step to solving an optimization problem is to analyze its time and space complexity, and determine whether the problem can be reduced to a tractable one from the literature. In other words, a user might search in literature for the same or similar problem to the one that the user wants to tackle. The next step is to find the best appropriate optimization algorithm to solve that given problem in a time frame fit the user requirements. User should be careful and follow the recommended tuning parameters from literature to avoid a deteriorated performance of the selected algorithm. Otherwise, a related problem could be found, so that its methodology might be followed. We also want to highlight those portfolio-based algorithm selectors that can be used as guidance for the appropriate choice of the nature-inspired technique depending on the problem instance features and properties [21].

Exact algorithms are the best choice whenever they can be used to solve or assist in solving a given problem within a reasonable time frame, such as some polynomial time problems and some large-scale linear continuous problems. Therefore, it will be an unwise decision to use NI algorithms to solve easy optimization problems, where an exact algorithm is available.

On the other hand, the use of NI algorithms is desirable and can be advantageous for approximate solutions. Using NI algorithms can be the best choice when a simplified model is used or the model parameters are estimated using inexact or limited data [22]. For such problems that are inaccurately represented (due to incomplete/uncertain information about constraints or objectives), using approximate methods can be better than time-consuming exact methods. In addition, NI algorithms could be used when a reliable exact method is unavailable or the available exact method is computationally undesirable (excessive time and or space). In those cases, NI algorithms can sometimes produce a reasonably good solution with minimal time and space requirements. NI algorithms can also be used to enhance the performance of an existing exact method by providing good starting solutions or guiding the search. Furthermore, NI algorithms have been recommended to be used to save time when the same problem is going to be solved frequently, because they are simple to implement and easy to understand. Moreover, they help to gain insight into complex problems, while using minimal resources [23].

3 Single-Solution-Based NI Algorithms

Single-solution-based (SS-based) NI class of algorithms is also known as trajectory methods. Any member of this class starts with a single agent, candidate solution, and improves its fitness value in each generation (iteration). The candidate solution moves through neighborhoods or search trajectories within a predefined search space [24]. The trajectories are produced by an iterative mechanism and move from one position to another in the predefined search space. This class performs two fundamental operations into two phases: generation and replacement. In the following subsection, we discuss the SA algorithm, a popular and well-known single-solution-based NI algorithm.

3.1 Simulated Annealing Algorithm

Annealing is a physical process where a certain alloy of metal, glass, or crystal is heated above its melting point. Then, it is cooled until it is eventually frozen into a perfect crystalline structure. The annealing process can produce high-quality materials. The SA algorithm is a single-solution-based algorithm, and it is also known as Boltzmann annealing. It updates its candidate solution by random ascent that moves to avoid being trapped in a local minimum. It is considered a Monte Carlo algorithm for finding a global minimum for continuous functions. According to Boltzmann distribution, the probability of a physical system (\(P_{\alpha }\)) being at state \(\alpha\) with energy \(E_{\alpha }\) at temperature T is given by:

$$\begin{aligned} P_{\alpha } = \frac{1}{Z}e^{(\frac{-E_{\alpha }}{K_{B}}T)}, \end{aligned}$$
(1)

where \(K_{B}\) is the Boltzmann constant, T is the absolute temperature, and Z is a partial function described by:

$$\begin{aligned} Z = \sum _{\beta } e^{(\frac{-E_{\beta }}{K_{B}}T)}, \end{aligned}$$
(2)

where the summation is over all states \(\beta\) with energy \(E_{\beta }\) at temperature T. The Boltzmann distribution exhibits uniform preference for all states at high T regardless of the energy. However, when T decreases to zero, only states with minimum energy have nonzero probability of occurrence.

Several studies have been conducted to enhance the performance of the SA search algorithm in terms of accuracy and convergence rate. In [25], Harold et al. introduced a variant of SA that replaces the Boltzmann probability density with the Cauchy probability density. In [26], the generalized SA was proposed to generalize both Cauchy annealing [25] and Boltzmann annealing [5]. Microcanonic annealing (MA) is a variant of SA, that is built on the Creutz algorithm rather than the Metropolis algorithm [27]. In [28], Dueck et al. introduced a variant of SA known as threshold accepting (TA) methods. In [28], the results showed that TA outperforms SA in terms of search time.

3.2 Advantages, Limitations, and Applications of SS-based Algorithms

In general, SS-based methods are effective, because they provide sufficient knowledge about their behavior. In case of large instances, SS-based algorithms might perform better than population-based (P-based) algorithms [29]. SS-based algorithms achieved a competitive performance against P-based algorithms on different Traveling Thief Problem (TTP) instances [29]. SA is more suitable for the job shop scheduling problem than other algorithms, such as a tailored heuristic algorithm [30]. SA is the most dominant SS-based NI algorithm that has been used extensively in recent decades. The main advantage of SA is that it is a simple search algorithm to implement, and it is appropriate for solving black-box optimization problems [31, 32]. In addition, SA has a fast convergence compared to exact methods and other heuristic algorithms, and it is suitable to be used with problems that have a large number of local minima [33]. In [34], SA achieved better results than Bayesian algorithms (they are variations on a method developed by H. J. Kushner) in terms of computation time and number of function evaluations in optimizing the results of a computer simulation. In [30], the results showed that SA produced better results than tailored heuristic algorithm in solving the job shop scheduling problem. SA has been used in optimal design problems where many researchers consider SA as a tool in the development process of optimal experimental design [35,36,37,38].

There are some limitations of using SS-based algorithms. They suffer from many function evaluations to reach the global optimum, which increase in search time [32]. The initial temperature of the cooling schedule should be set to an appropriate high value, which is another issue. A low initial temperature may cause the algorithm to get stuck in a local minimum, and a high initial temperature may cause difficulty in reaching the global optimal solution. In addition, the number of iterations at each temperature step should be large enough to exploit each region in the search space [32]. Furthermore, the cooling schedule should be carefully selected, because it may affect the quality of solutions. Users who have limited knowledge about SA may find the selection process of these parameters to be difficult. SA is not suitable for problems that have a limited number of local minima [33]. In SA, the neighborhood operators cannot effectively deal with clustered data when solving the vehicle routing problem. Thus, it is recommended to combine the improved SA with a data clustering method, such as k-means clustering [39]. In [40], it was shown that SA does not guarantee to solve a large instance of graph coloring problem. The authors suggested to split the time interval into several runs and return the minimum value over all runs.

SA was proposed to solve combinatorial optimization applications, and it has been successful in this regard. For instance, in [41], Emden et al. applied the SA algorithm to solve the airline crew pairing problem. In [42], Rosmalina et al. implemented SA along with a heuristic method to solve the railway crew scheduling problem. In [177], Wong et al. successfully implemented SA to solve the layout-routing of electronic circuits problem. In [170], Supatcha et al. implemented the SA algorithm along with a hill-climbing local search method to solve large-scale aircraft trajectory planning. In [39], an improved SA variant was applied to solve the vehicle routing problem with time windows. In [40], Alper et al. applied the SA algorithm to solve the graph coloring problem. In [31, 43], Delahaye et al. implemented the SA algorithm to solve two NP-hard combinatorial optimization problems: the traveling salesman problem and the knapsack problem. SA has been implemented as well to solve continuous optimization problems. In [44], the enhanced simulated annealing (ESA) variant was proposed to solve multimodal functions. In [45], David et al. have adapted SA for solving the quadratic assignment problem. In [46], Alfonsas introduced an adapted version of SA called M-SA-QAP that has an advanced formula for calculating the initial and final temperature, and he proposed an original cooling schedule with oscillation that allows for both decreasing and increasing the temperature.

4 Population-Based NI Algorithms

Population-based NI algorithms use a group of solutions rather than one single solution. This class of algorithms has two main subcategories: EC and SI. In EC algorithms, individuals in the population are updated through recombination and mutation operators. These algorithms are built on Darwin’s evolutionary theory. In SI algorithms, individuals in the population communicate in an intelligent way to explore different regions in the search space rather than using individual cognitive abilities alone. These algorithms are built on collective behavior of an organized group of, e.g., insects, animals, or plants. P-based NI algorithms include many well-known algorithms, such as GAs, ESs, EP, PSO, and BF.

4.1 Evolutionary Computation Algorithms

EC algorithms are also known as evolutionary algorithms (EAs). EC algorithms are built on Darwinian principle of nature’s capability to evolve and adapt to their environment. This class of algorithms includes genetic algorithms (GAs), evolutionary strategies (ESs), evolutionary programming (EP), genetic programming (GP), and so forth. All members of this class have the same idea of simulating the evolution of the candidate solutions using some operators: selection, recombination, mutation, and reproduction. These algorithms have been successfully implemented on different optimization problems, such as data mining and knowledge discovery [47].

4.1.1 Genetic Algorithms

GAs take the basic idea of genetics in order to artificially construct an optimization search algorithm that is robust and can tackle complex black-box problems [48]. In 1975, John Holland introduced the basic concept of GAs [4]. The basic mechanism of GAs involves nothing more than random number generations, string copies, and partial string exchanges [49]. GAs have three basic operators: initialization, selection, and reproduction. In GAs, the initial population of chromosomes (called candidate solutions) is generated randomly in the problem search space and then encoded (as binary or real value). The solution process includes many generations (iterations); and each generation consists of many function evaluations (objective function evaluations) of candidate solutions. The size of the initial population is a controversial question in the literature. One group of researchers suggested using a population that is sufficiently large to enhance the search diversity [49]. However, another group of researchers introduced the term micro-genetic (microGA) that refers to a small initial population with reinitialization [50]. In [51], Krishnakumar conducted the first comparison between the microGA and the basic GA. He concluded that microGA is faster and provides better results when applied to two stationary functions and a real-world engineering problem (a wind-shear controller task) [51].

4.1.1.1 Genetic Algorithms Variants

GAs are adaptable algorithms, so that they have many variants in different applications. A real-coded GA (the encoding is over the real number) can be used for global continuous optimization problems. The crossover operator is the main search operator in the GA. Researchers proposed many crossover operators for the real-coded GAs. In [52], Syswerda introduced the concept of the Uniform Crossover (UX) operator. In [53], Ono et al. introduced a real-coded GA using Unimodal Normal Distribution Crossover (UNDX). In [54], Ono et al. introduced a crossover approach that combines UNDX and UX. The results showed that the proposed crossover approach can solve more different functions than GA using only the UNDX. In [55], Deb et al. introduced a crossover operator called Simulated Binary Crossover (SBX). In [56], Sánchez et al. introduced a hybrid crossover operator that generates multiple descendants from two parents, and the best two offspings will replace the parents in the next generation. In [57], Eshelman et al. introduced the blend crossover (BLX-\(\alpha\)), but it faces some difficulties when it is used to solve non-separable functions. In [58], Takahashi et al. introduced a crossover operator that combines the BLX-\(\alpha\) and the Independent Component Analysis (ICA). Mutation operator has been used to improve the performance of GAs for functions optimization. In [59], Munteanu et al. introduced a mutation operator for real-coded GA called principal component analysis mutation (PCA-mutation). In [60], Korejo et al. proposed the Directed Mutation (DM) operator that allows a GA to explore promising areas in the search space. In [61], the authors introduced a popular non-domination-based genetic algorithm for multi-objective optimization.

4.1.2 Evolution Strategies

ESs are a class of EA introduced in 1964 by Rechenberg and Schewefel [62]. The first ES algorithm, (1+1)-ES (two membered ES), is a simple mutation-selection schema, and it has a population of two individuals. The two membered ES algorithm produces a single offspring using normal, Gaussian, distribution mutation. For the next generation, the best individual is elected by a selection operator. Rechenberg developed later the concept of the multimembered ES, (\(\mu\)+1)-ES, where \(\mu\) is denoted to the number of parents, and they collaborate to generate \(\lambda\) offsprings. In the multimembered ES, the best individuals among offsprings will be the parents of the next generation, while the current parents will be removed. In [63], Schwefel introduced two further variants of multimembered ES: (\(\mu\)+\(\lambda\))-ES and (\(\mu\), \(\lambda\))-ES. In the (\(\mu\)+\(\lambda\))-ES, parents (\(\mu\)) generate offsprings (\(\lambda\)) using recombination and mutation operators. The selection operator then selects the best individuals, equal to the number of parents, among the parents and offsprings (\(\mu\)+\(\lambda\)) and discards the rest. In (\(\mu\), \(\lambda\))-ES, the number of generated offsprings is greater than the number of parents. Then, the best individuals, equal to the number of parents, is selected from the offsprings (\(\lambda\)), and the parents of the offsprings are discarded no matter how good or bad their fitness value. In most recent variants of ES, the population of size \(\mu\) is used, and an additional operator called recombination (\(\rho\)) is implemented. There are two other variants were built on the concept of that recombination operator: (\(\mu\)/\(\rho\)+\(\lambda\))-ES and (\(\mu\)/\(\rho\), \(\lambda\))-ES

4.1.3 Evolutionary Programming (EP)

EP is a stochastic optimization search approach that belongs to the EAs family. In [3], in 1960s, L.J Fogel et al. introduced the basic idea of EP to artificial intelligence. In [64], in 1990s, David B Fogel introduced again the EP concept to solve many problems, such as numerical and combinatorial optimization and machine learning [65,66,67]. Mutation is the main operator in EP, while crossover operator is the main operator in GAs. The EP algorithm has two major operators: mutation and selection, but it does not have any kind of recombination operators. Mutation generates offsprings, while selection selects the best individuals among parents and offsprings for the next generation. In EP, the initial population is randomly generated based on a density function, and each individual is evaluated using an objective function. Then, mutation is implemented for generating new offsprings. The mutation operator is implemented by perturbing each parent in the population . This is done by adding a random number of specific distribution, e.g., normal distribution. In [173], David B Fogel et al. introduced the meta-evolutionary programming (meta-EP) idea. Meta-EP has the capability to discover the appropriate degree of perturbation for a given problem that makes the meta-EP has a self-adaptation of variances for the mutation operator. The self-adaptive EP is widely used for continuous parameter optimization problems. The first non-Gaussian mutation was introduced in the mid-1990s by Yao. In [68], Yao et al. introduced an EP variant called fast EP (FEP), which uses Cauchy instead of Gaussian mutation operator as the primary search operator. Cauchy probability distribution has a much longer tail; therefore, the offsprings could be totally different from their parents. In [69], Yao et al. later introduced an improved FEP (IFEP) variant. In [70], Lee et al. introduced a generalized version of FEP by using Lévy probability distribution, which is a general case of Cauchy probability distribution.

4.1.4 Advantages, Limitations, and Applications of EC Algorithms

EC algorithms receive much attentions for their advantages and capability in solving hard real-world optimization problems in different fields of science. They have been applied to problems that could not be solved using heuristic algorithms. In [71], David B Fogel summarized the main advantages of EC algorithms. The main advantage is that they are conceptually simple, and they can be applied to any optimization problem. EC algorithms can be applied using any representation; therefore, the same procedure can be used for discrete and continuous optimization problems. In addition, EC algorithms outperform classical methods on real-world problems, and they have significant advantage over classical methods in solving mutlimodal functions. Furthermore, EC algorithms provide a methodological framework that is usable as it is or can be combined with other optimization method. They can be used as appropriate methods when problems have: dynamic situations, i.e., the goal or constraint changes over time [72], parameter adjustments, rough or discontinuous landscape, and disturbed fitness measurements [73]. EC algorithms are a highly parallel process, and they have the capability to optimize their parameters as a part of the search for optimal solutions. Perhaps the greatest advantage is that EC algorithms have the ability to address problem where human experts do not exist. The EC community has been criticized for considering uniform standard instances, which are much easier than real-world applications. However, in [74], Dimopoulos et al. introduced a significant work to present the contribution of EC algorithms towards realistic problems taken from manufacturing plants. In theory, the effectiveness of an EC algorithm depends on the relationship between crossover and mutation as applied to a chosen representation [71]. They can be combined with other classical or heuristic algorithms [32]. The GA achieves much better solution than exact approaches in solving pipe optimization problem in terms of speed and quality of solutions [75]. In [76], Chu et al. introduced a GA to solve the generalized assignment problem. In [77], Alba et al. introduced a good survey to prove that parallel GAs enhances the computation over regular GAs, and helps in producing better solution. In [78], the authors implemented adaptive genetic algorithm along with fuzzy logic to propose a classifier to diagnosing heart disease. In [79], the authors implemented the genetic algorithm to solve supercapacitor charging problem.

However, EC algorithms have some limitations as well. EC algorithms should not be used to solve a given problem where there is a traditional method that can solve it, because EC algorithms cannot be better with less computational effort [73]. In [80], Back concluded that EC algorithms are not appropriate methods to solve strongly convex problems. In [81], Moslemipour et al. summarized some drawbacks about GAs. They may find a suboptimal solution, and they are not guaranteed to reach the global solutions. The mutation and crossover rates affect the stability of the algorithm, and they have difficult encoding schema. In [82], Leung et al. concluded that GAs suffer from premature convergence. The authors suggested that increasing the population size plays an important role to help overcome this problem. The effectiveness of GAs depends on the population size and crossover and mutation rates; therefore, these parameters should be selected carefully [33]. For instance, increasing the population size or the number of generation will lead to an increase in search time. In addition, the formulation of the fitness function is not an easy process. In [83], a binary-coded GA presents unsatisfactory results in solving multimodal functions with respect to real-coded differential evolution (DE). DE may offer easier convergence by increasing the number of parents and reducing the scaling factor; however, DE, may then suffer from high computational time. The basic EP algorithm has a slow conversion in solving some multimodal optimization problems [69]. However, in [84], EP outperforms GAs in solving constraint optimization problems. ESs and EP share common features, but EP does not have a recombination operator. ES algorithms were developed to solve continuous optimization problems. They have the flexibility to develop a new robust method for a given problem [73]. In [85], Swayamsiddha et al. concluded that the differential evolution algorithm provided the best results for solving a nonlinear system identification compared to GA and PSO. The differential evolution algorithm outperforms GA in solving a suit of benchmarks in terms of number of function evaluation [86]. In [87], the Bayesian optimization algorithm achieved better performance than a complex multi-population GA in tackling Nurse Scheduling Problem (NSP).

EC algorithms have been extensively used in many applications. In routing problems, the traveling salesman problem [88] is one of the most well-known combinatorial optimization problems that has been solved using EC algorithms [89]. In scheduling problems, the job shop scheduling problem is an NP-complete problem that has been solved using EC algorithms [90]. In packing problems, a design of layout for integrated circuits is a well-known example [91]. EC algorithms have been implemented in different design applications, such as finite impulse response (FIR) [92, 93], infinite impulse response (IIR) [94, 95], signal processing [96, 172], integrated circuit design [96, 176, 180], artificial neural networks [175, 178, 179], telecommunication [168, 171], and engineering applications [96, 97]. GAs have been implemented in different applications in mechanical engineering: material science and manufacturing [98]. EC algorithms have been used in system identification and simulation applications [99]. System identification is used for model structure selection and parameter estimation, while system simulation process is to determine how the system will behave. In [100], Abd Samad introduced a good survey about the usage of EC algorithms in the field of system identification in the last 40 years. In addition, EC algorithms have been implemented for critical applications: on-line and off-line control system engineers [101]. In [102], Fleming et al. introduced a survey of using EC algorithms in control system, and they concluded the importance of EC algorithms in control system applications. Furthermore, EC algorithms have been used extensively in data mining [103], image processing [104, 105], and classification [106].

4.2 Swarm Intelligence Algorithms

SI algorithms are inspired by the collective behavior of species, such as fish, birds, ants, wasps, bees, termites, and bacteria [107]. The SI theory is built on the social behavior of those species that compete to obtain the best source of food. The SI Population consists of particles (agents) that cooperate by an indirect communication medium to improve their fitness in the search space. PSO is the most dominant SI algorithm, and there are thousands of papers about its variants and applications in different fields. We give few popular examples of SI algorithms: PSO, BF, and ICA.

4.2.1 The Particle Swarm Optimization Algorithm

In 1995, Eberhart and Kennedy introduced the first idea of particle swarm optimization [6, 108, 109]. PSO mimics the movement of a flock of birds. Each bird in the flock is associated to a particle (candidate solution). The position of each particle in the search space is updated based on the previous best position of the particle itself (local position) and the best position of the entire flock (global position). The PSO algorithm updates the position of each particle using the following equation [108]:

$$\begin{aligned} x_{id}^{k+1} = x_{id}^{k} + v_{id}^{k+1}, \end{aligned}$$
(3)

where \(x_{id}\) is the position of a particle i, the superscript k denotes the iteration rank, and \(v_{id}\) is the velocity of the particle i. The velocity of the particle i is updated using the following equation:

$$\begin{aligned} v_{id}^{k+1} = v_{id}^{k} + c_{1} \times r_{1} [P_{id}^{k} - x_{id}^{k}] + c_{2} \times r_{2} [P_{gd}^{k} - x_{id}^{k}], \end{aligned}$$
(4)

where the \(v_{id}^{k}\) is the previous velocity of the particle i that provides the necessary momentum for moving around the search space. The constants \(c_{1}\) and \(c_{2}\) are also known as the acceleration coefficients, and \(r_{1}\) and \(r_{2}\) are uniform distribution random numbers in range [0,1]. \(P_{id}^{k}\) is the local best position for the particle i at iteration k, and \(P_{gd}^{k}\) is the global best position at iteration k.

PSO has too many variants that have been introduced and marked the history of PSO over the last three decades. In [109, 110], Shi and Eberhart proposed a variant of PSO called inertia weight PSO that has an extra parameter called inertia weight (\(\omega\)). In [110], Shi and Eberhart proposed a variant called PSO time varying inertia weight (PSO-TVIW) in which the inertia weight decreases along with time. In [111], Zheng and Ma proposed another PSO variant that increases the inertia weight value during the course of the run. In [109], Shi et al. reported that large inertia weight values enhance the global search, while small inertia weight values enhance the local search. In [112], Clerc and Kennedy introduced the canonical PSO variant, which has a constriction factor that helps in controlling the convergence properties of the particles. In [113], Mendes and Kennedy proposed another variant of PSO called Fully Informed Particle Swarm (FIPS). In [114], Ratnaweera and Halgamuge proposed another variant of the PSO algorithm called PSO time varying acceleration coefficient (PSO-TVAC) based on the PSO-TVIW variant. In [174], Higashi et al. introduced the “mutation” concept to the PSO algorithm and proposed a variant called mutation PSO (MPSO). In [114], the authors introduced a variant called mutation PSO with a time varying acceleration coefficient (MPSO-TVAC). In [114], Ratnaweera and Halgamuge introduced self-organizing hierarchical PSO (HPSO). The CSO is proposed to tackle large-scale problems. In [9], although CSO is built on the PSO idea, the neither the local best position nor the global best position is involved in updating the particles’ positions. In addition, CSO updates only half of the population in each iteration. The CSO is introduced to tackle large-scale optimization problems to solve problems of high dimension up to 5000. In [115], a modified CSO (MCSO) is proposed as a varaint of CSO where two-thirds of of the population are updated by a tri-competitive criterion. The results show the superiority of MCSO over the basic CSO version. In [116], a variant of CSO is introduced called Inherited Competitive Swarm Optimizer (ICSO). The variant is built on both the human learning principles and the CSO, and the results show better performance than the basic CSO over CEC2008 benchmark problems.

4.2.2 Bacterial Foraging Algorithm

E. coli have a control system that gives them the ability to search for food and avoid noxious regions [7]. In [7], Passino introduced a SI algorithm called bacterial foraging. The algorithm is built on the foraging behavior of E.coli. The BF algorithm consists of four phases: swim or tumble, chemotactic, reproduction, and elimination and dispersal. The swimming phase is represented by one or more steps in the same direction as its previous step. The swimming decision is made when a bacterium achieves better fitness value, while the tumble (change direction) decision is made when the bacterium receives a worse fitness value. Thus, a bacterium keeps swimming until it reaches the maximum number of swimming steps \(N_s\) or it reaches a anxious region, and it is called the chemotactic phase. After completing chemotactic steps \(N_c\) steps, the fitness value of each bacterium during its lifetime is accumulated. Then, the bacteria are sorted in an ascending order based on the accumulated fitness value. The new population is divided into two equal parts, least healthy bacteria and most healthy bacteria. The least healthy bacteria die, removed from the search space, and most healthy bacteria is split into two bacteria that are placed at the same location, which is called reproduction phase. Thus, the number of individual in the population after each reproduction phase is constant. After completing all reproduction \(N_{re}\) steps, the elimination and dispersal event occurs. In the elimination and dispersal event, each bacterium in the new population is subjected to be eliminated and dispersed with probability \(P_{ed}\).

4.2.3 Imperialist Competitive Algorithm

The ICA is built on the socio-politically imperialism concept where an agent in the population is represented by a country. The agent in the population could be colonies or imperialists where the stronger empires try to colonize the weakest colonies from weaker empires and make them part of their colonies [8]. The ICA was introduced and evaluated on different benchmark functions, and the results show its ability to tackle different optimization problems. The basic version of ICA was proposed to solve continuous optimization problems such as tuning neural network weights for UCAV global path planning [117]. Later version of ICA was implemented to tackle discrete optimization problem such as Traveling salesman problem (TSP), flowlines scheduling problems (FSP), facility line design problem FLP.

4.2.4 Advantages, Limitations, and Applications of SI Algorithms

SI have many advantages over traditional optimization techniques and the main ones are: scalability, adaptability, collective robustness, and individual simplicity [118, 119]. SI algorithms are highly scalable, and their control mechanism does not depend on the population size. They are self organized techniques whose individuals interact in direct or indirect ways through the search space. Thus, they have the ability to adapt the behavior of individuals in the population to any dynamic changes on the run time. The performance of SI algorithms in tackling different optimization problems proves their robustness where there is no single point of failure, but their individuals cooperate and repeat the same behavior. In [120], Elbeltagi et al. compared five algorithms: genetic algorithms, memetic algorithms, particle swarm, ant colony systems, and shuffled frog leaping. The results showed that PSO is the best among all algorithms in terms of computational efficiency. The main advantage of PSO over GA is that it is more simpler, robust, and faster in convergence. In [121], Hassan et al. conducted a number of experiments and concluded that PSO is more computationally efficient than GA. PSO has the ability to control its convergence using its inertia weight better than GA using the rates of crossover and mutation [122]. PSO using small population size performs better than GA using large population size. In [169], Veeramachaneni et al. concluded that PSO is better than GA in solving continuous optimization problem. In [123], the results showed that PSO is better than GA in solving profiled corrugated horn antenna design. In [124], Afandie et al. conducted an experiment to compare between BF and EP in solving optimal load shedding. The results showed that BF outperforms EP in terms of quality of solutions and speed. In [125], Alsariera et al. implemented BF and the bat algorithm on several continuous benchmark functions. The results showed that BF provides more accurate solutions compared to the bat algorithm (BA), but BA exhibits faster convergence rate. In [126], Kamalanand et al. summarized that BF achieved higher efficiency than PSO in computing the optimal dosage of antiretroviral drugs for therapy planning in HIV/AIDS patients. The ICA has a key feature, which is its fast convergence. It has been implemented to tackle different optimization problems [127].

SI algorithms are growing fast, and they offer an alternative way for tackling complex problems, but they have some limitations and challenges: parameter tuning, stagnation, and time critical applications. Unlike deterministic methods, stochastic algorithms, including nature-inspired techniques, embed inherent randomness that makes them sensitive to parameters tuning. Parameter tuning for SI and other randomized algorithms is one of the most important but difficult tasks. Regardless of how sensitive a given randomized algorithm is to its random parameters, tuning should be conducted following an appropriate methodology in order to have a fair comparative assessment of performance. Actually, parameter tuning can be seen as an optimization problem requiring an adequate solving method. There are two main categories of parameters’ tuning: off-line and online. Off-line parameter tuning can be done manually following a trial-and-error manner, or automatically [128]. A group of researchers suggested that SI algorithm parameters are predefined in a trial and error manner based on the problem characteristics, which is considered as an old fashion. Automatic parameter tuning can be used to enhance the performance of an algorithm without requiring a prior knowledge. In addition, it prevents us from missing relevant parameters values, which can be the case if we follow a trial-and-error manner. In this regard, in [129, 130], racing techniques use statistical test to exclude parameters’ values that achieve lower performance.

Online tuning takes place during the run time period, and can be adaptive/self-adaptive or deterministic [131,132,133,134]. This means that a trial start of parameters’ values is initiated, and these latter are improved during the run-time of the algorithm. In the online-deterministic technique, the parameters’ values are updated based on deterministic rules (such as increasing or decreasing weights in a PSO technique) that are updated every set of iterations. The online-deterministic method is difficult as determining the number of iterations at which the parameters will update is not clear. The self-adaptive method is a process where the parameters’ values are updated based on the fitness of solution. A good survey paper on self-adaptive methods can be found in [135].

SI algorithms may suffer from stagnation and convergence to local minima, because they do not have central coordination. In [136], Clerc introduced stagnation analysis for different PSO variants. For instances, in [137], the author concluded that a hybrid algorithm that combined Hopfield neural network and MNC local search outperforms PSO in tackling CSP problem. The ICA also suffers from stagnation problem when it is implemented for high dimension and complex multimodal functions. In [138], Abdi et al. introduced a variant of ICA called GICA to overcome the stagnation problem and the results show a a significant improvement over the basic ICA. In addition, the ICA effectiveness, limitations, and applicability in many domains are investigated [139].

In BF, elimination and dispersal helps in reducing the stagnation behavior [140]. In [141], stagnation occurs in ACO when all ants to follow the same path to reach destination. Sharvani discussed a few way to alleviate stagnation in ACO. In [142], Soundappan introduced a way to avoid stagnation in ABC. In addition, time critical applications require critical decision, control of the system, and acceptable solutions within restricted time frames; such as in underwater sensor networks [143]. SI algorithms are not applicable for time-critical applications, because the solutions of SI algorithms are not predefined. However, they are suitable for non-time critical applications. In [144], Pal et al. compared four different SI algorithms: PSO, ACO, ABC, and FA. The author summarized the advantages and disadvantages of each SI algorithm, and summarized in general the advantages and limitations of SI algorithms.

SI algorithms have been successfully implemented in many applications. In [145], Fornarelli et al. introduced a comprehensive reference of using SI algorithms in the field of electrical and electronic engineering. In addition, Bai et al. introduced a survey of implementing the SI algorithms in the electric power system. In [18], Karaboga et al. introduced a comprehensive survey about ABC and its applications in electrical, electronic, and control engineering. SI algorithms have been implemented in different applications in mechanical engineering, such as modeling of mechanical properties of as-cast Mg-Li-Al alloys [146], structural damage assessment using FRF [147]. In [18, 19], the reader can find more applications in mechanical engineering. In addition, SI algorithms have been widely used to solve different optimization problems in the civil engineering [19, 148]. Furthermore, they have been used in medical engineering, such as diagnosing the medical diseases, classification of magnetic resonance, parameters adjustment of medical microdevices [19]. In [149], Omran completed his PhD in image processing using PSO. Furthermore, they have been used extensively in biomedical research. The key challenge in biomedical problems is located in the huge amount of their data; therefore, problems require approximate algorithms rather than exact algorithms. In [150], Poli et al surveyed more than 25 different biomedical problems that have been solved using PSO, such as gene selection and cancer classification [151], cancer survival prediction [152], protein structure prediction in the 3D HP model [153], identify transcription factor binding sites [154], drug design [155]. In addition, they have been implemented in communication theory, such as antenna selection in multiple-input-multiple-output (MIMO) system [156], optimizing coverage in indoor Ultra-wideband (UWB) communication system [157], scheduling multi-channel and multi-timeslot in time constrained wireless sensor networks [158], and non-linear channel equalization [159]. In [160], a hybrid ICA and GA is implemented to the multi-processor open shop scheduling problem. In [161], the CIA was implemented to design a linear induction motor. In [162], ICA was utilized to optimize the skeletal structures. In [163], the authors proposed a variant of ICA called chaotic ICA, and it was used as image matching approach. In [164], ICA is implemented in solving integrated product mix-outsourcing optimization problem. In [165], the authors implemented ICA for materials property evaluation from indentation test curve. In [166], ICA was implemnted to tune the IPD controller. In [167], ICA was implemented to tackle the scheduling of receiving and shipping trucks in cross-docking systems. In [139], ICA is implemented to tackle the assembly sequence planning problem.

5 Performance Analysis of NI Algorithms

The performance analysis of NI algorithms is a significant task that should be done fairly. In this section, we discuss several guidelines that should be taken into consideration when evaluating a NI algorithm and/or comparing NI algorithms rigorously. Three phases should be implemented when conducting a performance analysis of a NI algorithm rigorously: experimental design, measurement, and reporting [181, 182]. During the experimental design phase, the goals of the experiments are set up, and the input instances are defined. During the measurement phase, all measures to be computed are selected. The results that are obtained by applying different statistical analyses should be reproducible. The reporting phase is the final stage in which the results are presented in a comprehensive way.

5.1 Experimental Design

In the experimental design phase, all goals should be clearly defined as a first step, such as search time quality of solution. The second step is the appropriate selection of the input instances. There are two main types of input instances: real-life instances and constructed instances. Real-life instances are considered the most appropriate benchmarks for a performance evaluation of NI algorithms; however, it is not easy to obtain. On the other hand, constructed instances, also known as standard instances, are available to the public on the internet, and they include well-known instances for continuous optimization or discrete optimization. The main disadvantage of the standard instances is that they do not reflect the level of difficulty of the real-world problems. In continuous optimization, there are several well-known benchmarks, such as Schaffer, Ackley, Griewank, Rastrigin, Rosenbrock,...etc. This group of standard instances has different properties, such as uncorrelated, nonseparable, nonlinear, and nonsymmetric. These properties help to mimic the real-world problems. In addition, NI algorithms have parameters that should be tuned, because those parameters’ values have significant influence on the robustness of the algorithms and the obtained quality of solutions. In the performance evaluation process, input instances should be divided into two parts: parameters calibration and performance evaluation. The obtained values of those parameters should be the same for all instances during the experiment.

5.2 Measurements

In the measurement phase, the performance indicators are selected for evaluation, and statistical analysis is applied to obtain the desired results. Exact algorithms guarantee the global optimal solution. Search time is considered the main indicator to evaluate the efficiency of an exact algorithm; however, other indicators beside the search time should be taken into consideration when evaluating NI algorithms, such as quality of solutions, computational effort, and robustness [181]. The quality of solutions is evaluated in terms of precision. Computational effort represents the computation time; it is defined as CPU time including preprocessing and postprocessing time [182]. The number of function evaluations is used often as an indicator for computational effort. Robustness is the insensitivity against small changes in the input instances or NI parameters. It is one of the indicators to measure the performance according to input instances that have different properties.

Given the randomness of the NI techniques and also the problem instances (when randomly generated instances are used). Statistical analysis methods are applied after the results are obtained for different measures. Those methods are used to conduct an assessment of the evaluated NI algorithm. Average and standard deviations are aggregate numbers that should be taken into consideration when using any performance indicator, such as quality of solutions and its associated computational effort. Then, statistical tests are applied to analyze and compare NI algorithms. Statistical tests are used to estimate the confidence of results being scientifically valid, such as t-tests and ANOVA tests. The t-test is applied under normal conditions, and ANOVA test is used to compare more than two algorithms.

5.3 Reporting Experimental Evaluation Results

Presenting the results is an important factor that helps researchers gain insight of the experiments. Using tables to present large amounts of data is not sufficient; therefore, visualizing the results (through charts) is considered as a complementary step in understanding the results [183]. The relationship between performance indicators, such as quality of solutions, search time, robustness, and size of instances can be represented graphically. Graphical representation has different forms, such as deviation bars, scatter plots, and interaction plots. The compromise between different performance measures can also be represented using scatter plots, such as the relationship between quality of solutions versus time, or robustness versus time or quality.

6 Conclusion and Future Directions

This work reports on a survey on nature-inspired methods. These techniques are good alternatives, when exact methods fail to solve a given combinatorial problem in a reasonable amount of time. The survey includes some of the most popular NI methods and their significant variants as described in the literature. The advantages and limitations of NI algorithms have been discussed for each category. We support this work with additional valuable references that help the reader to get a better understanding of NI algorithms for real-world applications. The significant number and variety of the applications that have been successfully solved using NI tech demonstrate their efficiency. A well-designed experiment to evaluate a NI algorithm is explained at the end of the survey [184].

Each NI algorithm depends on tunable parameters, and the tuning process is an open area of research. Those parameters influence the complexity of the algorithm and make the analysis process more complex. There are several studies that have been conducted to help resolve this issue [131, 134, 185]. For instance, the parameters may be adaptively tuned during run time. Recent research has concluded that the tuning problem of NI algorithms is somewhat similar to the tuning problem faced in machine learning [186].

One of the main challenges is that the performance of most NI algorithms deteriorates when the dimension and size of the problem increases. Recent research trends involve the development of powerful new NI algorithms to tackle large-scale optimization problem. Thus, scalability for high-dimensional problem becomes a significant factor when proposing new NI algorithms. For instance, cooperative coevolving PSO (CCPSO) is a variant of PSO that was proposed to address the issue of scaling up when solving large-scale optimization problems (up to 2000 real-valued variables). In addition, there is another research trend to enhance the performance of NI algorithms by developing hybrid optimization methods. Many research works have been conducted in the last decade regarding the hybridization of NI algorithms with other algorithms: exact or approximate. In [187], Christian introduced a survey about hybridization of NI algorithms with other algorithms.

In addition, we listed guidelines to be followed by anyone planning to propose a new nature-inspired technique (in addition to sharing the related code, as stated earlier). These guidelines are meant to address the address the “metaphor-based nethodologies” that some researchers are proposing and claiming to be new techniques, while they are actually embedding old ideas taken for known metaheuristics.

In recent years, we are experiencing more and more proposed nature-inspired methods. Unfortunately, some of these new techniques are more of “metaphor-based nethodologies” disguising and embedding old ideas taken for known metaheuristics [188]. In order to address this issue, several guidelines have been discussed in order to refrain from proposing similar methods [182, 188, 189]. We can summarize these guidelines as follows. Any new proposed nature-inspired method should have innovative basic ideas [188]. The components of a proposed algorithm should be well described and evaluated. The relations between these components and those in existing techniques such as GAs and PSOs should be well defined. The proposed method should demonstrate an actual contribution to the field. A rigorous methodology, including experiments demonstrating the merits of the new method, through fair comparative performance, should be conducted. The results, validated through statistical models, should clearly show the superiority of the new technique for some relevant instances.

Scientists concluded that we are in need of a unique publicly available software framework for NI algorithms to reduce the development effort and help compare NI methods [10]. In [189], Burke et al proposed an object oriented framework that is used in evaluating approximate search algorithms. We suggest a framework/platform that includes the code and problem instances for all published algorithms. This platform can be overseen by a dedicated NI committee who will be responsible for collecting and reviewing code and problem instances. We also encourage any researcher who plans to publish a new algorithm or a variant, to submit the related code and instances to the NI committee.

Finally, proposing efficient topologies is an open area of research. Topologies describe how agents in the same population communicate with one another. Many theoretical studies on such topologies have been conducted to improve the performance of different algorithms. Understanding topologies is an important step toward understanding the behavior of different search components in NI algorithms.