Keywords

1 Introduction

Optimization is ubiquitous in many applications in engineering and industry. In essence, optimization is a process of searching for the optimal solutions to a particular problem of interest, and this search process can be carried out using multiple agents which essentially form a system of evolving agents. This system can evolve by iterations according to a set of rules or mathematical equations. Consequently, such systems will show some emergent characteristics, leading to self-organizing states which correspond to some optima in the search space. Once the self-organized states are reached, we say the system has converged. Therefore, the design of an efficient optimization algorithm is equivalent to mimicking the evolution of a self-organizing system.

In almost all applications in engineering and industry, we are always trying to optimize something—whether to minimize the cost and energy consumption, or to maximize the profit, output, performance and efficiency. In reality, resources, time and money are always limited; consequently, optimization is far more important in practice [2, 13, 30, 32, 36, 39, 41].

It is worth pointing out that computational efforts are a main issue in many optimization problems in engineering and industry, because the most time-consuming part of the optimization process is the evaluations of objective functions [17, 39]. The use of the most efficient algorithms is just one way of tackling the problem, while an alternative is to use surrogate-based models which can often be more efficient if the number of evaluating high-fidelity models is significantly reduced [1820]. Such surrogate-based optimization using a combination of low-fidelity and high-fidelity models can be a powerful tool for many real-world applications. This book contains many examples of surrogate-based modelling and optimization. In this chapter, our focus is mainly on the introduction of some widely used new algorithms and a well-chosen set of design benchmarks.

2 Optimization Algorithms and Self-organization

2.1 Self-organizing Systems

Self-organization exists in many systems, from physical and chemical to biological and artificial systems. Emergent phenomena such as Rayleigh–Bénard convection, Turing pattern formation [26] and organisms and thunderstorms [15] can all be called self-organizing. Though there is no universal theory for self-organizing processes, some aspects of self-organization can be partly understood using theories based on nonlinear dynamical systems, far-from-equilibrium [23] multiple interacting agents, and closed systems under unchanging laws [3]. As pointed out by cyberneticist and mathematician Ross Ashby, every isolated determinate dynamic system, obeying unchanging laws, will ultimately develop some sort of ‘organisms’ that are adapted to their ‘environments’ [3].

For simple systems, going to equilibrium is trivial, but, for a complex system, if its size is so large that its equilibrium states are just a fraction of the vast number of possible states, and if the system is allowed to evolve long enough, some self-organized structures may emerge. Changes in environment can apply pressure on the system to re-organize and adapt to such changes. If the systems have sufficient perturbations or noise, often working at the edge of the chaos, some spontaneous formation of structures will emerge as the systems move, far from equilibrium, and select some states, thus reducing the uncertainty or entropy.

The state set S of a complex system such as a machine may change from initial states S(ψ) to other states S(ϕ), subject to the change of a parameter set α(t) which can be time-dependent. That is,

$$ S(\psi) \stackrel{\alpha(t)}{\longrightarrow} S(\phi), $$
(1)

where α(t) must come from external conditions such as the heat flow in Rayleigh–Bénard convection, not from the states S themselves. Obviously, S+α(t) can be considered as a larger, closed system [3]. In this sense, self-organization is equivalent to a mapping from some high-entropy states to low-entropy states.

An optimization algorithm can be viewed as a complex, dynamical system. If we can consider the convergence process as a self-organizing process, then there are strong similarities and links between self-organizing systems and optimization algorithms.

2.2 Algorithms for Self-organization

Mathematically speaking, an algorithm is a procedure to generate outputs for given inputs. From the optimization point of view, an optimization algorithm generates a new solution x t+1 to a given problem from a known solution x t at iteration or time t. That is

$$ \boldsymbol{x} ^{t+1}=\boldsymbol{A} \bigl(\boldsymbol{x} ^t, \boldsymbol{p} (t)\bigr), $$
(2)

where A is a nonlinear mapping from a given solution, or d-dimensional vector, x t to a new solution vector x t+1. The algorithm A has k algorithm-dependent parameters p(t)=(p 1,…,p k ) which can be time-dependent and can thus be tuned if necessary.

To find the optimal solution x to a given optimization problem S with an often infinite number of states is to select some desired states ϕ from all states ψ, according to some predefined criterion D. We have

$$ S(\psi) \stackrel{\boldsymbol{A} (t)}{\longrightarrow} S\bigl(\phi(x_*)\bigr), $$
(3)

where the final converged state ϕ corresponds to an optimal solution x to the problem of interest. The selection of the system states in the design space is carried out by running the optimization algorithm A. The behaviour of the algorithm is controlled by p, the initial solution x t=0 and the stopping criterion D. We can view the combined S+A(t) as a complex system with a self-organizing capability.

The change of states or solutions of the problem of interest is controlled by the algorithm A. In many classical algorithms such as hill-climbing, gradient information is often used to select states, say, the minimum value of the landscape, and the stopping criterion can be a given tolerance or accuracy, or zero gradient etc. Alternatively, an algorithm can act like a tool to tune a complex system. If an algorithm does not use any state information of the problem, then it is more likely to be versatile to deal with many types of problems. However, such black-box approaches can also imply that the algorithm may not be as efficient as it could be for a given type of problem. For example, if the optimization problem is convex, algorithms that use such convexity information will be more efficient than those that do not use such information. In order to select states/solutions efficiently, the information from the search process should be used to enhance the search process. In many cases, such information is often fed into the selection mechanism of an algorithm. By far the most widely used selection mechanism is to identify and keep the best solution found so far. That is, some form of ‘survival of the fittest’ is used.

Optimization algorithms can be very diverse. There are dozens of widely used algorithms. The main characteristics of different algorithms will only depend on the actual, often highly nonlinear or implicit, forms of A(t) and their parameters p(t).

In many situations concerning optimization, the generation and verification of the new solutions can often involve computationally expensive computer simulations or even measurements of the physical system. In such cases, the expensive model of the system under consideration is often replaced by its cheaper representation, called a surrogate model, and the algorithm A uses that model to produce a new solution. The parameters p(t) may then include variables that are used to align the surrogate with the expensive model to make it a reliable representation of the latter.

3 Three New Algorithms

In this chapter, we illustrate the concept of a self-organizing optimization algorithm using a specific class of algorithms called metaheuristics. Metaheuristics have some important characteristics that uses stochastic components to enable an algorithm to escape the possibility of being trapped in a local optimum. This often makes the search process more ergodic, and thus such algorithms can generate high-quality solutions over the search space during iterations, which may ultimately converge towards the true optimality of the problem of interest.

There are well over two dozen metaheuristic algorithms now in use for optimization [16, 30, 34]. All metaheuristic algorithms have to balance exploration and exploitation during the search process by using some sort of algorithm-dependent parameter setting. From the viewpoint of a self-organizing system, parameter settings will affect the way and routes by which the optimization process converges to an organized state. Here we analyse three relatively new nature-inspired algorithms and see the ways in which they can quickly converge towards optimality.

3.1 Firefly Algorithm

The first algorithm to be discussed is the firefly algorithm, which is essentially a dynamical system. The firefly algorithm (FA), first developed by Xin-She Yang in 2008 [29, 30], was based on the flashing patterns and behaviour of fireflies. In essence, FA uses the following three idealized rules:

  • Fireflies are unisex, so one firefly can be attracted to any other one.

  • The attractiveness is proportional to the brightness and they both decrease as their distance increases. Thus for any two flashing fireflies, the less brighter one will move towards the brighter one. If there is no brighter one than a particular firefly, it will move randomly.

  • The brightness of a firefly is determined by the landscape of the objective function.

As a firefly’s attractiveness is proportional to the light intensity seen by adjacent fireflies, we can now define the variation of attractiveness β with the distance r by

$$ \beta= \beta_0 e^{-\gamma r^2}, $$
(4)

where β 0 is the attractiveness at r=0.

The movement of a firefly i that is attracted to another more attractive (brighter) firefly j is determined by

$$ \boldsymbol{x} _i^{t+1} =\boldsymbol{x} _i^t + \beta_0 e^{-\gamma r^2_{ij} } \bigl(\boldsymbol{x} _j^t- \boldsymbol{x} _i^t\bigr) + \alpha \boldsymbol{\epsilon} _i^t, $$
(5)

where the second term is due to the attraction. The third term is randomization with α being the randomization parameter, and \(\boldsymbol{\epsilon} _{i}^{t}\) is a vector of random numbers drawn from a Gaussian distribution or uniform distribution at time t. If β 0=0, it becomes a simple random walk. Furthermore, the randomization \(\boldsymbol{\epsilon} _{i}^{t}\) can easily be extended to other distributions such as Lévy flights. A Lévy flight essentially provides a random walk whose random step length is drawn from a Lévy distribution

$$ \textrm{L\'{e}vy} \sim u = t^{-\lambda} \quad(1 < \lambda\le3), $$
(6)

which has an infinite variance with an infinite mean. Here the steps essentially form a random walk process with a power-law step-length distribution with a heavy tail. Some of the new solutions should be generated by the Lévy walk around the best solution obtained so far, and this will speed up the local search. A demo version of firefly algorithm implementation, without Lévy flights, can be found at the Mathworks file exchange web site.Footnote 1 FA has attracted much attention recently [1, 11, 25].

A discrete version of FA can efficiently solve NP-hard scheduling problems [25], while a detailed analysis has demonstrated the efficiency of FA over a wide range of test problems, including multiobjective load dispatch problems [1]. Highly nonlinear and non-convex global optimization problems can be solved efficiently using FA [11, 42].

From the self-organization point of view, FA acts as a simple dynamic system with diverse characteristics that can automatically subdivide the entire population into subgroups, and each subgroup can swarm around a local mode. Among all the local modes, there is always a global optimum, and thus FA can find the global optimality and local optima simultaneously if the number of fireflies is sufficiently higher than the number of modes.

3.2 Cuckoo Search

Cuckoo search (CS) is one of the latest nature-inspired metaheuristic algorithms, developed in 2009 by Xin-She Yang and Suash Deb [34]. CS is based on the brood parasitism of some cuckoo species. In addition, this algorithm is enhanced by Lévy flights, rather than by simple isotropic random walks. Recent studies show that CS is potentially far more efficient than particle swarm optimization (PSO) and genetic algorithms [35].

Cuckoos are fascinating birds, not only because of the beautiful sounds they can make, but also because of their aggressive reproduction strategy. Some species such as the Ani and Guira cuckoos lay their eggs in communal nests, though they may remove others’ eggs to increase the hatching probability of their own eggs. Quite a number of species engage the obligate brood parasitism by laying their eggs in the nests of other host birds (often other species).

For simplicity in describing CS, we now use the following three idealized rules:

  • Each cuckoo lays one egg at a time, and dumps it in a randomly chosen nest.

  • The best nests with high-quality eggs will be carried over to the next generations.

  • The number of available host nests is fixed, and the egg laid by a cuckoo is discovered by the host bird with a probability p a ∈[0,1]. In this case, the host bird can either get rid of the egg, or simply abandon the nest and build a completely new nest.

As a further approximation, this last assumption can be approximated by a fraction p a of the n host nests that are replaced by new nests (with new random solutions). For a maximization problem, the quality or fitness of a solution can simply be proportional to the value of the objective function. Other forms of fitness can be defined in a similar way to the fitness function in genetic algorithms.

This algorithm uses a balanced combination of a local random walk and a global explorative random walk, controlled by a switching parameter p a . The local random walk can be written as

$$ \boldsymbol{x} _i^{t+1}=\boldsymbol{x} _i^t +\alpha s \otimes H(p_a-\epsilon) \otimes\bigl(\boldsymbol{x} _j^t-\boldsymbol{x} _k^t\bigr), $$
(7)

where \(\boldsymbol{x} _{j}^{t}\) and \(\boldsymbol{x} _{k}^{t}\) are two different solutions selected randomly by random permutation, H(u) is a Heaviside function, ϵ is a random number drawn from a uniform distribution, and s is the step size. On the other hand, the global random walk is carried out by using Lévy flights

$$ \boldsymbol{x} _i^{t+1}=\boldsymbol{x} _i^t+ \alpha L(s,\lambda ), $$
(8)

where

$$ L(s,\lambda )=\frac{\lambda \varGamma (\lambda ) \sin (\pi\lambda /2)}{\pi} \frac{1}{s^{1+\lambda }} \quad(s \gg s_0>0). $$
(9)

Here α>0 is the step size scaling factor, which should be related to the scales of the problem of interest. In most cases, we can use α=O(L/10), where L is the characteristic scale of the problem of interest, while in some case α=O(L/100) can be more effective and avoid flying too far.

The above equation is essentially the stochastic equation for a random walk. In general, a random walk is a Markov chain whose next status/location only depends on the current location (the first term in the above equation) and the transition probability (the second term). However, a substantial fraction of the new solutions should be generated by far-field randomization, and their locations should be far enough from the current best solution to make sure that the system will not be trapped in a local optimum [34].

Though the pseudo-code given in many papers is sequential, vectors should be used from the implementation point of view, as vectors are more efficient than loops. A Matlab implementation is given by the author, and it can be downloaded.Footnote 2

The literature on CS is expanding rapidly. Much attention and many recent studies have used CS with a diverse range of applications [7, 11, 27, 37]. Walton et al. improved the algorithm by formulating a modified CS algorithm [27], while Yang and Deb extended it to multiobjective optimization problems [37].

Looking at CS in terms of self-organization, we can see that this swarm-intelligence-based algorithm uses multiple interacting Markov chains by switching between two key branches of global search and local search using Lévy flights as well as random walks so that a balance between global exploration and local exploitation can be achieved during the optimization process.

3.3 Bat Algorithm

A third way of looking at an algorithm, apart from dynamic systems and Markov chains, is by using a varying parameter setting. This idea is used in the bat algorithm; the parameter tuning is essentially achieved by frequency tuning and mimicking the hunting strategy of microbats.

The bat algorithm (BA) is a relatively new metaheuristic, developed by Xin-She Yang in 2010 [40], which was inspired by the echolocation behaviour of microbats. Microbats use a type of sonar, called echolocation, to detect prey, avoid obstacles and locate their roosting crevices in the dark. These bats emit a very loud sound pulse and listen for the echo that bounces back from surrounding objects. Their pulses have varying properties and can be correlated with their hunting strategies, depending on the species. Most bats use short, frequency-modulated signals to sweep through about an octave, while others more often use constant-frequency signals for echolocation. The signal bandwidth varies depending on the species, and is often increased by using more harmonics.

The bat algorithm has three idealized rules:

  • All bats use echolocation to sense distance, and they also ‘know’ the difference between food/prey and background barriers in some magical way.

  • Bats fly randomly with velocity v i at position x i with a fixed frequency f min, varying wavelength λ and loudness A 0 to search for prey. They can automatically adjust the wavelength (or frequency) of their emitted pulses and adjust the rate of pulse emission r∈[0,1], depending on the proximity of their target.

  • Although the loudness can vary in many ways, we assume that it varies from a large (positive) A 0 to a minimum constant value A min.

Obviously, we have to define the rules of how their positions x i and velocities v i in a d-dimensional search space are updated. The new solutions \(\boldsymbol{x} _{i}^{t}\) and velocities \(\boldsymbol{v} _{i}^{t}\) at time step t are given by

$$\begin{aligned} &f_i =f_{\min} + (f_{\max}-f_{\min}) \beta, \end{aligned}$$
(10)
$$\begin{aligned} &\boldsymbol{v} _i^{t} = \boldsymbol{v} _i^{t-1} + \bigl(\boldsymbol{x} _i^{t-1} - \boldsymbol{x} _*\bigr) f_i , \end{aligned}$$
(11)
$$\begin{aligned} &\boldsymbol{x} _i^{t}=\boldsymbol{x} _i^{t-1} + \boldsymbol{v} _i^t, \end{aligned}$$
(12)

where β∈[0,1] is a random vector drawn from a uniform distribution. Here x is the current global best location (solution) which is located after comparing all the solutions among all the n bats at each iteration t. As the product λ i f i is the velocity increment, we can use f i (or λ i ) to adjust the velocity change while fixing the other factor λ i (or f i ), depending on the type of problem of interest. In our implementation, we will use f min=0 and f max=O(1), depending on the domain size of the problem of interest. Initially, each bat is randomly assigned a frequency which is drawn uniformly from [f min,f max].

For the local search part, once a solution is selected among the current best solutions, a new solution for each bat is generated locally using random walk

$$ \boldsymbol{x}_{\mathrm{new}}=\boldsymbol{x}_{\mathrm{old}} + \boldsymbol{\epsilon} A^{t}, $$
(13)

where ϵ is a random number vector drawn from [−1,1], while \(A^{t}=\langle A_{i}^{t}\rangle \) is the average loudness of all the bats at this time step.

Furthermore, the loudness A i and the rate r i of pulse emission have to be updated accordingly as the iterations proceed. As the loudness usually decreases once a bat has found its prey, while the rate of pulse emission increases, the loudness can be chosen as any value of convenience. For simplicity, we can also use A 0=1 and A min=0, assuming A min=0 means that a bat has just found the prey and will temporarily stop emitting any sound. Now we have

$$\begin{aligned} &A_i^{t+1}=\alpha A_{i}^{t}, \end{aligned}$$
(14)
$$\begin{aligned} &r_i^{t+1}= r_i^0 \bigl[1-\exp(- \gamma t)\bigr], \end{aligned}$$
(15)

where α and γ are constants. In fact, α is similar to the cooling factor of a cooling schedule in the simulated annealing. For any 0<α<1 and γ>0, we have

$$ A_i^t \rightarrow0, \qquad r_i^t \rightarrow r_i^0,\quad \textrm{as}\ t \rightarrow \infty. $$
(16)

In the simplest case, we can use α=γ, and we have used α=γ=0.95 to 0.97 in our simulations.

BA has been extended to the multiobjective bat algorithm (MOBA) by Yang [31], and preliminary results suggest that it is very efficient.

Again looking at BA from the self-organization point of view, the convergence is controlled by loudness and pulse emission rate so that it can explore the vast search space in the earlier stage and then focus on the local exploitation in the more promising regions. Compared with FA and CS, where fixed parameters are used in terms of balancing exploration and exploitation, BA uses a more dynamic approach to balance exploration and exploitation.

4 Engineering Optimization and Applications

Engineering optimization is very diverse with vast collections of case studies, and some case studies require lengthy descriptions to provide sufficient details [6, 1012, 24]. Here we provide nine case studies as a subset of design optimization benchmarks in engineering and industrial applications.

4.1 Bending Beam Design

Probably the simplest design problem with engineering relevance is the design of a cantilever beam with five different square cross sections with heights/widths of x 1,x 2,…,x 5, respectively. The thickness is fixed with t=2/3, and the objective is to minimize [5, 9]

$$ f(\boldsymbol{x} )=0.0624(x_1+x_2+x_3+x_3 + x_4 +x_5), $$
(17)

subject to

$$ g(\boldsymbol{x} )=\frac{61}{x_1^3} + \frac{37}{x_2^3} +\frac{19}{x_3^3} + \frac{7}{x_4^3}+\frac{1}{x_5^3}-1 \le0. $$
(18)

It is straightforward to use all three algorithms discussed earlier to find the best solution

$$ \boldsymbol{x} =(6.0089,\ 5.3049,\ 4.5023,\ 3.5077,\ 2.1504), $$
(19)

which gives

$$ f_{\min}=1.33999. $$
(20)

4.2 Spring Design

Tensional and/or compressional springs are used widely in engineering. A standard spring design problem has three design variables: the wire diameter w, the mean coil diameter d and the length (or number of coils) L.

The objective is to minimize the weight of the spring, subject to various constraints such as maximum shear stress, minimum deflection and geometrical limits. For a detailed description, please refer to earlier studies [2, 4]. This problem can be written compactly as

$$ \textrm{Minimize}\quad f(\boldsymbol{x} )=(L+2) w^2 d, $$
(21)

subject to

$$ \begin{array}{l} g_1(\boldsymbol{x} ) =1-\displaystyle\frac{d^3 L}{71785 w^4} \le0, \\ g_2(\boldsymbol{x} ) =1-\displaystyle\frac{140.45 w}{d^2 L} \le0, \\ g_3(\boldsymbol{x} ) =\displaystyle\frac{2(w + d)}{3}-1 \le0, \\ g_4(\boldsymbol{x} )= \displaystyle\frac{d (4 d-w)}{w^3 (12566 d - w)} + \displaystyle\frac{1}{5108 w^2} -1 \le0, \end{array} $$
(22)

with the following limits:

$$ 0.05 \le w \le2.0, \qquad0.25 \le d \le1.3, \qquad2.0 \le L \le 15.0. $$
(23)

Using any of the algorithms discussed earlier, we can easily obtain the same or slightly better solutions than the best solution obtained by [4]:

$$ f_*=0.012665 \quad\textrm{at}\ (0.051690, 0.356750, 11.287126), $$
(24)

but both CS and FA use significantly fewer evaluations.

4.3 Three-Bar Truss Design

The three-bar truss design is a simple but practical example first presented by Nowcki [22], which requires one to find the optimal cross-sectional areas A 1 and A 2. The problem can be formulated as

$$ \textrm{Minimize}\quad f(A_1,A_2)=(\sqrt{8 A_1} + A_2) L, $$
(25)

subject to

$$\begin{aligned} &g_1=\frac{(\sqrt{2} A_1 + A_2)P}{\sqrt{2} A_1^2 + 2 A_1 A_2} -\sigma\le0, \end{aligned}$$
(26)
$$\begin{aligned} &g_2=\frac{A_2 P}{\sqrt{2} A_1^2 + 2 A_1 A_2} -\sigma \le0, \end{aligned}$$
(27)
$$\begin{aligned} &g_3=\frac{P}{A_1 + \sqrt{2} A_2} -\sigma\le0, \end{aligned}$$
(28)

where σ=2,000 N/cm2 is the stress constraint, and P=2,000 N/cm2 is the load. The simple limits are

$$ 0 \le A_1, A_2 \le1. $$
(29)

Using CS and BA, it is easy to find the optimal solution

$$ \boldsymbol{x} _*=(A_1, A_2) =(0.78867, 0.40902), $$
(30)

and

$$ f_{\min}=263.97156. $$
(31)

4.4 Discrete Beam Design

Reinforced concrete beam designs are relevant in many applications in engineering and construction. One class of beam designs is the discrete beam design, where the dimensions and some design variables can only take discrete values [11, 21]. For example, a very simple design benchmark of a reinforced concrete beam can be written as

$$ \textrm{Minimize}\quad f(A_s, b, h)=0.6 b h + 2.9 A_s, $$
(32)

subject to

$$\begin{aligned} &g_1=\frac{h}{b}-4 \le0, \end{aligned}$$
(33)
$$\begin{aligned} &g_2=\frac{7.375 A_s^2}{b} +180 -A_s h \le0. \end{aligned}$$
(34)

However, the area A s only take values of {6.0,6.16, 6.32,6.6, 7.0,7.11,7.2,7.8,7.9, 8.0,8.4} in2, and b only takes a value from {28,29,30,…,39,40} and 5≤h≤10 in the continuous domain [21].

By using FA and CS, we have found the best solution

$$ f_{\min}=359.2080, $$
(35)

with

$$ (A_s, b, h)=(6.32, 34, 8.5), $$
(36)

which is better than any solutions found so far in the literature [11].

4.5 Heat Exchanger Design

The heat exchanger design is a problem with six constraints [38], which can be expressed in the simplest case as the following minimization problem with eight design variables:

$$ \textrm{Minimize} \quad f(\boldsymbol{x} )=x_1+x_2+x_3, $$
(37)

subject to

$$\begin{aligned} &g_1(\boldsymbol{x} )=0.0025 (x_4+ x_6)-1 \le0, \end{aligned}$$
(38)
$$\begin{aligned} &g_2(\boldsymbol{x} )=0.0025 (x_5+x_7-x_5)-1 \le 0, \end{aligned}$$
(39)
$$\begin{aligned} &g_3(\boldsymbol{x} )=0.01(x_8-x_5)-1 \le0, \end{aligned}$$
(40)
$$\begin{aligned} &g_4(\boldsymbol{x} )=833.33252 x_4 +100 x_1 -x_1 x_6-83333.333 \le0, \end{aligned}$$
(41)
$$\begin{aligned} &g_5(\boldsymbol{x} )=1250x_5+x_2 x_4-x_2 x_7 -125 x_4 \le0, \end{aligned}$$
(42)
$$\begin{aligned} &g_6(\boldsymbol{x} )=x_3 x_5 -2500 x_5 -x_3 x_8 +1250000 \le0. \end{aligned}$$
(43)

For example, using CS with n=20 cuckoos, we can easily find the optimal solution for these eight design variables as

$$\begin{aligned} x^*=&(579.3068, 1359.9708, 5109.9705, 182.0177, \\&\hphantom{(} 295.6012, 217.9823, 286.4165,395.6012). \end{aligned}$$
(44)

4.6 Welded Beam Design

The welded beam design is another standard test problem for constrained design optimization [4, 38]. The problem has four design variables: the width w and length L of the welded area, and the depth d and thickness h of the main beam. The objective is to minimize the overall fabrication cost, under the appropriate constraints of shear stress τ, bending stress σ, buckling load P and maximum end deflection δ.

The problem can be written as

$$ \textrm{minimize}\quad f(\boldsymbol{x} )=1.10471 w^2 L + 0.04811 d h (14.0+L), $$
(45)

subject to

$$ \begin{array}{l} g_1(\boldsymbol{x} )=w -h \le0, \\ g_2(\boldsymbol{x} ) =\delta(\boldsymbol{x} ) - 0.25 \le 0, \\ g_3(\boldsymbol{x} )=\tau(\boldsymbol{x} )-13{,}600 \le0, \\ g_4(\boldsymbol{x} )=\sigma(\boldsymbol{x} )-30{,}000 \le 0, \\ g_5(\boldsymbol{x} )=0.10471 w^2 +0.04811 h d (14+L) -5.0 \le0, \\ g_6(\boldsymbol{x} )=0.125 - w \le0, \\ g_7(\boldsymbol{x} )=6000 - P(\boldsymbol{x} ) \le0, \end{array} $$
(46)

where

$$ \begin{array}{l@{\qquad}l} \sigma(\boldsymbol{x} )=\displaystyle\frac{504{,}000}{h d^2}, & Q=6000 \biggl(14+\displaystyle\frac{L}{2}\biggr), \\ D=\displaystyle\frac{1}{2} \sqrt{L^2 + (w+d)^2}, & J=\sqrt{2} w L \biggl[ \displaystyle\frac {L^2}{6} + \frac{(w+d)^2}{2}\biggr], \\ \delta=\displaystyle\frac{65{,}856}{30{,}000 h d^3}, & \beta=\displaystyle\frac{QD}{J}, \\ \alpha=\displaystyle\frac{6000}{\sqrt{2} w L}, & \tau(\boldsymbol{x} )=\sqrt{\alpha^2 + \displaystyle\frac {\alpha\beta L}{D}+\beta^2}, \\ P=0.61423 \times10^6\displaystyle\frac{d h^3}{6} \biggl(1-\displaystyle\frac{d \sqrt {30/48}}{28}\,\biggr). & \end{array} $$
(47)

The simple limits or bounds are 0.1≤L, d≤10 and 0.1≤w, h≤2.0. For example, using both CS and FA, we have obtained the following optimal solution:

$$\begin{aligned} \boldsymbol{x} _*=&(w ,L, d, h) \\=& (0.20572963978, 3.47048866563, 9.03662391036, 0.20572963979),\qquad \end{aligned}$$
(48)

with

$$ f(\boldsymbol{x} *)_{\min} = 1.72485230859. $$
(49)

This solution is exactly the same as those in the literature [4]

$$ f_*=1.724852 \quad\textrm{at}\ (0.205730, 3.470489, 9.036624, 0.205729). $$
(50)

We have also solved this problem using BA, and we got exactly the same solution.

4.7 Pressure Vessel Design

Pressure vessels are literally everywhere; some examples are champagne bottles and gas tanks. For a given volume and working pressure, the basic aim of designing a cylindrical vessel is to minimize the total cost. Typically, the design variables are the thickness d 1 of the head, the thickness d 2 of the body, the inner radius r and the length L of the cylindrical section [4, 38]. This is a well-known test problem for optimization and it can be written as

$$ \textrm{minimize}\quad f(\boldsymbol{x} ) = 0.6224 d_1 r L + 1.7781 d_2 r^2 + 3.1661 d_1^2 L + 19.84 d_1^2 r, $$
(51)

subject to the following constraints:

$$ \begin{array}{l} g_1(\boldsymbol{x} ) = -d_1 + 0.0193 r \le0, \\ g_2(\boldsymbol{x} ) = -d_2 + 0.00954 r \le0, \\ g_3(\boldsymbol{x} ) = - \pi r^2 L -\displaystyle\frac{4 \pi}{3} r^3 + 1296000 \le0, \\ g_4(\boldsymbol{x} ) =L -240 \le0. \end{array} $$
(52)

The simple bounds are

$$ 0.0625 \le d_1, d_2 \le99 \times0.0625, $$
(53)

and

$$ 10.0 \le r, \ L \le200.0. $$
(54)

We have used all three algorithms (FA, CS, BA) to solve this problem, and they all found the same solution f ≈6,059.714 at

$$ \boldsymbol{x} _* \approx(0.8125, \; 0.4375, \; 42.0984, \; 176.6366), $$
(55)

which is the same as the one obtained by Cagnina et al. [4]. This means that the lowest price is about $6,059.71.

4.8 Gearbox Design

Another important benchmark is the design of a speed reducer which is commonly used in many mechanisms such as a gearbox [14]. This problem involves the optimization of seven variables, including the face width, the number of teeth, the diameter of the shaft and others. All variables are continuous within some limits, except x 3 which only takes integer values. We have

$$\begin{aligned} f(\boldsymbol{x} ) =& 0.7854 x_1 x_2^2 \bigl(3.3333 x_3^2+14.9334 x_3-43.0934\bigr) \\&{}-1.508 x_1 \bigl(x_6^2+x_7^2 \bigr)+7.4777 \bigl(x_6^3+x_7^3 \bigr) +0.7854 \bigl(x_4 x_6^2+x_5 x_7^2\bigr),\qquad \end{aligned}$$
(56)

subject to

$$\begin{aligned} &g_1(\boldsymbol{x} ) = \frac{27}{x_1 x_2^2 x_3}-1 \le0, \end{aligned}$$
(57)
$$\begin{aligned} &g_2(\boldsymbol{x} ) = \frac{397.5}{x_1 x_2^2 x_3^2}-1 \le0, \end{aligned}$$
(58)
$$\begin{aligned} &g_3(\boldsymbol{x} )=\frac{1.93 x_4^3}{x_2 x_3 x_6^4} - 1 \le0, \end{aligned}$$
(59)
$$\begin{aligned} &g_4(\boldsymbol{x} )=\frac{1.93 x_5^3}{x_2 x_3 x_7^4} - 1 \le0, \end{aligned}$$
(60)
$$\begin{aligned} &g_5(\boldsymbol{x} ) =\frac{1.0}{110 x_6^3} \sqrt{ \biggl( \frac{745.0 x_4}{x_2 x_3} \biggr)^2+16.9 \times10^6} -1 \le0, \end{aligned}$$
(61)
$$\begin{aligned} &g_6(\boldsymbol{x} ) =\frac{1.0}{85 x_7^3} \sqrt{ \biggl( \frac{745.0 x_5}{x_2 x_3} \biggr)^2+157.5 \times10^6} -1 \le0, \end{aligned}$$
(62)
$$\begin{aligned} &g_7(\boldsymbol{x} ) = \frac{x_2 x_3}{40} -1 \le0, \end{aligned}$$
(63)
$$\begin{aligned} &g_8(\boldsymbol{x} ) = \frac{5 x_2}{x_1} -1 \le 0, \end{aligned}$$
(64)
$$\begin{aligned} &g_9(\boldsymbol{x} )=\frac{x_1}{12 x_2} -1 \le 0, \end{aligned}$$
(65)
$$\begin{aligned} &g_{10}(\boldsymbol{x} ) =\frac{1.5 x_6+1.9}{x_4}-1 \le0, \end{aligned}$$
(66)
$$\begin{aligned} &g_{11}(\boldsymbol{x} ) =\frac{1.1 x_7+1.9}{x_5}-1 \le0, \end{aligned}$$
(67)

where the simple bounds are 2.6≤x 1≤3.6, 0.7≤x 2≤0.8, 17≤x 3≤28, 7.3≤x 4≤8.3, 7.8≤x 5≤8.4, 2.9≤x 6≤3.9, 5.0≤x 7≤5.5.

The best result in the literature [4] is

$$ \boldsymbol{x} _*=(3.5, 0.7, 17, 7.3, 7.8, 3.350214,5.286683), $$
(68)

with f min=2996.348165.

By using FA and BA as well as CS, we have obtained a new best result:

$$ \boldsymbol{x} _*=(3.5, 0.7, 17, 7.3, 7.8, 3.34336449,5.285351) $$
(69)

with the best objective f min=2993.7495888.

4.9 Simulation-Driven Shape Optimization

Heat management, thus heat transfer modelling, is very important for many electronic applications, especially those using large-scale integrated circuits. In fact, nanoscale heat transfer is a challenging area, and topological optimization for designing nanoscale device is even more challenging [8, 28, 43]. For example, Evgrafov et al. proposed a topology optimization benchmark for a nanoscale heat-conducting system with a size of 150 nm by 150 nm [8]. Heat transfer can occur at many different scales, though smaller scales may be more difficult to control. Now we extend this to a unit of area of 1 mm by 1 mm, and the aim is to distribute two different materials so as to maximize the temperature difference |T A T B | at these two points A and B under the boundary conditions given in Fig. 1 where the top and bottom boundaries are symmetric. Obviously, if there is only one type of material, then T A =T B can be expected at the steady state, due to symmetry in the system configuration. However, two types of different materials will change this into a tough shape optimization problem.

Fig. 1
figure 1

The topology optimization benchmark to maximize |T A T B |

Two materials used in the design of the unit area have heat diffusivities of K 1 and K 2, respectively. In addition, K 1K 2. For example, for Si and Mg2Si, K 1/K 2≈10. The domain is continuous, and the objective is to distribute the two materials such that the difference |T A T B | is as large as possible.

By dividing the domain into 40×40 small grids and using CS to search for possible design solutions, an optimal shape and distribution of materials is obtained, as shown in Fig. 2, where Si is shown in light blue and Mg2Si is shown in red.

Fig. 2
figure 2

Optimal topology and distribution of two different materials

For each configuration generated during the search process, the temperature distribution is estimated using the finite difference method by solving the heat conduction equation with varied material conductivities so that the temperature difference at the two fixed points is as large as possible.

5 Challenges and Further Research Topics

Despite the huge success of optimization and extensive applications, many challenging issues must be addressed in the near future. As an optimization process typically involves an optimization algorithm and a simulator, two main issues naturally arise: the efficiency of the algorithm and the efficiency and accuracy of the numerical simulator. Obviously, we try to use the most efficient algorithms available, but the actual efficiency of an algorithm may depend on many factors such as the inner working of the algorithm, the available information (such as objective functions and their derivatives) and implementation details. The associated issue is how to assign the right algorithms to a given problem, which is not easy to solve. In fact, for some highly nonlinear problems, there may not be any efficient algorithm at all. One well-known case is the travelling salesman problem, which is hard in the non-deterministic polynomial-time (NP) sense, that is, NP-hard. There is no efficient algorithm to deal with these types of problems.

The efficiency of a simulator or solver is even more complicated, depending on the actual numerical methods used and the complexity of the problem of interest. Straightforward optimization of a given objective function is not always practical. If the objective function comes from a computer simulation, it may be computationally expensive, noisy or non-differentiable. In such cases, a surrogate-based optimization algorithm may be a very useful alternative [18]. The surrogate model can be typically constructed from the sampled data of the original objective function; however, this surrogate model should reasonably be cheap, smooth and easy to optimize, and yet accurate enough so that it can produce a good prediction of the function’s optimum [17]. A challenging issue is how to construct good surrogate models that have good fidelity and yet can save sufficient computational efforts.

On the other hand, it may be no exaggeration to say that metaheuristics have had great success in solving various tough optimization problems. However, there are many important questions which remain unanswered.

First, an important issue to be addressed in any metaheuristic algorithm is how to provide a good balance between local intensification and global diversification [29, 30]. At present, different algorithms uses various techniques and mechanisms with various parameters to control this, which may be far from optimal. Is there any optimal way to achieve this balance? If yes, how? If not, what is the best we can achieve?

Second, it is still only partly understood why different components of heuristics and metaheuristics interact in a coherent and balanced way so that they produce efficient algorithms which converge under the given conditions. For example, why does a balanced combination of randomization and a deterministic component lead to a much more efficient algorithm (than a purely deterministic and/or a purely random algorithm)? How can we measure or test if a balance is reached? How can we prove that the use of memory can significantly increase the search efficiency of an algorithm? Under what conditions?

Finally, most applications in the current literature have been tested against toy problems or small-scale benchmarks with a few design variables or at most problems with several dozen variables. In real-world applications, many design problems in engineering, business and industry may involve thousands or even millions of variables. We have not seen case studies for such large-scale problems in the literature. A crucial issue is that there is no indication that the methodology that works for such toy benchmarks will work equally well for large-scale problems. Apart from the difference in the problem size, there may be other fundamental differences for large-scale problems, and thus the methodology could be very different [33].

Such challenges still remain unresolved, both in theory and in practice. These important issues also provide golden opportunities for researchers to rethink the existing methodology and approaches, perhaps more fundamentally. We can expect that some significant progress will be made in the next ten years.