Keywords

1 Introduction

During the past four decades, many heuristic and other derivative-free optimisers (DFOs) [1, 2] have been developed. They have been successfully applied to a wide range of real-world problems and are made available in MATLAB®, which is used daily by a significant number of practising engineers and scientists. With the development of an increasing number of DFOs beyond conventional optimisation means, it has become challenging to choose the most suitable ones for an engineer’s application at hand. To provide a practical review of these tools, this paper conducts and reports their benchmarking tests. Further, a scoring system for benchmarking DFOs/EAs is developed, such that a MATLAB user can easily use it without needing to learn the DFOs in depth when contemplating selecting one of MATLAB’s DFOs for his/her application.

At present, there exist the following DFO algorithms [1,2,3]:

Heuristics (a posteriori):

  1. (1)

    Random search;

  2. (2)

    Bayesian optimization;

  3. (3)

    Data-based Online Nonlinear Extremumseeker (DONE);

  4. (4)

    Simulated annealing (SA), a special case being heuristic hill-climbing;

  5. (5)

    Particle swarm optimization (PSO);

  6. (6)

    Genetic algorithm (GA);

  7. (7)

    Cuckoo search;

  8. (8)

    Other evolution algorithms (such as Evolution Strategy, Evolutionary Programming, Ant Colony Optimisation, etc.).

Direct Search (a priori):

  1. (9)

    Nelder-Mead Simplex method (Simplex);

  2. (10)

    Pattern search (PS);

  3. (11)

    Powell’s conjugate search (COBYLA, UOBYQA, NEWUOA, BOBYQA and LINCOA);

  4. (12)

    Coordinate descent and adaptive coordinate descent;

  5. (13)

    Multilevel Coordinate Search (MCS) algorithm.

Among these, the bolded algorithms (4)–(6), (9)–(10) are available as functions in MATLAB, and (11) is available as an open-source.m file [4]. We shall test these six algorithms, together with a special version of the GA, its elitist variant GAe. Benchmarks used in this paper are consistent with but exceed those used in measuring conventional optimisation algorithms reported in [5, 6].

This paper is not about DFO algorithm or their comparison studies per se. It is about helping engineers who wish to use MATLAB’s DFOs to choose one for his/her application at hand, and about providing an amalgamated selection criterion for this purpose. Section 2 of this paper presents the aforementioned benchmarks. Section 3 presents the test functions adopted. Benchmarking results are reported and analysed in Sect. 4. Conclusions are drawn in Sect. 5.

2 Uniform Benchmarks for Testing

For an optimisation problem with a single objective, suppose that its objective function (performance index, cost function for minimisation, or fitness function for maximisation) is:

$$ \varvec{f}\left( \varvec{x} \right):\varvec{X} \to \varvec{F} $$
(1)

where X ⊆ R n spans the entire search or possible solution space in n dimensions, x ∈ X represents the collective variables or parameters of n dimensions to be optimised, F ⊆ R m represents the m dimensional space of all possible values of m objective, and fF.

We prefer benchmark test functions have known theoretical solutions, including the minimum or maximum value of the objective function:

$$ \varvec{f}_{0 } = Max\,or\,Min\{ \varvec{f}(\varvec{x})|\varvec{x} \in \varvec{X}\} \in \varvec{F} $$
(2)

where \( \varvec{x}_{0} \in \varvec{X} \) satisfying:

$$ \varvec{f}(\varvec{x}_{0} ) = \varvec{f}_{0} $$
(3)

Normalised benchmarks used in this paper are defined in [5]. We choose five benchmarks, where four have been used in [4], as follow.

2.1 Optimality

Optimality represents the relative figure of merit of an objective value found \( \hat{\varvec{f}}_{0} \)[5]:

$$ \left. {\text{Optimality}} \right|_{a} = 1 - \frac{{\left\| {\varvec{f}_{0} - \hat{\varvec{f}}_{0} } \right\|_{a} }}{{\left\| {\bar{\varvec{f}} - \underline{\varvec{f}} } \right\|_{a} }} \in \left[ {0,\,1} \right] $$
(4)

where \( \bar{\varvec{f}} \) and \( \underline{\varvec{f}} \) are the lower and upper bounds of \( \varvec{f} \), which are known from proper test functions.

2.2 Accuracy

As the highest optimality reached could be from a local optimum, the benchmark Accuracy needs to be used, which represents the relative closeness of a solution found \( {\hat{\mathbf{x}}}_{0} \), to the theoretical solution, x 0. This may be particularly useful if the solution space is noisy, there exist multiple optima or ‘niching’ is used. It is defined as [5]:

$$ {\text{Accuracy}} = 1 - \frac{{\left\| {\varvec{x}_{0} - \hat{\varvec{x}}_{0} } \right\|}}{{\left\| {\bar{\varvec{x}} - \underline{\varvec{x}} } \right\|}} \in \left[ {0,1} \right] $$
(5)

where \( \underline{\varvec{x}} \) is the lower bounds of x and \( \bar{\varvec{x}} \) is the upper bounds, [\( \underline{\varvec{x}} \) \( \bar{\varvec{x}} \)] represent the search range, which all should be known from a proper test function.

2.3 Convergence

Reach Time. The most frequently used measure of convergence is the number of function evaluations. The stop condition is usually set to when there is little change, less than 1e-6. But it could convergence very quickly ending at local extreme point. Hence the optimality is also evaluated in convergence as reach-time [5]:

$$ {\text{Reach time}}|_{b} = C^{b} $$
(6)

where \( C^{b} \) represent the total number of function evaluations conducted by which the optimality of the best individual first reaches b ∈ [0, 1].

NP Time. To estimate the order of the polynomial \( C^{0.99999} \) may be plotted against the number of parameters being optimized, n, as revised in:

$$ NP - {\text{time}}\left( n \right) = C^{0.999999} \left( n \right) $$
(7)

Total Number of Evaluations. This is limited to:

$$ N = { \hbox{min} }\{ C^{0.999999} \left( n \right), 400mn^{2} ) $$
(8)

which implies that a benchmark test should terminate either when the goal has been reached or 20n generations with a population size of 20n × m have been evolved, i.e., the theoretical maximum number of function evaluations will not exceed \( 400mn^{2} \). Without loss of considerable generality, in this paper, only single-objective tests are considered, i.e., m = 1.

2.4 Optimizer Overhead

Alternative to or in addition to the ‘total number of evaluations’, the ‘total CPU time’ may be used in a benchmark test. More quantitatively, the optimizer overhead may be calculated by [1]:

$$ {\text{Optimiser}}\,{\text{overhead}} = \frac{{{\text{Total}}\,{\text{time}}\,{\text{taken}} - T_{\text{PFE}} }}{{T_{\text{PFE}} }} $$
(9)

where \( T_{\text{PFE}} \) is the time taken for pure function evaluations.

2.5 Sensitivity

When the values of optimal parameters found are perturbed, the optimality may well change. This affects the robustness of the solution and hence can impact on an engineering design. To measure how much ‘small’ relative change in the designed parameters (solution found) will lead to relative change in the quality (objective value found), sensitivity is defined as the ratio between these changes [5]:

$$ {\text{Sensitivity}} = \left. {\mathop {\lim }\limits_{{\left\| {\Delta \varvec{x}} \right\| \to 0}} \frac{{\left\| {\Delta \varvec{f}} \right\|/\left\| {\bar{\varvec{f}} - \underline{\varvec{f}} } \right\|}}{{\left\| {\Delta \varvec{x}} \right\|/\left\| {\bar{\varvec{x}} - \underline{\varvec{x}} } \right\|}}} \right|_{{\varvec{x} = \hat{\varvec{x}}_{0} }} = \left. {\mathop {\lim }\limits_{{\left\| {\Delta {\mathbf{x}}} \right\| \to 0}} \frac{{\left\| {\Delta \varvec{f}} \right\|}}{{\left\| {\Delta \varvec{x}} \right\|}}} \right|_{{\varvec{x} = \hat{\varvec{x}}_{0} }} \frac{{\left\| {\bar{\varvec{x}} - \underline{\varvec{x}} } \right\|}}{{\left\| {\bar{\varvec{f}} - \underline{\varvec{f}} } \right\|}} $$
(10)
$$ \approx \frac{{1 - {\text{Optimality}}}}{{1 - {\text{Accuracy}}}} $$
(11)

3 Benchmark Test Functions

3.1 Experimental Setup

In the experiments, we investigate the performance of the MATLAB (R2016b) heuristics SA, PSO, GA, and GAe, and direct-search optimisers Simplex, PS, and Powell.

Each experiment is repeated 10 times to obtain a mean value of each benchmark. All starting points of search are randomly generated, i.e., for SA, Simplex, PS and Powell, different runs start from the same initial. For PSO, GA and GAe, different runs start from the same initial population. In all runs, we kept the same stopping criteria as given by (8), for a fair comparison.

In order to obtain T PFE for each tested function, all the benchmark functions (as in Table 2) are run 400n 2 times and the time is taken as the average value from 10 runs in MATLAB.

3.2 Algorithmic Settings

The parameters are not manually set, but are taken default settings in MATLAB unless otherwise stated. Because it is not to ask the user to tune or test the MATLAB function, but to help them use it to solve their own, practising problem. For the Simplex, reflection coefficient is 1, the expansion coefficient is 2, the contraction coefficient is 0.5 and the shrink coefficient is 0.5. For GA, the population size is set as 10n, where n is the dimension number. The max generation times is 20n, so that the maximum function evaluation is 400n 2 as described in Sect. 2 about reach-time. The crossover function is using scattered which creates a random binary vector and selects the genes from two parents based on that binary vector. Crossover fraction is set as 0.8. Probability rate of mutation is 0.01. For GA with elitism, 5 out of 100 population are guaranteed to survive to the next generation. For SA, the initial temperature is set to 100, and the temperature is lowered following function of \( T = T_{0} \times 0.95^{50} \). For PSO, swarm size is 10n, and maximum iteration is set as 40n so that the maximum function evaluations is 400n 2.

3.3 Benchmark Functions

For comparison, four non-linear functions used in [7, 8] are used here in Table 1. It is a benchmark function and not a benchmark data set that is normally used in benchmarking EAs/DFOs. Also, benchmarking through function evaluation is closer to the real world that a MATLB user or a design engineer finds himself/herself in, where simulation results (which functional evaluation mimics) instead to data sets are used. All the functions are tested in 10 and 30 dimensions. Difference types of tested function are chosen for a comprehensive comparison. The first function is Quadric function, which is a unimodal function. The rest three functions are multimodal ones. The n-D Varying Landscape problem, which is also the only maximization problem in the four tested functions, that was introduced by Michalewicz [9] and further studied by Renders and Bersini [10].

Table 1. Tested functions.

The objective function f 2 (x) is, in effect, de-coupled in every dimension represented by \( f_{i} \left( {x_{i} } \right) = {\text{sin}}\left( {x_{i} } \right)sin^{2m} \left( {\frac{{ix_{i}^{2} }}{\pi }} \right) \). Every such member function is independent. The larger the product mn is, the sharper the landscape becomes. There are n! local maxima within the search space [0, π]n. The theoretical benchmark solution to this n-dimensional optimization problem may be obtained by maximizing n independent uni-dimensional functions, \( f_{i} \left( {x_{i} } \right) \), the fact of which is however unknown to an optimization algorithm being tested.

The third is Scaffer F6 function with minimum at zero. And the last one is a composition function of Schwefel’s function, Rastrigin’s function and High Conditioned Elliptic Function. The composition rate is 0.3, 0.3 and 0.4 respectively.

4 Benchmarking Results an Analysis

4.1 Compare Among Algorithms

The results reported in this section are summarized in Tables 2, 3, 4, 5, 6, 7, 8 and 9, grouped by the tested function. All the theoretical values are given at the bottom of each table for comparison. Performance scores are given by sorting every algorithm’s results on each indicator. For example, if the mean optimality result of PSO is the best among all 7 algorithms, the PSO is given a score of 7 on Optimality. Then, five sets of scores for all benchmarks of every algorithm are added to the last column in each result table.

Table 2. Powell best on the unimodal problem (10-D Quadric)
Table 3. Powell best on the unimodal problem (30-D Quadric)
Table 4. GAe best on the low-dimensional multimodal Varying Landscape Problem (10-D)
Table 5. GAe best on the high-dimensional multimodal Varying Landscape Problem (30-D)
Table 6. PSO best on the low-dimensional multimodal Scaffer Problem (10-D)
Table 7. GAe best on the high-dimensional multimodal Scaffer Problem (30-D)
Table 8. GAe best on the low-dimensional multimodal Composition Problem (10-D)
Table 9. GAe best on the high-dimensional multimodal Composition Problem (30-D)

Quadratic function f 1. The first test function is a unimodal function with only one minimum point in the search area [− 1.28 1.28]D. The search results are shown in Tables 2, 3 and Fig. 1. The algorithms are running for different dimension of 10 and 30. All seven tested algorithms had optimality up to 0.99 in 10-D. In lower dimension of 10, PSO and Powell [4] showed better performance with lower optimization overhead and better accuracy. In 30-dimensional problem, the Powell had the best score. SA has the most optimizer overhead for both 10-D and 30-D tests. Figure 1 shows convergence trace for each algorithm. For f 1, all the algorithms show a fast and relatively uniformed speed.

Fig. 1.
figure 1

Convergence traces of tested algorithms on unimodal f 1 minimisation

n -D Varying Landscape function f 2 . The problem described in second function has multiple extremes in [0, π]D. The objective is to maximise the function value, which is different from the other three tested function.

From Tables 4 and 5, NM Simplex and SA performances are not satisfactory. It stalled at local maximum point. GA and GAe, on the other hand, showed steady performance, though it took much more optimizer overhead to reach the maximum point than NM Simplex. None of the tested algorithms can reach to 0.9999 optimality in N times of function evaluations. GA and GAe have provided relatively better results than other. Figure 2 shows convergence trace for each algorithm. For f 2, GA, GAe and PSO can get closer to global extremum with more iterations.

Fig. 2.
figure 2

Convergence traces of tested algorithms on multimodal f 2 maximisation

Scaffer’s F6 function f 3 . Like the first and second functions, GA, PS and PSO show consistently better performance than other three algorithms. Figure 3 shows convergence track for each algorithm. For f 3 , PSO has a slower convergence speed than GAe, but can reach to the same level of extreme point.

Fig. 3.
figure 3

Convergence traces of tested algorithms on multimodal f 3 minimisation

However, all the algorithms show high overhead at this tested function, which suggested that optimizer overhead is not only related to algorithms itself, but also related to the fitness function.

Composition function f 4 . The composition function is complex which should be expected more optimizer overhead. However, except for SA, the overhead of other six algorithms are less than 20. The optimality all can reach to 0.99, except Simplex, while the accuracy in Simplex is 65.83% in 30-D. Overall, all the seven algorithms can locate the global extreme in the last tested function (Fig. 4).

Fig. 4.
figure 4

Convergence traces of tested algorithms on multimodal f 4 minimisation

4.2 Compare Among Functions

Because f 1 is a unimodal function. All the algorithms are performing well. In 30-D, the optimizer overhead of Powell is only 3.8695. For multimodal functions, GAe, PSO and PS methods have an overall better performance than others.

From the theoretical time of pure running of fixed function evaluations, f 4 took most long time and f 3 is the shortest. However, the optimizer overhead of all the algorithms in f 3 is the biggest while small in f 1 and f 4. Moreover, for NM Simplex, the overhead is relatively stable in both 10-D and 30-D, while the optimizer overhead decreased at 30-D for other methods. It is make sense that because GA and PSO are imitation of group activating in nature, which depends on big population to keep diversity. When the dimension increases, the population size is also increased.

4.3 Evaluations of Benchmarks

The mean optimality and mean accuracy represent relative closeness to the theoretical value. It would lost meaning when the range for x and y become very big. The relative value would be too small to show difference between compared algorithms.

The minimum value for function is 0, however, the maximum value is increased exponentially with dimension of n. In this case, the benchmark of optimality can’t reflect the distant of searching result to theoretical value. When the distance between f max and f min is too big, the optimality could be very close to 1 even the absolute distance between \( \hat{f}_{0} \) and \( f_{0} \). As it is shown in Tables 8 and 9. All the optimality can reach to 0.99, but the Simplex.

Though reach-Time or N evaluations of functions also has disadvantages, it gave a convergence speed reflected both convergence and optimality. But because of large range of f in f 4, PSO, GA Powell and PS stop before it reaches to its best solutions. Though the optimizer overhead in this test is very small, it is not the actual overhead time because algorithms stop early. For example, in Table 9, the times for function evaluation of Powell is only 1856, compare to N = 360000. The results could be better if the search continue. However, PSO stops search based on the reach time setting. Thus, it compromised its performance to fit the testing standard.

Benchmarks of optimization, accuracy, convergence, optimizer overhead and sensitivity have shown some advantages for a uniformed standard to compare. Its application still has limits (Table 10).

Table 10. Amalgamated rankings of heuristic and direct search DFOs

5 Conclusions

Using 4 commonly adopted benchmark testing problems, we have compared all 7 MATLAB DFO functions. The PS and Powell have shown good performance for unimodal problems, but have not delivered satisfactory performance for multimodal and high dimension problems. The GA and GAe have shown overall consistently good performance on all benchmark problems, with the second highest robustness (second lowest sensitivity). The PSO has offered the highest convergence speed and relatively lower optimizer overhead, though the optimality and accuracy were not as good as the GA and GAe.

Further, the direct-search numerical optimiser Powell is seen to be the best on the Quadratic Problem, the heuristic numerical PSO algorithm on the lower dimensional Scaffer Problem, the direct-search numerical optimiser PS on the lower dimensional Composition Problem, and the extra-numerical genotype evolutionary algorithm GAe the best on the Varying Landscape Problem and on the other two higher dimensional problems. Overall, the GAe offers the highest performance, followed by PSO and Powell’s optimisation methods. It is surprising to see that, while it has reached the top accuracy (implying the global optimum), the numerical PS with a floating-point resolution has not exceeded the optimality delivered by the GAe which has only a resolution of genotype encoding. This confirms the advantage of a population-based optimiser.

Five benchmarks and a scoring system for benchmarking DFOs/EAs have been developed in this paper, which provide a selection criterion for engineers who wish easily to choose one MATLAB DFO most suitable to their application at hand. Meanwhile, the benchmarking technique developed in this paper is not restricted by the engineers’ background and does not require them to learn these algorithms pre-requisite.