Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

One goal of this book is to empirically answer the question of how efficient ES are in a setting of few function evaluations with a focus on modern ES from Sect. 2.2.2. This chapter addresses the experiments conducted and is organized as follows. Section 4.1 introduces two measures to evaluate the efficiency of ES, the fixed cost error (FCE) and the expected run time (ERT). Section 4.2 describes how the experiments were conducted technically and how they are examined. The results are presented and discussed in Sect. 4.3.

4.1 Measuring Efficiency

An ES is considered efficient in this book if it approaches the optimum f (see Eq. 2.4) quickly, i.e., by using as few function evaluations as possible. In order to compare different ES, a measure of efficiency for the convergence properties of an ES is needed. Figure 4.1 shows a sample convergence plot for five optimization runsFootnote 1 of an ES. The x-axis of the convergence plot represents the number of function evaluations. The y-axis shows the base ten logarithm of the difference between the currently best function value and the optimum f . This difference will be called \(\Delta {f}^{{\ast}}\) in the following. For a plus-strategy the graph is monotonically decreasing. To achieve monotonicity for a comma-strategy as well, one uses the best evaluated individual found so far for the calculation of \(\Delta {f}^{{\ast}}\) instead of the best individual of the current generation.

Fig. 4.1
figure 1

Example of a convergence plot

Appendix D.3 in [34] describes two opposing approaches for deriving an efficiency measure from these convergence plots. On the one hand there is the fixed-cost view, on the other hand there is the fixed-target view. The fixed-cost view operates with a fixed number of function evaluations and yields the fixed-cost error (FCE) covered in detail in Sect. 4.1.1. The other approach leads to the expected runtime measure (ERT) which is used in the BBOB benchmarking framework and described in Sect. 4.1.2.

4.1.1 The FCE Measure

FCE measures \(\Delta {f}^{{\ast}}\) given a fixed number of function evaluations. Considering the convergence plot in Fig. 4.1 this approach is implemented graphically by drawing a vertical line. The FCE values are represented by the intersections between the convergence graphs and the vertical line. FCE is of relevance for industrial applications demanding a maximal run-time hence a fixed number of function evaluations. In [34] the relevance of FCE is acknowledged but the use of FCE is rejected because it does not allow for a quantitative interpretation. The lack of interpretation stems from the fact that the ratio between two \(\Delta {f}^{{\ast}}\) from two algorithms cannot quantify how much better one algorithm is than another. Nevertheless, a qualitative interpretation is possible. On the basis of FCE one can answer the question of which algorithm yields a smaller FCE and as a result is better. Since optimization runs with ES are influenced by random effects, both during initialization and running period, the FCE of an algorithm has to be measured based on many independent optimization runs. Then, the FCE of different algorithms can be analyzed with statistical techniques to find significant differences in quality. This analysis is described in Sect. 4.2.3.

4.1.2 The ERT Measure

BBOB uses ERT as the measure for benchmarking algorithms. It was introduced in [49] as expected number of function evaluations per success and further analyzed under the name success probability (SP) [4]. ERT is the expected number of function evaluations needed to reach a given \(\Delta {f}^{{\ast}}\). Graphically, ERT consists of an intersection between a convergence graph as shown in Fig. 4.1 and a horizontal line representing a fixed \(\Delta {f}^{{\ast}}\). With this approach there might be optimization runs which do not reach the given \(\Delta {f}^{{\ast}}\) within a finite amount of function evaluations. These runs are considered unsuccessful and are rated with the run-time r us . A successful run is rated with the number of function evaluations to reach \(\Delta {f}^{{\ast}}\), i.e., the run-time r s . The ratio of successful runs to all runs yields the value p s . Let R s and R us be the mean value of the r s and r us from different runs, then for p s > 0 ERT is defined as:

$$\displaystyle{ \mbox{ ERT } = R_{s} + \frac{1 - p_{s}} {p_{s}} R_{us} }$$

If there are unsuccessful runs, i.e., p s < 1, then ERT strongly depends on the termination criteria of the algorithm and the value of r us . Considering optimization runs with very few function evaluation easily leads to \(p_{s} = 0\) when using common values for \(\Delta {f}^{{\ast}}\). So, to use ERT in this scenario, appropriate values for \(\Delta {f}^{{\ast}}\) have to be found first.

4.2 Experiments

4.2.1 Selection of Algorithms

Not all modern ES algorithms covered in Sect. 2.2.2 were subject to an empirical analysis. Since the focus of this book is on optimization runs with very few function evaluations, the restart heuristics were omitted. They can be better analyzed by long-running optimizations. Interesting results from such runs conducted by the authors of the algorithms are summarized in Sect. 3.2.2. In addition, five algorithms developed before 1996, described in Sect. 2.2.1, are included in the experiments. The complete list of algorithms which were used in the experiments is shown in Table 4.3 in Sect. 4.2.2.2.

4.2.2 Technical Aspects

4.2.2.1 Framework

The experiments are performed using the framework BBOB 10.2 [34]. It provides standardized test functions and interfaces to the programming languages C, C++, Java and Matlab/Octave. Having an implementation of an algorithm in one of these languages allows us to conduct experiments with minimal organizational effort on a set of test functions F, a set of function instances FI and a set of dimensions D for the input space. The set FI controls the number of independent runs for a test function. Let \(R = F \times D \times FI\), then |R| runs are conducted in total. A run, i.e., an element of R, yields a table indexed by the number of function evaluations used, containing information regarding the optimization run. This information is the difference between the current noise free fitness and the optimum and the difference between the best measuredFootnote 2 noise free fitness and the optimum. For small dimensionality the input values x yielding the current fitness are displayed as well. BBOB provides Python scripts for post-processing these tables.

Runs are conducted on all 24 noise free test functions. A detailed description of the test functions can be found on the BBOB web page.Footnote 3 The global optima of all test functions are located inside the hyperbox \({\left [-5,5\right ]}^{n}\). The test functions can be classified by different features. A test function can be uni- or multimodal, i.e., having only one or multiple (local) optima. Multimodal functions allow the global optimization capabilities of an algorithm to be benchmarked. Furthermore, a test function can be symmetric, i.e., invariant under rotations of the coordinate system. The condition of a function can be interpreted as a reciprocal measure of its symmetry und depends on the condition of an optimal covariance matrix (see Sect. 2.2.2.2). A more vivid description is that a function with a high condition has a fitness landscape with very steep valleys. Table 4.1 provides a summary of all 24 test functions with their commonly used names and some of their features.

Table 4.1 BBOB test functions

Considering their features, test functions can be classified. The discrimination into separable and non-separable and unimodal and multimodal functions are of special interest. Table 4.2 shows this distribution of test functions across these four classes.

Table 4.2 Classification of test functions

Unimodal test functions have a unique optimum which make them suitable for testing convergence properties of an algorithm without interferences stemming from stagnation in local optima. Multimodal test functions are especially useful for testing restart heuristics or algorithms designed to escape a local optimum. Since real fitness functions are usually multimodal, multimodal functions comply better with real-world optimization scenarios than unimodal functions.

Separable functions allow the optimization run to be split into n independent one-dimensional optimizations. In contrast to this, non-separable functions cannot be optimized this way and for them it is advantageous to apply an ES using correlated mutations. In general, non-separable multimodal functions are far more difficult to solve and hence serve better as an idealization of real-world problems.

According to [34], 15 runs are sufficient to observe significant differences when comparing 2 algorithms. For the analysis based on the FCE measure a best-of-n approach (described in Sect. 4.2.3) is used. In order to observe significant differences with this approach as well, the number of runs per test function, defined by the function instances in BBOB, is increased to 100. BBOB recommends a maximum number of function evaluations of 106n. Since the focus is on optimization tasks allowing only very few function evaluations, a drastically decreased maximum number of function evaluations of 500 ⋅ n is chosen. The experiments are conducted with dimensions \(n \in \{2,5,10,20,40,100\}\). For the dimensions n = 40 and n = 100 the maximum number of function evaluations is reduced to 104. The initial search point is drawn uniformly from the hyperbox [−5,5]n.

4.2.2.2 Software for ES Algorithms

The BBOB framework is used with its interface to the Matlab/Octave programming language. If there are publicly available implementationsFootnote 4 by the authors of the ES algorithms, they are used. For most of the ES an original implementation was created. Table 4.3 provides an overview of the implementation used.

Table 4.3 Summary of ES implementations

All original implementationsFootnote 5 represent the pseudocode of the algorithms, as listed in Chap. 2, for Octave [23]. Furthermore, these implementations are capable of constraining the search space to a hyperbox (see Eq. 2.5). For this purpose a transformation as described in [43] is applied individually to the coordinates of a search point. The transformed value x′ of \(x \in \mathbb{R}\) subject to the lower bound l and the upper bound u is calculated as follows:

$$\displaystyle{ x\prime = l + (u - l){ \frac{2} {\pi }\sin }^{-1}(\vert \sin \left (\frac{\pi (x - l)} {2(u - l)} \right )\vert ) }$$

Simply speaking, the transformation performs a reflection at the bounds. An optimization run is terminated if either the maximum number of function evaluations is reached or the fitness falls below a given target value. These two values can be configured by parameters in all the original implementations. The exogenous parameters of the different ES algorithms are configured with their default settings as described in Sect. 2.2.

4.2.3 Analysis

In the following, the procedure for evaluating the empirical test results is outlined.

4.2.3.1 Calculating FCE from Empirical Results

The basis for the calculation of FCE is the tables described in Sect. 4.2.2.1, which are called BBOB data in the following. The BBOB data contains tuples \((\#fe,\Delta {f}^{{\ast}})\), which consist of # f e, the number of function evaluations performed, and \(\Delta {f}^{{\ast}}\), the so-far bestFootnote 6 difference from the optimum f . There is not necessarily a tuple for every \(\#fe \in \{1,\ldots,\#fe_{max}\}\) in the BBOB data. Let \(I \subset \{1,\ldots,\#fe_{max}\}\) be the subset of existing # f e values in the BBOB data with C t as target costs, then FCE is calculated as follows:

$$\displaystyle{ \mbox{ FCE}(C_{t}) = \Delta {f}^{{\ast}}\mbox{ from the tuple }(\#fe,\Delta {f}^{{\ast}})\mbox{ with }\#fe =\sup \{e\vert e \in I \wedge e \leq C_{ t}\} }$$

Simply speaking, the FCE for a specific C t is based on the closest C t available in the BBOB data, which is smaller than or equal to the desired C t . In this way, the performance of an algorithm might be underestimated but is not overestimated.

4.2.3.2 Calculating Rankings

Conducting m = 100 runs for each test function f and each dimension n yields a set E(f,n,C t ) containing m FCE(C t ) values. For each algorithm a the sets \(E(f,n,C_{t})_{a}\) can be analyzed pairwise with non-parametric statistical tests [36] to find significant differences in their FCE. We use unpaired Welch Student’s t-tests [69] to decide whether one algorithm is better than another.Footnote 7 The difference between the mean of \(E(f,n,C_{t})_{1}\) and the mean of \(E(f,n,C_{t})_{2}\) is considered significant for a p-value < 0.05 and the algorithm with the better mean FCE is declared the winner and gets a point. Doing so pairwise for all algorithms, the algorithms are ranked according to the number of points obtained.

In [24] two relevant optimization scenarios are described. In the first one the user has the opportunity to choose the best run out of several runs. For this purpose an algorithm with a good peak performance, i.e., an algorithm which performs very well sometimes but its general performance is highly variant, is appropriate. In the second scenario only one optimization run is done. This requires an algorithm to have a good performance without much variation. This kind of performance is called the average performance of an algorithm. To reflect these two scenarios in our analysis, we will use a best-of-k approach. Instead of using all m runs to create the set E(f,n,C t ) only the best out of k runs can be used. This reduces the cardinality of E(f,n,C t ) to \(\lfloor \frac{m} {k} \rfloor \). The analysis regarding the average performance of an algorithm is done with k = 1. For the peak performance we have to choose an appropriate k. The resulting set E(f,n,C t ) must not be too small in order to apply statistical testing for significant differences. We choose a best-of-k approach with k = 5 to rank the algorithms regarding their peak performance.

4.2.3.3 Selection of Test Functions

Until now the sets E were dependent on one test function. In order to calculate a rank aggregation for a set of test functions, the points won by an algorithm for each test function within the set are accumulated before determining the aggregated ranking. Aggregated rankings are calculated for the classes of test functions as assigned in Table 4.2.

4.2.3.4 Choice of Target Costs C t

Following the motivation of this work small values for target costs C t are chosen. C t should be dependent on the dimension n to facilitate the interpretation of results for different dimensions. BBOB recommends 106n for long runs thus establishing a linear dependency. We choose to analyze results for C t = α ⋅ n with \(\alpha \in \{25,50,100\}\) instead, i.e., our focus is on much smaller values for C t .

4.3 Results

4.3.1 Ranks by FCE

The following figures show rankings aggregated for the four function classes as described in Sect. 4.2.2.1. Each ranking is displayed for all dimensions. Instead of using the rank, the number of significant wins over other algorithms divided by the number of test functions per class is shown on the y-axis. This kind of normalization allows the plots for different function classes to be compared. With 14 algorithms tested an algorithm can achieve at most 13 significant wins. This representation also has the advantage of showing how clearly an algorithm wins or loses against others. The aggregated ranking over all 24 test functions is given in Table 4.4.

4.3.2 Discussion of Results

Based on the results we are able to answer two questions regarding optimization scenarios with very few function evaluations. The first one is: Are there significant differences in the convergence properties of Evolutions Strategies with few function evaluations? In general this question can be answered positively. Even 25 ⋅ n function evaluations are sufficient to observe significant differences. As can be seen in Figs. 4.24.7 there are hardly any significant differences in algorithm performance for non-separable, multimodal test functions with dimension n = 2. An explanation for this behaviour is given by the fact that the variance of the Euclidian distance between the initial search point and the global optimum in the search space decreases with the dimensionality.Footnote 8 That means for n = 2 the variance of the differences is relatively high and the initialization of the search point impacts the results too much for us be able to see more significant differences

Table 4.4 Aggregated rankings over all 24 test functions for C t = 100 ⋅ n. Columns p show ranks for the peak performance (best-of-5) and columns a represent ranks for the average performance (best-of-1)
Fig. 4.2
figure 2

Ranking for \(C_{t} = 100 \cdot n\) with best-of-1 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: (1,4 m s)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

Fig. 4.3
figure 3

Rankings for C t = 100 ⋅ n with best-of-5 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: \((1,4_{m}^{s})\)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

Fig. 4.4
figure 4

Rankings for C t = 50 ⋅ n with best-of-1 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: (1,4 m s)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

Fig. 4.5
figure 5

Rankings for C t = 50 ⋅ n with best-of-5 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: (1,4 m s)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

Fig. 4.6
figure 6

Rankings for C t = 25 ⋅ n with best-of-1 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: (1,4 m s)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

Fig. 4.7
figure 7

Rankings for C t = 25 ⋅ n with best-of-5 approach. 1: (1 + 1)-ES, 2: (μ,λ)-MSC-ES, 3: (μ W ,λ)-CMA-ES, 4: Active-CMA-ES, 5: (1,4 m s)-CMA-ES, 6: xNES, 7: DR1, 8: DR2, 9: DR3, A: (μ,λ)-CMSA-ES, B: (1 + 1)-Cholesky-CMA-ES, C: LS-CMA-ES, D: (1 + 1)-Active-CMA-ES, E: sep-CMA-ES

in the convergence behaviour of the algorithms tested. According to the ranking aggregated over all 24 test functions as shown in Table 4.4, the Active-CMA-ES is clearly the best evolution strategy for optimization scenarios with few function evaluations, followed by the (μ W ,λ)-CMA-ES in second place. This result holds regardless of whether we analyze the peak or average performance. The sep-CMA-ES clearly ranked last in these experiments.

The second question, whether there are Evolution Strategies which are better given many function evaluations but are beaten given few function evaluations, can also be answered positively in some cases. For target costs C t = 100 ⋅ n the Active-CMA-ES or the (μ W ,λ)-CMA-ES usually rank best. Decreasing the target costs to 25 ⋅ n or 50 ⋅ n results in several (1 + 1)-strategiesFootnote 9 being found with good rankings, especially for unimodal functions. With the peak performance approach the (1 + 1)-Cholesky-CMA-ES and the (1 + 1)-ES rank first, sometimes even for multimodal functions. The CMA-ES variants catch up with more function evaluations, which can be explained by the time needed to adapt the covariance matrix successfully.

Despite using only anisotropic mutations with local step sizes the DR2 algorithm performs quite well. It often ranks directly behind the successful CMA-ES variants. Thus, it offers a better alternative to the sep-CMA-ES when the runtime of the algorithm cannot be neglected w.r.t. the time for a function evaluation, which might be the case for very high dimensional search spaces.

4.4 Further Analysis for n = 100

As the last section illustrates, several (1 + 1)-ES algorithms outperform CMA-ES variants considered state of the art when it comes to very few function evaluations. In industrial optimization scenarios, where function evaluations are extremely time consuming, we are interested in quick progress rather than finding the exact global optimum, or even converging to a local optimum.

A more thorough analysis for search space dimension n = 100 reflecting these scenarios was also conducted. The experiments described in the last section used the performance measure FCE based on the distance to the global optimum \(\Delta {f}^{{\ast}}\) to quantify progress of an algorithm. In order to reflect the scenario of quick progress we chose to measure the progress made w.r.t. the initial search point instead of using \(\Delta {f}^{{\ast}}\) directly. So, the \(\Delta f_{\mbox{ init}}^{{\ast}}\) of the function evaluation of the inital search point is used to normalize the \(\Delta {f}^{{\ast}}\) of later iterations yielding monotonically decreasing progress values.Footnote 10 Based on these values we can state by which order of magnitude an algorithm decreases the initial fitness value for a given test function after a given number of function evaluations. In order to decrease the influence of the initial search point the number of runs is increased from 100 used in the previous section to 1,000 for each of the 14 algorithms and each of the 24 test functions. As an example, Fig. 4.8 shows the resulting convergence plot for test function f 1.

Fig. 4.8
figure 8

Convergence plot for test function f 1 (sphere function) showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs

As in the analysis in Sect. 4.3.1 we used the non-parametric Student’s t-test to find significant differences between the algorithms tested. According to these significant differences we are able to rank the algorithms for the four test function classes shown in Table 4.2.

Table 4.5 Aggregated rankings over the separable unimodal test functions for target costs\(C_{t} = \{100,200,\ldots,1,000\}\)
Table 4.6 Aggregated rankings over the non-separable unimodal test functions for target costs \(C_{t} = \{100,200,\ldots,1,000\}\)
Table 4.7 Aggregated rankings over the separable multimodal test functions for target costs\(C_{t} = \{100,200,\ldots,1,000\}\)
Table 4.8 Aggregated rankings over the non-separable multimodal test functions for target costs \(C_{t} = \{100,200,\ldots,1,000\}\)

The results of this additional test are summarized in Tables 4.54.8 for the four different classes of objective functions. In addition, the corresponding convergence plots for all objective functions are provided in Figs. 4.84.31. The following observations can be made when analyzing the results:

  • As clarified by the rankings, the (1 + 1)-Active-CMA-ES most often ranks first, regardless of the function class (with the exception of separable multimodal functions and large values of C t , for which DR2 is the best algorithm). In general, the (1 + 1)-algorithms, even including the simple (1 + 1)-ES, perform quite well. It seems that adapting endogenous search parameters in the beginning more frequently with less information is better than less frequently with more information as is the case in population-based strategies.

  • On non-separable, multimodal test functions, the (1 + 1)-Active-CMA-ES is the clear winner, followed by the (1 + 1)-ES. Similar performance can be observed for the other function classes.

  • The convergence plots for the different functions indicate that, for the more complicated functions (e.g., f 21, f 22), progress in the beginning is very slow and accelerates later on. In contrast to this, on easier unimodal functions such as f 1 the algorithms generally converge much faster (up to three orders of magnitude improvement) after 1,000 function evaluations, and the progress rate is already high during the first 100 evaluations.

In conclusion, the (1 + 1)-Active-CMA-ES is a good recommendation for a small function evaluation budget (i.e., up to 10 ⋅ n) and high-dimensional problems in general. Especially for non-separable, multimodal test functions, it consistently shows the best performance, and for the unimodal functions it fails to win in

Fig. 4.9
figure 9

Convergence plot for test function f 2 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.10
figure 10

Convergence plot for test function f 3 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.11
figure 11

Convergence plot for test function f 4 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.12
figure 12

Convergence plot for test function f 5 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.13
figure 13

Convergence plot for test function f 6 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.14
figure 14

Convergence plot for test function f 7 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.15
figure 15

Convergence plot for test function f 8 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.16
figure 16

Convergence plot for test function f 9 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.17
figure 17

Convergence plot for test function f 10 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.18
figure 18

Convergence plot for test function f 11 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.19
figure 19

Convergence plot for test function f 12 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.20
figure 20

Convergence plot for test function f 13 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.21
figure 21

Convergence plot for test function f 14 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.22
figure 22

Convergence plot for test function f 15 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.23
figure 23

Convergence plot for test function f 16 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.24
figure 24

Convergence plot for test function f 17 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.25
figure 25

Convergence plot for test function f 18 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.26
figure 26

Convergence plot for test function f 19 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.27
figure 27

Convergence plot for test function f 20 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.28
figure 28

Convergence plot for test function f 21 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.29
figure 29

Convergence plot for test function f 22 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.30
figure 30

Convergence plot for test function f 23 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

Fig. 4.31
figure 31

Convergence plot for test function f 24 showing the order of magnitude of fitness value normalized w.r.t. the fitness of the inital search point for the number of function evaluations \(\{100,200,\ldots,1,000\}\). Error bars reflect the 20 % respectively 80 % quantiles of the 1,000 conducted runs and the solid line represents their mean

only two cases, for C t = 100. The very simple (1 + 1)-ES performs surprisingly well, especially on unimodal functions. On multimodal test functions, the simple DR2 strategy also performs reasonably well, but not for unimodal test functions. Overall, the (1 + 1)-Active-CMA-ES is clearly recommendable due to its consistent performance across all functions tested.