Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Hybridization has developed to an effective strategy in algorithm design. Hybrid algorithms can become more efficient and more effective than their native counterparts. This observation holds true for many problem classes, in particular in optimization, where hybrid techniques of meta-heuristics and local search are often called hybrid meta-heuristics. In this chapter, we show how Powell’s conjugate gradient search, which is a fast and powerful black box optimization strategy for convex problems, can be integrated into an ES [1]. Further, we show how to employ a specialized step size adaptation technique that allows to guide the optimization process and to escape from local optima that Powell’s method may successively find.

2 Iterated Local Search

Iterated local search (ILS) is based on a simple but successful idea. Instead of repeating local search and starting from initial solutions like restart approaches do, ILS begins with a solution \({\mathbf {x}}\) and successively applies local search and perturbation of the local optimal solution \(\hat{\mathbf {x}}\). This procedure is repeated iteratively until a termination condition is fulfilled. Algorithm 1 shows the pseudocode of the ILS approach. Initial solutions should use as much information as possible to be a good starting point for local search. Most local search operators are deterministic. Consequently, the perturbation mechanism should introduce non-deterministic components to explore the solution space. The perturbation mechanism performs global random search in the space of local optima that are approximated by the local search method. Blum and Roli [2] point out that the balance of the perturbation mechanism is quite important. Perturbation must be strong enough to allow the escape from basins of attraction, but weak enough to exploit knowledge from previous iterations. Otherwise, the ILS will become a simple restart strategy. The acceptance criterion of Line 6 may vary from always accept to only accept in case of improvement. Approaches like simulated annealing may be adopted.

figure a

There are many examples in literature for the successful application of ILS variants on combinatorial optimization problems. A survey of ILS techniques has been presented by Lourenco et al. [3]. The authors also provide a comprehensive introduction [4] to ILS. A famous combinatorial instance, many ILS methods have been developed for, is the traveling salesperson problem. Stützle and Hoos [5] introduced an approach that combines restarts with a specific acceptance criterion to maintain diversity for the TSP, while Katayama and Narihisa [6] use a perturbation mechanism that combines the heuristic 4-opt with a greedy method. Stützle [7] uses an ILS hybrid to solve the quadratic assignment problem. The technique is enhanced by acceptance criteria that allow moves to worse local optima. Furthermore, population-based extensions are introduced. Duarte et al. [8] introduce an ILS heuristic for the problem of assigning referees to scheduled games in sports based on greedy search. Our perturbation mechanism is related to their approach. Preliminary work on the adaptation of the perturbation algorithm has been applied by Mladenovic et al. [9] for variable neighborhood search and tabu search by Glover et al. [10].

3 Powell’s Conjugate Gradient Method

The hybrid ILS variant introduced in this chapter is based on Powell’s optimization method. Preliminary experiments revealed the efficiency of Powell’s method in comparison to continuous evolutionary search methods. However, in the experimental section, we will observe that Powell’s method can get stuck in local optima in multimodal solution spaces. An idea similar to the hybridization of local search has been presented by Griewank [11], who combines a gradient descent method with a deterministic perturbation term.

Powell’s method belongs to the class of direct search methods, i.e., no first or second order derivatives are required. It is based on conjugate directions and is similar to line search. The idea of line search is to start from search point \({\mathbf {x}} \in {\mathbb {R}}^N\) along a direction \({\mathbf {d}} \in {\mathbb {R}}^N\), so that \(f({\mathbf {x}} + \lambda _t {\mathbf {d}})\) is minimized for a \(\lambda _t \in {\mathbb {R}}^+\). Powell’s method [12, 13] adapts the directions according to a gradient-like information from the search.

figure b

It is based on the assumption of a quadratic convex objective function \(f({\mathbf {x}})\)

$$\begin{aligned} f({\mathbf {x}}) = \frac{1}{2}{\mathbf {x}}^T {\mathbf {H}} {\mathbf {x}} + {\mathbf {b}}^T{\mathbf {x}} + c. \end{aligned}$$
(5.1)

with Hessian matrix \({\mathbf {H}}\). Two directions \({\mathbf {d}}_i, {\mathbf {d}}_j \in {\mathbb {R}}^N, i \ne j\) are mutually conjugate, if

$$\begin{aligned} {\mathbf {d}}_i^T {\mathbf {H}} {\mathbf {d}}_j =0 \end{aligned}$$
(5.2)

holds with mutual conjugate directions that constitute a basis of the solution space \({\mathbb {R}}^N\). Let \({\mathbf {x}}_0\) be the initial guess of a minimum of function \(f\!\). In iteration \(t\), we require an estimation of the gradient \({\mathbf {g}}_t = {\mathbf {g}}({\mathbf {x}}_t)\). Let \(t=1\) and let \({\mathbf {d}}_t=-{\mathbf {g}}_t\) be the steepest descent direction. For \(t>1\), Powell applies the equation

$$\begin{aligned} {\mathbf {d}}_t = - {\mathbf {g}}_t + \beta _t {\mathbf {d}}_{t-1}, \end{aligned}$$
(5.3)

with the Euclidean vector norms

$$\begin{aligned} \beta _t = \frac{\Vert {\mathbf {g}}_t \Vert ^2 }{\Vert {\mathbf {g}}_{t-1} \Vert ^2 }. \end{aligned}$$
(5.4)

The main idea of the conjugate direction method is to search for the minimal value of \(f({\mathbf {x}})\) along direction \({\mathbf {d}}_t\) to obtain the next solution \({\mathbf {x}}_{t+1}\), i.e., to find the \(\lambda \) that minimizes

$$\begin{aligned} f({\mathbf {x}}_t + \lambda _t {\mathbf {d}}_t). \end{aligned}$$
(5.5)

For a minimizing \(\lambda _t\), set

$$\begin{aligned} {\mathbf {x}}_{t+1} = {\mathbf {x}}_t + \lambda _t {\mathbf {d}}_t. \end{aligned}$$
(5.6)

Algorithm 2 shows the pseudocode of the conjugate gradient method that is the basis of Powell’s strategy. In our implementation, the search for \(\lambda _t\) is implemented with line search. For a more detailed introduction, we refer to the depiction of Powell [12] and Schwefel [14].

At first, we analyze Powell’s method on the optimization test suite (cf. Appendix A). Solutions are randomly initialized in the interval \([-100,100]^N\). Each experiment is repeated \(30\) times. Powell’s method terminates, if the improvement from one to the next iteration is smaller than \(\phi = 10^{-10}\) with comma selection, or if the optimum is found with accuracy \(f_\mathrm{stop }= 10^{-10}\). As Powell’s method is a convex optimization technique, we expect that only the unimodal problems can be solved. Table 5.1 confirms these expectations. On unimodal functions, Powell’s method is exceedingly fast. On the Sphere problem with \(N=10\), a budget of only \(101.7\) fitness function evaluations in mean is sufficient to approximate the optimum. These fast approximation capabilities can also be observed on problems Doublesum and Rosenbrock, also for higher dimensions, i.e. \(N=30\).

Table 5.1 Experimental comparison of Powell’s method on the test problems with \(N=10\) and \(N=30\) dimensions

The results also show that Powell’s method is not able to approximate the optima of the multimodal function Rastrigin. On the easier multimodal function Griewank, the random initializations allow to find the optimum in some of the \(30\) runs. The fast convergence behavior on convex function parts motivates to perform local search as operator in a global evolutionary optimization framework. It is the basis of the Powell ES that we will analyze in the following.

4 Powell Evolution Strategy

The Powell ES [1] presented in this section is based on four key concepts, each focusing on typical problems in real-valued solution spaces. Powell’s method is a fast direct optimization method, in particular appropriate for unimodal fitness landscapes. It is integrated into the optimization process using ILS, in order to prevent Powell’s method from getting stuck in local optima. ILS approach is based on the successive repetition of Powell’s conjugate gradient method as local search technique and a perturbation mechanism. A population of candidate solutions is employed for exploration similar to evolution strategies. The strength of the ILS perturbation is controlled by means of an adaptive control mechanism. In case of stagnation, the mutation strength is increased, in order to leave local optima, and decreased otherwise.

Algorithm 3 shows the pseudocode of the Powell ES. At the beginning, \(\mu \) solutions \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_{\mu } \in {\mathbb {R}}^N\) are randomly initialized and optimized with the strategy of Powell. In an iterative loop \(\lambda \), offspring solutions \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_{\lambda }\) are produced by means of Gaussian mutations with the global mutation strength \(\sigma \) by

$$\begin{aligned} {\mathbf {x}}_j'={\mathbf {x}}_j+{\mathbf {z}}_j, \end{aligned}$$
(5.7)

with

$$\begin{aligned} {\mathbf {z}}_j \sim (\sigma _1{\mathcal {N}}(0,1),\ldots ,\sigma _{N}{\mathcal {N}}(0,1))^T. \end{aligned}$$
(5.8)

Afterwards, each solution \({\mathbf {x}}_j'\) is locally optimized with the strategy of Powell, leading to \({\hat{\mathbf{x}}}_{j}'\) for \(j=1,\ldots ,\lambda \). After \(\lambda \) solutions have been produced this way, the \(\mu \)-best are selected according to their fitness with comma selection. Then, we apply global recombination, i.e., the arithmetic mean \(\langle {\hat{\mathbf{x}}}_t \rangle \) at generation \(t\) of all selected solutions \(\hat{\mathbf {x}}_1, \ldots , \hat{\mathbf {x}}_{\mu }\) is computed. The fitness of this arithmetic mean is evaluated and compared to the fitness of the arithmetic mean of the last generation \(t-1\). If the search stagnates, i.e., if the condition

$$\begin{aligned} |f(\langle {\hat{\mathbf{x}}} \rangle _t) - f(\langle {\hat{\mathbf{x}}} \rangle _{t-1})| < \theta \end{aligned}$$
(5.9)

becomes true, the mutation strength is increased via

$$\begin{aligned} \sigma = \sigma \cdot \tau \end{aligned}$$
(5.10)

with \(\tau >1\). Otherwise, the mutation strength \(\sigma \) is decreased by multiplication with \(1/\tau \).

figure c

An increasing mutation strength \(\sigma \) allows to leave local optima. Powell’s method drives the search into local optima, and the outer ILS performs a search within the space of local optima controlling the perturbation strength \(\sigma \). A decrease of step size \(\sigma \) lets the algorithm converge to the local optimum in a range defined by \(\sigma \). This technique seems to be in contraposition to the \(1/5\)th success rule by Rechenberg [15]. Running a simple (\(1+1\))-ES with isotropic Gaussian mutations and constant mutation steps \(\sigma \), the optimization process will become very slow after a few generations. Rechenberg’s rule adapts the mutation strengths in the opposite kind of way. If the ratio \(g/G\) of successful generations \(g\) after \(G\) generations is larger than \(1/5\)th, the step size should be increased. The increase is reasonable, because bigger steps towards the optimum are possible, while small steps would be a waste of time. If the success ratio is less than \(1/5\)th, the step size should be decreased. This rule is applied every \(G\) generations. The goal of Rechenberg’s approach is to stay in the evolution window guaranteeing nearly optimal progress. Optimal progress is problem-dependent and can be stated theoretically on artificial functions [16].

However, in our approach the strategy of Powell approximates local optima, not the evolution strategy. The step control of the Powell ES has another task: leaving local optima, when the search stagnates. Basins of attractions can be left because of the increasing step size. Hence, the probability of finding the global optimum is larger than \(0\). With this mechanism, also the global optimum may be left again. But if the vicinity of the optimum has been reached, it is probable that it will be successively reached again. The problem that the global optimum may be left, if not recognized, can be compensated by saving the best found solution in the course of the optimization process.

5 Experimental Analysis

In the following, we will experimentally analyze the Powell ES on a set of test problems, cf. Appendix A. Again, initial solutions are generated in the interval \([-100,100]^N\), and the step sizes are set to \(\sigma _\mathrm{init }=1.0\). Each experiment is repeated \(30\) times. For the Powell ES, we employ the settings \(\lambda =8\) and \(\mu =2\). Each solution is mutated and locally optimized with Powell’s method. Again, Powell’s method terminates, if the improvement from one to the next iteration is smaller than \(\phi = 10^{-10}\), or if the optimum is found with accuracy \(f_\mathrm{stop }= 10^{-10}\). If the search on the ILS level stagnates, i.e., if the achieved improvement is smaller than \(\theta \), the mutation strength is increased with mutation parameter \(\tau = 2.0\). We allow a maximal budget of ffe\(_{\max }= 2.0 \times 10^{6}\) fitness function evaluations.

Table 5.2 shows the results of the analysis of the Powell ES on the test problems with \(N=10\) and \(N=30\) dimensions. The results have shown that Powell’s method is very fast on unimodal problems. Of course, the Powell ES shows the same capabilities and approximates the optimum in the first Powell-run on the Sphere problem and Doublesum. We have already observed that Powell gets stuck in local optima of multimodal problems (e.g. Rastrigin). The Powell ES perturbates a solution, when getting stuck, and applies Powell’s method again with the perturbation mechanism of Eq. (5.10). The results show that the iterated application of Powell’s method in each generation allows to approximate the global optimum, also on Rastrigin. The Powell ES is able to approximate the optimum in comparison to its counterpart without ILS.

Table 5.2 Experimental analysis of the Powell ES on the test problems with \(N=10\) and \(N=30\) dimensions

It convergences significantly faster than the CMSA-ES (cf. Chap. 2). A statistically significant superiority of the Powell ES can also be observed on Griewank. On Rosenbrock, no superiority of any of the two algorithms can be reported. Although the worst runs of the Powell ES cause a fitness deterioration in mean, the best runs are still much faster than the best runs of the CMSA-ES. The CMSA-ES is more robust with smaller standard deviations, but does not offer the potential to find the optimal solution that fast. A similar behavior can be observed on the test problems with \(N=30\) dimensions, see the lower part of Table 5.2. The CMSA-ES takes about 17 times more evaluations. This also holds true for the other unimodal test problems, where the Powell ES is superior. On the multimodal test problems in higher dimensions, similar results as for \(N=10\) can be observed. The Powell ES is statistically better on Rastrigin. The CMSA-ES’s mean and median are better on Rosenbrock and Griewank. On Kursawe, the optimum has been found in every run for \(N=10\), but in no run for \(N=30\).

Fig. 5.1
figure 1

Development of fitness and step sizes on the multimodal problem Kursawe employing \(N=10\). When the search gets stuck in local optima, the perturbation mechanism increases \(\sigma \) and lets the Powell ES escape from basins of attraction [1]

Figure 5.1 shows fitness curves and step sizes of typical runs on the multimodal problem Kursawe with \(N=10\). It can be observed that the search successively gets stuck. But the perturbation mechanism always allows to leave the local optima again. When the search gets stuck in a local optimum, the strategy increases \(\sigma \) until the local optimum is successfully left, and a better local optimum is found. The approach moves from one local optimum to another controlling \(\sigma \), until the global optimum is found. The fitness development reveals that the search has to accept worse solutions to approximate the optimal solution. The figures confirm the basic idea of the algorithm. ILS controls the global search, while Powell’s method drives the search into local optima. Frequently, the hybrid is only able to leave local optima by controlling the strength \(\sigma \) of the Gaussian perturbation mechanism. ILS conducts a search in the space of local optima.

6 Perturbation Mechanism and Population Sizes

For deeper insights into the perturbation mechanism and the interaction with population sizes, we conduct further experiments on the multimodal problem Rastrigin with \(N=30\), where the Powell ES has shown successful results. The strength of the perturbation mechanism plays an essential role for the ILS. In case of stagnation, the step size is increased as described in Eq. (5.10) with \(\tau >1\) to let the search escape from local optima. Frequently, a successive increase of the perturbation strength is necessary to prevent stagnation. In case of an improvement, the step size is decreased with \(\tau <1\). The idea of the step size reduction is to prevent the search process from jumping over promising regions of the solution space. In the following, we analyze the perturbation mechanism and the population sizes on Rastrigin. We try to determine useful parameter settings for \(\tau \) and for population parameters \(\mu \) and \(\lambda \).

Table 5.3 Analysis of the Powell ES perturbation parameter \(\tau \) and the population sizes on Rastrigin with \(N= 30\) using the same initial settings, performance measure, and termination condition like in the previous experiments

Table 5.3 shows the corresponding results. The best result has been achieved with \(\tau = 2.0\) and population sizes \((1,4)\). Also the best median has been achieved with this setting, while the second best has been achieved with \(\tau = 1.5\) and population sizes \((1,4)\). With parameter setting \(\tau = 10.0\), the Powell ES achieves a satisfying best solution, but the variance of the results is high. The worst solution is comparably bad. In general, the results for \(\tau = 10.0\) are quite weak, for \((1,4)\) the algorithm does not converge within reasonable time. For low mutation strengths, the best results can be observed for small population sizes. In turn, for higher mutation strengths, i.e., \(\tau = 5.0\), larger population sizes are necessary to compensate the explorative effect. Further experiments on other problems led to the decision that a \((2,8)\)-Powell ES is a good compromise between exploration and efficiency, while a \((4,16)\)-Powell ES is a rather conservative, but stable choice with reliable results.

7 Conclusions

Combining the world of local search with the world of global evolutionary optimization is a promising undertaking. It reflects the original idea of evolutionary computation. If we do not know anything about the problem, evolutionary algorithms are an appropriate choice. In multimodal fitness landscapes, we typically know nothing about the landscape of local optima. The Powell ES only assumes that attractive local optima lie closely together. Hence, the search might jump from one basin of attraction to a neighbored one. To move into local optima, Powell’s method turns out to be fairly successful. Furthermore, the adaptation of the perturbation strength is a natural enhancement in real-valued solution spaces. A population-based implementation allows to run multiple Powell searches in parallel and to achieve a crucial speedup in distributed computing environments.