Abstract
Evolutionary algorithms (GlossaryTerm
EA
s) have given rise to many parallel variants, fuelled by the rapidly increasing number of GlossaryTermCPU
cores and the ready availability of computation power through GlossaryTermGPU
s and cloud computing. A very popular approach is to parallelize evolution in island models, or coarse-grained GlossaryTermEA
s, by evolving different populations on different processors. These populations run independently most of the time, but they periodically communicate genetic information to coordinate search. Many applications have shown that island models can speed up computation significantly, and that parallel populations can further increase solution diversity.The aim of this book chapter is to give a gentle introduction into the design and analysis of parallel evolutionary algorithms, in order to understand how parallel GlossaryTerm
EA
s work, and to explain when and how speedups over sequential GlossaryTermEA
s can be obtained.Understanding how parallel GlossaryTerm
EA
s work is a challenging goal as they represent interacting stochastic processes, whose dynamics are determined by several parameters and design choices. This chapter uses a theory-guided perspective to explain how key parameters affect performance, based on recent advances on the theory of parallel GlossaryTermEA
s. The presented results give insight into the fundamental working principles of parallel GlossaryTermEA
s, assess the impact of parameters and design choices on performance, and contribute to an informed design of effective parallel GlossaryTermEA
s.Access provided by Autonomous University of Puebla. Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Recent years have witnessed the emergence of a huge number of parallel computer architectures. Almost every desktop or notebook PC, and even mobile phones, come with several CPU cores built in. Also GPUs have been discovered as a source of massive computation power at no extra cost. Commercial IT solutions often use clusters with hundreds and thousands of CPU cores and cloud computing has become an affordable and convenient way of gaining CPU
With these resources readily available, it has become more important than ever to design algorithms that can be implemented effectively in a parallel architecture. Evolutionary algorithms (GlossaryTerm
EA
) are popular general-purpose metaheuristics inspired by the natural evolution of species. By using operators like mutation, recombination, and selection, a multi-set of solutions – the population – is evolved over time. The hope is that this artificial evolution will explore vast regions of the search space and yet use the principle of survival of the fittest to generate good solutions for the problem at hand. Countless applications as well as theoretical results have demonstrated that these algorithms are effective on many hard optimization problems.One of many advantages of GlossaryTerm
EA
s is that they are easy to parallelize. The process of artificial evolution can be implemented on parallel hardware in various ways. It is possible to parallelize specific operations, or to parallelize the evolutionary process itself. The latter approach has led to a variety of search algorithms called island models or cellular evolutionary algorithms. They differ from a sequential implementation in that evolution happens in a spatially structured network. Subpopulations evolve on different processors and good solutions are communicated between processors. The spread of information can be tuned easily via key parameters of the algorithm. A slow spread of information can lead to a larger diversity in the system, hence increasing exploration.Many applications have shown that parallel GlossaryTerm
EA
s can speed up computation and find better solutions, compared to a sequential GlossaryTermEA
. This book chapter reviews the most common forms of parallel GlossaryTermEA
s. We highlight what distinguishes parallel GlossaryTermEA
s from sequential GlossaryTermEA
s. We also we make an effort to understand the search dynamics of parallel GlossaryTermEA
. This addresses a very hot topic since, as of today, even the impact of the most basic parameters of a parallel evolutionary algorithms are not well understood.The chapter has a particular emphasis on theoretical results. This includes runtime analysis, or computational complexity analysis. The goal is to estimate the expected time until an GlossaryTerm
EA
finds a satisfactory solution for a particular problem, or problem class, by rigorous mathematical studies. This area has led to very fruitful results for general GlossaryTermEA
s in the last decade [1, 2]. Only recently have researchers turned to investigating parallel evolutionary algorithms from this perspective [3, 4, 5, 6, 7]. The results help to get insight into the search behavior of parallel GlossaryTermEA
s and how parameters and design choices affect performance. The presentation of these results is kept informal in order to make it accessible to a broad audience. Instead of presenting theorems and complete formal proofs, we focus on key ideas and insights that can be drawn from theseThe outline of this chapter is as follows. In Sect. 46.1 we first introduce parallel models of evolutionary algorithms, along with a discussion of key design choices and parameters. Section 46.2 considers performance measures for parallel GlossaryTerm
EA
s, particularly notions for speedup of a parallel GlossaryTermEA
when compared to sequential GlossaryTermEA
s.Section 46.3 deals with the spread of information in parallel GlossaryTerm
EA
s. We review various models used to describe how the number of good solutions increases in a parallel GlossaryTermEA
. This also gives insight into the time until the whole system is taken over by good solutions, the so-called takeover time.In Sect. 46.4 we present selected examples where parallel GlossaryTerm
EA
s were shown to outperform sequential evolutionary algorithms. Drastic speedups were shown on illustrative example functions. This holds for various forms of parallelization, from independent runs to offspring populations and islandSection 46.5 finally reviews a general method for estimating the expected running time of parallel GlossaryTerm
EA
s. This method can be used to transfer bounds for a sequential GlossaryTermEA
to a corresponding parallel GlossaryTermEA
, in an automated fashion. We go into a bit more detail here, in order to enable the reader to apply this method by her-/himself. Illustrative example applications are given that also include problems from combinatorialThe chapter finishes with conclusions in Sect. 46.6 and pointers to further literature on parallel evolutionary algorithms.
1 Parallel Models
1.1 Master–Slave Models
There are many ways how to use parallel machines. A simple way of using parallelization is to execute operations on separate processors. This can concern variation operators like mutation and recombination as well as function evaluations. In fact, it makes most sense for function evaluations as these operations can be performed independently and they are often among the most expensive operations. This kind of architecture is known as master–slave model . One machine represents the master and it distributes the workload for executing operations to several other machines called slaves. It is well suited for the creation of offspring populations as offspring can be created and evaluated independently, after suitable parents have been selected.
The system is typically synchronized: the master waits until all slaves have completed their operations before moving on. However, it is possible to use asynchronous systems where the master does not wait for slaves that take too long.
The behavior of synchronized master–slave models is not different from their sequential counterparts. The implementation is different, but the algorithm – and therefore search behavior – is the same.
1.2 Independent Runs
Parallel machines can also be used to simulate different, independent runs of the same algorithm in parallel. Such a system is very easy to set up as no communication during the runtime is required. Only after all runs have been stopped, do the results need to be collected and the best solution (or a selection of different high-quality solutions) is output.
Alternatively, all machines can periodically communicate their current best solutions so that the system can be stopped as soon as a satisfactory solution has been found. As for master–slave models, this prevents us from having to wait until the longest run has
Despite its simplicity, independent runs can be quite effective. Consider a setting where a single run of an algorithm has a particular success probability, i. e., a probability of finding a satisfactory solution within a given time frame. Let this probability be denoted p. By using several independent runs, this success probability can be increased significantly. This approach is commonly known as probability .
The probability that in λ independent runs no run is successful is . The probability that there is at least one successful run among these is, therefore,
Figure 46.1 illustrates this amplified success probability for various choices of λ and p.
We can see that for a small number of processors the success probability increases almost linearly. If the number of processors is large, a saturation effect occurs. The benefit of using ever more processors decreases with the number of processors used. The point where saturation happens depends crucially on p; for smaller success probabilities saturation happens only with a fairly large number of processors.
Furthermore, independent runs can be set up with different initial conditions or different parameters. This is useful to effectively explore the parameter space and to find good parameter settings in a short time.
1.3 Island Models
Independent runs suffer from obvious drawbacks: once a run reaches a situation where its population has become stuck in a difficult local optimum, it will most likely remain stuck forever. This is unfortunate since other runs might reach more promising regions of the search space at the same time. It makes more sense to establish some form of communication between the different runs to coordinate search, so that runs that have reached low-quality solutions can join in on the search in more promising regions.
In island models , also called distributed GlossaryTerm
EA
s , the coarse-grained model , or the multi-deme model , the population of each run is regarded an island. One often speaks of islands as subpopulations that together form the population of the whole island model. Islands evolve independently as in the independent run model, for most of the time. However, periodically solutions are exchanged between islands in a process called .The idea is to have a migration topology , a directed graph with islands as its nodes and directed edges connecting two islands. At certain points of time selected individuals from each island are sent off to neighbored islands, i. e., islands that can be reached by a directed edge in the topology. These individuals are called migrants and they are included in the target island after a further selection process. This way, islands can communicate and compete with one another. Islands that get stuck in low-fitness regions of the search space can be taken over by individuals from more successful islands. This helps to coordinate search, focus on the most promising regions of the search space, and use the available resources effectively. An example of an island model is given in Fig. 46.2. Algorithm 46.1shows the general scheme of a basic island
Algorithm 46.1 Scheme of an island model with migration interval τ
1: Initialize a population made up of subpopulations or islands, .
2: Let .
3: loop
4: for each island i do
5: if then
6: Send selected individuals from island to selected neighbored islands.
7: Receive immigrants from islands for which island is a neighbor.
8: Replace by a subpopulation resulting from a selection among and .
9: end if
10: Produce by applying reproduction operators and selection to .
11: end for
12: Let .
13: end loop
There are many design choices that affect the behavior of such an island model:
-
Emigration policy. When migrants are sent, they can be removed from the sending island. Alternatively, copies of selected individuals can be emigrated. The latter is often called pollination . Also the selection of migrants is important. One might select the best, worst, or random individuals.
-
Immigration policy. Immigrants can replace the worst individuals in the target population, random individuals, or be subjected to the same kind of selection used within the islands for parent selection or selection for replacement. Crowding mechanisms can be used, such as replacing the most similar individuals. In addition, immigrants can be recombined with individuals present on the island before selection.
-
Migration interval . The time interval between migrations determines the speed at which information is spread throughout an island model. Its reciprocal is often called migration frequency . Frequent migrations imply a rapid spread of information, while rare migrations allow for more exploration. Note that a migration interval of yields independent runs as a special case.
-
Number of migrants. The number of migrants, also called migration size , is another parameter that determines how quickly an island can be taken over by immigrants.
-
Migration topology. Also the choice of the migration topology impacts search behavior. The topology can be a directed or undirected graph – after all, undirected graphs can be seen as special cases of directed graphs. Common topologies include unidirectional rings (a ring with directed edges only in one direction), bidirectional rings, torus or grid graphs, hypercubes, scale-free graphs [8], random graphs [9], and complete graphs. Figure 46.3 sketches some of these topologies. An important characteristic of a topology is its diameter: the maximum number of edges on any shortest path between two vertices. Formally, , where is the graph distance, the number of edges on a shortest path from u to v. The diameter gives a good indication of the time needed to propagate information throughout the topology. Rings and torus graphs have large diameters, while hypercubes, complete graphs, and many scale-free graphs have small diameters.
Island models with non-complete topologies are also called stepping stone models . The impact of these design choices will be discussed in more detail in Sect. 46.3.
If all islands run the same algorithm under identical conditions, we speak of a homogeneous island model . Heterogeneous island models contain islands with different characteristics. Different algorithms might be used, different representations, objective functions, or parameters. Using heterogeneous islands might be useful if one is not sure what the best algorithm is for a particular problem. It also makes sense in the context of multiobjective optimization or when a diverse set of solutions is sought, as the islands can reflect different objective functions, or variations of the same objective functions, with an emphasis on different
Skolicki [10] proposed a two-level view of search dynamics in island models. The term intra-island evolution describes the evolutionary process that takes place within each island. On a higher level, inter-island evolution describes the interaction between different islands. He argues that islands can be regarded as individuals in a higher-level evolution. Islands compete with one another and islands can take over other islands, just like individuals can replace other individuals in a regular population. One conclusion is that with this perspective an island models looks more like a compact
The two levels of evolution obviously interact with one another. Which level is more important is determined by the migration interval and the other parameters of the system that affect the spread of
1.4 Cellular EAs
Cellular GlossaryTerm
EA
s represent a special case of island models with a more fine-grained form of parallelization. Like in the island model we have islands connected by a fixed topology. Rings and two-dimensional torus graphs are the most common choice. The most striking characteristic is that each island only contains a single individual. Islands are often called cells in this context, which explains the term cellular GlossaryTermEA
. Each individual is only allowed to mate with its neighbors in the topology. This kind of interaction happens in every generation. This corresponds to a migration interval of 1 in the context of island models. Figure 46.4 shows a sketch of a cellular GlossaryTermEA
. A scheme of a cellular GlossaryTermEA
is given in Algorithm 46.2.Algorithm 46.2 Scheme of a cellular EA
1: Initialize all cells to form a population . Let .
2: loop
3: for each cell i do
4: Select a set S i of individuals from out of all cells neighbored to cell i.
5: Create a set R i by applying reproduction operators to S i .
6: Create by selecting an individual from .
7: end for
8: Let .
9: end loop
Cellular GlossaryTerm
EA
s yield a much more fine-grained system; they have therefore been called fine-grained models, neighborhood models, or diffusion models. The difference to island models is that no evolution takes place on the cell itself, i. e., there is no intra-island evolution. Improvements can only be obtained by cells interacting with one another. It is, however, possible that an island can interact with itself.In terms of the two-level view on island models, in cellular GlossaryTerm
EA
s the intra-island dynamics have effectively been removed. After all, each island only contains a single individual. Fine-grained models are well suited for investigations of inter-island dynamics. In fact, the first runtime analyses considered fine-grained island models, where each island contains a single individual [4, 5]. Other studies dealt with fine-grained systems that use a migration interval larger than 1 [3, 6, 7].For replacing individuals the same strategies as listed for island models can be used. All cells can be updated synchronously, in which case we speak of a synchronous cellular EA . A common way of implementing this is to create a new, temporary population. All parents are taken from the current population and new individuals are written into the temporary population. At the end of the process, the current population is replaced by the temporary population.
Alternatively, cells can be updated sequentially, resulting in an asynchronous cellular EA . This is likely to result in a different search behavior as individuals can mate with offspring of their neighbors. Alba etal [11] define the following update strategies. The terms are tailored towards two-dimensional grids or torus graphs as they are inspired by cellular automata. It is, however, easy to adapt these strategies to arbitrary
-
Uniform choice: the next cell to be updated is chosen uniformly at random.
-
Fixed line sweep: the cells are updated sequentially, line by line in a grid/torus topology.
-
Fixed random sweep: the cells are updated sequentially, according to some fixed order. This order is determined by a permutation of all cells. This permutation is created uniformly at random during initialization and kept throughout the whole run.
-
New random sweep: this strategy is like fixed random sweep, but after each sweep is completed a new permutation is created uniformly at random.
A time step or generation is defined as the time needed to update m cells, m being the number of cells in the grid. The last three strategies ensure that within each time step each cell is updated exactly once. This yields a much more balanced treatment for all cells. With the uniform choice model is it likely that some cells must wait for a long time before being updated. In the limit, the waiting time for updates follows a Poisson distribution. Consider the random number of updates until the last cell has been updated at least once. This random process is known as the coupon collector problem [12, page 32], as it resembles the process of collecting coupons, which are drawn uniformly at random. A simple analysis shows that the expected number of updates until the last cell has been updated in the uniform choice model (or all coupons have been collected) equals
This is equivalent to
time steps, which can be significantly larger than , the time for completing a sweep in any given order.
Cellular GlossaryTerm
EA
s are often compared to cellular automata. In the context of the latter, it is common practice to consider a two-dimensional grid and different neighborhoods. The neighborhood in Fig. 46.2 is called the von Neumann neighborhood or Linear 5. It includes the cell itself and its four neighbors along the directions north, south, west, and east. The Moore neighborhood or Compact 9in addition also contains the four cells to the north west, north east, south west, and south east. Also larger neighborhoods are common, containing cells that are further away from the centerNote that using a large neighborhood on a two-dimensional grid is equivalent to considering a graph where, starting with a torus graph, for each vertex edges to nearby vertices have been added. We will, therefore, in the remainder of this chapter stick to the common notion of neighbors in a graph (i. e., vertices connected by an edge), unless there is a good reason not
1.5 A Unified Hypergraph Model for Population Structures
Sprave [13] proposed a unified model for population structures. It is based on hypergraphs; an extension of graphs where edges can connect more than two vertices. We present an informal definition to focus on the ideas; for formal definitions we refer to [13]. A hypergraph contains a set of vertices and a collection of hyperedges. Each hyperedge is a non-empty set of vertices. Two vertices are neighbored in the hypergraph if there is a hyperedge that contains both vertices. Note that the special case where each hyperedge contains two different vertices results in an undirected
In Sprave’s model each vertex represents an individual. Hyperedges represent the set of possible parents for each individual. The model unifies various common population models:
-
Panmictic populations: for panmictic populations we have a set of vertices Vand there is a single hyperedge that equals the whole vertex set. This reflects the fact that in a panmictic population each individual has all individuals as potential
-
Island models with migration: if migration is understood in the sense that individuals are removed, the set of potential parents for an individual contains all potential immigrants as well as all individuals from its own island, except for those that are being emigrated.
-
Island models with pollination: if pollination is used, the set of potential parents contains all immigrants and all individuals on its own island.
-
Cellular GlossaryTerm
EA
s: For each individual, the potential parents are its neighbors in the topology.
In the case of coarse-grained models, the hypergraph may depend on time. More precisely, we have different sets of potential parents when migration is used, compared to generations without migration. Sprave considers this by defining a dynamic population structure: instead of considering a single, fixed hypergraph, we consider a sequence of hypergraphs over
1.6 Hybrid Models
It is also possible to combine several of the above approaches. For instance, one can imagine an island model where each island runs a cellular GlossaryTerm
EA
to further promote diversity. Or one can think of hierarchical island models where islands are island models themselves. In such a system it makes sense that the inner-layer island models use more frequent migrations than the outer-layer island model. Island models and cellular GlossaryTermEA
s can also be implemented as master–slave models to achieve a better speedup.2 Effects of Parallelization
An obvious effect of parallelization is that the computation time can be reduced by using multiple processors. This section describes performance measured that can be used to define this speedup. We also consider beneficial effects of using parallel GlossaryTerm
EA
s that can lead to superlinear2.1 Performance Measures for Parallel EAs
The computation time of a parallel GlossaryTerm
EA
can be defined in various ways. It makes sense to use wall-clock time as the performance measure as this accounts for the overhead by parallelization. Under certain conditions, it is also possible to use the number of generations or function evaluations. This is feasible if these measures reflect the real running time in an adequate way, for instance if the execution of a generation (or a function evaluation) dominates the computational effort, including the effort for coordinating different machines. It is also feasible if one can estimate the overhead or the communication costs separately.We consider settings where an GlossaryTerm
EA
is run until a certain goal is fulfilled. Goals can be reaching a global or local optimum or reaching a certain minimum fitness. In such a setting the goal is fixed and the running time of the GlossaryTermEA
can vary. This is in contrast to setups where the running time is fixed to a predetermined number of generations and then the quality or accuracy of the obtained solutions is compared. As Alba pointed out [14], performance comparisons of parallel and sequential GlossaryTermEA
s only make sense if they reach the same accuracy. In the following, we focus on the former setting where the same goal is used.Still, defining speedup formally is far from trivial. It is not at all clear against what algorithm a parallel algorithm should be compared. However, this decision is essential to clarify the meaning of speedup. Not clarifying it, or using the wrong comparison, can easily yield misleading results and false claims. We present a taxonomy inspired by Alba [14], restricted to cases where a fixed goal is given:
-
Strong speedup : the parallel run time of a parallel algorithm is compared against the sequential run time of the best known sequential algorithm. It was called absolute speedup by Barr and Hickman [15]. This measure captures in how far parallelization can improve upon the best known algorithms. However, it is often difficult to determine the best sequential algorithm. Most researchers, therefore, do not use strong speedup [14].
-
Weak speedup : the parallel run time of an algorithm is compared against its own sequential run time. This gives rise to two subcases where the notion of its own sequential run time is made precise:
-
Single machine/panmixia : the parallel GlossaryTerm
EA
is compared against a canonical, panmictic version of it, running on a single machine. For instance, we might compare an island model with m islands against an GlossaryTermEA
running a single island. Thereby, the GlossaryTermEA
run on all islands is the same in both cases. -
Orthodox : the parallel GlossaryTerm
EA
running on m machines is compared against the same parallel GlossaryTermEA
running on a single machine. This kind of speedup was called relative speedup by Barr and Hickman [15].
-
In the light of these essential differences, it is essential for researchers to clarify their notion of speedup.
Having clarified the comparison, we can now define the speedup and other measures. Let T m denote the time for m machines to reach the goal. Let T1 denote the time for a single machine, where the algorithm is chosen according to one of the definitions of speedup defined above.
The idea is to consider the ratio of T m and the time for a single machine, T1, as speedup . However, as we are dealing with randomized algorithms, T 1 and T m are random variables and so the ratio of both is a random variable as well. It makes more sense to consider the ratio of expected times for both the parallel and the sequential algorithm as speedup
Note that T 1 and T m might have very dissimilar probability distributions. Even when both are re-scaled appropriately to obtain the best possible match between the two, they might still have different shapes and different variances. In some cases it might make sense to consider the median or other statistics instead of the expectation.
According to the speedup s m we distinguish the following cases:
-
Sublinear speedup: if we speak of a sublinear speedup . This implies that the total computation time across all machines is larger than the total computation time of the single machine (assuming no idle times in the parallel algorithm).
-
Linear speedup: the case is known as linear speedup . There, the parallel and the sequential algorithm have the same total time. This outcome is very desirable as it means that parallelization does not come at a cost. There is no noticeable overhead in the parallel algorithm.
-
Superlinear speedup: if we have a superlinear speedup . The total computation time of the parallel algorithm is even smaller than that of the single machine. This case is considered in more detail in the following section.
Speedup is the best known measure, but not the only one used regularly. For the sake of completeness, we mention other measures. The efficiency is a normalization of the speedup
Obviously, is equivalent to a linear speedup. Lower efficiencies correspond to sublinear speedups, higher ones to superlinear speedups.
Another measure is called incremental efficiency and it measures the speedup when moving from processors to m processors
There is also a generalized form where is replaced by in the above formula. This reflects the speedup when going from processors to m processors.
2.2 Superlinear Speedups
At first glance superlinear speedups seem astonishing. How can a parallel algorithm have a smaller total computation time than a sequential counterpart? After all, parallelization usually comes with significant overhead that slows down the algorithm. The existence of superlinear speedups has been discussed controversially in the literature. However, there are convincing reasons why a superlinear speedup might
Alba [14] mentions physical sources as one possible reason. A parallel machine might have more resources in terms of memory or caches. When moving from a single machine to a parallel one, the algorithm might – purposely or not – make use of these additional resources. Also, each machine might only have to deal with smaller data packages. It might be that the smaller data fits into the cache while this was not the case for the single machine. This can make a significant performance difference.
When comparing a single panmictic population against smaller subpopulations, it might be easier to deal with the subpopulations. This holds even when the total population sizes of both systems are the same. In particular, a parallel system has an advantage if operations need time which grows faster than linearly with the size of the (sub)population.
We give two illustrative examples. Compare a single panmictic population of size μ with m subpopulations of size each. Some selection mechanisms, like ranking selection, might have to sort the individuals in the population according to their fitness. In a straightforward implementation one might use well-known sorting algorithms such as (randomized) QuickSort, MergeSort, or HeapSort. All of these are known to take time for sorting n elements, on average. Let us disregard the hidden constant and the randomness of randomized QuickSort and assume that the time is precisely .
Now the effort of sorting the panmictic population is . The total effort for sorting m populations of size each is
So, the parallel system executes this operation faster, with a difference of time steps in terms of the total computation time.
This effect becomes more pronounced the more expensive operations are used (with respect to the population size). Assume that some selection mechanism or diversity mechanism is used, which compares every individual against every other one. Then the effort for the panmictic population is roughly time steps. However, for the parallel GlossaryTerm
EA
and its subpopulations the total effort would only beThis is faster than the panmictic GlossaryTerm
EA
by a factor of m.The above two growth curves are actually very typical running times for operations that take more than linear time. A table with time bounds for common selection mechanisms can be found in Goldberg and Deb [16]. Figure 46.5 shows plots for the total effort in both scenarios for a population size of . One can see that even with a small number of processors the total effort decreases quite significantly. To put this into perspective, most operations require only linear time. Also the overhead by parallelization was not accounted for. However, the discussion gives some hints as to why the execution time for smaller subpopulations can decrease significantly in
3 On the Spread of Information in Parallel EAs
In order to understand how parallel GlossaryTerm
EA
s work, it is vital to get an idea on how quickly information is propagated. The spread of information is the most distinguishing aspect of parallel GlossaryTermEA
s, particularly distributed GlossaryTermEA
s. This includes island models and cellular GlossaryTermEA
s. Many design choices can tune the speed at which information is transmitted: the topology, the migration interval, the number of migrants, and the policies for emigration and immigration.3.1 Logistic Models for Growth Curves
Many researchers have turned to investigating the selection pressure in distributed GlossaryTerm
EA
s in a simplified model. Assume that in the whole system we only have two types of solutions: current best individuals and worse solutions. No variation is used, i. e., we consider GlossaryTermEA
s using neither mutation nor crossover. The question is the following. Using only selection and migration, how long does it take for the best solutions to take over the whole system? This time, starting from a single best solution, is referred to as takeover time.It is strongly related to the study of growth curves : how the number of best solutions increases over time. The takeover time is the first point of time at which the number of best solutions has grown to the whole population.
Growth curves are determined by both inter-island dynamics and intra-island dynamics: how quickly current best solutions spread in one island’s population, and how quickly they populate neighbored islands, until the whole topology is taken over. Both dynamics are linked: intra-island dynamics can have a direct impact on inter-island dynamics as the fraction of best individuals can decide how many (if any) best individuals emigrate.
For intra-island dynamics one can consider results on panmictic GlossaryTerm
EA
s. Logistic curves have been proposed and found to fit simulations of takeover times very well for common selection schemes [16]. These curves are defined by the following equation. If is the proportion of best individuals in the population at time t, thenwhere a is called the growth coefficient. One can see that the proportion of best individuals increases exponentially, but then the curve saturates as the proportion approaches 1.
Sarma and De Jong [17] considered growth curves in cellular GlossaryTerm
EA
s. They presented a detailed empirical study of the effects of the neighborhood size and the shape of the neighborhood for different selection schemes. They showed that logistic curves as defined above can model the growth curves in cellular GlossaryTermEA
s reasonably well.Alba and Luque [18] proposed a logistic model called LOG tailored towards distributed GlossaryTerm
EA
s with periodic migration. If τ denotes the migration interval and m is the number of islands, thenIn this model a and b are adjustable parameters. The model counts subsequent increases of the proportion of best individuals during migrations. However, it does not include any information about the topology and the authors admit that it only works appropriately on the ring topology [19, Section 4.2]. They, therefore, present an even more detailed model called TOP, which includes the diameter of the topology T.
Simulations show that this model yields very accurate fits for ring, star, and complete topologies [19, Section 4.3].
Luque and Alba [19, Section 4.3] proceed by analyzing the effect of the migration interval and the number of migrants. With a large migration interval, the growth curves tend to make jumps during migration and flatten out quickly to form plateaus during periods without migration. The resulting curves look like step functions, and the size of these steps varies with the migration interval.
Varying the number of migrants changes the slope of these steps. A large number of migrants has a better chance of transmitting best individuals than a small number of migrants. However, the influence of the number of migrants was found to be less drastic than the impact of the migration interval. When a medium or large migration frequency is used, the impact of the number of migrants is negligible [19, Section 4.5]. The same conclusion was made earlier by Skolicki and De Jong [20].
Luque and Alba also presented experiments with a model based on the Sprave’s hypergraph formulation of distributed GlossaryTerm
EA
s [13]. This model gave a better fit than the simple logistic model LOG, but it was less accurate than the model TOP that included the diameter.For the sake of completeness, we also mention that Giacobini etal [21] proposed an improved model for asynchronous cellular GlossaryTerm
EA
s, which is not based on logistic curves.3.2 Rigorous Takeover Times
Rudolph [22, 23] rigorously analyzed takeover times in panmictic populations, for various selection schemes. In [22] he also dealt with the probability that the best solution takes over the whole population; this is not evident for non-elitistic algorithms. In [23] Rudolph considered selection schemes made elitistic by undoing the last selection in case the best solution would become extinct otherwise. Under this scheme the expected takeover time in a population of size μ is .
In [24] Rudolph considered spatially structured populations in a fine-grained model. Each population has size 1, therefore vertices in the migration topology can be identified with individuals. Migration happens in every generation. Assume that initially only one vertex i in the topology is a best individual. If in every generation each non-best vertex is taken over by the best individual in its neighborhood, then the takeover time from vertex i equals
where V is the set of vertices and denotes the graph distance, the number of edges on a shortest path from i to j.
Rudolph defines the takeover time in a setting where the initial best solution has the same chance of evolving at every vertex. Then
is the expected takeover time if, as above, best solutions are always propagated to their neighbors with probability 1. If this probability is lower, the expected takeover time might be higher. The above formula still represents a lower bound. Note that in non-elitist GlossaryTerm
EA
s it is possible that all best solutions might get lost, leading to a positive extinction probability [24].Note that is bounded by the diameter of the topology. The diameter is hence a trivial lower bound on the takeover times. Rudolph [24] conjectures that the diameter is more important than the selection mechanism used in the distributed GlossaryTerm
EA
.In [25] the author generalizes the above arguments to coarse-grained models. Islands can contain larger populations and migration happens with a fixed frequency. In his model the author assumes that in each island new best individuals can only be generated by immigration. Migration always communicates best individuals. Hence, the takeover time boils down to a deterministic time until the last island has been reached, plus a random component for the time until all islands have been taken over completely.
Rudolph [25] gives tight bounds for unidirectional rings, based on the fact that each island with a best individual will send one such individual to each neighbored island. Hence, on the latter island the number of best individuals increases by 1, unless the island has been taken over completely. For more dense topologies he gives a general upper bound, which may not be tight for all graphs. If there is an island that receives best individuals from k > 1 other islands, the number of best individuals increases by k. (The number kcould even increase over time.) It was left as an open problem to derive more tight bounds for interesting topologies other than unidirectional
Other researchers followed up on Rudolph’s seminal work. Giacobini etal [26] presented theoretical and empirical results for the selection pressure on ring topologies, or linear cellular GlossaryTerm
EA
s. Giacobini etal [27] did the same for toroidal cellular GlossaryTermEA
s. In particular, they considered takeover times for asynchronous cellular GlossaryTermEA
s, under various common update schemes. Finally, Giacobini etal investigated growth curves for small-world graphs [9].The assumption from Rudolph’s model that only immigration can create new best individuals is not always realistic. If standard mutation operators are used, there is a constant probability of creating a clone of a selected parent simply by not flipping any bits. This can lead to a rapid increase in the number of high-fitness individuals.
This argument on the takeover of good solutions in panmictic populations has been studied as part of rigorous runtime analyses of population-based GlossaryTerm
EA
s. Witt [28] considered a simple () GlossaryTermEA
with uniform parent selection, standard bit mutations, no crossover, and cut selection at the end of the generation. From his work it follows that good solutions take over the population in expected time . More precisely, if currently there is at least one individual with current best fitness i, then after generations all individuals in the population will have fitness i at least.Sudholt [29, Lemma 2] extended these arguments to a () GlossaryTerm
EA
and proved an upper bound of . Note that, in contrast to other studies of takeover times, both results apply real GlossaryTermEA
s that actually use mutation. Extending these arguments to distributed GlossaryTermEA
s is an interesting topic for future work.3.3 Maximum Growth Curves
Now, we consider inter-island dynamics in more detail. Assume for simplicity that intra-island takeover happens quickly: after each migration transmitting at least one best solution, the target island is completely taken over by best solutions before the next migration. We start with only one island containing a best solution, assuming that all individuals on this island are best solutions. We call such an island an optimal island. If migrants are not subject to variation while emigrating or immigrating, we will always select best solutions for migration and, hence, successfully transmit best
These assumptions give rise to a deterministic spread of best solutions: after each migration, each optimal island will turn all neighbored islands into optimal islands. This is very similar to Rudolph’s model [25], but it also accounts for a rapid intra-island takeover in between migrations.
We consider growth curves on various graph classes: unidirectional and bidirectional rings, square torus graphs, hypercubes, and complete graphs. Figure 46.6 shows these curves for all these graphs on vertices. The torus graph has side lengths . The hypercube has dimension . Each vertex has a label made of . All possible values for this bit string are present in the graph. Two vertices are neighbored if their labels differ in exactly one
For the unidirectional ring, after migrations we have exactly i optimal islands, if . The growth curve is, therefore, linear. For the bidirectional ring information spreads twice as fast as it can spread in two directions. After migrations we have optimal islands if .
The torus allows communication in two dimensions. After one migration there are optimal islands. After two migrations this number is , and after three migrations it is . In general, after migrations we have
optimal islands, as long as the optimal islands can freely spread out in all four directions, north, south, west, and east. At some point the ends of the region of optimal islands will meet, i. e., the northern tip meets the southern one and the same goes for west and east. Afterwards, we observe regions of non-optimal islands that constantly shrink, until all islands are optimal. The growth curve for the torus is hence quadratic at first and then it starts to saturate. The deterministic growth on torus graphs was also considered in [30].
For the hypercube, we can without loss of generality assume that the initial optimal island has a label containing only zeros. After one migration all islands whose label contains a single one become optimal. After two migrations the same holds for all islands with two ones, and so on. The number of optimal islands after i migrations in a d-dimensional hypercube (i. e., ) is hence . This number is close to d j during the first migrations and then at some point starts to saturate. The complete graph is the simplest one to analyze here as it will be completely optimal after one migration.
These arguments and Fig. 46.6 show that the growth curves can depend tremendously on the migration topology. For sparse topologies like rings or torus graphs, in the beginning the growth is linear or quadratic, respectively. This is much slower than the exponential growth observed in logistic curves. Furthermore, for the ring there is no saturation; linear curves are quite dissimilar to logistic curves.
This suggests that logistic curves might not be the best models for growth curves across all topologies. The plots by Luque and Alba [19, Section 4.3] show a remarkably good overall fit for their TOP model. However, this might be due to the optimal choice of the parameters a and b and the fact that logistic curves are easily adaptable to various curves of roughly similar shape. We believe that it is possible to derive even more accurate models for common topologies, based on results by Giacobini etal [26, 27, 9]. This is an interesting challenge for future work.
3.4 Propagation
So far, we have only considered models where migration always successfully transmits best individuals. For non-trivial selection of emigrants, this is not always given. Also if crossover is used during migration, due to disruptive effects migration is not always successful. If we consider randomized migration processes, things become more interesting.
Rowe etal [31] considered a model of propagation in networks. Consider a network where vertices are either informed or not. In each round, each informed vertex tries to inform each of its neighbors. Every such trial is successful with a given probability p, and then the target island becomes informed. These decisions are made independently. Note that an uninformed island might obtain a probability larger than p of becoming informed, in case several informed islands try to inform it. The model is inspired by models from epidemiology; it can be used to model the spread of a
The model of propagation of information directly applies to our previous setting where the network is the migration topology and p describes the probability of successfully migrating a current best solution. Note that when looking for estimations of growth curves and upper bounds on the takeover time, we can assume that p is a lower bound on the actual probability of a successful transmission. Then the model becomes applicable to a broader range of settings, where islands can have different transmission probabilities.
On some graphs like unidirectional rings, we can just multiply our growth curves by p to reflect the expected number of optimal islands after a certain time. It then follows that the time for taking over all m islands is by a factor of larger than in the previous, deterministic model.
However, this reasoning does not hold in general. Multiplying the takeover time in the deterministic setting by does not always give the expected takeover time in the random model. Consider a star graph (or hub), where initially only the center vertex is informed. In the deterministic case p = 1, the takeover time is clearly 1. However, if < p< 1, the time until the last vertex is informed is given by the maximum of independent geometric distributions with parameter p. For constant p, this time is of order , i. e., the time until the last vertex is informed is much larger than the expected time for any specific island to be
Rowe etal [31] presented a detailed analysis of hubs. They also show how to obtain a general upper bound that holds for all graphs. For every graph G with n vertices and diameter the expected takeover time is bounded by
Both terms and make sense. The diameter describes what distance needs to be overcome in order to inform all vertices in the network. The factor gives the expected time until a next vertex is informed, assuming that it has only one informed neighbor. We also obtain (without a factor ) as a lower bound on the takeover time. The additive term is necessary to account for a potentially large variance, as seen in the example for star
If the diameter of the graph is at least , we can drop the -term in the asymptotic bound, leading to an upper bound of .
Interestingly, the concept of propagation also appears in other contexts. When solving shortest paths problems in graphs, metaheuristics like evolutionary algorithms [32, 33, 34] and ant colony optimization (GlossaryTerm
ACO
) [35, 36] tend to propagate shortest paths through the graph. In the single-source shortest paths problem (GlossaryTermSSSP
) one is looking for shortest paths from a source vertex to all other vertices of the graph. The GlossaryTermEA
s and GlossaryTermACO
algorithms tend to find shortest paths first for vertices that are closeto the source, in a sense that their shortest paths only contain few edges. If these shortest paths are found, it enables the algorithm to find shortest paths for vertices that are furtherWhen a shortest paths to vertex u is found and there is an edge in the graph, it is easy to find a shortest path for v. In the case of evolutionary algorithms, an GlossaryTerm
EA
only needs to assign u as a predecessor of v on the shortest path in a lucky mutation in order to find a shortest path to v. In the case of GlossaryTermACO
, pheromones enable an ant to follow pheromones between the source and u, and so it only has to decide to travel between u and v to find a shortest path to v, with good probability.Doerr etal [34, Lemma 3] used tail bounds to prove that the time for propagating shortest paths with an GlossaryTerm
EA
is highly concentrated. If the graph has diameter , the GlossaryTermEA
with high probability finds all shortest paths in time , where in this case. This result is similar to the one obtained by Rowe etal [31]; asymptotically, both bounds are equal. However, the result by Doerr etal [33] also allows for conclusions about growth curves.Lässig and Sudholt [6, Theorem 3] introduced yet another argument for the analysis of propagation times. They considered layers of vertices. The i-th layer contains all vertices that have shortest paths of at most i edges, and that are not on any smaller layer. They bound the time until information is propagated throughout all vertices of a layer. This is feasible since all vertices in layer i are informed with probability at least p if all vertices in layers are informed. If n i is the number of vertices in layer i, the time until the last vertex in this layer is informed is . This gives a bound for the total takeover time of . For small () or large () diameters, we get the same asymptotic bound as before. For other values it is slightly worse.
However, the layering of vertices allows for inclusion of intra-island effects. Assume that the transmission probability p only applies once islands have been taken over (to a significantly large degree) by best individuals. This is a realistic setting as with only a single best individual the probability of selecting it for emigration (or pollination, to be precise) might be very small. If all islands need time in order to reach this stage after the first best individual has reached the island, we obtain an upper bound of
for the takeover time.
4 Examples Where Parallel EAs Excel
Parallel GlossaryTerm
EA
s have been applied to a very broad range of problems, including many NP-hard problems from combinatorial optimization. The present literature is immense; already early surveys like the one by Alba and Troya [37] present long lists of applications of parallel GlossaryTermEA
s. Further applications can be found in [38, 39, 40]. Research on and applications of parallel metaheuristics has increased in recent years, due to the emergence of parallel computer architectures.Crainic and Hail [41] review applications of parallel metaheuristics, with a focus on graph coloring, partitioning problems, covering problems, Steiner tree problems, satisfiability problems, location and network design, as well as the quadratic assignment problems with its famous special cases: the traveling salesman problem and vehicle routing problems. Luque and Alba [19] present selected applications for natural language tagging, the design of combinatorial logic circuits, the workforce planning problem, and the bioinformatics problem of assembling GlossaryTerm
DNA
The literature is too vast to be reviewed in this section. Also, for many hard practical problems it is often hard to determine the effect that parallelization has on search dynamics. The reasons behind the success of parallel models often remain elusive. We follow a different route and describe theoretical studies of evolutionary algorithms where parallelization was proven to be helpful. This concerns illustrative toy functions as well as problems from combinatorial optimization. All following settings are well understood and allow us to gain insights into the effect of parallelization. We consider parallel variants of the most simple evolutionary algorithm called evolutionary algorithm, shortly GlossaryTerm
EA
. It is described in Algorithm 46.3 and it only uses mutation and selection in a population containing just one current search point. We are interested in the optimization time, defined as the number of generations until the algorithm first finds a global optimum. Unless noted otherwise, we consider pseudo-Boolean optimization: the search space contains all bit strings of length n and the task is to maximize a function . We use the common notation for bit strings.Algorithm 46.3 EA for maximizing
1: Initialize uniformly at random.
2: loop
3: Create by copying x and flipping each bit independently with probability .
4: if then .
5: end loop
The presentation in this section is kept informal. For theorems with precise results, including all preconditions, we refer to the respective papers.
4.1 Independent Runs
Independent runs prove useful if the running time has a large variance. The reason is that the optimization time equals the time until the fastest run has found a global optimum.
The variance can be particularly large in the case when the objective function yields local optima that are very hard to overcome. Bimodal functions contain two local optima, and typically only one is a global optimum. One such example was already analyzed theoretically in the seminal runtime analysis paper by Droste etal [42].
We review the analysis of a similar function that leads to a simpler analysis. The function TwoMax was considered by Friedrich etal [43] in the context of diversity mechanisms. It is a function of unitation: the fitness only depends on the number of bits set to 1. The function contains two symmetric slopes that increase linearly with the distance to . Only one of these slopes leads to a global optimum. Formally, the function is defined as the maximum of and its symmetric cousin , with an additional fitness bonus for the all-ones bit string
See Fig. 46.7 for a sketch.
The GlossaryTerm
EA
reaches either a local optimum or a global optimum in expected time . Due to the perfect symmetry of the function on the remainder of the search space, the probability that this is the global optimum is exactly 1/2. If a local optimum is reached, the GlossaryTermEA
has to flip all bits in one mutation in order to reach the global optimum. The probability for this event is exactly .The authors consider deterministic crowding [43] in a population of size μ as a diversity mechanism. It has the same search behavior as μ independent runs of the GlossaryTerm
EA
, except that the running time is counted in a different way. Their result directly transfers to this parallel model. The only assumption is that the number of independent runs is polynomially bounded in n.The probability of finding a global optimum after generations of the parallel system is amplified to . This means that only with probability we arrive at a situation where the parallel GlossaryTerm
EA
needs to escape from a local optimum. When all m islands are in this situation, the probability that at least one island makes this jump in one generation is at mostwhere the last equality holds since m is asymptotically smaller than n n.
This implies that the expected number of generations of a parallel system with m independent runs is
We can see from this formula that the number of runs m has an immense impact on the expected running time. Increasing the number of runs by 1 decreases the second summand by more than a factor of 2. The speedup is, therefore, exponential, up to a point where the running time is dominated by the first term . Note in particular that processors are sufficient to decrease the expected running time to .
This is a very simple example of a superlinear speedup, with regard to the optimization time.
The observed effects also occur in combinatorial optimization. Witt [44] analyzed the GlossaryTerm
EA
on the NP-hard PARTITION problem. The task can be regarded as scheduling on two machines: given a sequence of jobs, each with a specific effort, the goal is to distribute the jobs on two machines to that the largest execution time (the makespan) is minimized.On worst-case instances the GlossaryTerm
EA
has a constant probability of getting stuck in a bad local optimum. The expected time to find a solution with a makespan of less than is , where is an arbitrary constant and OPT is the value of the optimal solution.However, if the GlossaryTerm
EA
is lucky, it can, indeed, achieve a good approximation of the global optimum. Assume we are aiming at a solution with a makespan of at most , for some we can choose. Witt’s analysis shows that then parallel runs output a solution of this quality with probability at least 3/4. (This probability can be further amplified quite easily by using more runs.) Each run takes time . The parallel model represents what is known as a polynomial-time randomized approximation scheme (GlossaryTermPRAS
). The desired approximation quality can be specified, and if is fixed, the total computation time is bounded by a polynomial in n. This was the first example that parallel runs of a randomized search heuristics constitute a GlossaryTermPRAS
for an NP-hard problem.4.2 Offspring Populations
Using offspring populations in a master–slave architecture can decrease the parallel running time and lead to a speedup. We will discuss this issue further in Sect. 46.5 as offspring populations are very similar to island models on complete topologies. For now, we present one example where offspring populations decrease the optimization time very drastically.
Jansen etal [45] compared the GlossaryTerm
EA
against a variant () GlossaryTermEA
that creates λ offspring in parallel and compares the current search point against the best offspring. They constructed a function SufSamp where offspring populations have a significant advantage. We refrain from giving a formal definition, but instead describe the main ideas. The vast majority of all search points tend to lead an GlossaryTermEA
towards the start of a path through the search space. The points on this path have increasing fitness, thus encouraging an GlossaryTermEA
to follow it. All points outside the path are worse, so the GlossaryTermEA
will stay on the path.The path leads to a local optimum at the end. However, the function also includes a number of smaller paths that branch off the main path, see Fig. 46.8. All these paths lead to global optima, but they are difficult to discover. This makes a difference between the GlossaryTerm
EA
and the () GlossaryTermEA
for sufficiently large λ. The GlossaryTermEA
typically follows the main path without discovering the smaller paths branching off. At the end of the main path it thus becomes stuck in a local optimum. The analysis in [45] shows that the GlossaryTermEA
needs superpolynomial time, with high probability.Contrarily, the () GlossaryTerm
EA
performs a more thorough search as it progresses on the main path. The many offspring tend to discover at least one of the smaller branches. The fitness on the smaller branches is larger than the fitness of the main path, so the GlossaryTermEA
will move away from the main path and follow a smaller path. It then finds a global optimum in polynomial time, with high probability.Interestingly, this construction can be easily adapted to show an opposite result. We replace the local optimum at the end of the main path by a global optimum and replace all global optima at the end of the smaller branches by local optima. This yields another function SufSamp, also shown in Fig. 46.8. By the same reasoning as above, the () GlossaryTerm
EA
will become stuck and the GlossaryTermEA
will find a global optimum in polynomial time, with high probability.While the example is clearly constructed and artificial, it can be seen as a cautionary tale. The reader might be tempted to think that using offspring populations instead of creating a single offspring can never increase the number of generations needed to find the optimum. After all, evolutionary search with offspring population is more intense and improvements can be found more easily. As we focus on the number of generations (and do not count the effort for creating λ offspring), it is tempting to claim that offspring populations are never disadvantageous.
The second example shows that this claim – however obvious it may seem – does not hold for general problem classes. Note that this statement is also implied by the well-known no free lunch theorems [46], but the above results are much stronger and more
4.3 Island Models
The examples so far have shown that a more thorough search – by independent runs or increased sampling of offspring – can lead to more efficient running times. Lässig and Sudholt [3] presented a first example where communication makes the difference between exponential and polynomial running times, in a typical run. They constructed a family of problems called where a simple island model finds the optimum in polynomial time, with high probability. This holds for a proper choice of the migration interval and any migration topology that is not too sparse. The islands run GlossaryTerm
EA
s, hence the island model resembles a fine-grained model.Contrarily, both a panmictic population as well as independent islands need exponential time, with high probability. This shows that the weak speedup versus panmixia is superlinear, even exponential (when considering speedups with respect to the typical running time instead of the expected running time). Unlike previous examples, it also shows that more sophisticated means of parallelization can be better than independent runs.
The basic idea of this construction is as follows. An GlossaryTerm
EA
can increase the fitness of its current solutions by gathering a prefix of bits with the same value. Generally, a prefix of i leading ones yields the same fitness as a prefix of i leading zeros. The GlossaryTermEA
has to make a decision whether to collect leading ones (GlossaryTermLO
s) or leading zeros (GlossaryTermLZ
s). This not only holds for the GlossaryTermEA
but also for a (not too large) panmictic population as genetic drift will lead the whole population to either leading ones or leading zeros.In the beginning, both decisions are symmetric. However, after a significant prefix has been gathered, symmetry is broken: after the prefix has reached a length of z, z being a parameter of the function, only leading ones lead to a further fitness increase. If the GlossaryTerm
EA
has gone for leading zeros, it becomes stuck in a local optimum. The parameter z determines the difficulty of escaping from this local optimum.This construction is repeated on several blocks of the bit string that need to be optimized one-by-one. Each block has length . Only if the right decision towards the leading ones is made on the first block, can the block be filled with further leading ones. Once the first block contains only leading ones, the fitness depends on the prefix in the second block, and a further decision between leading ones and leading zeros needs to be made. Figure 46.1 illustrates the problem definition.
So, the problem requires an GlossaryTerm
EA
to make several decisions in succession. The number of blocks, b, is another parameter that determines how many decisions need to be made. Panmictic populations will sooner or later make a wrong decision and become stuck in some local optimum. If b is not too small, the same holds for independent runs.However, an island model can effectively communicate the right decisions on blocks to other islands. Islands that have become stuck in a local optimum can be taken over by other islands that have made the correct decision. These dynamics make up the success of the island model as it can be shown to find global optima with high probability. A requirement is, though, that the migration interval is carefully tuned so that migration only transmits the right information. If migration happens before the symmetry between leading ones and leading zeros is broken, it might be that islands with leading zeros take over islands with leading ones. Lässig and Sudholt [3] give sufficient conditions under which this does not happen, with high probability.
An interesting finding is also how islands can regain independence. During migration, genetic information about future blocks is transmitted. Hence, after migration all islands contain the same genotype on future blocks. This is a real threat as this dependence might imply that all islands make the same decision after moving on to the next block. Then all diversity would be lost.
However, under the conditions given in [3] there is a period of independent evolution following migration, before any island moves on to a new block. During this period of independence, the genotypes of future blocks are subjected to random mutations, independently for each island. The reader might think of moving particles in some space. Initially, all bits are in the same position. However, then particles start moving around randomly. Naturally, they will spread out and separate from one another. After some time the distribution of particles will resemble a uniform distribution. In particular, an observer would not be able to distinguish whether the positions of particles were obtained by this random process or by simply drawing them from a uniform distribution.
The same effect occurs with bits of future blocks; after some time all bits of a future block will be indistinguishable from a random bit string. This shows that independence can not only be gained by independent runs, but also by periods of independent evolution. One could say that the island model combines the advantages of two worlds: independent evolution and selection pressure through migration. The island model is only successful because it can use both migration and periods of independent evolution.
The theoretical results [3] were complemented by experiments in [47]. The aim was to look at what impact the choice of the migration topology and the choice of the migration interval have on performance, regarding the function LOLZ. The theoretical results made a statement about a broad class of dense topologies, but required a very precise migration interval. The experiments showed that the island model is far more robust with respect to the migration interval than suggested by theory.
Depending on the migration interval, some topologies were better than others. The topologies involved were a bidirectional ring, a torus with edges wrapping around, a hypercube graph, and the complete graph. We considered the success rate of the island model, stopping it as soon as all islands had reached local or global optima. We then performed statistical tests comparing these success rates. For small migration intervals, i. e., frequent migrations, sparse topologies were better than dense ones. For large migration intervals, i. e., rare migrations, the effect was the opposite. This effect was expected; however, we also found that the torus was generally better than the hypercube. This is surprising, as both have a similar density. Table 46.2 shows the ranking obtained for commonly used topologies.
Superlinear speedups with island models also occur in simpler settings. Lässig and Sudholt [6] also considered island models for the Eulerian cycle problem. Given an undirected Eulerian graph, the task is to find a Eulerian cycle, i. e., a traversal of the graph on which each edge is traversed exactly once. This problem can be solved efficiently by tailored algorithms, but it served as an excellent test bed for studying the performance of evolutionary algorithms [48, 49, 50, 51].
Instead of bit strings, the problem representation by Neumann [48] is based on permutations of the edges of the graph. Each such permutation gives rise to a walk: starting with the first edge, a walk is the longest sequence of edges such that two subsequent edges in the permutation share a common vertex. The walk encoded by the permutation ends when the next edge does not share a vertex with the current one. A walk that contains all edges represents a Eulerian cycle. The length of the walk gives the fitness of the current solution.
Neumann [48] considered a simple instance that consists of two cycles of equal size, connected by one common vertex (Fig. 46.9). The instance is interesting as it represents a worst case for the time until an improvement is found. This is with respect to randomized local search (GlossaryTerm
RLS
) working on this representation. GlossaryTermRLS
works like the GlossaryTermEA
, but it only uses local mutations. As the mutation operator it uses jumps: an edge is selected uniformly at random and then it is moved to a (different) target position chosen uniformly at random. All edges in between the two positions are shifted accordingly.On the considered instance GlossaryTerm
RLS
typically starts constructing a walk within one of these cycles, either by appending edges to the end of the walk or by prepending edges to the start of the walk. When the walk extends to for the first time, a decision needs to be made. GlossaryTermRLS
can either extend the walk to the opposite cycle, Fig. 46.9. In this case, GlossaryTermRLS
can simply extend both ends of the walk until a Eulerian cycle is formed. The expected time until this happens is , where m denotes the number of edges.However, if another edge in the same cycle is added at , the walk will evolve into one of the two cycles that make up the instance. It is not possible to add further edges to the current walk, unless the current walk starts and ends in . However, the walk can be rotated so that the start and end vertex of the walk is moved to a neighbored vertex. Such an operation takes expected time . Note that the fitness after a rotation is the same as before. Rotations that take the start and end closer to are as likely as rotations that move it away from . The start and end of the walk hence performs a fair random walk, and rotations are needed on average in order to reach . The total expected time for rotating the cycle is
Summarizing, if GlossaryTerm
RLS
makes the right decision then expected time suffices in total. However, if rotations become necessary the expected time increases to . Now consider an island model with m islands running GlossaryTermRLS
. If islands evolve independently for at least generations, all mentioned decisions are made independently, with high probability. The probability of making a wrong decision is 1/3, hence with m islands the probability that all islands make the wrong decision is . The expected time can be shown to beThe choice yields an expectation of , and every value up to leads to a superlinear speedup, asymptotically speaking. Technically, the speedup is even exponential.
Interestingly, this good performance only holds if migration is used rarely, or if independent runs are used. If migration is used too frequently, the island model rapidly loses diversity. If T is any strongly connected topology and is its diameter, we have the following. If
then there is a constant probability that the island that first arrives at a decision at propagates this solution throughout the whole island model, before any other island can make an improvement. This results in an expected running time of . This is almost , even for very large numbers of islands. The speedup is, therefore, logarithmic at best, or even worse. This natural example shows that the choice of the migration interval can make a difference between exponential and logarithmic
4.4 Crossover Between Islands
It has long been known that island models can also be useful in the context of crossover. Crossover usually requires a good diversity in the population to work properly. Due to the higher diversity between different islands, compared to panmixia, recombining individuals from different islands is promising.
Watson and Jansen [52] presented and analyzed a royal road function for crossover: a function where crossover drastically outperforms mutation-based evolutionary algorithms. In contrast to previous theoretically studied examples [53, 54, 55, 56, 57], their goal was to construct a function with a clear building-block structure. In order to prove that a GlossaryTerm
GA
was able to assemble all building blocks, they resorted to an island model with a very particular migration topology. In their single-receiver model all islands except one evolve independently. Each island sends its migrants to a designated island called the receiver (Fig 46.10). This way, all sending islands are able to evolve the right building blocks, and the receiver is used to assemble all these building blocks to obtain the optimum.This idea was picked up later on by Neumann etal [7] in a more detailed study of crossover in island models. We describe parts of their results, as their problem is more illustrative than the one by Watson and Jansen. The former authors considered instances of the NP-hard Vertex cover problem. Given an undirected graph, the goal is to select a subset of vertices such that each vertex is either selected or neighbored to a selected vertex. We say that vertices are covered if this property holds for them. The objective is to minimize the number of selected vertices. The problem has a simple and natural binary representation where each bit indicates whether a corresponding vertex is selected or not.
Prior work by Oliveto etal [58] showed that evolutionary algorithms with panmictic populations even fail on simply structured instance classes like copies of bipartite graphs. An example is shown in Fig. 46.11. Consider a single bipartite graph, i. e., two sets of vertices such that each vertex in one set is connected to every vertex in the other set. If both sets have different sizes, the smaller set is an optimal Vertex cover. The larger set is another Vertex cover. It is, in fact, a non-optimal local optimum which is hard to overcome: the majority of bits has to flip in order to escape. If the instance consists of several independent copies of bipartite graphs, it is very likely that a panmictic GlossaryTerm
EA
will evolve a locally optimal configuration on at least one of the bipartite graphs. Then the algorithm fails to find a global optimum.Island models perform better. Assume the topology is the single-receiver model. In each migration a 2-point crossover is performed between migrants and the individual on the target island. All islands have population size 1 for simplicity. We also assume that the bipartite subgraphs are encoded in such a way that each subgraph forms one block in the bit string. This is a natural assumption as all subgraphs can be clearly identified as building blocks. In addition, Jansen etal [59] presented an automated way of encoding graphs in a crossover-friendly way, based on the degrees of vertices.
The analysis in [7] shows the following. Assume that the migration interval is at least for some positive constant . This choice implies that all islands will evolve to configurations where all bipartite graphs are either locally optimal or globally optimal. With probability we have that for each bipartite graph at least a constant fraction of all sender islands will have the globally optimal
All that is left to do for the receiver island is to rely on crossover combining all present good building blocks. As two-point crossover can select one block from an immigrant and the remainder from the current solution on the receiver island, all good building blocks have a good chance to be obtained. The island model finds a global optimum within a polynomial number of generations, with probability .
5 Speedups by Parallelization
5.1 A General Method for Analyzing Parallel EAs
We now finally discuss a method for estimating the speedup by parallelization. Assume that, instead of running a single GlossaryTerm
EA
, we run an island model where each island runs the same GlossaryTermEA
. The question is by how much the expected optimization time (i. e., the number of generations until a global optimum is found) decreases, compared to the single, panmictic GlossaryTermEA
. Recall that this speedup is called weak orthodox speedup [14].In the following we sometimes speak of the expected parallel optimization time to emphasize that we are dealing with a parallel system. If the number of islands and the population size on each island is fixed, we can simply multiply this time by a fixed factor to obtain the expected number of function evaluations.
Lässig and Sudholt [4] presented a method for estimating the expected optimization time of island models. It combines growth curves with a well-known method for the analysis of evolutionary algorithms. The fitness-level method or method of f -based partitions [60] is a simple, yet powerful technique. The idea is to partition the search space into non-empty sets A 1, A 2, , A m such that the following holds:
-
for each each search point in A i has a strictly worse fitness than each search point in and
-
A m contains all global optima.
The described ordering with respect to the fitness f is often denoted
Note that A m can also be redefined towards containing all search points of some desired quality if the goal is not global optimization.
We say that a population-based algorithm (including populations of size 1) is in A i or on fitness level i if the best search point in the population is in A i . Now, assume that we know that s i is a lower bound on the probability that the algorithm finds a solution in if it is currently in A i . Then the reciprocal is an upper bound on the expected time until this event happens. If the algorithm is elitist (i. e., it never loses the current best solution), then it will never decrease its current fitness level. A sufficient condition for finding an optimal solution is that all sets are left in the described manner at least once. This implies the following bound on the expected optimization time.
Theorem 46.1 Wegener [60]
Consider an elitist GlossaryTerm
EA
and assume a fitness-level partition where A m is the set of global optima. Let s i be a lower bound for the probability that in one generation the GlossaryTermEA
finds a search point in if the best individual in the parent population is in A i . Then the expected optimization time is bounded byThe above bound applies to all elitist algorithms. It is generally applicable and often quite versatile, as we can freely choose the partition . The challenge is to find such a partition and to find corresponding probability bounds for finding improvements. Many papers have shown that this method – applied explicitly or implicitly – yields tight bounds on the expected optimization time of GlossaryTerm
EA
s for various problems [32, 42, 48]. It can also be used as part of a more general analysis [61, 62].We are being pessimistic in assuming that every fitness level has to be left. In reality, several fitness levels might be skipped. The fitness-level method often yields good bounds if not too many levels are skipped, and if the probability bounds s i are good estimates for the real probabilities of finding a better fitness-level set. Note that the lower bound s i must apply regardless of the precise search point(s) in A i present in the population, hence we need to consider the worst-case probability of escaping from A i .
Nevertheless, the fitness-level method often yields tight bounds. Sudholt [63] recently developed a lower-bound method based on fitness levels, which in each case shows that the upper bound is tight. Also, Lehre [64] recently presented an extension of the method to non-elitist algorithms. Asymptotically, the same bound as in Theorem 46.1 applies, if some additional conditions on the selection pressure and the population size are fulfilled. For the sake of simplicity, we focus on elitist algorithms in the following.
If s i denotes the probability of a single offspring finding an improvement, this probability can be increased by using λ offspring in parallel. We have already seen in Sect. 46.1 how λ independent trials can increase or amplify a success probability p to . The same reasoning applies to the probability s i for finding an improvement on the current best level. Figure 46.1 has shown how this probability increases with the number of trials. Figure 46.12 shows how the expected time for having a success decreases with the number of offspring. In fact, the curves in Fig. 46.12 are just reciprocals of those in the previous Fig. 46.1.
Figure 46.12 shows that the speedup can be close to linear (in a strict, non-asymptotic sense), especially for low success probabilities. As the probability of increasing the current fitness level i is at least , we obtain the following.
Theorem 46.2
Consider an elitist GlossaryTerm
EA
creating λ offspring independently in each generation. Assume a fitness-level partition where A m is the set of global optima. Let s i be a lower bound for the probability that in one generation a single offspring finds a search point in if the best individual in the parent population is in A i . Then the expected optimization time is bounded byNote that the first bound for reproduces the previous upper bound from Theorem 46.1. For the second bound we used
where the inequality was proposed by Jon Rowe (personal communication, 2011); it can be proven by a simple induction.
Our estimate of the probability for an improvement increases with the number of islands on the current best fitness level. In a spatially structured GlossaryTerm
EA
these growth curves are non-trivial. Especially with a sparse migration topology, information about the current best fitness level is typically propagated quite slowly. The increased exploration slows down exploitation. Still, even sparse topologies lead to drastically improved upper bounds, when compared to the simple bound for a sequential GlossaryTermEA
from Theorem 46.1. The precise bounds crucially depend on the particular topology.We first consider a setting where migration always transmits the current best fitness level and migration occurs in every generation. It is possible to adapt the results to account for larger migration intervals. One way of doing this is to redefine s i to represent a lower bound of finding an improvement in a time period between migrations. Then we obtain an upper bound on the expected number of migrations. For the sake of simplicity, we only consider the case in the following.
The following theorem was presented in Lässig and Sudholt [6]; it is a refined special case of previous results [4]. The main proof idea is to combine the investigation of growth curves with the consideration of amplified success probabilities.
Theorem 46.3 Lässig and Sudholt [6]
Consider an island model with μ islands where each island runs an elitist GlossaryTerm
EA
. In every iteration each island sends copies of its best individual to all neighbored islands (i. e., ). Each island incorporates the best out of its own individuals and its immigrants.For every partition if s i is a lower bound for the probability that in one generation an island in A i finds a search point in then the expected parallel optimization time is bounded by:
-
1.
for every unidirectional ring (a ring with edges in one direction) or any other strongly connected topology,
-
2.
for every undirected grid or torus graph with side lengths at least ,
-
3.
for the complete topology .
Note that the bound for the complete topology is equal to the upper bound for offspring populations, Theorem 46.2. This makes sense as an island model with a complete topology propagates the current best fitness level like an offspring population.
All bounds in Theorem 46.3 consist of two additive terms. The second term
represents a perfect linear speedup, compared to the upper bound from Theorem 46.1. The larger we choose the number of islands μ, the smaller this term becomes. The first additive term is related to the growth curves of the current best fitness level in the island model. The denser the topology, the faster information is spread, and the smaller this term becomes. Note that it is independent of μ. It can be regarded as the term limiting the degree of parallelizability. We can increase the number of islands in order to decrease the second term
but we cannot decrease the first term by changing μ.
This allows for immediate conclusions about cases where we obtain an asymptotic linear speedup over a single-island GlossaryTerm
EA
. For all choices of μ where the second term is asymptotically no smaller than the first term, the upper bound is smaller than the upper bound from Theorem 46.1 by a factor of order μ. This is an asymptotic linear speedup if the upper bound from Theorem 46.1 is asymptotically tight. (If it is not, we can only compare upper bounds for a sequential and a parallel GlossaryTermEA
.)We illustrate this with a simple and well-known test function from pseudo-Boolean optimization. The algorithm considered is an island model where each island runs a GlossaryTerm
EA
; the island model is also called parallel GlossaryTermEA
. The functioncounts the number of leading ones in the bit string. We choose the canonic partition where A i contains all search points with fitness i, i. e., i leading ones. For any set A i , we use the following lower bound on the probability for an improvement.
An improvement occurs if the first 0-bit is flipped from 0 to 1 and no other bit flips. The probability of flipping the mentioned 0-bit is as each bit is flipped independently with probability . The probability of not flipping any other bit is . We use the common estimate , where , so the probability of an improvement is at least for all . Plugging this into Theorem 46.3, the second term is for all bounds. The first terms are
for the ring,
for the torus, and n for the complete graph, respectively.
For the ring, choosing islands results in an expected parallel time of as the second term is asymptotically not smaller than the first one. This is asymptotically smaller by a factor of than the expected optimization time of a single GlossaryTerm
EA
, [42]. Hence, each choice of μ up to gives a linear speedup. For the torus we obtain a linear speedup for in the same fashion. For the complete graph this even holds for . One can see here that the island model can decrease the expected parallel running time by significant polynomial factors.Table 46.3 lists expected parallel optimization time bounds for several well-known pseudo-Boolean functions. The above analysis for LO generalizes to all unimodal functions. A function is called unimodal here if every non-optimal search point has a better Hamming neighbor, i. e., a better search point can be reached by flipping exactly one specific bit. counts the number of ones, hence modeling a simple hill climbing task. Finally, [42] is a multimodal function of tunable difficulty. An GlossaryTerm
EA
typically has to make a jump by flipping k bits simultaneously, where . The GlossaryTermEA
has an expected optimization time of , hence growing rapidly with increasing k.One can see that the island model leads to drastically reduced parallel optimization times. This particularly holds for problems where improvements are hard to find.
We remark that Lässig and Sudholt [4] also considered parallel GlossaryTerm
EA
s where migration is not always successful in transmitting information about the current best fitness level. This includes the case where crossover is used during migration and crossover has a certain probability of being disruptive. We do obtain upper bounds on the expected optimization time if we know a lower bound on the probability of a successful transmission. The bounds depend on ; the degree of this dependence is determined by the topology. For simplicity we only focus on the deterministic case5.2 Speedups in Combinatorial Optimization
The techniques are also applicable in combinatorial optimization. We review two examples here, presented in [6]. Scharnow etal [32] considered the classical sorting problem as an optimization problem: given a sequence of n distinct elements from a totally ordered set, sorting is the problem of maximizing sortedness. Without loss of generality the elements are ; then the aim is to find the permutation such that is the sorted
The search space is the set of all permutations π on . Two different operators are used for mutation. An exchange chooses two indices uniformly at random from and exchanges the entries at positions i and j. A jump chooses two indices in the same fashion. The entry at i is put at position j and all entries in between are shifted accordingly. For instance, a jump with i = 2 and j = 5 would turn into .
The GlossaryTerm
EA
draws S according to a Poisson distribution with parameter and then performs elementary operations. Each operation is either an exchange or a jump, where the decision is made independently and uniformly for each elementary operation. The resulting offspring replaces its parent if its fitness is not worse. The fitness function describes the sortedness of . As in [32], we consider the following measures of-
INV measures the number of pairs , such that (pairs in correct order),
-
HAM measures the number of indices i such that (elements at the correct position),
-
LAS equals the largest k such that for some (length of the longest ascending subsequence),
-
EXC equals the minimal number of exchanges (of pairs and ) to sort the sequence, leading to a minimization problem.
The expected optimization time of the GlossaryTerm
EA
is and for all fitness functions. The upper bound is tight for LAS, and it is believed to be tight for INV, HAM, and EXC as well [32]. Theorem 46.3 yields the following. For INV, all topologies guarantee a linear speedup only in case and the bound for the GlossaryTermEA
is tight. The other functions allow for linear speedups up to (ring), (torus), and (), respectively (again assuming tightness, otherwise up to a factor of ). Note how the results improve with the density of the topology. HAM, LAS, and EXC yield much better guarantees for the island model than INV. This is surprising as there is no visible performance difference for a single GlossaryTermEA
. Theorem 46.3 yields the following results also shown in Tab. 46.4An explanation is that INV leads to non-optimal fitness levels that are quite easy to overcome. HAM, LAS, and EXC have only n non-optimal fitness levels that are more difficult. For a single GlossaryTerm
EA
both settings are equally difficult, leading to asymptotically equal expected times (assuming all upper bounds are tight). However, the latter setting is easier to parallelize than the former as it is easier to amplify small success probabilities.We also consider parallel variants of the GlossaryTerm
EA
for the single source shortest path problem (GlossaryTermSSSP
) [32]. An GlossaryTermSSSP
instance is given by an undirected connected graph with vertices and a distance matrix , where defines the length value for given edges from node i to node j. We are searching for shortest paths from a node s (without loss of generality ) to each other node .A candidate solution is represented as a shortest paths tree, a tree rooted at s with directed shortest paths to all other vertices. We define a search point x as vector of length , where position i describes the predecessor node x i of node i in the shortest path tree. Note that infeasible solutions are possible if the predecessors do not encode a tree. An elementary mutation chooses a vertex i uniformly at random and replaces its predecessor x i by a vertex chosen uniformly at random from . We call this a vertex-based mutation. Doerr etal [65]proposed an edge-based mutation operator. An edge is chosen uniformly at random, and the edge is made a predecessor edge for its end
The GlossaryTerm
EA
uses either vertex-based mutations or edge-based ones. It creates an offspring using S elementary mutations, where S is chosen according to a Poisson distribution with . The result of an offspring is accepted in case no distance to any vertex has gotten worse.Applying Theorem 46.3 along with a layering argument as described at the end of Sect. 46.3.4 yields the bounds on the expected parallel optimization time shown in Table 46.5.
The upper bounds for the island models with constant μ match the expected time of the GlossaryTerm
EA
if or as then . In other cases, the upper bounds are off by a factor of . Table 46.5 also shows a range of μ-values for which the speedup is linear (if or ) or almost linear, that is, when disregarding the term.Note how the possible speedups significantly increase with the density of the topology. The speedups also depend on the graph instance and the maximum number of edges on any shortest path. For a single GlossaryTerm
EA
edge-based mutations are more effective than vertex-based mutations [65]. Island models with edge-based mutations cannot be parallelized as effectively for sparse graphs as those with vertex-based mutations if the graph is sparse, i. e., . Then the number of islands that guarantees a linear speedup is smaller for edge-based mutations than for vertex-based mutations. The reason is that with a more efficient mutation operator there is less potential for further speedups with a parallel GlossaryTermEA
.5.3 Adaptive Numbers of Islands
Theorem 46.3presents a powerful tool for determining the number of islands that give an asymptotic linear speedup. However, it would be even more desirable to have an adaptive system that automatically finds the ideal number of islands throughout the
In [5] Lässig and Sudholtproposed and analyzed two simple adaptive schemes for choosing the number of islands. Both schemes check whether in the current generation some island has found an improvement over the current best fitness in the system. If no island has found an improvement, the number of islands is doubled. This can be implemented, for instance, by copying each island. New processors can be allocated to host these islands in large clusters or by using cloud
If some island has found an improvement, the number of islands is reduced by removing selected islands from the system and de-allocating resources. Both schemes differ in the way they decrease the number of islands. The first scheme, simply called Scheme A, only keeps one island containing a current best solution. Scheme B halves the number of islands. Both schemes use complete topologies, so all remaining islands will contain current best individuals
Both mechanisms lead to optimal speedups in many cases. Doubling the number of islands may seem aggressive, but the analysis shows that the probability of allocating far more islands than necessary is very very small. The authors considered the expected sequential optimization time, defined as the number of function evaluations, to measure the total effort over time. With both schemes it is guaranteed that the expected sequential time does not exceed the simple bound for a sequential GlossaryTerm
EA
from Theorem 46.1, asymptotically. The expected parallel times on each fitness level can, roughly speaking, be replaced by their logarithms.The following is a slight simplification of results in [5].
Theorem 46.4 Lässig and Sudholt [5]
Given an f-based partition and lower bounds on the probability of a single island finding an improvement, the expected sequential times for island models using a complete topology and either Scheme A or Scheme B are bounded by
If each set A i contains only a single fitness value then also the expected parallel time is bounded by
Actually, for Scheme A we can obtain slightly better constants than the ones stated in Theorem 46.4. However, with a more detailed analysis one can show that Scheme B can perform much better than Scheme A. Lässig’s and Sundholt’s work [5] contains a more refined upper bound for Scheme B. We only show a special case where the fitness levels become increasingly harder. Then it makes sense to only halve the number of islands when an improvement is found, instead of resetting the number of islands to 1.
Theorem 46.5 Lässig and Sudholt [5]
Given an f-based partition , where each set A i contains only a single fitness value and for the probability bounds it holds . Then the expected parallel running time for an island model using a complete topology and Scheme B is bounded by
Example applications for a parallel GlossaryTerm
EA
in Table 46.6 show that Scheme B can automatically lead to the same speedups as when using an optimal number of islands. This holds for ONEMAX, LO, and the general bound for unimodal functions. For it also holds in the most relevant cases, when , as then the expected parallel time is .We conclude that simply doubling or halving the number of islands represents a simple and effective mechanism for finding optimal parameters
6 Conclusions
Parallel evolutionary algorithm can effectively reduce computation time and at the same time lead to an increased exploration and better diversity, compared to sequential evolutionary algorithms.
We have surveyed various forms of parallel GlossaryTerm
EA
s, from independent runs to island models and cellular GlossaryTermEA
s. Different lines of research have been discussed that give insight into the working principles behind parallel GlossaryTermEA
s. This includes the spread of information, growth curves for current best solutions, and takeover times.A recurring theme was the possible speedup that can be achieved with parallel GlossaryTerm
EA
s. We have elaborated on the reasons why superlinear speedups are possible in practice. Rigorous runtime analysis has given examples where parallel GlossaryTermEA
s excel over sequential algorithms, with regard to the number of generations or the number of function evaluations until a global optimum is found. The final section has covered a method for estimating the expected parallel optimization time of island models. The method is easy to apply as we can automatically transfer existing analyses for sequential GlossaryTermEA
s to a parallel version thereof. Examples have been given for pseudo-Boolean optimization and combinatorial optimization. The results have also led to the discovery of a simple, yet surprisingly powerful adaptive scheme for choosing the number of islands.There are many possible avenues for future work. In the light of the development in computer architecture, it is important to develop parallel GlossaryTerm
EA
s that can run effectively on many cores. It also remains a crucial issue to increase our understanding of how design choices and parameters affect the performance of parallel GlossaryTermEA
s. Rigorous runtime analysis has emerged recently as a new line of research that can give novel insights in this respect and opens new roads. The present results should be extended towards further algorithms, further problems, and more detailed cost models that reflect the costs for communication in parallel architectures. It would also be interesting to derive further rigorous results on takeover times in settings where propagation through migration is probabilistic. Finally, it is important to bring theory and practice together in order to create synergetic effects between the two6.1 Further Reading
This book chapter does not claim to be comprehensive. In fact, parallel evolutionary algorithms represent a vast research area with a long history. Early variants of parallel evolutionary algorithms were developed, studied, and applied more than 20 years ago. We, therefore, point the reader to references that may complement this chapter. Paz [66] presented a review of early literature and the history of parallel GlossaryTerm
EA
s. The survey by Alba and Troya [37] contains detailed overviews of parallel GlossaryTermEA
s and their characteristics.This chapter does not cover implementation details of parallel evolutionary algorithms. We refer to the excellent survey by Alba and Tomassini [38]. This survey also includes an overview of the theory of parallel GlossaryTerm
EA
s. The emphasis is different from this chapter and it can be used to complement this chapter.Tomassini’s text book [67] describes various forms of parallel GlossaryTerm
EA
s like island models, cellular GlossaryTermEA
s, and coevolution. It also presents many mathematical and experimental results that help understand how parallel GlossaryTermEA
s work. Furthermore, it contains an appendix dealing with the implementation of parallel GlossaryTermEA
s.The book edited by Alba etal [39] takes a broader scope on parallel models that also include parallel evolutionary multiobjective optimization and parallel variants of swarm intelligence algorithms like particle swarm optimization and ant colony optimization. The book contains a part on parallel hardware as well as a number of applications of parallel metaheuristics.
Alba’s edited book on parallel metaheuristics [40] has an even broader scope. It covers parallel variants of many common metaheuristics such as genetic algorithms, genetic programming, evolution strategies, ant colony optimization, estimation-of-distribution algorithms, scatter search, variable-neighborhood search, simulated annealing, tabu search, greedy randomized adaptive search procedures (GlossaryTerm
GRASP
s), hybrid metaheuristics, multiobjective optimization, and heterogeneous metaheuristics.The most recent text book was written by Luque and Alba [19]. It provides an excellent introduction into the field, with hands-on advice on how to present results for parallel GlossaryTerm
EA
s. Theoretical models of selection pressure in distributed GlossaryTermGA
s are presented. A large part of the book then reviews selected applications of parallel GlossaryTermGA
s.Abbreviations
- ACO:
-
ant colony optimization
- CPU:
-
central processing unit
- DNA:
-
deoxyribonucleic acid
- EA:
-
evolutionary algorithm
- GA:
-
genetic algorithm
- GPU:
-
graphics processing unit
- GRASP:
-
greedy randomized adaptive search procedure
- LO:
-
leading one
- LZ:
-
leading zero
- PRAS:
-
polynomial-time randomized approximation scheme
- RLS:
-
randomized local search
- SSSP:
-
single-source shortest path problem
References
P.S. Oliveto, J. He, X. Yao: Time complexity of evolutionary algorithms for combinatorial optimization: A decade of results, Int. J. Autom. Comput. 4(3), 281–293 (2007)
F. Neumann, C. Witt: Bioinspired Computation in Combinatorial Optimization – Algorithms and Their Computational Complexity (Springer, Berlin, Heidelberg 2010)
J. Lässig, D. Sudholt: The benefit of migration in parallel evolutionary algorithms, Proc. Genet. Evol. Comput. Conf. (GECCO 2010) (ACM, New York 2010) pp. 1105–1112
J. Lässig, D. Sudholt: General scheme for analyzing running times of parallel evolutionary algorithms, 11th Int. Conf. Parallel Probl. Solving Nat. (PPSN 2010) (Springer, Berlin, Heidelberg 2010) pp. 234–243
J. Lässig, D. Sudholt: Adaptive population models for offspring populations and parallel evolutionary algorithms, Proc. 11th Workshop Found. Genet. Algorithms (FOGA 2011) (ACM, Berlin, Heidelberg 2011) pp. 181–192
J. Lässig, D. Sudholt: Analysis of speedups in parallel evolutionary algorithms for combinatorial optimization, 22nd Int. Symp. Algorithms Comput. (ISAAC '11) (Springer, Berlin, Heidelberg 2011) pp. 405–414
F. Neumann, P.S. Oliveto, G. Rudolph, D. Sudholt: On the effectiveness of crossover for migration in parallel evolutionary algorithms, Proc. Genet. Evol. Comput. Conf. (GECCO 2011) (ACM, New York 2011) pp. 1587–1594
M. De Felice, S. Meloni, S. Panzieri: Effect of topology on diversity of spatially-structured evolutionary algorithms, Proc. 13th Annu. Genet. Evol. Comput. Conf. (GECCO '11) (2011) pp. 1579–1586
M. Giacobini, M. Tomassini, A. Tettamanzi: Takeover time curves in random and small-world structured populations, Proc. Genet. Evol. Comput. Conf. (GECCO '05) (ACM, New York 2005) pp. 1333–1340
Z. Skolicki: An Analysis of Island Models in Evolutionary Computation, Ph.D. Thesis (George Mason University, Fairfax 2000)
E. Alba, M. Giacobini, M. Tomassini, S. Romero: Comparing Synchronous and Asynchronous Cellular Genetic Algorithms, Parallel Problem Solving from Nature VII (Springer, Berlin, Heidelberg 2002) pp. 601–610
M. Mitzenmacher, E. Upfal: Probability and Computing (Cambridge Univ. Press, Cambridge 2005)
J. Sprave: A unified model of non-panmictic population structures in evolutionary algorithms, Proc. 1999 Congr. Evol. Comput. (IEEE, Bellingham 1999) pp. 1384–1391
E. Alba: Parallel evolutionary algorithms can achieve super-linear performance, Inf. Process. Lett. 82(1), 7–13 (2002)
R.S. Barr, B.L. Hickman: Reporting computational experiments with parallel algorithms: Issues, measures, and experts' opinion, ORSA J. Comput. 5(1), 2–18 (1993)
D.E. Goldberg, K. Deb: A comparatative analysis of selection schemes used in genetic algorithms. In: Foundations of Genetic Algorithms, ed. by G.J.E. Rawlins (Morgan Kaufmann, Burlington 1991) pp. 69–93
J. Sarma, K. De Jong: An analysis of local selection algorithms in a spatially structured evolutionary algorithm, Proc. 7th Int. Conf. Genet. Algorithms (Morgan Kaufmann, Burlington 1997) pp. 181–186
E. Alba, G. Luque: Growth curves and takeover time in distributed evolutionary algorithms, Proc. Genet. Evol. Comput. Conf. (Springer, Berlin, Heidelberg 2004) pp. 864–876
G. Luque, E. Alba: Parallel Genetic Algorithms – Theory and Real World Applications, Studies in Computational Intelligence, Vol. 367 (Springer, Berlin, Heidelberg 2011)
Z. Skolicki, K.A. De Jong: The influence of migration sizes and intervals on island models, Proc. Genet. Evol. Comput. Conf. (GECCO '05) (ACM, New York 2005) pp. 1295–1302
M. Giacobini, E. Alba, M. Tomassini: Selection intensity in asynchronous cellular evolutionary algorithms, Proc. Genet. Evol. Comput. Conf. (GECCO '03) (Springer, Berlin, Heidelberg 2003) pp. 955–966
G. Rudolph: Takeover times and probabilities of non-generational selection rules, Proc. Genet. Evol. Comput. Conf. (GECCO '00) (Morgan Kaufmann, Burlington 2000) pp. 903–910
G. Rudolph: Takeover times of noisy non-generational selection rules that undo extinction, Proc. 5th Int. Conf. Artif. Neural Nets Genet. Algorithms (ICANNGA 2001) (Springer, Berlin, Heidelberg 2001) pp. 268–271
G. Rudolph: On takeover times in spatially structured populations: Array and ring, Proc. 2nd Asia-Pac. Conf. Genet. Algorithms Appl. (Global-Link Publishing, Hong Kong 2000) pp. 144–151
G. Rudolph: Takeover time in parallel populations with migration, Proc. 2nd Int. Conf. Bioinspired Optim. Methods Appl. (BIOMA 2006), ed. by B. Filipic, J. Silc (2006) pp. 63–72
M. Giacobini, M. Tomassini, A. Tettamanzi: Modelling selection intensity for linear cellular evolutionary algorithms, Proc. 6th Int. Conf. Artif. Evol., Evol. Artif. (Springer, Berlin, Heidelberg 2003) pp. 345–356
M. Giacobini, E. Alba, A. Tettamanzi, M. Tomassini: Selection intensity in cellular evolutionary algorithms for regular lattices, IEEE Trans. Evol. Comput. 9, 489–505 (2005)
C. Witt: Runtime analysis of the $(\mu+1)$EA on simple pseudo-Boolean functions, Evol. Comput. 14(1), 65–86 (2006)
D. Sudholt: The impact of parametrization in memetic evolutionary algorithms, Theor. Comput. Sci. 410(26), 2511–2528 (2009)
M. Giacobini, E. Alba, A. Tettamanzi, M. Tomassini: Modeling selection intensity for toroidal cellular evolutionary algorithms, Proc. Genet. Evol. Comput. Conf. (GECCO '04) (Springer, Berlin, Heidelberg 2004) pp. 1138–1149
J. Rowe, B. Mitavskiy, C. Cannings: Propagation time in stochastic communication networks, 2nd IEEE Int. Conf. Digit. Ecosyst. Technol. (2008) pp. 426–431
J. Scharnow, K. Tinnefeld, I. Wegener: The analysis of evolutionary algorithms on sorting and shortest paths problems, J. Math. Model, Algorithms 3(4), 349–366 (2004)
B. Doerr, E. Happ, C. Klein: Crossover can provably be useful in evolutionary computation, Theor. Comput. Sci. 425, 17–33 (2012)
B. Doerr, E. Happ, C. Klein: A tight analysis of the $(1+1)$-EA for the single source shortest path problem, Proc. IEEE Congr. Evol. Comput. (CEC '07) (IEEE, Bellingham 2007) pp. 1890–1895
C. Horoba, D. Sudholt: Ant colony optimization for stochastic shortest path problems, Proc. Genet. Evol. Comput. Conf. (GECCO 2010) (ACM, New York 2010) pp. 1465–1472
D. Sudholt, C. Thyssen: Running time analysis of ant colony optimization for shortest path problems, J. Discret. Algorithms 10, 165–180 (2012)
E. Alba, J.M. Troya: A survey of parallel distributed genetic algorithms, Complexity 4, 31–52 (1999)
E. Alba, M. Tomassini: Parallelism and evolutionary algorithms, IEEE Trans. Evol. Comput. 6, 443–462 (2002)
E. Alba, N. Nedjah, L. de Macedo Mourelle: Parallel Evolutionary Computations (Springer, Berlin, Heidelberg 2006)
E. Alba: Parallel Metaheuristics: A New Class of Algorithms (Wiley-Interscience, New York 2005)
T.G. Crainic, N. Hail: Parallel metaheuristics applications. In: Parallel Metaheuristics: A New Class of Algorithms, (Wiley-Interscience, New York 2005)
S. Droste, T. Jansen, I. Wegener: On the analysis of the $(1+1)$ evolutionary algorithm, Theor. Comput. Sci. 276, 51–81 (2002)
T. Friedrich, P.S. Oliveto, D. Sudholt, C. Witt: Analysis of diversity-preserving mechanisms for global exploration, Evol. Comput. 17(4), 455–476 (2009)
C. Witt: Worst-case and average-case approximations by simple randomized search heuristics, Proc. 22nd Symp. Theor. Asp. Comput. Sci. (STACS '05) (Springer, Berlin, Heidelberg 2005) pp. 44–56
T. Jansen, K.A. De Jong, I. Wegener: On the choice of the offspring population size in evolutionary algorithms, Evol. Comput. 13, 413–440 (2005)
C. Igel, M. Toussaint: A no-free-lunch theorem for non-uniform distributions of target functions, J. Math. Model, Algorithms 3(4), 313–322 (2004)
J. Lässig, D. Sudholt: Experimental supplements to the theoretical analysis of migration in the island model, 11th Int. Conf. Parallel Probl. Solving Nat. (PPSN 2010) (Springer, Berlin, Heidelberg 2010) pp. 224–233
F. Neumann: Expected runtimes of evolutionary algorithms for the Eulerian cycle problem, Comput. Oper. Res. 35(9), 2750–2759 (2008)
B. Doerr, N. Hebbinghaus, F. Neumann: Speeding up evolutionary algorithms through asymmetric mutation operators, Evol. Comput. 15, 401–410 (2007)
B. Doerr, D. Johannsen: Adjacency list matchings – An ideal genotype for cycle covers, Proc. Genet. Evol. Comput. Conf. (GECCO '07) (ACM, New York 2007) pp. 1203–1210
B. Doerr, C. Klein, T. Storch: Faster evolutionary algorithms by superior graph representation, 1st IEEE Symp. Found. Comput. Intell. (FOCI '07) (2007) pp. 245–250
R.A. Watson, T. Jansen: A building-block royal road where crossover is provably essential, Proc. Genet. Evol. Comput. Conf. (GECCO '07) (ACM, New York 2007) pp. 1452–1459
T. Jansen, I. Wegener: On the analysis of evolutionary algorithms – A proof that crossover really can help, Algorithmica 34(1), 47–66 (2002)
T. Jansen, I. Wegener: Real royal road functions – Where crossover provably is essential, Discret. Appl. Math. 149, 111–125 (2005)
T. Storch, I. Wegener: Real royal road functions for constant population size, Theor. Comput. Sci. 320, 123–134 (2004)
S. Fischer, I. Wegener: The one-dimensional ising model: Mutation versus recombination, Theor. Comput. Sci. 344(2/3), 208–225 (2005)
D. Sudholt: Crossover is provably essential for the ising model on trees, Proc. Genet. Evol. Comput. Conf. (GECCO '05) (ACM, New York 2005) pp. 1161–1167
P.S. Oliveto, J. He, X. Yao: Analysis of the $(1+1)$-EA for finding approximate solutions to vertex cover problems, IEEE Trans. Evol. Comput. 13(5), 1006–1029 (2009)
T. Jansen, P.S. Oliveto, C. Zarges: On the analysis of the immune-inspired B-cell algorithm for the vertex cover problem, Proc. 10th Int. Conf. Artif. Immune Syst. (ICARIS 2011) (Springer, Berlin, Heidelberg 2011) pp. 117–131
I. Wegener: Methods for the analysis of evolutionary algorithms on pseudo-Boolean functions. In: Evolutionary Optimization, ed. by R. Sarker, X. Yao, M. Mohammadian (Kluwer, Dordrecht 2002) pp. 349–369
F. Neumann, I. Wegener: Randomized local search, evolutionary algorithms, and the minimum spanning tree problem, Theor. Comput. Sci. 378(1), 32–40 (2007)
D. Sudholt, C. Zarges: Analysis of an iterated local search algorithm for vertex coloring, 21st Int. Symp. Algorithms Comput. (ISAAC 2010) (Springer, Berlin, Heidelberg 2010) pp. 340–352
D. Sudholt: General lower bounds for the running time of evolutionary algorithms, 11th Int. Conf. Parallel Probl. Solving Nat. (PPSN 2010) (Springer, Berlin, Heidelberg 2010) pp. 124–133
P.K. Lehre: Fitness-levels for non-elitist populations, Proc. 13th Annu. Genet. Evol. Comput. Conf. (GECCO '11) (ACM, New York 2011) pp. 2075–2082
B. Doerr, D. Johannsen, C. Winzen: Drift analysis and linear functions revisited, IEEE Congr. Evol. Comput. (CEC '10) (2010) pp. 1967–1974
E. Cantú Paz: A survey of parallel genetic algorithms, Tech. Rep., Illinois Genetic Algorithms Laboratory (University of Illinois at Urbana Champaign, Urbana 1997)
M. Tomassini: Spatially Structured Evolutionary Algorithms: Artificial Evolution in Space and Time (Springer, Berlin, Heidelberg 2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Sudholt, D. (2015). Parallel Evolutionary Algorithms. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_46
Download citation
DOI: https://doi.org/10.1007/978-3-662-43505-2_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43504-5
Online ISBN: 978-3-662-43505-2
eBook Packages: EngineeringEngineering (R0)