Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

9.1 Introduction and Historical Notes

The generic denomination of ‘Memetic Algorithms’ (MAs) [135] is used to encompass a broad class of metaheuristics, understanding the latter as high-level templates that orchestrate the functioning on low-level rules and heuristics. The method, which is based on a population of agents, had practical success in a variety of problem domains, in particular for the heuristic resolution of NP-hard optimization problems.

Unlike traditional evolutionary computation (EC) methods, MAs are intrinsically concerned with exploiting all available knowledge about the problem under study. The incorporation of problem domain knowledge is not an optional mechanism, but a fundamental feature that characterizes MAs. This functioning philosophy is perfectly illustrated by the term “memetic”. Coined by Dawkins [40], the word ‘meme’ denotes an analogous to the gene in the context of cultural evolution [116]. In Dawkins’ words:

Examples of memes are tunes, ideas, catch-phrases, clothes fashions, ways of making pots or of building arches. Just as genes propagate themselves in the gene pool by leaping from body to body via sperms or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation.

This characterization of a meme suggests that in cultural evolution processes, information is not simply transmitted unaltered between individuals. Rather, it is processed and enhanced by the communicating parts. This enhancement is accomplished in MAs by incorporating heuristics, approximation algorithms, local search techniques, specialized recombination operators, truncated exact methods, etc. In essence, most MAs can be interpreted as a search strategy in which a population of optimizing agents cooperate and compete [144]. The success of MAs can probably be explained as being a direct consequence of the synergy of the different search approaches they incorporate.

The most crucial and distinctive feature of MAs, the inclusion of problem knowledge, is also supported by strong theoretical results. As Hart and Belew [68] initially stated and Wolpert and Macready [190] later popularized in the so-called No-Free-Lunch Theorem, a search algorithm strictly performs in accordance with the amount and quality of the problem knowledge they incorporate. More precisely, the theorem establishes that the performance of any search algorithm is indistinguishable on average from any other one when all possible problems are considered, a scenario that captures the lack of knowledge on the target problem (this very broad assumption can be challenged [45]; this said, similar results can be found for more restricted scenarios [78, 165]). The quest for universal solvers is thus futile [36]: using and exploiting problem knowledge is a requirement for attaining efficient problem solvers [116]. Given that the term hybridization is often used to denote the process of incorporating problem knowledge (due to the fact that it is accomplished by combining or hybridizing concepts from different resolution algorithms [39]), it is not surprising that MAs are sometimes called ‘Hybrid Evolutionary Algorithms’ (hybrid EAs) as well.

One of the first algorithms to which the MA label was assigned dates back to 1988 [144], and was regarded by many as a hybrid of traditional Genetic Algorithms (GAs) and Simulated Annealing (SA). Part of the initial motivation was to find a way out of the limitations of both techniques on a well-studied combinatorial optimization problem the Min Euclidean Traveling Salesman problem (Min ETSP)—the reader interested in the historical circumstances of the initial developments in this field is directed to a personal and very detailed account in [119]. According to the authors, the original inspiration came from computer game tournaments [72] used to study “the evolution of cooperation” [4, 130]. That approach had several features which anticipated many current algorithms in practice today. The competitive phase of the algorithm was based on the new allocation of search points in the configuration space, a process involving a “battle” for survival followed by the so-called “cloning”, which has a strong similarity with ‘go with the winners’ algorithms [1, 150]. Thus, the cooperative phase followed by local search may be better named “go-with-the-local-winners” since the topological arrangement of the optimizing agents was a two-dimensional toroidal lattice. After initial computer experiments, an insight was derived on the particular relevance of the “spatial” organization, when coupled with an appropriate set of rules, for the overall performance of population search processes. A few months later, Moscato and Norman discovered that they shared similar views with other researchers [61, 126] and other authors proposing “island models” for GAs. Spacialization is now being recognized as the “catalyzer” responsible for a variety of phenomena [129, 130]. This is an important research issue, currently only understood in a rather heuristic way. However, some proper undecidability results have been obtained for related problems [63] giving some hope to a more formal treatment.

Less than a year later, in 1989, Moscato and Norman identified several authors who were also pioneering the introduction of heuristics to improve the solutions before recombining them [60, 127] (see other references and the discussion in [116]). Particularly coming from the GA field, several authors were introducing problem-domain knowledge in a variety of ways. In [116] the denomination of ‘memetic algorithms’ was introduced for the first time. It was also suggested that cultural evolution can be a better working metaphor for these metaheuristics to avoid “biologically constrained” thinking that was restricting progress at that time.

Thirty years later, albeit unfortunately under different names, MAs have become an important optimization approach, with several successes in a variety of classical NP-hard optimization problems. We aim to provide an updated and self-contained introduction to MAs, focusing on their technical innards and formal features, but without loosing the perspective of their practical applications and open research issues.

9.2 Memetic Algorithms

Before proceeding to the description of MAs, it is necessary to provide some basic concepts and definitions. Several notions introduced in the first subsection are strongly related to the field of computational complexity. Nevertheless, we approach them in a slightly different way, more oriented toward the subsequent developments in the chapter. These basic concepts will give rise to the notions of local search and population-based search, upon which MAs are founded. This latter class of search settles the scenario for recombination, a crucial mechanism in the functioning of MAs that will be studied to some depth. Finally, a basic algorithmic template and some guidelines for designing MAs will be presented.

9.2.1 Basic Concepts

An algorithm is a detailed step-by-step procedure for solving a computational problem. A computational problem P denotes a class of algorithmically-doable tasks, and it has an input domain set of instances denoted I P. For each instance xI P, there is an associated set sol P(x) which denotes the feasible solutions for problem P given instance x. The set sol P(x) is also known as the set of acceptable or valid solutions.

We are expected to deliver an algorithm that solves problem P; this means that our algorithm, given instance xI P, must return at least one element y from a set of answers ans P(x) (also called given solutions) that satisfies the requirements of the problem. This is the first design issue to face. To be precise, depending on the kind of answers expected, computational problems can be classified into different categories; for example:

  • finding all solutions in sol P(x), i.e., enumeration problems.

  • counting how many solutions exist in sol P(x), i.e. counting problems.

  • determining whether the set sol P(x) is empty or not, i.e., decision problems.

  • finding a solution in sol P(x) maximizing or minimizing a given function, i.e., optimization problems.

In this chapter, we will focus on the last possibility, that is, a problem will be considered solved by finding a feasible solution ysol P(x) which is optimal or by giving an indication that no such feasible solution exists. It is thus convenient in many situations to define a Boolean feasibility function feasible P(x, y) in order to identify whether a given solution yans P(x) is acceptable for an instance xI P of a computational problem P, i.e., checking if ysol P(x).

An algorithm is said to solve problem P if it can fulfill this condition for any given instance xI P. This definition is certainly too broad, so a more restrictive characterization for our problems of interest is necessary. This characterization is provided by restricting ourselves to the so-called combinatorial optimization problems. These constitute a special subclass of computational problems in which for each instance xI P:

  • the cardinality of sol P(x) is finite.

  • each solution ysol P(x) has a goodness integer value m P(y, x), obtained by means of an associated objective function m P.

  • a partial order ≺P is defined over the set of goodness values returned by the objective function, thus allowing to determine which of two goodness values is preferable.

An instance xI P of a combinatorial optimization problem P is solved by finding the best solution y sol P(x), i.e., finding a solution y such that no other solution yPy exists if sol P(x) is not empty. It is very common to have ≺P defining a total order. In this case, the best solution is the one that maximizes (or minimizes) the objective function.

As an example of a combinatorial optimization problem consider the 0–1 Multiple Knapsack Problem (0–1 MKP). Each instance x of this problem is defined by a vector of profits V = {v 0, ⋯ , v n−1}, a vector of capacities C = {c 0, ⋯ , c m−1}, and a matrix of capacity constraints coefficients \(M =\{ m_{ij}\: 0\leqslant i <m,\ 0\leqslant j <n\}\). Intuitively, the problem consists of selecting a set of objects so as to maximize the profit of this set without violating the capacity constraints. If the objects are indexed with the elements of the set \(\mathbb{N}_{n} =\{ 0,1,\cdots \,,n - 1\}\), the answer set ans P(x) for an instance x is simply the power set of \(\mathbb{N}_{n}\), that is, each subset of \(\mathbb{N}_{n}\) is a possible answer. Furthermore, the set of feasible answers sol P(x) is composed of those subsets whose incidence vector B verifies \(M \cdot B\leqslant C\). Finally, the objective function is defined as m P(y, x) = iyv i, i.e., the sum of profits for all selected objects, the goal being to maximize this value.

Notice that a decisional version can be associated with a combinatorial optimization problem. To formulate the decision problem, an integer goodness value K is considered, and instead of trying to find the best solution of instance x, we ask whether x has a solution whose goodness is equal or better than K. In the above example, we could ask whether a feasible solution y exists such that its associated profit is equal or better than K.

9.2.2 Search Landscapes

As mentioned above, having defined the concept of combinatorial optimization problem, the goal is to find at least one of the optimal solutions for a given instance. For this purpose, a search algorithm must be used. Before discussing search algorithms, three entities must be discussed. These are the search space, the neighborhood relation, and the guiding function. It is important to consider that, for any given computational problem, these three entities can be instantiated in several ways, giving rise to different optimization tasks.

Let us start by defining the concept of search space for a combinatorial problem P. To do so, we consider a set \(\mathcal{S}_{P}(x)\), whose elements must satisfy the following requirements:

  • Each element \(s \in \mathcal{S}_{P}(x)\) represents at least one answer in ans P(x).

  • For decision problems: at least one element of sol P(x) that stands for a ‘Yes’ answer must be represented by one element in \(\mathcal{S}_{P}(x)\).

  • For optimization problems: at least one optimal element y of sol P(x) is represented by one element in \(\mathcal{S}_{P}(x)\).

Each element of \(\mathcal{S}_{P}(x)\) is called a configuration. It is related to an answer in ans P(x) by a growth function \(g: \mathcal{S}_{P}(x) \rightarrow ans_{P}(x)\). Note that the first requirement refers to ans P(x) and not to sol P(x), i.e., some configurations in the search space may correspond to infeasible solutions. Thus, the search algorithm may need to be prepared to deal with this fact. If these requirements have been achieved, we say that we have a valid representation or valid formulation of the problem. For simplicity, we will just write \(\mathcal{S}\) to refer to \(\mathcal{S}_{P}(x)\) when x and P are clear from the context. People using biologically-inspired metaphors like to call \(\mathcal{S}_{P}(x)\) the genotype space and ans P(x) the phenotype space, so we appropriately refer to g as the growth function.

To illustrate this notion of search space, consider again the case of the 0–1 MKP. Since solutions in ans P(x) are subsets of \(\mathbb{N}_{n}\), we can define the search space as the set of n-dimensional binary vectors. Each vector will represent the incidence vector of a certain subset, i.e., the growth function g is defined as g(s) = g(b 0b 1b n−1) = {i |  b i = 1}. As mentioned above, many binary vectors may correspond to infeasible sets of objects. Another possibility is defining the search space as the set of permutations of elements in \(\mathbb{N}_{n}\) [62]. In this case, the growth function may consist of applying a greedy construction algorithm, considering objects in the order provided by the permutation. Unlike the binary search space previously mentioned, all configurations represent feasible solutions in this case.

The role of the search space is to provide a “ground” where the search algorithm will act. Important properties of the search space that affect the dynamics of the search algorithm are related to the accessibility relationships between the configurations. These relationships are dependent of a neighborhood function \(\mathcal{N}: \mathcal{S} \rightarrow 2^{\mathcal{S}}\). This function assigns to each element \(s \in \mathcal{S}\) a set \(\mathcal{N}(s) \subseteq \mathcal{S}\) of neighboring configurations of s. The set \(\mathcal{N}(s)\) is called the neighborhood of s and each member \(s' \in \mathcal{N}(s)\) is called a neighbor of s.

It must be noted that the neighborhood depends on the instance, so the notation \(\mathcal{N}(s)\) is a simplified form of \(\mathcal{N}_{P}(s,x)\) since it is clear from the context. The elements of \(\mathcal{N}(s)\) need not be listed explicitly. In fact, it is very usual to define them implicitly by referring to a set of possible moves, which define transitions between configurations. Moves are usually defined as “local” modifications of some part of s, where “locality” refers to the fact that the move is done on a single solution to obtain another single solution. This “locality”, is one of the key ingredients of local search, and actually it has also given the name to the whole search paradigm.

As examples of concrete neighborhood definitions, consider the two representations of solutions for the 0–1 MKP presented above. In the first case (binary representation), moves can be defined as changing the values of a number of bits. If just one bit is modified at a time, the resulting neighborhood structure is the n-dimensional binary hypercube. In the second case (permutation representation), moves can be defined as the interchange of two positions in the permutation. Thus, two configurations are neighboring if, and only if, they differ in exactly two positions.

This definition of locality presented above is not necessarily related to “closeness” under some kind of distance relationship between configurations (except in the tautological situation in which the distance between two configurations s and s′ is defined as the number of moves needed to reach s′ from s). As a matter of fact, it is possible to give common examples of very complex neighborhood definitions unrelated to intuitive distance measures.

An important feature that must be considered when selecting the class of moves to be used in the search algorithm is its “ergodicity”, that is the ability, given any \(s \in \mathcal{S}\) to find a sequence of moves that can reach all other configurations \(s' \in \mathcal{S}\). In many situations this property is self-evident and no explicit demonstration is required. It is important since even if we have a valid representation (recall the definition above), it is necessary to guarantee a priori that at least one optimal solution is reachable from any given initial solution. Again, consider the binary representation of solutions for a 0–1 MKP instance. If moves are defined as single bit-flips, it is easily seen that any configuration s′ can be reached from another configuration s in exactly h moves, where h is the Hamming distance between these configurations. This is not always the case though.

The last entity that must be defined is the guiding function. To do so, we require a set \(\mathcal{F}\) whose elements are termed fitness values (typically \(\mathcal{F} \equiv \mathbb{R}\)), and a partial order \(\prec _{\mathcal{F}}\) on \(\mathcal{F}\) (typically, but not always, \(\prec _{\mathcal{F}} \equiv <\)). The guiding function is defined as a function \(F_{g}: \mathcal{S} \rightarrow \mathcal{F}\) that associates to each configuration \(s \in \mathcal{S}\) a value F g(s) that assesses the quality of the solution. The behavior of the search algorithm will be “controlled” by these fitness values.

Notice that for optimization problems there is an obvious direct connection between the guiding function F g and the objective function m P (and hence between partial orders ≺P and \(\prec _{\mathcal{F}}\)). As a matter of fact, it is very common to enforce this relationship to the point that both terms are usually considered equivalent. However, this equivalence is not necessary and, in many situations, not even desirable. For decision problems, since a solution is a ‘Yes’ or ‘No’ answer, associated guiding functions usually take the form of distance to satisfiability.

A typical example is the Boolean Satisfiability Problem, i.e., determining whether a Boolean expression in conjunctive normal form is satisfiable. In this case, solutions are assignments of Boolean values to variables, and the objective function m P is a binary function returning 1 if the solution satisfies the Boolean expression, and 0 otherwise. This objective function could be used as the guiding function. However, a much more typical choice is to use the number of satisfied clauses in the current configuration as guiding function, i.e., F g(s) = if i(s), the sum over clause indexes i of f i(s), defined as f i(s) = 0 for a yet unsatisfied clause i, and f i(s) = 1 if the clause i is satisfied. Hence, the goal is to maximize this number. Notice that the guiding function in this case is the objective function of the associated NP-hard optimization problem called Max SAT.

The above differentiation between objective function and guiding function is also very important in the context of constrained optimization problems, i.e., problems for which, in general, sol P(x) is chosen to be a proper subset of ans P(x). Since the growth function establishes a mapping from \(\mathcal{S}\) to ans P(x), the search algorithm may need to process both feasible solutions (whose goodness values are well-defined) and infeasible solutions (whose goodness values are ill-defined in general). In many implementations of MAs for these problems, a guiding function is defined as a weighted sum of the value of the objective function and the distance to feasibility (which accounts for the constraints). Typically, a higher weight is assigned to the constraints, so as to give preference to feasibility over optimality. Several other remedies to this problem abound, including resorting to multi-objective techniques.

The combination of a certain problem instance and the three entities defined above induces a so-called fitness landscape [87]. Essentially, a fitness landscape can be defined as a weighted digraph, in which the vertices are configurations of the search space \(\mathcal{S}\), and the arcs connect neighboring configurations. The weights are the differences between the guiding function values of the two endpoint configurations. The search can thus be seen as the process of “navigating” the fitness landscape using the information provided by the guiding function. This is a very powerful metaphor; it allows interpretations in terms of well-known topographical objects such as peaks, valleys, mesas, etc., which is of great utility to visualize the search progress, and to grasp factors affecting the performance of the process. In particular, the important notion of local optimum is associated with this definition of fitness landscape. To be precise, a local optimum is a vertex of the fitness landscape whose guiding function value is better than the values of all its neighbors. Notice that different moves define different neighborhoods and hence different fitness landscapes, even when the same problem instance is considered. For this reason, the notion of local optimum is not intrinsic to a problem instance as it is, sometimes, erroneously considered.

The notion of fitness landscape is not only useful for conceptual or visualization purposes. It also serves as a very useful instrument in order to analyze the properties of the search space as regarded by a certain search algorithm (via the moves used by the latter). Thus, analytical tools such as random-walk correlation or fitness distance correlation can be used to assess the difficulty perceived by the optimizer, and other statistical tools can be utilized to guide the design/parameterization of the search algorithm—see [111].

9.2.3 Local vs. Population-Based Search

The definitions presented in the previous subsection naturally lead to the notion of local search algorithm. A local search algorithm starts from a configuration \(s_{0} \in \mathcal{S}\), generated at random or constructed by some other algorithm. Subsequently, it iterates using at each step a transition based on the neighborhood of the current configuration. Transitions leading to preferable (according to the partial order \(\prec _{\mathcal{F}}\)) configurations are accepted, i.e., the newly generated configuration turns to be the current configuration in the next step. Otherwise, the current configuration is kept. This process is repeated until a certain termination criterion is met. Typical criteria are the realization of a pre-specified number of iterations, not having found any improvement in the last m iterations, or even more complex mechanisms based on estimating the probability of being at a local optimum [29]. Due to these characteristics, the approach is metaphorically called “hill climbing”. The whole process is sketched in Algorithm 1.

The selection of the particular type of moves to use (which are also known as mutations in the context of GAs) does certainly depend on the specific characteristics of the problem and the representation chosen. There is no general advice for this, since it is a matter of the available computer time for the whole process as well as other algorithmic decisions that include ease of coding, etc. In some cases some moves are conspicuous, for example it can be the change of the value of one single variable or the swap of the values of two different variables. Sometimes the “step” may also be composed of a chain of transitions. For instance, in relation with MAs, Radcliffe and Surry introduced the concept of Binomial Minimal Mutation, where the number of mutations to perform is selected according to a certain binomial distribution [159]. In the context of fitness landscapes, this is equivalent to a redefinition of the neighborhood relation, considering two configurations as neighbors when there exists a chain of transitions connecting them.

Algorithm 1 A local search algorithm

Local search algorithms are thus characterized by keeping a single configuration at a time. The immediate generalization of this behavior is the simultaneous maintenance of k, \((k\geqslant 2)\) configurations. The term population-based search algorithms has been coined to denote search techniques behaving this way.

The availability of several configurations at a time allows the use of new powerful mechanisms for traversing the fitness landscape in addition to the standard mutation operator. The most popular of these mechanisms, the recombination operator, will be studied in more depth in the next section. In any case, notice that the general functioning of population-based search techniques is very similar to the pseudocode depicted in Algorithm 1. As a matter of fact, a population-based algorithm can be seen as a procedure in which we sequentially visit vertices of a hypergraph. Each vertex of the hypergraph represents a set of configurations in \(\mathcal{S}_{P}(x)\), i.e., a population. The next vertex to be visited, i.e., the new population, can be established according to the composition of the neighborhoods of the different transition mechanisms used in the population algorithm. Despite the analogy with local search, it is widely accepted in the scientific literature to apply the denomination ‘local’ just to one-configuration-at-a-time search algorithms. For this reason, the term ‘local’ will be used with this interpretation in the remainder of the chapter.

9.2.4 Recombination

As mentioned in the previous section, local search is based on the application of a mutation operator to a single configuration. Despite the apparent simplicity of this mechanism, “mutation-based” local search has revealed itself a very powerful mechanism for obtaining good quality solutions for NP-hard problems. For this reason, some researchers have tried to provide a more theoretically-solid background to this class of search. In this line, it is worth mentioning the definition of the Polynomial Local Search class (PLS) by Johnson et al. [86]. Basically, this complexity class comprises a problem and an associated search landscape such that for any given point in the search landscape we can decide in polynomial time if it is a local optimum or not, and in the latter case find an improved solution in the neighborhood. Unfortunately, this does not mean that we can find a local optimum in polynomial time (in fact, it may generally take an exponential number of steps to do so). This fact has justified the quest for additional search mechanisms to be used as stand-alone operators or as complements to standard mutation.

In this line, recall that population-based search allowed the definition of generalized move operators termed recombination operators. In essence, recombination can be defined as a process in which a set \(\mathcal{S}_{par}\) of n configurations (informally referred to as “parents”) is taken into consideration to create a set \(\mathcal{S}_{desc} \subseteq sol_{P}(x)\) of m new configurations (informally termed “descendants”). The creation of these descendants involves the identification and combination of features extracted from the parents.

At this point, it is possible to consider properties of interest that can be exhibited by recombination operators [159]. The first property, respect, represents the exploitation side of recombination. A recombination operator is said to be respectful, regarding a particular type of features of the configurations, if, and only if, it generates descendants carrying all basic features common to all parents (where the term ‘basic’ refers to features being used to represent solutions, hence constituting a representation basis in an algebraic sense). Notice that, if all parent configurations are identical, a respectful recombination operator is forced to return the same configuration as a descendant. This property is termed purity, and can be achieved even when the recombination operator is not generally respectful.

On the other hand, assortment represents the exploratory side of recombination. A recombination operator is said to be properly assorting if, and only if, it can generate descendants carrying any combination of compatible features taken from the parents. The assortment is said to be weak if this cannot be accomplished in a single recombination event, and further applications of the recombination operator on the offspring are required.

Finally, transmission is a very important property that captures the intuitive role of recombination. An operator is said to be transmitting if every feature exhibited by the offspring is present in at least one of the parents. Thus, a transmitting recombination operator combines the information present in the parents but does not introduce new information. This latter task is usually left to the mutation operator. For this reason, a non-transmitting recombination operator is said to introduce implicit mutation.

The three properties above suffice to describe the abstract input/output behavior of a recombination operator regarding some particular features. It provides a characterization of the possible descendants that can be produced by the operator. Nevertheless, there exist other aspects of the functioning of recombination that must be studied. In particular, it is interesting to consider how the construction of \(\mathcal{S}_{desc}\) is approached.

First of all, a recombination operator is said to be blind if it has no other input than \(\mathcal{S}_{par}\), i.e., it does not use any information from the problem instance. This definition is certainly very restrictive, and hence is sometimes relaxed to allow the recombination operator to use information regarding the problem constraints (so as to construct feasible descendants), and possibly the fitness values of configurations \(y \in \mathcal{S}_{par}\) (so as to bias the generation of descendants toward the best parents). A typical example of a blind recombination operator is the classical Uniform crossover [180]. This operator is defined on search spaces \(\mathcal{S} \equiv \varSigma ^{n}\), i.e., strings of n symbols taken from an alphabet Σ. The construction of the descendant is done by randomly selecting at each position one of the symbols appearing in that position in any of the parents. This random selection can be totally uniform or can be biased according to the fitness values of the parents as mentioned before. Furthermore, the selection can be done so as to enforce feasibility (e.g., consider the binary representation of solutions in the 0–1 MKP). Notice that, in this case, the resulting operator is neither respectful nor transmitting in general.

The use of blind recombination operators has been usually justified on the grounds of not introducing excessive bias in the search algorithm, thus preventing extremely fast convergence to suboptimal solutions. This is questionable though. First, notice that the behavior of the algorithm is in fact biased by the choice of representation and the mechanics of the particular operators. Second, there exist widely known mechanisms (e.g., spatial isolation) to hinder these problems. Finally, it can be better to quickly obtain a suboptimal solution and restart the algorithm than using blind operators for a long time in pursuit of an asymptotically optimal behavior (not even guaranteed in most cases).

Recombination operators that use problem knowledge are commonly termed heuristic or hybrid. In these operators, problem information is utilized to guide the process of constructing the descendants. This can be done in a plethora of ways for each problem, so it is difficult to provide a taxonomy of heuristic recombination operators. Nevertheless, there exist two main aspects into which problem knowledge can be injected: the selection of the parental features that will be transmitted to the descendant, and the selection of non-parental features that will be added to it. A heuristic recombination operator can focus in one of these aspects, or in both of them simultaneously.

As an example of a heuristic recombination operator focusing on the first aspect, Dynastically Optimal Recombination (DOR) [27] can be mentioned. This operator explores the dynastic potential (i.e., the set of possible children) of the configurations being recombined, so as to find the best member of this set (notice that, since configurations in the dynastic potential are entirely composed of features taken from any of the parents, this is a transmitting operator). This exploration is done using a subordinate complete algorithm, and its goal is thus to find the best combination of parental features giving rise to a feasible child. This can be accomplished using techniques such as branch and bound (BnB) or dynamic programming (see, e.g. [57]). This operator is monotonic in the sense that any child generated is at least as good as the best parent.

With regard to heuristic operators concentrating on the selection of non-parental features, one can cite the patching-by-forma-completion operators proposed by Radcliffe and Surry [158]. These operators are based on generating an incomplete child using a non-heuristic procedure (e.g., the RARω operator [157]), and then completing the child either using a local hill climbing procedure restricted to non-specified features (locally optimal forma completion) or a global search procedure that finds the globally best solution carrying the specified features (globally optimal forma completion). Notice the similarity of this latter approach with DOR.

Finally, there exist some operators trying to exploit knowledge in both of the above aspects. A distinguished example is the Edge Assembly Crossover (EAX) [128]. EAX is a specialized operator for the TSP (both for symmetric and asymmetric instances) in which the construction of the child comprises two-phases: the first one involves the generation of an incomplete child via the so-called E-sets (subtours composed of alternating edges from each parent); subsequently, these subtours are merged into a single feasible subtour using a greedy repair algorithm. The authors of this operator reported impressive results in terms of accuracy and speed. It has some similarities with the recombination operator proposed in [117]. We can also mention the use of path relinking [59], a method based on creating a trajectory in the search space between the solutions being “recombined” and picking the best solution along that path.

A final comment must be made in relation to the computational complexity of recombination. It is clear that combining the features of several solutions is in general computationally more expensive than modifying a single solution (i.e., a mutation). Furthermore, the recombination operation will be usually invoked a large number of times. For this reason, it is convenient (and in many situations mandatory) to keep it at a low computational cost. A reasonable guideline is to consider an O(NlogN) upper bound for its complexity, where N is the size of the input (the set S par and the problem instance x). Such limit is easily affordable for blind recombination operators, which are called crossover, a reasonable name to convey their low complexity (yet not always used in this context). However, this limit can be relatively astringent in the case of heuristic recombination, mainly when epistasis (non-additive inter-feature influence on the fitness value) is involved. This admits several solutions depending upon the particular heuristic used. For example, DOR has exponential worst case behavior, but it can be made affordable by picking larger pieces of information from each parent (the larger the size of these pieces of information, the lower the number of them needed to complete the child) [26]. In any case, heuristic recombination operators provide better solutions than blind recombination operators, and hence they need not be invoked the same number of times.

9.2.5 A Memetic Algorithm Template

In light of the above considerations, it is possible to provide a general template for a memetic algorithm. As mentioned in Sect. 9.2.3, this template is very similar to that of a local search procedure acting on a set of \(\vert pop\vert \geqslant 2\) configurations. This is shown in Algorithm 2.

Algorithm 2 A population-based search algorithm

This template requires some explanation. First of all, the GenerateInitialPopulation procedure is responsible for creating the initial set of | pop | configurations. This can be done by simply generating | pop | random configurations or by using a more sophisticated seeding mechanism (for instance, some constructive heuristic), by means of which high-quality configurations are injected in the initial population [179]. Another possibility is to use the Local-Search-Engine presented in Sect. 9.2.3 (as shown in Algorithm 3) or any other randomized constructive algorithm for that matter. For example, a Greedy Randomized Adaptive Search Procedure (GRASP) [161, 162] mechanism was used in [51], and Beam Search [189] was used in [56].

Algorithm 3 Injecting high-quality solutions in the initial population

As for the TerminationCriterion function, it can be defined very similarly to Local Search, i.e., setting a limit on the total number of iterations, reaching a maximum number of iterations without improvement or performing a certain number of population restarts, etc.

The GenerateNewPopulation procedure is at the core of memetic algorithms. Essentially, this procedure can be seen as a pipelined process comprising n op stages. Each of these stages consists of applying a variation (or reproductive) operator op j by taking arity in j configurations from the previous stage to produce arity out j new configurations. This pipeline is restricted to have arity in 1 = popsize. The whole process is sketched in Algorithm 4.

Algorithm 4 The pipelined GenerateNewPopulation procedure

This template for the GenerateNewPopulation procedure is typically instantiated in GAs by letting n op = 3, using a selection, a recombination, and a mutation operator. Traditionally, mutation is applied after recombination, i.e., on each child generated by the recombination operator. However, if a heuristic recombination operator is being used, it may be more convenient to apply mutation before recombination. Since the purpose of mutation is simply to introduce new features in the configuration pool, using it in advance is possible in this case. Furthermore, the smart feature combination performed by the heuristic operator would not be disturbed this way.

This situation is slightly different in MAs. In this case, it is very common to let n op = 4, inserting a Local-Search-Engine right after applying op 2 and op 3 (respectively recombination and mutation). Due to the local optimization performed after mutation, their combined effect (i.e., mutation + local search) cannot be regarded as a simple disruption of a computationally-demanding recombination. Note also that the interplay between mutation and local search requires the former to be different than the neighborhood structure used in the latter; otherwise mutations can be readily reverted by local search, and their usefulness would be negligible.

The UpdatePopulation procedure is used to reconstruct the current population using the old population pop and the newly generated population newpop. Borrowing the terminology from the evolution strategy [160, 166] community, there exist two main possibilities to carry on this reconstruction: the plus strategy and the comma strategy. In the former, the current population is constructed taken the best popsize configurations from popnewpop. For the latter, the best popsize configurations are taken just from newpop. In this case, it is required to have | newpop | > popsize, so as to put some selective pressure on the process (the bigger the | newpop | ∕popsize ratio, the stronger the pressure). Otherwise, the search would reduce to a random wandering through \(\mathcal{S}\).

There are a number of studies regarding appropriate choices for the UpdatePopulation procedure (see e.g., [6]). As a general guideline, the comma strategy is usually regarded as less prone to stagnation, with the ratio | newpop | ∕popsize ≃ 6 being a common choice [7]. Nevertheless, this option can be somewhat computationally expensive if the guiding function is complex and time-consuming. Another common alternative is to use a plus strategy with a low value of | newpop |, analogous to the so-called steady-state replacement strategy in GAs [187]. This option usually provides a faster convergence to high-quality solutions. However, care has to be taken with premature convergence to suboptimal regions of the search space, i.e., all configurations in the population being very similar to each other, hence hindering the exploration of other regions of \(\mathcal{S}\).

The above consideration about premature convergence leads to the last component of the template shown in Algorithm 2, the restarting procedure. First of all, it must be decided whether the population has degraded or has not. To do so, it is possible to use some measure of information diversity in the population such as Shannon’s entropy [38]. If this measure falls below a predefined threshold, the population is considered to be in a degenerate state. This threshold depends upon the representation (number of values per variable, constraints, etc.) and hence must be determined in an ad-hoc fashion. A different possibility is using a probabilistic approach to determine with a desired confidence that the population has converged. For example, in [77] a Bayesian approach is presented for this purpose.

Once the population is considered to be at a degenerate state, the restart procedure is invoked. Again, this can be implemented in a number of ways. A very typical strategy is to keep a fraction of the current population (this fraction can be as small as one solution, the current best), and substituting the remaining configurations with newly generated (from scratch) solutions, as shown in Algorithm 5.

Algorithm 5 The RestartPopulation procedure

The procedure shown in Algorithm 5 is also known as the random-immigrant strategy [20]. Another possibility is using the previous search history [178] or activate a strong or heavy mutation operator in order to drive the population away from its current location in the search space. Both options have their advantages and disadvantages. For example, when using the random-immigrant strategy, one has to take some caution to prevent the preserved configurations to take over the population (this can be achieved by putting a low selective pressure, at least in the first iterations after a restart). As to the heavy mutation strategy, one has to achieve a tradeoff between an excessively strong mutation that would destroy any information contained in the current population, and a not so strong mutation that would cause the population to converge again in a few iterations.

9.2.6 Designing an Effective Memetic Algorithm

The general template of MAs depicted in the previous section must be instantiated with precise components in order to be used for solving a specific problem. This instantiation has to be done carefully so as to obtain an effective optimization tool. We will address some design issues in this section.

A first obvious remark is that there exist no general approach for the design of effective MAs. This observation is based on different proofs depending on the precise definition of effective in the previous statement. Such proofs may involve classical complexity results and conjectures if ‘effective’ is understood as ‘polynomial-time’, or the NFL Theorem if we consider a more general set of performance measures, and even Computability Theory if we relax the definition to arbitrary decision problems. For these reasons, we can only define several design heuristics that will likely result in good-performing MAs, but without explicit guarantees for this.

This said, MAs are commonly implemented as evolutionary algorithms endowed with an independent search component, sometimes provided by a local search mechanism (recall previous section), and as such can benefit from the theoretical corpus available for EAs. This is particularly applicable to some basic aspects such as the representation of solutions in terms of meaningful information units [37, 158]. Focusing now on more specific aspects of MAs, the first consideration that must be clearly taken into account is the interplay among the local search component and the remaining operators, mostly with respect to the characteristics of the search landscape. A good example of this issue can be found in the work of Merz and Freisleben on the TSP [49]. They consider the use of a highly intensive local search procedure—the Lin-Kernighan heuristic [104]—and note that the average distance between local optima is similar to the average distance between a local optimum and the global optimum. For this reason, they introduce a distance-preserving crossover (DPX) operator that generate offspring whose distance from the parents is the same as the distance between the parents themselves. Such an operator is likely to be less effective if a not-so-powerful local improvement method, e.g., 2-opt , was used, inducing a different distribution of local optima.

Another important choice refers to the learning model used. The most common option is to use a Lamarckian model, whereby an improved solution is sought via local search and the corresponding genotypic changes are retained in the solution. However, there also exists the possibility of using a Baldwinian model, in which the improved solution is only used for the purposes of fitness computation, but the actual solution is not changed at all. This might be useful in order to avoid local optima while converging to the global optimum [58, 89, 192]; see also [188] for a classical analysis of these two strategies in optimization.

In addition to the particular choice (or choices) of local search operator, there remains the issue of determining an adequate parameterization for the procedure, namely, how much effort must be spent on each local search, how often the local search must be applied, and—were it not applied to every new solution generated—how to select the solutions that will undergo local improvement. Regarding the first two items, there exists theoretical evidence [99, 175] that an inadequate parameter setting can turn the algorithmic solution from easily solvable to non-polynomially solvable. Besides, there are obvious practical limitations in situations where the local search and/or the fitness function is computationally expensive. This fact admits different solutions. On the one hand, the use of surrogates (i.e., fast approximate models of the true function) to accelerate evolution is an increasingly popular option in such highly demanding problems [64, 102, 185, 186, 194]. On the other hand, partial lamarckism [23, 74, 149], where not every individual is subject to local search, is commonly used as well. The precise value for the local search application probability (or multiple values when more than one local search procedure is available) largely depends on the problem under consideration [81], and its determination is in many cases an art. For this reason, adaptive and self-adaptive mechanisms have been defined in order to let the algorithm learn what the most appropriate setting is (see Sect. 9.3.4). The interested reader is referred to [176, 177] for a more in-depth analysis of the balance between the local and global (i.e., population-based) components of the memetic algorithm.

As to the selection of individuals that will undergo local search, the most common options are random-selection, and fitness-based selection, where only the best individuals are subject to local improvement. Nguyen et al. [136] also consider a ‘stratified’ approach, in which the population is sorted and divided into k levels (k being the number of local search applications), and one individual per level is randomly selected. Their experimentation on some continuous functions indicates that this strategy and improve-the-best (i.e., applying local search to the best individuals) provide better results than random selection. Such strategies can be readily deployed on a structured MA as defined by Moscato et al. [10, 15, 48, 110, 125], where good solutions flow upwards within a tree-structured population, and layers are explicitly available. Other population management strategies are possible as well, see [14, 153, 154, 173] .

9.3 Algorithmic Extensions of Memetic Algorithms

The algorithmic template and design guidelines described in the previous section can characterize most basic incarnations of MAs, namely population-based algorithms endowed with static local search for single-objective optimization. However, more sophisticated approaches can be conceived, and are certainly required in certain applications. This section is aimed at providing an overview of more advanced algorithmic extensions used in the MA realm.

9.3.1 Multiobjective Memetic Algorithms

Multiobjective problems are frequent in real-world applications. Rather than having a single objective to be optimized, the solver is faced with multiple, partially conflicting objectives. As a result, there is no a priori single optimal solution but rather a collection of optimal solutions, providing different trade-offs among the objectives considered. In this scenario, the notion of Pareto-dominance is essential: given two solutions s, s′ ∈ sol P(x), s is said to dominate s′ if it is better than s′ in at least one of the objectives, and it is no worse in the remaining ones. This clearly induces a partial order ≺P, since given two solutions it may be the case that none of them dominates the other. This collection of optimal solutions is termed the optimal Pareto front, or the optimal non-dominated front.

Population-based search techniques, in particular evolutionary algorithms (EAs), are naturally fit to deal with multiobjective problems, due to the availability of a population of solutions which can approach the optimal Pareto front from different directions. There is an extensive literature on the deployment of EAs in multiobjective settings, and the reader is referred to [21, 22, 42, 195], among others, for more information on this topic. MAs can obviously benefit from this corpus of knowledge. However, MAs typically incorporate a local search mechanism, and it has to be adapted to the multiobjective setting as well. This can be done in different ways [94], which can be roughly classified into two major classes: scalarizing approaches, and Pareto-based approaches. Scalarizing approaches are based on the use of some aggregation mechanism to combine the multiple objectives into a single scalar value. This is usually done using a linear combination of the objective values, with weights that are either fixed (at random or otherwise) for the whole execution of the local search procedure [182], or adapted as the local search progresses [66]. With regard to Pareto-based approaches, the notion of Pareto-dominance is considered for deciding transitions among neighboring solutions, typically coupled with the use of some measure of crowding to spread the search, e.g, [91].

A full-fledged multiobjective MA (MOMA) is obtained by appropriately combining population-based and local search-based components for multiobjective optimization. Again, the strategy used in the local search mechanism can be used to classify most MOMAs. On one hand, we have aggregation approaches. Thus, two proposals due to Ishibuchi and Murata [79, 80] and Jaszkiewicz [83, 84] are based on the use of random scalarization each time a local search is to be used. Alternatively, a single-objective local search could be used to optimize individual objectives [82]. Ad hoc mating strategies based on the particular weights chosen at each local search invocation (whereby the solutions to be recombined are picked according to these weights) are used as well. A related approach—including the on-line adjustment of scalarizing weights—is followed by Guo et al. [6567]. On the other hand, we have Pareto-based approaches. In this line, a MA based on PAES (Pareto Archived Evolution Strategy) was defined by Knowles and Corne [92, 93] . More recently, a MOMA based on particle swarm optimization (PSO) has been defined by Liu et al. [101, 108]. In this algorithm, an archive of non-dominated solutions is maintained and randomly sampled to obtain reference points for particles. A different approach is used by Schuetze et al. [164] for numerical-optimization problems. The continuous nature of solution variables allows using their values for computing search directions. This fact is exploited in their local search procedure (HCS for Hill Climber with Sidestep) to direct the search toward specific regions (e.g., along the Pareto front) when required. We refer to [85] for a more in-depth discussion on multiobjective MAs.

9.3.2 Continuous Optimization

Continuous optimization problems are defined on a dense search space [183], typically by some subset of the n-fold Cartesian product \(\mathbb{R}^{n}\). Many problems have decision variables of this continuous nature and hence continuous optimization is a realm of paramount importance. Throughout previous sections, MAs were admittedly described with a discrete background in mind. Indeed, discrete optimization problems put to test the skills of the algorithmic designer in the sense that the difficulty of solving a particular problem and the effectiveness of the solver depend on the precise instantiation of notions such as the neighborhood relation. This said, most of the ideas and concepts sketched before for discrete optimization are also applicable to continuous optimization. Of course, in this realm there is a natural notion of neighborhood of a point s given by open balls B d(s) = {s′: | | ss′ | | < d}, i.e., those points located within distance d of x, for a suitable distance metric (typically, but not necessarily, the Euclidean distance—see [44]).

The different components of a classic MA, namely the population-based engine and the local search technique, must be adapted to deal with this new domain of solutions. Regarding the former, there is plenty of literature on how to adapt the variation operators to tackle continuous optimization [70, 71, 109, 191] and actually some evolutionary computation families lend themselves naturally to this kind of optimization [13, 174]. Typical options with regard to the recombination operator are the following (assuming for the sake of notation that parental solutions s = 〈s 1, , s n〉 and s′ = 〈s1, , sn〉 are being recombined to obtain u = 〈u 1, , u n〉):

  • use a discrete approach and create the offspring by using the precise values the decision variable have in the parental solutions, i.e., u i ∈ {s i, si}.

  • use some arithmetical operation to combine the values of homologous variables in the parental solutions, e.g., compute an average: u i = (s i + si)∕2.

  • use some sampling procedure within some hyperrectangle, hyperellipse, or other suitable hypersurface defined by the parental solutions, e.g., u i ∈ [m i, M i] where m i = min(s i, si) −αd i, M i = max(s i, si) + αd i, d i = | s isi |, \(\alpha \geqslant 0\).

The situation is more flexible when multiparent recombination is used. In this case, other possibilities exist in addition to the previous methods, such as utilizing some subset of the parental solutions to create a hypersurface and using some projection technique to create the offspring, much like it is done in the Nelder-Mead method [131]. For mutation, it is typically accomplished by some additive or multiplicative perturbation to variable values, obtained by means of some predefined distribution such as uniform, Gaussian or Cauchy, just to cite some of the most common examples. The extent of the perturbation is a parameter than can be subject to adaptation during the run (cf. Sect. 9.3.4)

Regarding the local search component, there are many techniques that can be used for this purpose, just by adapting the definition of neighborhood as mentioned before and using some kind of gradient ascent, possibly modulated with some mechanism to escape from local optima (as it is done in e.g., simulated annealing); see [41] for a more detailed discussion of these. One particular issue worth mentioning in connection with local search is the fact that, unlike many typical discrete optimization scenarios in which the objective function can be decomposed in order to isolate the effect caused by the modification of a certain decision variable (i.e., the fitness value of the modified solution is f(u) = f(s) + Δ(s, u) for some function Δ(s, u) which is computationally cheaper to compute than f(u)), continuous optimization problem usually exhibits many couplings and non-linearities that preclude or at least limit such approaches. This affects the cost of the local search component which in turn may influence the optimal balance between local and global search in the algorithm. Some authors [115] have proposed to store the state of the local search along with each solution it is applied to, so that further applications of the local improvement routine resume from this state. We refer to [25] for a more detailed discussion of design issues in MAs for continuous optimization.

9.3.3 Memetic Computing Approaches

Memes were introduced in Sect. 9.1 as units of imitation. In a computational context (and more precisely with regard to memetic algorithms), they acquire a new meaning though. In this sense, a first interpretation would be to use the notion of meme as a high-level non-genetic pattern of information, that is, the carrier particle of individual learning. From the standpoint of classical MAs, this role is implemented by local improvement procedures. Thus, the particular choice of a local search procedure (a simple heuristic rule, hill climbing, simulated annealing, etc.) plus the corresponding parameterization can be regarded as the implicit definition of a fixed meme. However, earlier works already anticipated that these memes needed not be static, but change dynamically during the search. Quoting [118]:

It may be possible that a future generation of MAs will work in at least two levels and two timescales. In the short-timescale, a set of agents would be searching in the search space associated to the problem while the long-time scale adapts the heuristics associated with the agents.

The first steps in this direction were taken in [96, 168] by including an explicit representation of memes alongside solutions, and having them evolve. This has given rise to the notion of memetic computing, which can be defined as a broad discipline that focuses on the use of dynamic complex computational structures composed of interacting modules (the memes) which are harmoniously coordinated to solve particular problems [132]; see also [19, 146].

There are obvious connections here with the notion of adaptive hyperheuristics [16, 18, 35, 88], particularly in the context of Meta-Lamarckian learning [137, 145], in which a collection of memes are available and some mechanism is used to decide which one to apply and when (be it using information on the previous applications of each meme or gathering population statistics [134]). Some other possibilities can be used though. As mentioned above, memes can be explicitly represented (this can range from a simple parameterization of a generic template—i.e., the neighborhood definition of a local search procedure, the pivot rule, etc.—to the full definition of the local improver using mechanisms akin to genetic programming) and self-adapt during the execution of the algorithm, either as a part of solutions [97, 98, 140, 171] or in a separate population [169]. Furthermore, it is possible to aggregate simple memes into larger compounds or memeplexes [19] in order to attain synergistic cooperation and improved search efficiency.

9.3.4 Self-⋆ Memetic Algorithms

When some design guidelines were given in Sect. 9.2.6, the fact that these were heuristics that ultimately relied on the available problem knowledge was stressed. This is not a particular feature of MAs, but affects the field of metaheuristics as a whole. Indeed, one of the keystones in practical metaheuristic problem-solving is the necessity of customizing the solver for the problem at hand [30]. Therefore, it is not surprising that attempts to transfer a part of this tuning effort to the metaheuristic technique itself have been common. Such attempts can take place at different levels, or can affect different components of the algorithm. The first—and more intuitive one—is the parametric level involving the numerical values of parameters, such as the operator application rates. Examples of this can be found in early EAs, see for example [3, 12, 39, 167]. An overview of these approaches (actually broader in scope, covering more advanced topics than parameter adaptation) can be found in [170]. Focusing specifically on MAs, this kind of adaptation has been applied in [8, 71, 113, 114, 147].

The explicit processing of memes described in the previous section is actually a further step in the direction of promoting the autonomous functioning of the algorithm. Indeed, from a very general point of view this connects to the idea of autonomic computing [73], that tries to transfer to the computing realm the idea of the autonomic nervous system carrying essential functions without conscious control. In this line, the umbrella term self-⋆ properties [5] is used to describe the capacity of self-management in complex computational systems [76]. Self-parameterization attempts mentioned previously fall within the scope of self-⋆ properties, and so does the explicit handling of memes described in previous section, which can be considered a case of self-generating search strategies. As a matter of fact, both approaches constitute examples of self-optimization [9], because they aim at improving the capabilities of the algorithm for carrying out its functions (which is in turn solving the objective problem).

Self-⋆ properties can encompass other advanced capabilities beyond self-optimization such as self-scaling or self-healing. The former refers to the ability of the system to react efficiently to changes in its scale parameters, be it related to changes in the scale of the problem being solved, in the scale of the computational resources available, or in other circumstance or combination of circumstances of the computation. Such capability may involve some form of self-configuration in order to accomplish the objective of the computation in the most effective way in light of the change of scale. An example can be found in the domain of island-based MAs [138] deployed in unstable distributed environments [32]: if the computational substrate is composed of processing nodes whose availability fluctuate, the algorithm may face uncontrollable reductions or increments of the computational resources (i.e., some islands may appear, other islands may disappear). As a reaction, the algorithm may attempt to resize the islands and balance them out, so that the population size is affected as little as possible [141]. The second property, namely self-healing, is also relevant in this context: it aims to maintain and restore system attributes that may have been affected by internal or external actions, i.e., self-healing externally infringed damage. In the volatile computational scenario depicted, such damage is caused by the loss of information and the disruptions in connectivity caused by the disappearance of islands [142]. To tackle these issues, the algorithm may use self-sampling (using a probabilistic model of the population—much like it is done in estimation of distribution algorithms [100, 151]—in order to enlarge it in a sensible way when required) and self-rewiring in order to create new connectivity links and prevent the network from becoming disconnected. It must also be noted as an aside that very traditional techniques commonly used when metaheuristic face constrained problems, namely using a repair function to restore the feasibility of solutions [112], can also fall within the scope of self-repairing approaches.

9.3.5 Memetic Algorithms and Complete Techniques

The combination of exact techniques with metaheuristics is an increasingly popular approach. Focusing on local search techniques, Dumitrescu and Stüztle [46] have provided a classification of methods in which exact algorithms are used to strengthen local search, i.e., to explore large neighborhoods, to solve exactly some subproblems, to provide bounds and problem relaxations to guide the search, etc. Some of these combinations can also be found in the literature on population-based methods. For example, exact techniques—such as BnB [27] or dynamic programming [54] among others—have been used to perform recombination (recall Sect. 9.2.4), and approaches in which exact techniques solved some subproblems provided by EAs date back to 1995 [28]; see also [47] for a large list of references regarding local search/exact hybrids.

Puchinger and Raidl [155] have provided a classification of this kind of hybrid techniques in which algorithmic combinations are either collaborative (sequential or intertwined execution of the combined algorithms) or integrative (one technique works inside the other one, as a subordinate). Some of the exact/metaheuristic hybrid approaches defined before are clearly integrative—i.e., using an exact technique to explore neighborhoods. Further examples are the use of BnB in the decoding process [156] of a genetic algorithm (i.e., exact method within a metaheuristic technique), or the use of evolutionary techniques for the strategic guidance of BnB [95] (metaheuristic approach within an exact method).

With regard to collaborative combinations, a sequential approach in which the execution of a MA is followed by a branch-and-cut method can be found in [90]. Intertwined approaches are also popular. For example, Denzinger and Offerman [43] combine genetic algorithms and BnB within a parallel multi-agent system. These two algorithms also cooperate in [28, 52], the exact technique providing partial promising solutions, and the metaheuristic returning improved bound. A related approach involving beam search and full-fledged MAs can be found in [53, 55, 56]; see also [31] for a broader overview of this kind of combinations.

It must be noted that most hybrid algorithms defined so far that involve exact techniques and metaheuristics are not complete, in the sense that they do not guarantee an optimal solution (an exception is the proposal of French et al. [50], combining an integer-programming BnB approach with GAs for MAX-SAT). Thus, the term ‘complete MA’ may be not fully appropriate. Nevertheless, many of these hybrids can be readily adapted for completeness purposes, although obviously time and/or space requirements will grow faster-than-polynomial in general.

9.4 Applications of Memetic Algorithms

Applications are the “raison d’être” of memetic algorithms. Their functioning philosophy, namely incorporating and exploiting knowledge of the problem being solved, presumes they are designed with a target problem in mind. This section will provide an overview of the numerous applications of MAs. The reader may actually be convinced of the breadth of these applications by noting the existence of a number of domain-specific reviews of MAs. As a matter of fact, we have organized this section as a meta-review of applications, providing pointers to these compilations rather than to individual specific applications. This is done in Table 9.1.

Table 9.1 Application surveys of memetic algorithms

Any of the reviews mentioned are far from exhaustive since new applications are being developed continuously. However, they are intended to illustrate the practical impact of these optimization techniques, pointing out some selected compilations from these well-known application areas. For further information about MA applications, we suggest querying bibliographical databases or web browsers for the keywords ‘memetic algorithms’ and ‘hybrid genetic algorithms’.

9.5 Conclusions

We believe that the future looks good for MAs. This belief is based on the following. First of all, MAs are showing a great record of efficient implementations, providing very good results for practical problems, as the reader may have noted in Sect. 9.4. We also have reasons to believe that we are close to some major leaps forward in our theoretical understanding of these techniques, including for example the worst-case and average-case computational complexity of recombination procedures. On the other hand, the ubiquitous nature of distributed systems is likely to boost the deployment of MAs on large-scale, computationally demanding optimization problems.

We also see as a healthy sign the systematic development of other particular optimization strategies. If any of the simpler metaheuristics (SA, TS, VNS, GRASP, etc.) performs the same as a more complex method (GAs, MAs, Ant Colonies, etc.), an “elegance design” principle should prevail and we must either resort to the simpler method, or to the one that has less free parameters, or to the one that is easier to implement. Such a fact should challenge us to adapt complex methodologies to beat simpler heuristics, or to check if that is possible at

Fig. 9.1
figure 1

Number of publications obtained by querying the Web of Science and Scopus with the term “memetic algorithm” (1998–2016)

all. An unhealthy sign of current research, however, are the attempts to encapsulate metaheuristics in separate compartments. Fortunately, such attempts are becoming increasingly less frequent. Indeed, combinations of MAs with other metaheuristics such as differential evolution [133, 143, 163, 181], estimation of distribution algorithms [2, 139, 184], particle swarm optimization [75, 101, 105108, 148, 152, 172, 193], or ant-colony optimization [103] are not unusual nowadays. Furthermore, there is a clear ascending trend in the number of publications related to MAs, as shown in Fig. 9.1. Thus, as stated before, the future looks promising for MAs.