Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Evolutionary Search for Executable Programs

There have been many attempts to artificially emulate human intelligence, from symbolic artificial intelligence (GlossaryTerm

AI

) [1] to connectionism [2, 3], to subcognitive approaches like behavioral GlossaryTerm

AI

 [4] and statistical machine learning (GlossaryTerm

ML

) [5], and domain-specific achievements like web search [6] and self-driving cars [7]. Darwinian evolution [8] has a type of distributed intelligence distinct from all of these. It has created lifeforms and ecosystems of amazing diversity, complexity, beauty, facility, and efficiency. It has even created forms of intelligence very different from itself, including our own.

The principles of evolution – fitness biased selection and inheritance with variation – serve as inspiration for the field of evolutionary computation (GlossaryTerm

EC

) [9], an adaptive learning and search approach which is general-purpose, applicable even with black-box performance feedback, and highly parallel. GlossaryTerm

EC

is a trial-and-error method: individual solutions are evaluated for fitness, good ones are selected as parents, and new ones are created by inheritance with variation (Fig. 43.1).

Fig. 43.1
figure 1figure 1

The fundamental loop of EC

GlossaryTerm

GP

is the subset of GlossaryTerm

EC

in which the aim is to create executable programs. The search space is a set of programs, such as the space of all possible Lisp programs within a subset of built-in functions and functions composed by a programmer or the space of numerical C functions. The program representation is an encoding of such a search space, for example an abstract syntax tree or a list of instructions. A program’s fitness is evaluated by executing it to see what it does. New programs are created by inheritance and variation of material from parent programs, with constraints to ensure syntactic correctness.

We define a program as a data structure capable of being executed directly by a computer, or of being compiled to a directly executable form by a compiler, or of interpretation, leading to execution of low-level code, by an interpreter. A key feature of some programming languages, such as Lisp, is homoiconicity: program code can be viewed as data. This is essential in GlossaryTerm

GP

, since when the algorithm operates on existing programs to make new ones, it is regarding them as data; but when they are being executed in order to determine what they do, they are being regarded as the program code. This double meaning echoes that of GlossaryTerm

DNA

(deoxyribonucleic acid), which is both data and code in the same sense.

GlossaryTerm

GP

exists in many different forms which differ (among other ways) in their executable representation. As in programming by hand, GlossaryTerm

GP

usually considers and composes programs of varying length. Programs are also generally hierarchical in some sense, with nesting of statements or control. These representation properties (variable length and hierarchical structure) raise a very different set of technical challenges for GlossaryTerm

GP

compared to typical GlossaryTerm

EC

.

GlossaryTerm

GP

is very promising, because programs are so general. A program can define and operate on any data structure, including numbers, strings, lists, dictionaries, sets, permutations, trees, and graphs [10, 11, 12]. Via Turing completeness, a program can emulate any model of computation, including Turing machines, cellular automata, neural networks, grammars, and finite-state machines [13, 14, 15, 16, 17, 18].

A program can be a data regression model [19] or a probability distribution. It can express the growth process of a plant [20], the gait of a horse [21], or the attack strategy of a group of lions [22]; it can model behavior in the Prisoner’s Dilemma [23] or play chess [24], Pacman [25], or a car-racing game [26]. A program can generate designs for physical objects, like a space-going antenna [27], or plans for the organization of objects, like the layout of a manufacturing facility [28]. A program can implement a rule-based expert system for medicine [29], a scheduling strategy for a factory [30], or an exam timetable for a university [31]. A program can recognize speech [32], filter a digital signal [33], or process the raw output of a brain-computer interface [34]. It can generate a piece of abstract art [35], a GlossaryTerm

3-D

(three-dimensional) architectural model [36], or a piece of piano music [37].

A program can interface with natural or man-made sensors and actuators in the real world, so it can both act and react [38]. It can interact with a user or with remote sites over the network [39]. It can also introspect and copy or modify itself [40]. A program can be nondeterministic [41]. If true GlossaryTerm

AI

is possible, then a program can be intelligent [42].

2 History

GlossaryTerm

GP

has a surprisingly long history, dating back to very shortly after von Neumann’s 1945 description of the stored-program architecture [43] and the 1946 creation of ENIAC [44], sometimes regarded as the first general-purpose computer. In 1948, Turing stated the aim of machine intelligence and recognized that evolution might have something to teach us in this regard [45]:

Further research into intelligence of machinery will probably be very greatly concerned with searches. [ ] There is the genetical or evolutionary search by which a combination of genes is looked for, the criterion being survival value. The remarkable success of this search confirms to some extent the idea that intellectual activity consists mainly of various kinds of search.

However, Turing also went a step further. In 1950, he more explicitly stated the aim of automatic programming (GlossaryTerm

AP

) and a mapping between biological evolution and program search [46]:

We have [ ] divided our problem [automatic programming] into two parts. The child-program [Turing machine] and the education process. These two remain very closely connected. We cannot expect to find a good child-machine at the first attempt. One must experiment with teaching one such machine and see how well it learns. One can then try another and see if it is better or worse. There is an obvious connection between this process and evolution, by the identifications:

  • Structure of the child machine=Hereditary material

  • Changes=Mutations

  • Natural selection=Judgment of the experimenter.

This is an unmistakeable, if abstract, description of GlossaryTerm

GP

(though a computational fitness function is not envisaged).

Several other authors expanded on the aims and vision of GlossaryTerm

AP

and machine intelligence. In 1959 Samuel wrote that the aim was to be able to Tell the computer what to do, not how to do it [47]. An important early attempt at implementation of GlossaryTerm

AP

was the 1958 learning machine of Friedberg [48].

In 1963, McCarthy summarized [1] several representations with which machine intelligence might be

attempted: neural networks, Turing machines, and calculator programs. With the latter, McCarthy was referring to Friedberg’s work. McCarthy was prescient in identifying important issues such as representations, operator behavior, density of good programs in the search space, sufficiency of the search space, appropriate fitness evaluation, and self-organized modularity. Many of these remain open issues in GlossaryTerm

GP

 [49].

Fogel etal’s 1960s evolutionary programming may be the first successful implementation of GlossaryTerm

GP

 [50]. It used a finite-state machine representation for programs, with specialized operators to ensure syntactic correctness of offspring. A detailed history is available in Fogel’s 2006 book [51].

In the 1980s, inspired by the success of genetic algorithms (GlossaryTerm

GA

s) and learning classifier systems (GlossaryTerm

LCS

s), several authors experimented with hierarchically structured and program-like representations. Smith [52] proposed a representation of a variable-length list of rules which could be used for program-like behavior such as maze navigation and poker. Cramer [53] was the first to use a tree-structured representation and appropriate operators. With a simple proof of concept, it successfully evolved a multiplication function in a simple custom language. Schmidhuber [54] describes a GlossaryTerm

GP

system with the possibility of Turing completeness, though the focus is on meta-learning aspects. Fujiki and Dickinson [55] generated Lisp code for the prisoner’s dilemma, Bickel and Bickel [56] used a GlossaryTerm

GA

to create variable-length lists of rules, each of which had a tree structure. An artificial life approach using machine-code genomes was used by Ray [57]. All of these would likely be regarded as on-topic in a modern GlossaryTerm

GP

conference.

However, the founding of the modern field of GlossaryTerm

GP

, and the invention of what is now called standard GlossaryTerm

GP

, are credited to Koza [19]. In addition to the abstract syntax tree notation (Sect. 43.3.3), the key innovations were subtree crossover (Sect. 43.3.3 ) and the description and set-up of many test problems. In this and later research [10, 58, 59] symbolic regression of synthetic data and real-world time series, Boolean problems, and simple robot control problems such as the lawnmower problem and the artificial ant with Santa Fe trail were introduced as benchmarks and solved successfully for the first time, demonstrating that GlossaryTerm

GP

was a potentially powerful and general-purpose method capable of solving machine learning-style problems albeit conventional academic versions of them. Mutation was minimized in order to make it clear that GlossaryTerm

GP

was different from random search. GlossaryTerm

GP

took on its modern form in the years following Koza’s 1992 book: many researchers took up work in the field, new types of GlossaryTerm

GP

were developed (Sect. 43.3), successful applications appeared (Sect. 43.4), key research topics were identified (Sect. 43.5), further books were written, and conferences and journals were established (Sect. 43.6).

Another important milestone in the history of GlossaryTerm

GP

was the 2004 establishment of the Humies, the awards for human-competitive results produced by GlossaryTerm

EC

methods. The entries are judged for matching or exceeding human-produced solutions to the same or similar problems, and for criteria such as patentability and publishability. The impressive list of human-competitive results [60] again helps to demonstrate to researchers and clients outside the field of GlossaryTerm

GP

that it is powerful and general purpose.

3 Taxonomy of AI and GP

In this section, we present a taxonomy which firstly places GlossaryTerm

GP

in the context of the broader fields of GlossaryTerm

EC

, GlossaryTerm

ML

, and artificial intelligence (GlossaryTerm

AI

). It then classifies GlossaryTerm

GP

techniques according to their representations and their population models (Fig. 43.3).

3.1 Placing GP in an AI Context

GlossaryTerm

GP

is a type of GlossaryTerm

EC

, which is a type of GlossaryTerm

ML

, which is itself a subset of the broader field of GlossaryTerm

AI

(Fig. 43.3). Carbonell etal [61] classify GlossaryTerm

ML

techniques

Fig. 43.3
figure 2figure 2

A taxonomy of AI, EC, and GP

according to the underlying learning strategy, which may be rote learning, learning from instruction, learning by analogy, learning from examples, and learning from observation and discovery. In this classification, GlossaryTerm

EC

and GlossaryTerm

GP

fit in the learning from examples category, in that an (individual, fitness) pair is an example drawn from the search space together with its evaluation.

It is also useful to see GlossaryTerm

GP

as a subset of another field, GlossaryTerm

AP

. The term automatic programming seems to have had different meanings at different times, from automated card punching, to compilation, to template-driven source generation, then generation techniques such as universal modeling language (GlossaryTerm

UML

), to the ambitious aim of creating software directly from a natural-language English specification [62]. We interpret GlossaryTerm

AP

to mean creating software by specifying what to do rather than how to do it [47]. GlossaryTerm

GP

clearly fits into this category. Other nonevolutionary techniques also do so, for example inductive programming (GlossaryTerm

IP

). The main difference between GlossaryTerm

GP

and GlossaryTerm

IP

is that typically GlossaryTerm

IP

works only with programs which are known to be correct, achieving this using inductive methods over the specifications, [63]. In contrast, GlossaryTerm

GP

is concerned mostly with programs which are syntactically correct, but behaviorally suboptimal.

3.2 Taxonomy of GP

It is traditional to divide GlossaryTerm

EC

into four main subfields: evolution strategies (GlossaryTerm

ES

) [64, 65], evolutionary programming (GlossaryTerm

EP

) [50], GlossaryTerm

GA

s [66], and GlossaryTerm

GP

. In this view, GlossaryTerm

ES

is chiefly characterized by real-valued optimization and self-adaptation of algorithm parameters; GlossaryTerm

EP

by a finite-state machine representation (later generalized) and the absence of crossover; GlossaryTerm

GA

by the bitstring representation; and GlossaryTerm

GP

by the abstract syntax tree representation. While historically useful, this classification is not exhaustive: in particular it does not provide a home for the many alternative GlossaryTerm

GP

representations which now exist. It also separates GlossaryTerm

EP

and GlossaryTerm

GP

, though they are both concerned with evolving programs. We prefer to use the term GlossaryTerm

GP

in a general sense to refer to all types of GlossaryTerm

EC

which evolve programs. We use the term standard GlossaryTerm

GP

(GlossaryTerm

StdGP

) to mean Koza-style GlossaryTerm

GP

with a tree representation. With this view, GlossaryTerm

StdGP

and GlossaryTerm

EP

are types of GlossaryTerm

GP

, as are several others discussed below. In the following, we classify GlossaryTerm

GP

algorithms according to their representation and according to their population model.

3.3 Representations

Throughout GlossaryTerm

EC

, it is useful to contrast direct and indirect representations. Standard GlossaryTerm

GP

is direct, in that the genome (the object created and modified by the genetic operators) serves directly as an executable program. Some other GlossaryTerm

GP

representations are indirect, meaning that the genome must be decoded or translated in some way to give an executable program. An example is grammatical evolution (GlossaryTerm

GE

, see below), where the genome is an integer array which is used to generate a program. Indirect representations have the advantage that they may allow an easier definition of the genetic operators, since they may allow the genome to exist in a rather simpler space than that of executable programs. Indirect representations also imitate somewhat more closely the mechanism found in nature, a mapping from GlossaryTerm

DNA

(deoxyribonucleic acid) to GlossaryTerm

RNA

(ribonucleic acid) to GlossaryTerm

mRNA

(messenger GlossaryTerm

RNA

) to codons to proteins and finally to cells. The choice between direct and indirect representations also affects the structure of the fitness landscape (Sect. 43.5.2). In the following, we present a nonexhaustive selection of the main representations used in GlossaryTerm

GP

, in each case describing initialization and the two key operators: mutation, and crossover.

3.3.1 Standard GP

In Standard GlossaryTerm

GP

(GlossaryTerm

StdGP

), the representation is an abstract syntax tree, or can be seen as a Lisp-style S-expression. All nodes are functions and all arguments are the same type. A function accepts zero or more arguments and returns a single value. Trees can be initialized by recursive random growth starting from a null node. GlossaryTerm

StdGP

uses parameterized initialization methods that diversify the size and structure of initial trees. Figure 43.2a shows a tree in the process of initialization.

Fig. 43.2 a–c
figure 3figure 3

The StdGP representation is an abstract syntax tree. The expression that will be evaluated in the second tree from left is, in inorder notation, ( x * y ) - ( x + 2 ) . In preorder, or the notation of Lisp-style S-expressions, it is ( - ( * x y ) ( + x  2 ) ) . GP presumes that the variables x and y will be already bound to some value in the execution environment when the expression is evaluated. It also presumes that the operations * and -, etc. are also defined. Note that, all interior tree nodes are effectively operators in some computational language. In standard GP parlance, these operators are called functions and the leaf tree nodes which accept no arguments and typically represent variables bound to data values from the problem domain are referred to as terminals

Trees can be crossed over by cutting and swapping the subtrees rooted at randomly chosen nodes, as shown in Fig. 43.2b. They can be mutated by cutting and regrowing from the subtrees of randomly chosen nodes, as shown in Fig. 43.2c. Another mutation operator, HVL-Prime, is shown later in Fig. 43.11. Note that crossover or mutation creates an offspring of potentially different size and structure, but the offspring remains syntactically valid for evaluation. With these variations, a tree could theoretically grow to infinite size or height. To circumvent this, as a practicality, a hard parameterized threshold for size or height or some other threshold is used. Violations to the threshold are typically rejected. Bias may also be applied in the randomized selection of crossed-over subtree roots. A common variation of GlossaryTerm

StdGP

is strongly typed GlossaryTerm

GP

(GlossaryTerm

STGP

) [67, 68], which supports functions accepting arguments and returning values of specific types by means of specialized mutation and crossover operations that respect these types.

3.3.2 Executable Graph Representations

A natural generalization of the executable tree representation of GlossaryTerm

StdGP

is the executable graph. Neural networks can be seen as executable graphs in which each node calculates a weighted sum of its inputs and outputs the result after a fixed shaping function such as tanh⁡ ( ) . Parallel and distributed GlossaryTerm

GP

(GlossaryTerm

PDGP

) [69] is more closely akin to GlossaryTerm

StdGP

in that nodes calculate different functions, depending on their labels, and do not perform a weighted sum. It also allows the topology of the graph to vary, unlike the typical neural network. Cartesian GlossaryTerm

GP

(GlossaryTerm

CGP

) [70] uses an integer-array genome and a mapping process to produce the graph. Each block of three integer genes codes for a single node in the graph, specifying the indices of its inputs and the function to be executed by the node (Fig. 43.4).

Fig. 43.4
figure 4figure 4

Cartesian GP. An integer-array genome is divided into blocks: in each block the last integer specifies a function (top-left). Then one node is created for each input variable ( x , y , z ) and for each genome block. Nodes are arranged in a grid and outputs are indexed sequentially (bottom-left). The first elements in each block specify the indices of the incoming links. The final graph is created by connecting each node input to the node output with the same integer label (right). Dataflow in the graph is bottom to top. Multiple outputs can be read from the topmost layer of nodes. In this example node 6 outputs x y - z + y , node 7 outputs x + z + y , and node 8 outputs  x y / x y

Neuro-evolution of augmenting topologies (GlossaryTerm

NEAT

) [71] again allows the topology to vary, and allows nodes to be labelled by the functions they perform, but in this case each node does perform a weighted sum of its inputs. Each of these representations uses different operators. For example, GlossaryTerm

CGP

uses simple array-oriented (GlossaryTerm

GA

-style) initialization, crossover, and mutation operators (subject to some customizations).

3.3.3 Finite-State Machine Representations

Some GlossaryTerm

GP

representations use graphs in a different way: the model of computation is the finite-state machine rather than the executable functional graph (Fig. 43.5). The original incarnation of evolutionary programming (GlossaryTerm

EP

) [72] is an example. In a typical implementation [72], five types of mutation are used: adding and deleting states, changing the initial state, changing the output symbol attached to edges, and changing the edges themselves. In this implementation, crossover is not used.

Fig. 43.5
figure 5figure 5

EP representation: finite-state machine. In this example, a mutation changes a state transition

3.3.4 Grammatical GP

In grammatical GlossaryTerm

GP

 [73], the context-free grammar (GlossaryTerm

CFG

) is the defining component of the representation. In the most common approach, search takes place in the space defined by a fixed nondeterministic GlossaryTerm

CFG

. The aim is to find a good program in that space. Often the GlossaryTerm

CFG

defines a useful subset of a programming language such as Lisp, C, or Python. Programs derived from the GlossaryTerm

CFG

can then be compiled or interpreted using either standard or special-purpose software. There are several advantages to using a GlossaryTerm

CFG

. It allows convenient definition of multiple data-types which are automatically respected by the crossover and mutation operators. It can introduce domain knowledge into the problem representation. For example, if it is known that good programs will consist of a conditional statement inside a loop, it is easy to express this knowledge using a grammar. The grammar can restrict the ways in which program expressions are combined, for example making the system aware of physical units in dimensionally aware GlossaryTerm

GP

 [74, 75]. A grammatical GlossaryTerm

GP

system can conveniently be applied to new domains, or can incorporate new domain knowledge, through updates to the grammar rather than large-scale reprogramming.

In one early system [76], the derivation tree is used as the genome: initial individuals’ genomes are randomly generated according to the rules of the grammar. Mutation works by randomly generating a new subtree starting from a randomly chosen internal node in the derivation tree. Crossover is constrained to exchange subtrees whose roots are identical. In this way, new individuals are guaranteed to be valid derivation trees. The executable program is then created from the genome by reading the leaves left to right. A later system, grammatical evolution (GlossaryTerm

GE

s) [77] instead uses an integer-array genome. Initialization, mutation and crossover are defined as simple GlossaryTerm

GA

-style array operations. The genome is mapped to an output program by using the successive integers of the genome to choose among the applicable production choices at each step of the derivation process. Figure 43.6 shows a simple grammar, integer genome, derivation process, and derivation tree. At each step of the derivation process, the left-most nonterminal in the derivation is rewritten. The next integer gene is used to determine, using the mod rule, which of the possible productions is chosen. The output program is the final step of the derivation tree.

Fig. 43.6
figure 6figure 6

GE representation. The grammar (top-left) consists of several rules. The genome (center-left) is a variable-length list of integers. At each step of the derivation process (bottom-left), the left-most nonterminal is rewritten as specified by a gene. The resulting derivation tree is shown on the right: reading just the leaves gives the derived program

Although successful and widely used, GlossaryTerm

GE

has also been criticized for the disruptive effects of its operators with respect to preserving the modular functionality of parents. Another system, tree adjoining grammar-guided genetic programming (GlossaryTerm

TAG3P

) has also been used successfully [78]. Instead of a string-rewriting GlossaryTerm

CFG

, GlossaryTerm

TAG3P

uses the tree-rewriting tree adjoining grammars. The representation has the advantage, relative to GlossaryTerm

GE

, that individuals are valid programs at every step of the derivation process. TAGs also have some context-sensitive properties [78]. However, it is a more complex representation.

Another common alternative approach, surveyed by Shan etal [79], uses probabilistic models over grammar-defined spaces, rather than direct evolutionary search.

3.3.5 Linear GP

In Linear GlossaryTerm

GP

(GlossaryTerm

LGP

), the program is a list of instructions to be interpreted sequentially. In order to achieve complex functionality, a set of registers acting as state or memory are used. Instructions can read from or write to the registers. Several registers, which may be read-only, are initialized with the values of the input variables. One register is designated as the output: its value at the end of the program is taken as the result of the program. Since a register can be read multiple times after writing, an GlossaryTerm

LGP

program can be seen as having a graph structure. A typical implementation is that of [80]. It uses instructions of three registers each, which typically calculate a new value as an arithmetic function of some registers and/or constants, and assign it to a register (Fig. 43.7).

Fig. 43.7
figure 7figure 7

Linear GP representation. This implementation has four registers in total (top). The representation is a list of register-oriented instructions (bottom). In this example program of three instructions, r0 is the output register, and the formula 4 ( x 0 + x 1 ) 2 is calculated

It also allows conditional statements and looping. It explicitly recognizes the possibility of nonfunctioning code, or introns. Since there are no syntactic constraints on how multiple instructions may be composed together, initialization can be as simple as the random generation of a list of valid instructions. Mutation can change a single instruction to a newly generated instruction, or change just a single element of an instruction. Crossover can be performed over the two parents’ list structures, respecting instruction boundaries.

3.3.6 Stack-Based GP

A variant of linear GlossaryTerm

GP

avoids the need for registers by adding a stack. The program is again a list of instructions, each now represented by a single label. In a simple arithmetic implementation, the label may be one of the input variables ( x i ) , a numerical constant, or a function ( * , + , etc.). If it is a variable or constant, the instruction is executed by pushing the value onto the stack. If a function, it is executed by popping the required number of operands from the stack, executing the function on them, and pushing the result back on. The result of the program is the value at the top of the stack after all instructions have been executed. With the stipulation that stack-popping instructions become no-ops when the stack is empty, one can again implement initialization, mutation, and crossover as simple list-based operations [81]. One can also constrain the operations to work on what are effectively subtrees, so that stack-based GlossaryTerm

GP

becomes effectively equivalent to a reverse Polish notation implementation of standard GlossaryTerm

GP

 [82]. A more sophisticated type of stack-based GlossaryTerm

GP

is PushGP [83], in which multiple stacks are used. Each stack is used for values of a different type, such as integer, boolean, and float. When a function requires multiple operands of different types, they are taken as required from the appropriate stacks. With the addition of an exec stack which stores the program code itself, and the code stack which stores items of code, both of which may be both read and written, PushGP gains the ability to evolve programs with self-modification, modularity, control structures, and even self-reproduction.

3.3.7 Low-Level Programming

Finally, several authors have evolved programs directly in real-world low-level programming languages. Schulte etal [84] automatically repaired programs written in Java byte code and in x86 assembly. Orlov and Sipper [85] evolved programs such as trail navigation and image classification de novo in Java byte code. This work made use of a specialized crossover operator which performed automated checks for compatibility of the parent programs’ stack and control flow state. Nordin [86] proposed a machine-code representation for GlossaryTerm

GP

. Programs consist of lists of low-level register-oriented instructions which execute directly, rather than in a virtual machine or interpreter. The result is a massive speed-up in execution.

3.4 Population Models

It is also useful to classify GlossaryTerm

GP

methods according to their population models. In general the population model and the representation can vary independently, and in fact all of the following population can be applied with any GlossaryTerm

EC

representation including bitstrings and real-valued vectors, as well as with GlossaryTerm

GP

representations.

The simplest possible model, hill-climbing, uses just one individual at a time [87]. At each iteration, offspring are created until one of them is more highly fit than the current individual, which it then replaces. If at any iteration it becomes impossible to find an improvement, the algorithm has climbed the hill, i. e. reached a local optimum, and stops. It is common to use a random restart in this case. The hill-climbing model can be used in combination with any representation. Note that it does not use crossover. Variants include GlossaryTerm

ES

-style ( μ , λ ) or ( μ + λ ) schemes, in which multiple parents each give rise to multiple offspring by mutation.

The most common model is an evolving population. Here a large number of individuals (from tens to many thousands) exist in parallel, with new generations being created by crossover and mutation among selected individuals. Variants include the steady-state and the generational models. They differ only in that the steady-state model generates one or a few new individuals at a time, adds them to the existing population and removes some old or weak individuals; whereas the generational model generates an entirely new population all at once and discards the old one.

The island model is a further addition, in which multiple populations all evolve in parallel, with infrequent migration between them [88].

In coevolutionary models, the fitness of an individual cannot be calculated in an endogenous way. Instead it depends on the individual’s relationship to other individuals in the population. A typical example is in game-playing applications such as checkers, where the best way to evaluate an individual is to allow it to play against other individuals. Coevolution can also use fitness defined in terms of an individual’s relationship to individuals in a population of a different type. A good example is the work of [89], which uses a type of predator–prey relationship between populations of programs and populations of test cases. The test cases (predators) evolve to find bugs in the programs; the programs (prey) evolve to fix the bugs being tested for by the test suites.

Another group of highly biologically inspired population models are those of swarm intelligence. Here the primary method of learning is not the creation of new individuals by inheritance. Instead, each individual generally lives for the length of the run, but moves about in the search space with reference to other individuals and their current fitness values. For example, in particle swarm optimization (GlossaryTerm

PSO

) individuals tend to move toward the global best and toward the best point in their own history, but tend to avoid moving too close to other individuals. Although GlossaryTerm

PSO

and related methods such as differential evolution (GlossaryTerm

DE

) are best applied in real-valued optimization, their population models and operators can be abstracted and applied in GlossaryTerm

GP

methods also [90, 91].

Finally, we come to estimation of distribution algorithms (GlossaryTerm

EDA

s). Here the idea is to create a population, select a subsample of the best individuals, model that subsample using a distribution, and then create a new population by sampling the distribution. This approach is particularly common in grammar-based GlossaryTerm

GP

 [73], though it is also used with other representations [92, 93, 94]. The modeling-sampling process could be regarded as a type of whole-population crossover. Alternatively one can view GlossaryTerm

EDA

s as being quite far from the biological inspiration of most GlossaryTerm

EC

, and in a sense they bridge the gap between GlossaryTerm

EC

and statistical GlossaryTerm

ML

.

4 Uses of GP

Our introduction (Sect. 43.1) has touched on a wide array of domains in which GlossaryTerm

GP

has been applied. In this section, we give more detail on just a few of these.

4.1 Symbolic Regression

Symbolic regression is one of the most common tasks for which GlossaryTerm

GP

is used [19, 95, 96]. It is used as a component in techniques like data modeling, clustering, and classification, for example in the modeling application outlined in Sect. 43.4.2. It is named after techniques such as linear or quadratic regression, and can be seen as a generalization of them. Unlike those techniques it does not require a priori specification of the model. The goal is to find a function in symbolic form which models a data set. A typical symbolic regression is implemented as follows.

It begins with a dataset which is to be regressed, in the form of a numerical matrix (Fig. 43.8, left). Each row i is a data-point consisting of some input (explanatory) variables x i and an output (response) variable y i to be modeled. The goal is to produce a function f ( x ) which models the relationship between x and y as closely as possible. Figure 43.8 (right) plots the existing data and one possible function f.

Fig. 43.8
figure 8figure 8

Symbolic regression: a matrix of data (left) is to be modeled by a function. It is plotted as dots in the figure on the right. A candidate function f (solid line) can be plotted, and its errors (dotted lines) can be visualized

Typically GlossaryTerm

StdGP

is used, with a numerical language which includes arithmetic operators, functions like sinusoids and exponentials, numerical constants, and the input variables of the dataset. The internal nodes of each GlossaryTerm

StdGP

abstract syntax tree will be operators and functions, and the leaf nodes will be constants and variables.

To calculate the fitness of each model, the explanatory variables of the model are bound to their values at each of the training points x i in turn. The model is executed, and the output f ( x i ) is the model’s predicted response. This value y ^ i is then compared to the response of the training point y i . The error can be visualized as the dotted lines in Fig. 43.8 (right). Fitness is usually defined as the root-mean-square error of the model’s outputs versus the training data. In this formulation, therefore, fitness is to be minimized

fitness ( f ) = i = 1 n ( f ( x i ) - y i ) 2 n .

Over the course of evolution, the population moves toward better and better models f of the training data. After the run, a testing data set is used to confirm that the model is capable of generalization to unseen data.

4.2 Machine Learning

Like other GlossaryTerm

ML

methods, GlossaryTerm

GP

is successful in quantitative domains where data is available for learning and both approximate solutions and incremental improvements are valued. In modeling or supervised learning, GlossaryTerm

GP

is preferable to other GlossaryTerm

ML

methods in circumstances where the form of the solution model is unknown a priori because it is capable of searching among possible forms for the model. Symbolic regression can be used as an approach to classification, regression modeling, and clustering. It can also be used to automatically extract influential features, since it is able to pare down the feature set it is given at initialization. GlossaryTerm

GP

-derived classifiers have been integrated into ensemble learning approaches and GlossaryTerm

GP

has been used in reinforcement learning (GlossaryTerm

RL

) contexts. Figure 43.9 shows GlossaryTerm

GP

as a means of GlossaryTerm

ML

which allows it to address problems such as planning, forecasting, pattern recognition, and modeling.

Fig. 43.9
figure 9figure 9

GP as a component in ML. Symbolic regression can be used as an approach to many ML tasks, and integrated with other ML techniques

For the sensory evaluation problem described in [97], the authors use GlossaryTerm

GP

as the anchor of a GlossaryTerm

ML

framework (Fig. 43.10). A panel of assessors provides liking scores for many different flavors. Each flavor consists of a mixture of ingredients in different proportions. The goals are to discover the dependency of a liking score on the concentration levels of flavors’ ingredients, identifying ingredients that drive liking, segmenting the panel into groups with similar liking preferences and optimizing flavors to maximize liking per group. The framework uses symbolic regression and ensemble methods to generate multiple diverse explanations of liking scores, with confidence information. It uses statistical techniques to extrapolate from the genetically evolved model ensembles to unobserved regions of the flavor space. It also segments the assessors into groups which either have the same propensity to like flavors, or whose liking is driven by the same ingredients.

Sensory evaluation data is very sparse and there is large variation among the responses of different assessors. A Pareto-GlossaryTerm

GP

algorithm (which uses multiobjective techniques to maximise model accuracy and minimise model complexity; [98]) was therefore used to evolve an ensemble of models for each assessor and to use this ensemble as a source of robust variable importance estimation. The frequency of variable occurrences in the models of the ensemble was interpreted as information about the ingredients that drive the liking of an assessor. Model ensembles with the same dominance of variable occurrences, and which demonstrate similar effects when the important variables are varied, were grouped together to identify assessors who are driven by the same ingredient set and in the same direction. Varying the input values of the important variables, while using the model ensembles of these panel segments, provided a means of conducting focused sensitivity analysis. Subsequently, the same model ensembles when clustered constitute the black box which is used by an evolutionary algorithm in its optimization of flavors that are well liked by assessors who are driven by the same ingredient.

Fig. 43.10
figure 10figure 10

GP symbolic regression is unique and useful as an ML technique because it obviates the need to define the structure of a model prior to training. Here, it is used to form a personalized ensemble model for each assessor in a flavor evaluation panel

4.3 Software Engineering

At least three areas of software engineering have been tackled with remarkable success by GlossaryTerm

GP

: bug-fixing [99], parallelization [100, 101], and optimization [102, 103, 104]. These three areas are very different in their aims, scope, and methods; however, they all need to deal with two key problems in this domain: the very large and unconstrained search space, and the problem of program correctness. They therefore do not aim to evolve new functionality from scratch, but instead use existing code as material to be transformed in some way; and they either guarantee correctness of the evolved programs as a result of their representations, or take advantage of existing test suites in order to provide strong evidence of correctness.

Le Goues etal [99] show that automatically fixing software bugs is a problem within the reach of GlossaryTerm

GP

. They describe a system called GenProg. It operates on C source code taken from open-source projects. It works by forming an abstract syntax tree from the original source code. The initial population is seeded with variations of the original. Mutations and crossover are constrained to copy or delete complete lines of code, rather than editing subexpressions, and they are constrained to alter only lines which are exercised by the failing test cases. This helps to reduce the search space size. The original test suites are used to give confidence that the program variations have not lost their original functionality. Fixes for several real-world bugs are produced, quickly and with high certainty of success, including bugs in HTTP servers, Unix utilities, and a media player. The fixes can be automatically processed to produce minimal patches. Best of all, the fixes are demonstrated to be rather robust: some even generalize to fixing related bugs which were not explicitly encoded in the test suite.

Ryan [100] describes a system, Paragen, which automatically rewrites serial Fortran programs to parallel versions. In Paragen I, the programs are directly varied by the genetic operators, and automated tests are used to reward the preservation of the program’s original semantics. The work of Williams [101] was in some ways similar to Paragen I. In Paragen II, correctness of the new programs is instead guaranteed, using a different approach. The programs to be evolved are sequences of transformations defined over the original serial code. Each transformation is known to preserve semantics. Some transformations however directly transform serial operations to parallel, while other transformations merely enable the first type.

A third goal of software engineering is optimization of existing code. White etal [104] tackle this task using a multiobjective optimization method. Again, an existing program is used as a starting point, and the aim is to evolve a semantically equivalent one with improved characteristics, such as reduced memory usage, execution time, or power consumption. The system is capable of finding nonobvious optimizations, i. e. ones which cannot be found by optimizing compilers. A population of test cases is coevolved with the population of programs. Stephenson etal [102, 103] in the Meta Optimization project improve program execution speed by using GlossaryTerm

GP

to refine priority functions within the compiler. The compiler generates better code which executes faster across the input range of one program and across the program range of a benchmark set.

A survey of the broader field of search-based software engineering is given by Harman [105].

4.4 Design

GlossaryTerm

GP

has been successfully used in several areas of design. This includes both engineering design, where the aim is to design some hardware or software system to carry out a well-defined task, and aesthetic design, where the aim is to produce art objects with subjective qualities.

4.4.1 Engineering Design

One of the first examples of GlossaryTerm

GP

design was the synthesis of analog electrical circuits by Koza etal [106]. This work addressed the problem of automatically creating circuits to perform tasks such as a filter or an amplifier. Eight types of circuit were automatically created, each having certain requirements, such as outputting an amplified copy of the input, and low distortion. These functions were used to define fitness. A complex GlossaryTerm

GP

representation was used, with both GlossaryTerm

STGP

(Sect. 43.3.3) and GlossaryTerm

ADF

s (Sect. 43.5.3). Execution of the evolved program began with a trivial embryonic circuit. GlossaryTerm

GP

program nodes, when executed, performed actions such as altering the circuit topology or creating a new component. These nodes were parameterized with numerical parameters, also under GlossaryTerm

GP

control, which could be created by more typical arithmetic GlossaryTerm

GP

subtrees. The evolved circuits solved significant problems to a human-competitive standard though they were not fabricated.

Another significant success story was the space-going antenna evolved by Hornby etal [27] for the GlossaryTerm

NASA

(National Aeronautics and Space Administration) Space Technology 5 spacecraft. The task was to design an antenna with certain beamwidth and bandwidth requirements, which could be tested in simulation (thus providing a natural fitness function). GlossaryTerm

GP

was used to reduce reliance on human labor and limitations on complexity, and to explore areas of the search space which would be rejected as not worthy of exploration by human designers. Both a GlossaryTerm

GA

and a GlossaryTerm

GP

representation were used, producing quite similar results. The GlossaryTerm

GP

representation was in some ways similar to a 3-D turtle graphics system. Commands included forward which moved the turtle forward, creating a wire component, and rotate-x which changed orientation. Branching of the antenna arms was allowed with special markers similar to those used in turtle graphics programs. The program composed of these primitives, when run, created a wire structure, which was rotated and copied four times to produce a symmetric result for simulation and evaluation.

4.4.2 Aesthetic Design

There have also been successes in the fields of graphical art, 3-D aesthetic design, and music. Given the aesthetic nature of these fields, GlossaryTerm

GP

fitness is often replaced by an interactive approach where the user performs direct selection on the population. This approach dates back to Dawkins’ seminal Biomorphs [107] and has been used in other forms of GlossaryTerm

EC

also [108]. Early successes were those of Todd and Latham [109], who created pseudo-organic forms, and Sims [35] who created abstract art. An overview of evolutionary art is provided by Lewis [110].

A key aim throughout aesthetic design is to avoid the many random-seeming designs which tend to be created by typical representations. For example, a naive representation for music might encode each quarter-note as an integer in a genome whose length is the length of the eventual piece. Such a representation will be capable of representing some good pieces of music, but it will have several significant problems. The vast majority of pieces will be very poor and random sounding. Small mutations will tend to gradually degrade pieces, rather than causing large-scale and semantically sensible transformations [111].

As a result, many authors have tried to use representations which take advantage of forms of reuse. Although reuse is also an aim in nonaesthetic GlossaryTerm

GP

(Sect. 43.5.3), the hypothesis that good solutions will tend to involve reuse, even on new, unknown problems, is more easily motivated in the context of aesthetic design.

In one strand of research, the time or space to be occupied by the work is predefined, and divided into a grid of 1, 2, or 3 dimensions. A GlossaryTerm

GP

function of 1, 2 or 3 arguments is then evolved, and applied to each point in the grid with the coordinates of the point passed as arguments to the function. The result is that the function is reused many times, and all parts of the work are felt to be coherent. The earliest example of such work was that of Sims [35], who created fascinating graphical art (a GlossaryTerm

2-D

grid) and some animations (a 3-D grid of two spatial dimensions and 1 time dimension). The paradigm was later brought to a high degree of artistry by Hart [112]. The same generative idea, now with a GlossaryTerm

1-D

grid representing time, was used by Hoover etal [113], Shao etal [114] and McDermott and O’Reilly [115] to produce music as a function of time, and with a 3-D grid by Clune and Lipson [116] to produce 3-D sculptures.

Other successful work has used different approaches to reuse. L-systems are grammars in which symbols are recursively expanded in parallel: after several expansions (a growth process), the string will by highly patterned, with multiple copies of some substrings. Interpreting this string as a program can then yield highly patterned graphics [117], artificial creatures [118], and music [119]. Grammars have also been used in 3-D and architectural design, both in a modified L-system form [36] and in the standard GlossaryTerm

GE

form [120]. The Ossia system of Dahlstedt [37] uses GlossaryTerm

GP

trees with recursive pointers to impose reuse and a natural, gestural quality on short pieces of art music.

5 Research Topics

Many research topics of interest to GlossaryTerm

GP

practitioners are also of broader interest. For example, the self-adaptation of algorithm parameters is a topic of interest throughout GlossaryTerm

EC

. We have chosen to focus on four research topics of specific interest in GlossaryTerm

GP

: bloat, GlossaryTerm

GP

theory, modularity, and open-ended evolution.

5.1 Bloat

Most GlossaryTerm

GP

-type problems naturally require variable-length representations. It might be expected that selection pressure would effectively guide the population toward program sizes appropriate to the problem, and indeed this is sometimes the case. However, it has been observed that for many different representations [121] and problems, programs grow over time without apparent fitness improvements. This phenomenon is called bloat. Since the time complexity for the evaluation of a GlossaryTerm

GP

program is generally proportional to its size, this greatly slows the GlossaryTerm

GP

run down. There are also other drawbacks. The eventual solution may be so large and complex that is unreadable, negating a key advantage of symbolic methods like GlossaryTerm

GP

. Overly large programs tend to generalize less well than parsimonious ones. Bloat may negatively impact the rate of fitness improvement. Since bloat is a significant obstacle to successful GlossaryTerm

GP

, it is an important topic of research, with differing viewpoints both on the causes of bloat and the best solutions.

The competing theories of the causes of bloat are summarized by Luke and Panait [122] and Silva etal [123]. A fundamental idea is that adding material rather than removing material from a GlossaryTerm

GP

tree is more likely to lead to a fitness improvement. The hitchhiking theory is that noneffective code is carried along by virtue of being attached to useful code. Defense against crossover suggests that large amounts of noneffective code give a selection advantage later in GlossaryTerm

GP

runs when crossover is likely to highly destructive of good, fragile programs. Removal bias is the idea that it is harder for GlossaryTerm

GP

operators to remove exactly the right (i. e., noneffective) code than it is to add more. The fitness causes bloat theory suggests that fitness-neutral changes tend to increase program size just because there are many more programs with the same functionality at larger sizes than at smaller [124]. The modification point depth theory suggests that children formed by tree crossover at deep crossover points are likely to have fitness similar to their parents and thus more likely to survive than the more radically different children formed at shallow crossover points. Because larger trees have more very deep potential crossover points, there is a selection pressure toward growth. Finally, the crossover bias theory [125] suggests that after many crossovers, a population will tend toward a limiting distribution of tree sizes [126] such that small trees are more common than large ones – note that this is the opposite of the effect that might be expected as the basis of a theory of bloat. However, when selection is considered, the majority of the small programs cannot compete with the larger ones, and so the distribution is now skewed in favour of larger programs.

Many different solutions to the problem of bloat have been proposed, many with some success. One simple method is depth limiting, imposing a fixed limit on the tree depth that can be produced by the variation operators [19].

Another simple but effective method is Tarpeian bloat control [127]. Individuals which are larger than average receive, with a certain probability, a constant punitively bad fitness. The advantage is that these individuals are not evaluated, and so a huge amount of time can be saved and devoted to running more generations (as in [122]). The Tarpeian method does allow the population to grow beyond its initial size, since the punishment is only applied to a proportion of individuals – typically around 1 in 3. This value can also be set adaptively [127].

The parsimony pressure method evaluates all individuals, but imposes a fitness penalty on overly large individuals. This assumes that fitness is commensurable with size: the magnitude of the punishment establishes a de facto exchange rate between the two. Luke and Panait [122] found that parsimony pressure was effective across problems and across a wide range of exchange rates.

The choice of an exchange rate can be avoided using multiobjective methods, such as Pareto-GlossaryTerm

GP

 [128], where one of the objectives is fitness and the other program length or complexity. The correct definition for complexity in this context is itself an interesting research topic [129, 96]. Alternatively, the pressure against bloat can be moved from the fitness evaluation phase to the the selection phase of the algorithm, using the double tournament method [122]. Here individuals must compete in one fitness-based tournament and one size-based one. Another approach is to incorporate tree size directly into fitness evaluation using a minimum description length principle [130].

Another technique is called operator length equalization. A histogram of program sizes is maintained throughout the run and is used to set the population’s capacity for programs of different sizes. A newly created program which would cause the population’s capacity to be exceeded is rejected, unless exceptionally fit. A mutation-based variation of the method instead mutates the overly large individuals using directed mutation to become smaller or larger as needed.

Some authors have argued that the choice of GlossaryTerm

GP

representation can avoid the issue of bloat [131]. Some aim to avoid the problem of bloat by speeding up fitness evaluation [132, 82] or avoiding wasted effort in evaluation [133, 134]. Sometimes GlossaryTerm

GP

techniques are introduced with other motivations but have the side-effect of reducing bloat [135].

In summary, researchers including Luke and Panait [122], Poli etal [127], Miller [131], and Silva etal [123] have effectively declared victory in the fight against bloat. However, their techniques have not yet become standard for new GlossaryTerm

GP

research and benchmark experiments.

5.2 GP Theory

Theoretical research in GlossaryTerm

GP

seeks to answer a variety of questions, for example: What are the drivers of population fitness convergence? How does the behavior of an operator influence the progress of the algorithm? How does the combination of different algorithmic mechanisms steer GlossaryTerm

GP

toward fitter solutions? What mechanisms cause bloat to arise? What problems are difficult for GP? How diverse is a GlossaryTerm

GP

population? Theoretical methodologies are based in mathematics and exploit formalisms, theorems, and proofs for rigor. While GlossaryTerm

GP

may appear simple, beyond its stochastic nature which it shares with all other evolutionary algorithms, its variety of representations each impose specific requirements for theoretical treatment. All GlossaryTerm

GP

representations share two common traits which greatly contribute to the difficulty it poses for theoretical analysis. First, the representations have no fixed size, implying a complex search space. Second, GlossaryTerm

GP

representations do not imply that parents will be equal in size and shape. While crossover accommodates this lack of synchronization, it generally allows the exchange of content from anywhere in one parent to anywhere in the other parent’s tree. This implies combinatorial outcomes and likes not switching with likes. This functionality contributes to complicated algorithmic behavior which is challenging to analyze.

Here, we select several influential methods of theoretical analysis and very briefly describe them and their results: schema-based analysis, Markov chain modeling, runtime complexity, and problem difficulty. We also introduce the No Free Lunch Theorem and describe its implications for GlossaryTerm

GP

.

5.2.1 Schema-Based Analysis

In schema-based analysis, the search space is conceptually partitioned into hyperplanes (also known as schemas) which represent sets of partial solutions. There are numerous ways to do this and, as a consequence, multiple schema definitions have been proposed [136, 137, 138, 139]. The fitness of a schema is estimated as the average fitness of all programs in the sample of its hyperplane, given a population. The processes of fitness-based selection and crossover are formalized in a recurrence equation which describes the expected number of programs sampling a schema from the current population to the next. Exact formulations have been derived for most types of crossover [140, 141]. These alternatively depend on making explicit the effects and the mechanisms of schema creation. This leads to insight; however, tracking schema equations in actual GlossaryTerm

GP

population dynamics is infeasible. Also, while schema theorems predict changes from one generation to the next, they cannot predict further into the future to predict the long-term dynamics that GlossaryTerm

GP

practitioners care about.

Fig. 43.11 a–c
figure 11figure 11

HVL-prime mutation: substitution and deletion (a) Original parse tree, (b) Result of substitution (c) Result of deletion

5.2.2 Markov Chain Analysis

Markov chain models are one means of describing such long-term GlossaryTerm

GP

dynamics. They take advantage of the Markovian property observed in a GlossaryTerm

GP

algorithm: the composition of one generation’s population relies only upon that of the previous generation. Markov chains describe the probabilistic movement of a particular population (state) to others using a probabilistic transition matrix. In evolutionary algorithms, the transition matrix must express the effects of any selection and variation operators. The transition matrix, when multiplied by itself k times, indicates which new populations can be reached in k generations. This, in principle, allows a calculation of the probability that a population with a solution can be reached. To date a Markov chain for a simplified GlossaryTerm

GP

crossover operator has been derived, see [142]. Another interesting Markov chain-based result has revealed that the distribution of functionality of non-Turing complete programs approaches a limit as length increases. Markov chain analysis has also been the means of describing what happens with GlossaryTerm

GP

semantics rather than syntax. The influence of subtree crossover is studied in a semantic building block analysis by [143]. Markov chains, unfortunately, combinatorially explode with even simple extensions of algorithm dynamics or, in GP’s case, its theoretically infinite search space. Thus, while they can support further analysis, ultimately this complexity is unwieldy to work with.

5.2.3 Runtime Complexity

Due to stochasticity, it is arguably impossible in most cases to make formal guarantees about the number of fitness evaluations needed for a GlossaryTerm

GP

algorithm to find an optimal solution. However, initial steps in the runtime complexity analysis of genetic programming have been made in [144]. The authors study the runtime of hill climbing GlossaryTerm

GP

algorithms which use a mutation operator called HVL-Prime (Figs. 43.11 and 43.12). Several of these simplified GlossaryTerm

GP

algorithms were analyzed on two separable model problems, Order and Majority introduced in [145]. Order and Majority each have an independent, additive fitness structure. They each admit multiple solutions based on their objective function, so they exhibit a key property of all real GlossaryTerm

GP

problems. They each capture a different relevant facet of typical GlossaryTerm

GP

problems. Order represents problems, such as classification problems, where the operators include conditional functions such as an IF-THEN-ELSE. These functions give rise to conditional execution paths which have implications for evolvability and the effectiveness of crossover. Majority is a GlossaryTerm

GP

equivalent of the GlossaryTerm

GA

OneMax problem [146]. It reflects a general (and thus weak) property required of GlossaryTerm

GP

solutions: a solution must have correct functionality (by evolving an aggregation of subsolutions) and no incorrect functionality. The analyses highlighted, in particular, the impact of accepting or rejecting neutral moves and the importance of a local mutation operator. A similar finding, [147], regarding mutation arose from the analysis of the Max problem [148] and hillclimbing. For a search process bounded by a maximally sized tree of n nodes, the time complexity of the simple GlossaryTerm

GP

mutation-based hillclimbing algorithms using HVL-Prime for the entire range of MAX variants are O ( n log⁡ 2 n ) when one mutation operation precedes each fitness evaluation. When multiple mutations are successively applied before each fitness evaluation, the time complexity is O ( n 2 ) . This complexity can be reduced to O ( n log⁡ n ) if the mutations are biased to replace a random leaf with distance d from the root with probability  2 - d .

Fig. 43.12 a,b
figure 12figure 12

HVL-prime mutation: insertion (a) Original parse tree, (b) Result of insertion

Runtime analyses have also considered parsimony pressure and multiobjective GlossaryTerm

GP

algorithms for generalizations of Order and Majority [149].

GlossaryTerm

GP

algorithms have also been studied in the GlossaryTerm

PAC

learning framework [150].

5.2.4 Problem Difficulty

Problem difficulty is the study of the differences between algorithms and problems which lead to differences in performance. Stated simply, the goal is to understand why some problems are easy and some are hard, and why some algorithms perform well on certain problems and others do not. Problem difficulty work in the field of GlossaryTerm

GP

has much in common with similar work in the broader field of GlossaryTerm

EC

. Problem difficulty is naturally related to the size of the search space; smaller spaces are easier to search, as are spaces in which the solution is over-represented [151]. Difficulty is also related to the fitness landscape [152], which in turn depends on both the problem and the algorithm and representation chosen to solve it. Landscapes with few local optima (visualized in the fitness landscape as peaks which are not as high as that of the global optimum) are easier to search. Locality, that is the property that small changes to a program lead to small changes in fitness, implies a smooth, easily searchable landscape [151, 153].

However, more precise statements concerning problem difficulty are usually desired. One important line of research was carried out by Vanneschi etal [154, 155, 156]. This involved calculating various measures of the correlation of the fitness landscape, that is the relationship between distance in the landscape and fitness difference. The measures include the fitness distance correlation and the negative slope coefficient. These measures require the definition of a distance measure on the search space, which in the case of standard GlossaryTerm

GP

means a distance between pairs of trees. Various tree distance measures have been proposed and used for this purpose [157, 158, 159, 160]. However, the reliable prediction of performance based purely on landscape analysis remains a distant goal in GlossaryTerm

GP

as it does in the broader field of GlossaryTerm

EC

.

5.2.5 No Free Lunch

In a nutshell, the No Free Lunch Theorem [161] proves that, averaged over all problem instances, no algorithm outperforms another. Follow-up GlossaryTerm

NFL

analysis [162, 163] yields a similar result for problems where the set of fitness functions are closed under permutation. One question is whether the GlossaryTerm

NFL

theorem applies to GlossaryTerm

GP

algorithms: for some problem class, is it worth developing a better GlossaryTerm

GP

algorithm, or will this effort offer no extra value when all instances of the problem are considered? Research has revealed two conditions under which the GlossaryTerm

NFL

breaks down for GlossaryTerm

GP

because the set of fitness functions is not closed under permutation. First, GlossaryTerm

GP

has a many-to-one syntax tree to program output mapping because many different programs have the same functionality while program output functionality is not uniformly distributed across syntax trees. Second, a geometric argument has shown [164], that many realistic situations exist where a set of GlossaryTerm

GP

problems is provably not closed under permutation. The implication of a contradiction to the No Free Lunch theorem is that it is worthwhile investing effort in improving a GlossaryTerm

GP

algorithm for a class of problems.

5.3 Modularity

Modularity in GlossaryTerm

GP

is the ability of a representation to evolve good building blocks and then encapsulate and reuse them. This can be expected to make complex programs far easier to find, since good building blocks needed in multiple places in the program not be laboriously re-evolved each time. One of the best-known approaches to modularity is automatically defined functions (GlossaryTerm

ADF

s), where the building blocks are implemented as functions which are defined in one part of the evolving program and then invoked from another part [58]. This work was followed by automatically defined macros which are more powerful than GlossaryTerm

ADF

s and allow control of program flow [165]; automatically defined iteration, recursion, and memory stores [10]; modularity in other representations [166]; and demonstrations of the power of reuse, [167].

5.4 Open-Ended Evolution and GP

Biological evolution is a long-running exploration of the enormously varied and indefinitely sized GlossaryTerm

DNA

search space. There is no hint that a limit on new areas of the space to be explored will ever be reached. In contrast, GlossaryTerm

EC

algorithms often operate in search spaces which are finite and highly simplified in comparison to biology. Although GlossaryTerm

GP

itself can be used for a wide variety of tasks (Sect. 43.1), each specific instance of the GlossaryTerm

GP

algorithm is capable of solving only a very narrow problem. In contrast, some researchers see biological evolution as pointing the way to a more ambitious vision of the possibilities for GlossaryTerm

GP

 [168]. In this vision, an evolutionary run would continue for an indefinite length of time, always exploring new areas of an indefinitely sized search space; always responding to changes in the environment; and always reshaping the search space itself. This vision is particularly well suited to GlossaryTerm

GP

, as opposed to GlossaryTerm

GA

s and similar algorithms, because GlossaryTerm

GP

already works in search spaces which are infinite in theory, if not in practice.

To make this type of GlossaryTerm

GP

possible, it is necessary to prevent convergence of the population on a narrow area of the search space. Diversity preservation [169], periodic injection of new random material [170], and island-structured population models [88] can help in this regard.

Open-ended evolution would also be facilitated by complexity and nonstationarity in the algorithm’s evolutionary ecosystem. If fitness criteria are dynamic or coevolutionary [171, 172, 173], there may be no natural end-point to evolution, and so continued exploration under different criteria can lead to unlimited new results.

6 Practicalities

6.1 Conferences and Journals

Several conferences provide venues for the publication of new GlossaryTerm

GP

research results. The ACM Genetic and Evolutionary Computation Conference (GlossaryTerm

GECCO

) alternates annually between North America and the rest of the world and includes a GlossaryTerm

GP

track. EuroGP is held annually in Europe as the main event of Evo*, and focuses only on GlossaryTerm

GP

. The IEEE Congress on Evolutionary Computation is a larger event with broad coverage of GlossaryTerm

EC

in general. Genetic Programming Theory and Practice is held annually in Ann Arbor, MI, USA and provides a focused forum for GlossaryTerm

GP

discussion. Parallel Problem Solving from Nature is one of the older, general GlossaryTerm

EC

conferences, held biennially in Europe. It alternates with the Evolution Artificielle conference. Finally, Foundations of Genetic Algorithms is a smaller, theory-focused conference.

The journal most specialized to the field is probably Genetic Programming and Evolvable Machines (published by Springer). The September 2010, 10-year anniversary issue included several review articles on GlossaryTerm

GP

. Evolutionary Computation (MIT Press) and the IEEE Transactions on Evolutionary Computation also publish important GlossaryTerm

GP

material. Other on-topic journals with a broader focus include Applied Soft Computing and Natural Computing.

6.2 Software

A great variety of GlossaryTerm

GP

software is available. We will mention only a few packages – further options can be found online.

One of the well-known Java systems is ECJ [174, 175]. It is a general-purpose system with support for many representations, problems, and methods, both within GlossaryTerm

GP

and in the wider field of GlossaryTerm

EC

. It has a helpful mailing list. Watchmaker [176] is another general-purpose system with excellent out-of-the-box examples. GEVA [177, 178] is another Java-based package, this time with support only for GlossaryTerm

GE

.

For users of C++ there are also several options. Some popular packages include Evolutionary Objects [179], μGP [180, 181, 182], and OpenBeagle [183, 184]. Matlab users may be interested in GPLab [185], which implements standard GlossaryTerm

GP

, while DEAP [186] provides implementations of several algorithms in Python. PushGP [187] is available in many languages.

Two more systems are worth mentioning for their deliberate focus on simplicity and understandability. TinyGP [188] and PonyGE [189] implement standard GlossaryTerm

GP

and GlossaryTerm

GE

respectively, each in a single, readable source file.

Moving on from open source, Michael Schmidt and Hod Lipson’s Eureqa [190] is a free-to-use tool with a focus on symbolic regression of numerical data and the built-in ability to use cloud resources.

Finally, the authors are aware of two commercially available GlossaryTerm

GP

tools, each fast and industrial-strength. They have more automation and it just works functionality, relative to most free and open-source tools. Free trials are available. DataModeler (Evolved Analytics LLC) [191] is a notebook in Mathematica. It employs the ParetoGP method [128] which gives the ability to trade program fitness off against complexity, and to form ensembles of programs. It also exploits complex population archiving and archive-based selection. It offers means of dealing with ill-conditioned data and extracting information on variable importance from evolved models. Discipulus (Register Machine Learning Technologies, Inc.) [192] evolves machine code based on the ideas of Nordin etal [193]. It runs on Windows only. The machine code representation allows very fast fitness evaluation and low memory usage, hence large populations. In addition to typical GlossaryTerm

GP

features, it can: use an GlossaryTerm

ES

to optimise numerical constants; automatically construct ensembles; preprocess data; extract variable importance after runs; automatically simplify results; and save them to high-level languages.

6.3 Resources and Further Reading

Another useful resource for GlossaryTerm

GP

research is the GlossaryTerm

GP

Bibliography [194]. In addition to its huge, regularly updated collection of BibTeX-formatted citations, it has lists of researchers’ homepages [195] and co-authorship graphs. The GlossaryTerm

GP

mailing list [196] is one well-known forum for discussion.

Many of the traditional GlossaryTerm

GP

benchmark problems have been criticized for being unrealistic in various ways. The lack of standardization of benchmark problems also allows the possibility of cherry-picking of benchmarks. Effort is underway to bring some standardization to the choice of GlossaryTerm

GP

benchmarks [197, 198].

Those wishing to read further have many good options. The Field Guide to GlossaryTerm

GP

is a good introduction, walking the reader through simple examples, scanning large amounts of the literature, and offering practical advice [199]. Luke’s Essentials of Metaheuristics [200] also has an introductory style, but is broader in scope. Both are free to download. Other broad and introductory books include those by Fogel [51] and Banzhaf etal [201]. More specialized books include those by Langdon and Poli [202] (coverage of theoretical topics), Langdon [11] (narrower coverage of GlossaryTerm

GP

with data structures), O’Neill and Ryan [77] (GlossaryTerm

GE

), Iba etal [203] (GlossaryTerm

GP

-style GlossaryTerm

ML

), and Sipper [204] (games). Advances in Genetic Programming, a series of four volumes, contains important foundational work from the 1990s.