NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks

Turner, Andrew James; Miller, Julian Francis

doi:10.1007/s12065-014-0115-5

NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks

Special Issue
Published: 08 November 2014

Volume 7, pages 135–154, (2014)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Evolutionary Intelligence Aims and scope Submit manuscript

NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks

Download PDF

Andrew James Turner¹ &
Julian Francis Miller¹

829 Accesses
26 Citations
Explore all metrics

Abstract

NeuroEvolution is the application of Evolutionary Algorithms to the training of Artificial Neural Networks. Currently the vast majority of NeuroEvolutionary methods create homogeneous networks of user defined transfer functions. This is despite NeuroEvolution being capable of creating heterogeneous networks where each neuron’s transfer function is not chosen by the user, but selected or optimised during evolution. This paper demonstrates how NeuroEvolution can be used to select or optimise each neuron’s transfer function and empirically shows that doing so significantly aids training. This result is important as the majority of NeuroEvolutionary methods are capable of creating heterogeneous networks using the methods described.

Designing neural networks through neuroevolution

Article 07 January 2019

Neuroevolution

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

NeuroEvolution (NE) is the application of Evolutionary Algorithms (EA) to the training of Artificial Neural Networks (ANN) [11, 48]. NE’s history began by evolving the connection weights of fixed topology ANNs [30, 46]. This method brought many advantages over the still popular gradient based methods; such as simple back propagation [31]. These advantages include: being able to natively escape local optima, being less sensitive to the initial connection weights, being suited to deep ANNs and not requiring that each neuron’s Transfer Function (TF) be differentiable [49]. NE is also suited to reinforcement learning as well as supervised learning; whereas back propagation is only suited to supervised learning. Other ANN training methods such as restricted Bolzmann machines are also suited to unsupervised learning [34].

A significant advantage of NE is its ability to evolve the topology of ANNs; as well as the connection weights. Topology evolving NE methods include: GNARL [1], NEAT [35], SAGA [7] and CGPANN [14, 39]. This ability to automatically create suitable topologies is significant as topology has been shown to strongly influence the effectiveness of back propagation [16] and weight only evolving NE [40]. Evolving the topology of ANNs has even been shown to be more important to training than evolving connection weights [40]. Although some non-evolutionary ANN training methods do adapt topology, they typically achieve this by iteratively adding or removing neurons during training. This approach is akin to a local search of topologies, and is consequently likely to become trapped in locally sub-optimal topologies [1]. It has been shown that using simple back propagation with hand crafted topologies produce results as good as NE [4]. This result demonstrates the benefit of topology optimising NE; the topology is also optimised and does not have to be hand crafted by trial and error. Finally gradient descent methods struggle to train deep ANNs [12, 16] whereas the depth of the network has no impact on NE algorithms. This coupled with the fact that deep neural networks are thought to be more efficient in terms of the number of neurons required to solve a task [3] is another benefit of topology optimising NE.

Interestingly, NE can also be used to optimise the TF of each neuron within heterogeneous ANNs^{Footnote 1}. However, this capability of NE has been widely overlooked in recent research. Indeed, at the turn of the twenty-first century many ANN publications stated that more research was required concerning the optimisation of TFs: “Relatively little has been done on the evolution of node transfer functions, let alone the simultaneous evolution of both topological structure and node transfer functions” [49, “The current emphasis in neural network research is on learning algorithms and architectures, neglecting the importance of transfer functions” [8] and “Selection and/or optimisation of transfer functions performed by artificial neurons have been so far little explored ways to improve performance of neural networks in complex problems” [9]. However, a search of the literature reveals that there has been little active research in this area. This paper intends to help fill this gap by showing how NE can easily optimise neuron TFs during evolution and that doing so produces strongly beneficial results.

The remainder of this paper is structured as follows. Section 2 discusses research literature relating to the application NE to the evolution of the TFs of ANNs. Section 3 describes the investigations which were undertaken using NE to evolve heterogeneous ANNs, leading to the results given in Sect. 4. Finally Sect. 5 discusses the overall findings with closing conclusions given in Sect. 6.

2 Background

There are many ANN TFs found in the literature [9]. However, the majority of NE implementations only evolve homogeneous ANNs of logistic or Gaussian functions, which have both been shown capable of universal approximation; [13] and [26] respectively. Of those which do evolve heterogeneous ANNs, there are two main methods.

The first method selects the TF of each neuron from a predetermined list of TFs. Training methods which use this method include General Neural Networks (GNN) [17]; which randomly adds or removes logistic or Gaussian TFs using an evolutionary programming method. GNN is also a hybrid approach which makes use of back propagation during training. Other NE methods which select specific TFs for each neuron include Parallel Distributed Genetic Programming (PDGP) [28], a modified Hierarchical Co-evolutionary Genetic Algorithm ($\hbox {HCGA}_2$) [45] and Cartesian Genetic Programming of Artificial Neural Networks (CGPANN) [14, 39]. These methods use genes to encode which TF is used by each neuron. These genes are then subject to mutation and/or crossover during evolution.

The second method by which NE can optimise neuron TFs is to use TFs which are described by a number of parameters [9]. The training methods then optimise these parameters for each individual neuron. A simple version of this technique has been used by CGPANN [19]; where the widths of Gaussian functions were optimised for each neuron. Again the parameter(s) associated with each neuron’s TF were encoded in the chromosome by the inclusion of additional gene(s). A more complex version of this method was used in [2] where each neuron’s TF was itself an evolved Genetic Program. This method allowed for an almost limitless variation of TFs. Another example where each neuron is described by a number of genes, is state-enhanced neural networks [25], where the dynamics of each neuron are evolved. These state-enhanced neural network exhibit memory which can be utilised on partially observable Markov decision tasks.

Until now however, there has been little research which empirically and rigorously investigates if the ability for NE to evolve heterogeneous ANNs actually provides any benefit. This is important research as if it is shown to be beneficial it could easily be adopted by other NE methods; as the described methods just require additional genes for each neuron. As discussed there are two ways in which NE can evolve TFs: (1) by choosing the TF of each neuron from a predetermined list or (2) by optimising parameters associated with each individual neuron. Additionally these two methods can be combined by allowing evolution to both select the TF for each neuron and optimises the parameters associated with the TF. Here both of these methods are investigated along with a combination of the two. The investigation uses two NE strategies and compares the results to evolving regular homogeneous ANNs.

3 Investigation

The investigation presented on evolving heterogeneous ANN using NE takes four parts. The first investigation is to identify if the choice of TF impacts on the effectiveness of NE for evolving homogeneous ANNs. The second investigates if evolving heterogeneous ANNs, by allowing evolution to select each neuron’s TF from a predetermined list, outperforms evolving homogeneous ANNs. The third investigates if using NE to optimise parameters associated with each neuron’s TF outperforms evolving homogeneous ANNs. The fourth investigates using NE to both select each neuron’s TF from a predetermined list and optimise parameters associated with that TF.

The remainder of this section introduces the NE methods employed by the investigation, the TFs made available and the benchmarks used.

3.1 NeuroEvolutionary Strategies

In order to undertake the described experiments, two NE methods were used; this is to ensure that any conclusions are not specific to a particular type of NE. The chosen NE methods are Conventional NeuroEvolution (CNE) and Cartesian Genetic Programming of Artificial Neural Networks (CGPANN). CNE is the simplest (and oldest) form of NE and evolves the connection weights of fixed topology ANNs. CGPANN is a more complex NE method which evolves both the connection weights and topology of ANNs. These two NE methods represent the two main types of NE; those which evolve only connection weights and those which evolve connection weights and topology^{Footnote 2}.

3.1.1 Conventional NeuroEvolution

Conventional NeuroEvolution [30] operates by storing the connection weights of a fixed topology ANNs as an array of floating point numbers. Each of these arrays represents a chromosome. Mutation is implemented by selecting a new random weight value for each gene (weight) with a given probability. CNE is extended here to be capable of evolving each neuron’s TF by the inclusion of an additional gene per neuron. These additional TF genes can either be used as an index in a look-up-table of TFs, or as a parameter value to be used by each neuron’s TF. As CNE uses fixed topologies, this topology must be selected in advanced by the user.

3.1.2 Cartesian Genetic Programming of Artificial Neural Networks

CGPANN [14, 39] is the application of Cartesian Genetic Programming (CGP) to the evolution of ANNs. CGP [22, 24] is a form of Genetic Programming (GP) which represents computational structures as directed graphs of nodes indexed by their Cartesian coordinates. CGP does not suffer from bloat [21, 41]. Bloat refers to the condition where during evolutionary time the size of evolved computational structures grow without limit. Many other forms of GP do suffer from bloat and there has been intensive research to address the issue [33]; interestingly other graph based encoding schemes have also been shown to suffer less from bloat that standard GP [32]. CGP chromosomes also contain non-functioning genes enabling neutral genetic drift during evolution [44, 50]. CGP typically evolves acyclic networks but can also be easily adapted to evolve cyclic or recurrent networks [42]. CGP typically uses point or probabilistic mutation and no crossover.^{Footnote 3}

Other forms of graph based GP have also been proposed including Parallel Algorithm Discovery and Orchestration (PADO) [36] and Parallel Distributed Genetic Programming (PDGP) [27]. PDGP has also been applied to NE [27]. PDGP is similar in structure to CGP, graph based, but uses different mutation and crossover operations. CGP also placed fewer restraints on the structure of the generated graphs [22]. Research relating to PDGP appears to have been largely abandoned.

Each CGP chromosome is comprised of a number of gene types: function genes ($F_i$), connection genes ($C_{i,j}$) and output genes ($O_i$). The function genes represent indexes in a function look-up-table and describe the functionality of each node. The connection genes define from where each node gathers its inputs. For regular acyclic CGP, connection genes may connect a given node to any previous node in the program, or any of the program inputs. The output genes address any program input or internal node and define which nodes are used as program outputs.

Originally CGP programs were organized with nodes arranged in rows (nodes per layer) and columns (layers); with each node indexed by its row and a column. However, this is an unnecessary constraint, as any configuration possible using a given number of rows and columns is also possible using one row with many columns; provided the total number of nodes remains constant. This is due to CGP being capable of evolving where each node connects its inputs. Consequently, here the chromosomes are defined with one row and $n$ columns; with each node only indexed by its column. A generic (one row) CGP chromosome is given in Eq. 1; where $\alpha$ is the arity of each node, $n$ is the number of nodes and $m$ is the number of program outputs.

An example CGP program is given in Fig. 1 along with its corresponding chromosome. As can be seen, all nodes are connected to previous nodes or program inputs. Not all program inputs have to be used, enabling evolution to decide which inputs are significant. An advantage of CGP over tree-based GP, again seen in Fig. 1, is that node outputs can be reused multiple times, rather than requiring the same value to be recalculated if it is needed again. Finally, not all nodes contribute to the final program output, these represent the inactive nodes which enable neutral genetic drift and make variable length phenotypes possible.

$$\begin{aligned} F_0 C_{0,0} \ldots C_{0,\alpha } F_1 C_{1,0} \ldots C_{1,\alpha } \ldots \ldots F_{n}C_{n,0} \ldots C_{n,\alpha } O_0 \ldots O_m \end{aligned}$$

(1)

Cartesian Genetic Programming is easily applied to ANNs [14, 39] by the inclusion of connection weight genes ($W_{i,j}$) for each node input and by using TFs suited to ANNs. CGPANN exhibits all of the benefits of CGP and is a NE training method which can evolve the weights, topology [40] and TFs of ANNs. Although CGP evolves topology, it is required that the user specifies a maximum network size. This could be considered a drawback, but overestimating the required number of nodes has been shown to be highly beneficial for CGP [23]. Similarly, a maximum neuron arity must be specified, however, the arity of each neuron can be lower than this maximum [39]. This occurs when the chromosome describes two neurons being connected by two or more connections. In this case, multiple connections between two neurons are equivalent to one connection; with the connection weight value being the sum of the individual weights.

It is important to note that the types of ANN created using CGPANN are unconventional and often cannot be described in terms of layers and nodes per layer. Figure 2 gives an example of the type of ANN which can be created using CGPANN. It can be seen that each neuron’s input is highly unconstrained; they can connect to any previous neuron in the network including input neurons. It can also be seen that the arity of each neuron can vary. Additionally any neuron can be used as an output; including the input neurons. Figure 2 demonstrates that by allowing NE to optimise topology, evolution is capable of discovering topologies which would be unlikely to be considered by a human designer.

3.2 Transfer functions

The TFs used in this investigation are the Heaviside step function, Eq. 2, the Gaussian function, Eq. 3, and the logistic sigmoid function^{Footnote 4}, Eq. 4. Each of these TFs is shown graphically in Fig. 3. These particular TFs were selected as they are the most commonly used by ANNs.

As can be seen in Eqs. 3 and 4, the Gaussian and logistic functions have been given in a form which contains a $\sigma$ variable. Where $\sigma = 1$ gives the typical form of these TFs. When using NE to evolve parameters associated with each neuron’s TF, the $\sigma$ value can be evolved or optimised. Figures 4 and 5 show the Gaussian and logistic function respectively for a range of $\sigma$ values.

$$\begin{aligned}&f(x)= {\left\{ \begin{array}{ll} 1,&{} \text {if }\,x\ge 0\\ 0,&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(2)

$$\begin{aligned}&f(x)= \exp \left( -\frac{x^2}{2 \sigma ^2} \right) \end{aligned}$$

(3)

$$\begin{aligned}&f(x)= \frac{1}{1 + \exp (-\sigma x)} \end{aligned}$$

(4)

3.3 Benchmarks

In order to draw strong conclusions regarding whether it is beneficial to evolve TFs, it is necessary to examine its effectiveness on a wide range of benchmarks. In this paper five benchmarks are employed. The chosen benchmarks mainly include supervised learning classification tasks, a common application of ANNs, but also include a reinforcement learning control task.

Despite many of the described benchmarks being classification tasks, they each use their own type of fitness function. Although this adds complexity, the fitness functions used here are those typically used with these benchmarks. This is done to make standard the use of these benchmarks; which is important when comparing machine learning methods.

3.3.1 Ball Throwing

The ball throwing benchmark [15] is a reinforcement learning control task. The task is to design a controller for a driven arm so as to throw a ball a distance of $\ge$9.5 m. A depiction of the task is given in Fig. 6, with the equations describing the dynamics of the arm given in Eqs. 5 and 6; symbol definitions given in Table 1. The model is simulated using Euler integration with a time step of 0.01 s for 3,000 time steps. The control system has two inputs $\theta$ and $\omega$ and outputs two values $T$ and whether or not to release the ball. The inputs to the controller are linearly scaled from $\pm \pi /2$ and $\pm 5$ rad/s to a [0,1] range for $\theta$ and $\omega$ respectively. The first output of the controller sets the torque applied to the arm and is linearly mapped to a $[-5,5]$ N range. The ball is released if the second output exceeds a threshold of 0.5. Once the ball is released, Newtonian mechanics are used to calculate the distance the ball is thrown ($d$) which is then used as the fitness value.

$$\begin{aligned}&\left( \dot{\theta },\dot{\omega }\right) = \left( \omega , -c\cdot w + \frac{g\cdot \sin (\theta )}{l} + \frac{T}{m\cdot l^2} \right) \end{aligned}$$

(5)

$$\begin{aligned}&\omega = 0 \quad \text{ if } \vert \theta \vert \ge \pi /2 \end{aligned}$$

(6)

Table 1 Ball throwing symbol definitions with commonly used values

NeuroEvolution: Evolving Heterogeneous Artificial Neural Networks

Abstract

Similar content being viewed by others

Designing neural networks through neuroevolution

Neuroevolution

Neuroevolution

Explore related subjects

1 Introduction

2 Background

3 Investigation

3.1 NeuroEvolutionary Strategies

3.1.1 Conventional NeuroEvolution

3.1.2 Cartesian Genetic Programming of Artificial Neural Networks

3.2 Transfer functions

3.3 Benchmarks

3.3.1 Ball Throwing

3.3.2 Full Adder

3.3.3 Monks Problem 1

3.3.4 Two Spirals

3.3.5 Proben1: Cancer1

4 Results

4.1 Experiment 1: Homogeneous Networks

4.2 Experiment 2: Heterogeneous Networks

4.3 Experiment 3: Evolving Transfer Function Parameters

4.4 Experiment 4: Evolving Heterogeneous Networks and Transfer Function Parameters

4.5 Box and whisker plots

5 Discussion

6 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation