Keywords

1 Introduction

Reinforcement Learning (RL) is an important subdiscipline of Artificial Intelligence. It represents a very active field in research [1] as well as in practical applications. Artificial neural networks, or short neural networks, are often applied in the context of Reinforcement Learning. They act as controllers for the agents or robots. Here, Neuroevolution plays an important role since it allows to learn the structure and the weights of neural networks, see e.g. [2]. In many cases, however, the intended behaviour does not only depend on a single criterion, e.g. to balance two poles on a cart, but on multiple criteria, e.g. to balance two poles on a cart and to use as little energy as possible. Most real world problems are based on multiple (possibly) conflicting objectives, which makes it necessary to develop neural networks that consider all objectives, to make an algorithm applicable for practical issues. Today’s quasi-standard in Neuroevolution, Stanley’s NeuroEvolution of Augmenting topologies (NEAT) is able to evolve neural networks that are adapted to only a single criterion of behaviour [2], which makes it not applicable for multi-objective tasks without further knowledge and abstraction of the problem.

Research regarding this topic is sparse: Only a few papers could be identified. Schrum and Miikkulainen [3] were the first to consider a combination of neural networks and evolutionary multi-objective algorithms: They used NEAT and NSGA-IIFootnote 1 for multi-objective Neuroevolution. Their algorithm uses a not fully featured NEAT that operates with a modified version of NSGA-II [3]. In addition Schrum and Miikkulainen [5] introduced the Modular Multiobjective NEAT (MM-NEAT) which evolves modular neural networks (each module describes a behaviour) using NSGA-II and NEAT. The MM-NEAT uses all features of NEAT’s framework in combination with the procedure of NSGA-II [5]. Van Willigen et al. [6] introduced the NEAT-PS, a combination of NEAT and SPEA2. The Pareto Strength approach used in NEAT-PS computes a single fitness value for all fitness functions and each individual. This makes it easy to apply in the environment of the original NEAT [6].

We were unable to find approaches that consider the more recent evolutionary multi-objective algorithms, which make use of quality indicators as the Hypervolume which allows a good approximation of the Pareto front combined with a sufficient spread of the solutions even for many objectives.

In the first part of this paper, Sect. 2, we provide the foundations of multi-objective optimization and evolutionary algorithms, additionally we give a brief overview of the NEAT algorithm. In the second part, we introduce a novel extension of NEAT, called mNEAT, which is potentially able to evolve neural networks for high-dimensional multi-objective tasks (Sect. 3). Furthermore, we present and investigate novel combinations of NEAT and the SMS-EMOA/R2-EMOA for the first time (Sect. 4). In Sect. 5 we define a multi-objective version of the Double Pole Balancing problem and provide an experimental analysis of the new algorithms. Finally, we give a summary and an outlook on future work in Sect. 6.

2 Foundations

This paper considers multi-objective optimization problems (MOPs) consisting of K objectives \(f_k:\mathbb {R}^N \rightarrow \mathbb {R}\) which have to be optimized although they may be possibly conflicting [7, p. 8]. The term optimization may stand for the maximization or minimization of an objective function. In this paper, we focus on minimization problems. MOPs are hard to solve, because in many cases the K different objectives are conflicting. Here, evolutionary algorithms (EAs) are often applied. They were developed to address tasks for which, for example, no efficient algorithm is known or can be developed due to time constraints. Due to the evolutionary approach, which mimics natural evolution, EAs have a broad applicability with a relatively easy problem-specific adaptation. The field of evolutionary algorithms is also referred to as Evolutionary Computing [8, p. 14].

2.1 Quality Indicators

To determine the quality of a solution in a MOP, multiple dimensions need to be considered. Therefore, we introduce the Pareto Dominance:

Definition 1

(Pareto Dominance). [7, p. 11] A decision vector \(u = (u_1, \ldots , u_n)\) dominates another decision vector \(v = (v_1, \ldots , v_n)\), read as \(u \prec v\), iff \(\forall k \in \left\{ 1, \ldots , K\right\} : f_k(u) \le f_k(v) \wedge \exists k \in \left\{ 1, \ldots , K\right\} : f_k(u) < f_k(v)\).

The goal in MOPs is to identify the set of non-dominated solutions. The concept of Pareto Dominance represents a partial order. Therefore, decision vectors may exist that are not comparable. To make the solutions of a set A comparable in this case, quality indicators are necessary. They are defined as follows:

Definition 2

(Quality Indicator). [7, p. 251] Let A be a vector of H sets \(A_1, \ldots , A_H\). An H-ary quality indicator is a function \(\mathcal {I}: \varOmega ^H \rightarrow \mathbb {R}\), which assigns the vector \(A = \left( A_1, \ldots , A_H\right) \) a real value \(\mathcal {I}\left( A_1, \ldots , A_H\right) \).

We state a set A has a higher quality than another set B, if A at least weakly dominates B.Footnote 2 If a quality indicator \(\mathcal {I}\) assigns a value \(\mathcal {I}(A) \ge \mathcal {I}(B)\) under the condition that \(A \preceq B\) (and vice versa), \(\mathcal {I}\) is called Pareto Compliant [7, p. 253].

Fig. 1.
figure 1

Example for the quality indicators Nondominated Ranking (left), R2-Indicator (center) and Hypervolume (right). Consider a two-dimensional MOP and a set A consisting of ten individuals \(i_1\) to \(i_{10}\). The R2-Indicator and the Hypervolume do only focus on \(i_1\) and \(i_2\) in this example. The reference point is denoted by \(z^*\).

In the following, we provide a brief overview of three important quality indicators in the context of this paper. Figure 1 gives an example for a set \(A = \left\{ i_1, \ldots , i_{10}\right\} \) for each of the following quality indicators:

 

Nondominated Ranking. :

The Nondominated Ranking [4] orders the solutions of a set A using the Pareto Dominance. Therefore all solutions are assigned into ranks \(R_1\) to \(R_H\). Let \(1 \le i < j \le H\) then \(\forall b \in R_j\,\exists a \in R_i:\, a \prec b\). Solutions of equal rank are incomparable with respect to Pareto Dominance. In Fig. 1 the individuals \(i_1\) and \(i_2\), which dominate all remaining individuals, are assigned to \(R_1\). The individuals \(i_3\) to \(i_6\), which are dominated by \(i_1\) and \(i_2\) each, but dominate the individuals \(i_7\) to \(i_{10}\), are assigned to \(R_2\) and so on.

R2-Indicator. :

The R2-Indicator is based on the standard weighted Tchebycheff utility function, where z is a solution, \(z^*\) an (ideal) reference point and \(\lambda = \left\{ \lambda _1, \ldots , \lambda _K\right\} \in \varLambda \) is a weight vector [9]:

$$\begin{aligned} u\left( z\right) = -\max _{i \in \left\{ 1, \ldots ,K\right\} }\left\{ \lambda _i \left| z^{*}_i - z_i\right| \right\} \end{aligned}$$
(1)

\(\varLambda \) is a set of (usually) uniformly distributed weight vectors over the weight space [10]. The R2-Indicator returns the averaged sum of the minimum distances for all \(a \in A\) in any dimension for each weight vector \(\lambda \in \varLambda \):

$$\begin{aligned} R2(A) = R2(A, \varLambda , z^*) = \dfrac{1}{\left| \varLambda \right| } \sum \limits _{\lambda \in \varLambda } \min \limits _{a \in A}{\left\{ \max \limits _{i \in \left\{ 1, \ldots , k\right\} }{\left\{ \lambda _i \left| z^*_i - a_i\right| \right\} }\right\} } \end{aligned}$$
(2)

A lower R2 value means that the set’s solutions are located closer to the reference point. The contribution of a single solution \(a \in A\) to the R2 value is computed by \(R2(A \setminus \left\{ a\right\} ) - R2(A)\) [10]. In Fig. 1 there are three weight vectors \(w_1 = \left( \frac{3}{8}, \frac{5}{8}\right) \), \(w_2 = \left( \frac{1}{2}, \frac{1}{2}\right) \) and \(w_3 = \left( \frac{5}{8}, \frac{3}{8}\right) \). Each assigns a certain importance to each dimension. For each weight vector, the individual’s contribution is increased, which is nearest to the reference point \(z^*\) with respect to the vector’s weights. The following individuals are selected: \(i_1\) for \(w_2\) and \(i_2\) for \(w_1\) and \(w_3\). In the above case, considering only \(i_1\) and \(i_2\) their total R2 contribution is 0.062 respectively 0.007. Thus \(i_1\) is evaluated as better than \(i_2\) concerning the R2-indicator.

Hypervolume. :

The Hypervolume (HV) defines the volume of the objective space that is covered by the solutions of a set A and a reference point \(z^*\). The Hypervolume is defined as follows:

$$\begin{aligned} HV(A) = HV(A, z^*) = \left\{ \bigcup _{a \in A} vol\left( a, z^*\right) \right\} \end{aligned}$$
(3)

where \(vol(a, z^*)\) denotes the volume of the space bounded between a and \(z^*\) [7, p. 260]. The larger the value of HV(A) the more space is covered by A, hence the solutions in A are closer to the ideal point (in case that \(z^*\) is the nadir point, e.g. \(z^*=\left( 1, \ldots , 1\right) \) for normalized minimization problems) [11]. The contribution of a single solution \(a \in A\) to the HV value (the space that is only covered by a) is computed by \(HV(A) - HV(A \setminus \left\{ a\right\} )\) [12]. In Fig. 1 the area, dominated exclusively by \(i_1\) or \(i_2\) is shaded in light gray. The dark gray area is dominated by both individuals. In this example it is obvious that the area exclusively dominated by \(i_1\) is larger than \(i_2\)’s area, therefore \(i_1\) is to prefer over \(i_2\) with respect to the dominated Hypervolume.

 

2.2 Evolutionary Multi-objective Algorithms

To address MOPs, evolutionary multi-objective algorithms (EMOAs) can be used. The class of EMOAs is also referred to as multi-objective evolutionary algorithms, short MOEAs. Well-known examples include the NSGA-II [4] and the SMS-EMOA [13]. Additionally, Trautmann et al. [14] introduced the R2-EMOA. We have selected the SMS- and the R2-EMOA for further investigations, because both have been shown to have a high performance while they do only require the user to set the population size \(\mu \). The SMS-EMOA is able to deal with an arbitrary large number of fitness functions K and uses two criteria for sorting a population’s individuals: first all individuals are sorted into ranks using fast nondominated sorting of the NSGA-II [4]. If the (worst) rank \(R_H\) contains more than one individual, the Hypervolume contribution of each individual in \(R_H\) is computed to make these individuals comparable (survivor selection). Note that there exist different variants of SMS-EMOA using the Dominance count as primary selection criterion [13]. The R2-EMOA works similar to SMS-EMOA, the only difference is that the R2-EMOA relies on the R2-Indicator as the secondary criterion instead of the Hypervolume [14].

As already stated, both, the R2-EMOA and the SMS-EMOA will be considered to derive multi-objective variants of the NEAT approach. But first, the next section provides the details of the original method.

2.3 NeuroEvolution of Augmenting Topologies

The NeuroEvolution of Augmenting Topologies algorithm (NEAT) was introduced in [2]. It addresses the evolution of neural networks for performing single objective tasks. Therefore it depends on a node based encoding, where neural networks are described by a list of links, which each contains information about the connected nodes (or neurons) [2, p. 34f.]. NEAT relies on three basic principles:

Historical Markings. For the crossover of two networks, their structure has to be considered. NEAT addresses this by tracking each structural mutation (new link or neuron) with an Innovation ID. If a mutation occurs twice or more, it is always assigned the same Innovation ID. Thereby two networks can be combined without any structural analysis, the equal parts of two networks can be determined by their Innovation IDs, likewise their unequal parts [2, pp. 36–38].

Speciation. To protect innovation, NEAT sorts the networks of a population into niches of similar networks, called species. Thus networks do only have to compete against the other members of their species, and may evolve a competitive structure. Additionally it keeps the genomes as small as possible: As long as the fitness of smaller networks is comparable to the fitness of other, larger networks, the smaller networks remain in the population and thus are not unnecessarily replaced by larger networks [2, pp. 38–40].

Complexification. Every network in NEAT starts with the same minimal topology: each input neuron is directly connected to every output neuron (with random link weights), there are no hidden neurons, only an optional bias neuron. By structural mutation the networks grow incrementally and only well performing ones will survive. Thereby NEAT needs to search a minimal number of weight dimensions and is able to find well performing networks very quickly, additionally this prevents the networks’ structures from growing unnecessarily large [2, p. 40f.].

NEAT is a powerful algorithm which is able to outperform other algorithms for evolving neural networks like Cellular Encoding [15], Symbiotic Adaptive Neuroevolution [16] and Enforced Subpopulations [17] in several experiments [2, pp. 44–49]. As it has been shown, NEAT is capable to evolve neural networks for single objective tasks, but how can Neuroevolution be used to address multi-objective problems?

3 Multi-Objective NEAT: A First Approach

In the real world, there exist many tasks that are suitable to be controlled by neural networks, but depend on more than one criterion. The NEAT approach allows to evolve neural networks considering a single objective, but for problems with more than one, NEAT has to fall back to using a weighted fitness function (scalarisation). The drawback of weighted fitness functions is that the user has to determine the weights a-priori and thus needs to know the importance of each goal before optimization [8, p. 196]. To avoid this and other disadvantages of scalarisation, we introduce a novel multi-objective version of NEAT, called Multi-Objective NEAT (mNEAT) as the foundation of our research. The mNEAT is an indicator-based algorithm and utilizes the R2-Indicator for quality assessment. Its main procedures are described in the following.

3.1 Procedure of mNEAT

The novel approach starts with an initially random population \(\mathcal {P}_t\) (for time \(t = 0\)) consisting of \(\mu \) minimal networks. The mNEAT is based on speciation. Here, the user defines the target number of species and mNEAT automatically adjusts the speciation threshold (the maximum difference of two networks of same species) according to the networks’ difference. This step is executed at the beginning of every epoch. Note that mNEAT provides a general selection and sorting procedure, that is, the outer loop of an EMOA. It assumes that the neural networks in the population (\(p \in \mathcal {P}_t\)) have been evaluated and assigned a fitness \(f(p) = \left( f_1(p), \ldots , f_K(p)\right) \) before each epoch.

The networks in \(\mathcal {P}_t\) are sorted depending on their R2 contribution with respect to the population \(\mathcal {P}_t\). All networks are assigned to the corresponding species s, which results in a set of species S. Already existing network-to-species mappings are removed in advance. The first network assigned to a species is chosen as its representative. After speciation, the fitness \(f(s) = \left( f_1(s), \ldots , f_K(s)\right) \) of each species \(s \in S\) is computed (determined by the species’ members’ fitness): The approach uses fitness sharing to keep species as small as possible and to maximize the diversity otherwise. Therefore, the species’ fitness is computed by \(\forall k \in \left\{ 1, \ldots , K\right\} : f_k(s) = \sum _{p \in s}{f_k(p)}\). When used for minimization problems, large species are assigned a comparatively large fitness with respect to smaller species. A large species has only then the chance to compete with smaller species if it contains much better networks than the smaller species. In a maximization problem, the fitness could be exponentiated with \(-1\) for example in order to penalize larger species. The species’ fitness is normalized between 0 and 1 for each fitness function, which makes the fitness functions comparable to each other. Finally the R2 contribution of each species \(s \in S\) with respect to S is computed: The number of solutions, offspring(s), the species s is allowed to create, is proportional to its R2 contribution:

$$\begin{aligned} \text {offspring}(s) = \frac{R2(s, S, \varLambda , z^*)}{\sum _{t \in S}{R2(t, S, \varLambda , z^*)}} \left( \mu - \left| S\right| \right) .\end{aligned}$$
(4)

Every epoch \(\mu - \left| S\right| \) offspring networks are created by crossover and mutation. Each species internally performs parent selection by using Stochastic Universal SamplingFootnote 3. The mNEAT approach employs the variation operators that were defined in NEAT [2]. All newly created networks enter the next generation’s population \(\mathcal {P}_{t + 1}\). Additionally, the representative (i.e., best network) of each species \(s \in S\) will be part of \(\mathcal {P}_{t + 1}\). All other networks are discarded.

3.2 Variations of mNEAT

First experiments with mNEAT have shown that it is able to find good solutions fast but that the final Pareto front is typically only sparsely populated by good solutions because many of these are discarded during evolution. To improve the performance of mNEAT, we introduced and investigated several variations: (1) an archive of best individuals [7, p. 14], (2) using a steady state population model [8, p. 80] and (3) reducing the replacement rate in each epoch (not discarding all networks, except the species’ leaders, but preserving a part of the other (mutated) networks for the next generation).

Our initial idea of the cooperation of these variations was as follows: The archive (Variation 1) records the progress of the algorithm and keeps a list of the best networks that have been found during the optimization. This allows the mNEAT to explore arbitrary new networks without losing the progress already made. Using a steady state population model (Variation 2) results in always taking over the best networks of \(\mathcal {P}_t\) and the offspring into \(\mathcal {P}_{t + 1}\). Without this variation mNEAT would only keep the species’ representatives and discard any other networks. By Variation 2, the mNEAT does not discard promising networks any more, only because their fitness is not at its maximum at timestep t due to lack of evolution. Finally, the reduction of the replacement rate (Variation 3) allows younger networks to evolve and maximize their fitness. By doing so, less networks are newly generated but more existing networks are mutated. Thus exploitation plays a bigger role instead of a pronounced utilization of exploration [8, p. 41f.]. Preliminary experiments (Multi-objective Double Pole Balancing experiment with all eight possible combinations of these variations) have shown that the Variations 2 and 3 do only have a small influence on the algorithm’s performance. Using mNEAT only with Variation 1 (archive) provides the best results with respect to the Pareto front. This behaviour is caused by the archive, which always saves the best known solutions and returns these as the finally known Pareto Front. Due to space restrictions we do not provide details concerning the experiment and its results in this paper. In future work further variations of the mNEAT will be investigated to create an even better performing algorithm.

4 NEAT as the Foundation of Evolutionary Multi-objective Algorithms

We already provided a brief description of the two efficient evolutionary multi-objective algorithms SMS-EMOA and R2-EMOA with indicator-based selection. Since these algorithms are capable of optimizing solutions in multiple dimensions simultaneously, they should be suitable for adapting the behaviour of neural networks. As both algorithms are designed as general frameworks and not specifically to work with neural networks, we introduce the following extensions, similar to Schrum and Miikkulainen [3, 5] and van Willigen et al. [6]:

For giving SMS- and R2-EMOA the ability to evolve neural networks, we base them on the framework that NEAT provides: We use the same genetic encoding of neural networks, the Innovation IDs and variation (crossover and mutation) operators of NEAT and implement those for an SMS- and R2-EMOA. Following this approach, we combine the power and efficiency of SMS-EMOA and R2-EMOA for multi-objective optimization and NEAT for evolving neural networks. This results in an algorithm Multi-Objective NEAT-Indicator Based, short mNEAT-IB that is capable of evolving neural networks for multi-objective tasks quite efficiently using different components for selection. Beside the selection mechanisms used in R2-EMOA and SMS-EMOA, we apply and investigate two indicator (only) based as selection components. Table 1 gives an overview over these variations and additionally provides information about their computationally complexities.

Table 1. Variations of the mNEAT-IB using different combinations of sorting criteria. K denotes the number of objectives and \(\mu \) the population size. \(\varLambda \) defines the set of weight vectors used for the R2-indicator. The computational complexities are based on the following values: NDR: \(O\left( K \mu ^2\right) \) [4], HV (contribution of individual): \(O\left( \mu \log \mu + \mu ^{\frac{K}{2} + 1}\right) \) [11], R2 (contribution of individual): \(O\left( K \mu \left| \varLambda \right| \right) \) [18].

In the following, we provide the details of the novel neuroevolutionary approach: The mNEAT-IB creates an initial population of \(\mu \) minimal networks (with randomized weights). The population is sorted by the first and, optionally second sorting criterion. Then \(\lambda \) (instead of strictly one like in SMS-EMOA [13]) new networks are created by using NEAT’s variation operators which then are added to \(\mathcal {P}_t\). The parameter \(\lambda \) needs to be defined by the user. A large number of offspring \(\lambda \) may lead to a faster convergence of the population towards the Pareto front. However, this may cause the algorithm to miss potentially good networks by replacing them too quickly. For giving younger networks a chance to evolve, mNEAT-IB provides a small fitness-boost to younger networks, while it slightly penalizes the fitness of elder ones. The \(2 \lambda \) parents for crossover are selected using Stochastic Universal Sampling from the sorted population \(\mathcal {P}_t\). The resulting intermediate population has a size of \(\mu + \lambda \) and has to be reduced by \(\lambda \) networks. Because the \(\lambda \) new networks require fitness values before the reduction, the epoch ends here and the new networks are evaluated. At the beginning of the next epoch the population \(\mathcal {P}_t\) is sorted as described above. The worst \(\lambda \) networks are removed from \(\mathcal {P}_t\) which decreases its size to \(\mu \). The remainder are already sorted and the mNEAT-IB continues with the selection of \(2 \lambda \) parents for the next generation. The algorithm terminates if the stopping criterion is met.

Due to the exponential growth of the Hypervolume-computation’s runtime, the R2-indicator is to be preferred for a larger number of objectives (\(K \ge 4\)). Its runtime heavily depends on the number of weight vectors applied. Here, Brockhoff et al. [10] discuss the optimal number of weight vectors for \(K = 2\). Further investigations are necessary to give a suggestion for the case \(K > 2\). We use a default of 100 weight vectors for our experiments described in Sect. 5.

5 Experimental Analysis

To compare the performance of our novel algorithms and variations, we conducted a series of experiments using a variant of the Double Pole Balancing problem. This section provides the experimental set-up and discusses the experimental findings. First we introduce a multi-objective version of the Double Pole Balancing problem before we describe the design of the experiments. The last part of the section summarizes and discusses the results.

5.1 A Multi-objective Double Pole Balancing Problem

The Double Pole Balancing problem describes the following task: Given a cart c with mass \(c_m\) (= 1 kg) on which two poles \(p_1\) and \(p_2\) are mounted using a hinge. Each pole p has a length \(p_l\) (= 1 m/0.1 m) and a mass \(p_m\) (= 0.1 kg / 0.01 kg). At time \(t = 0\) both poles are poised, inclined in angles \(p_{1_\alpha }\) (= \(0^{\circ }\)) and \(p_{2_\alpha }\) (=  \(1^{\circ }\)). Left by themselves, the two poles would fall to the left or right side over the time t. If one or both poles’ inclination exceeds a given maximum angle \(\gamma \) (= \(36^{\circ }\)), the experiment has failed and the time t is stopped. The cart is positioned at the centre of a track with length L (= 4.8 m). The experiment’s target is to keep the two poles balanced by moving the cart to left or right without leaving the track for a given amount of time T (= 10.000 units). The cart is controlled by a neural network which has to be evolved using our proposed algorithms. The networks consider the following input parameters: The position \(c_{pos}\) and velocity \(c_v\) of the cart and the angles and rates of fall of the poles \(p_1\) and \(p_2\). The networks’ output is a value which determines the force that affects the cart (direction and strength) in the next timestep. The Double Pole Balancing problem was used by Stanley [2] for comparing the performance of the single-criterion NEAT to other neuroevolutionary algorithms and ablations of the original NEAT. It is an enhanced version of the Pole Balancing problem [16] which became too simple to solve for modern algorithms to be a measure for comparison [2]. The Double Pole Balancing problem is typically used to evaluate the performance of algorithms for Reinforcement Learning.

We transform this problem into a multi-objective optimization problem using the following fitness functions: (1) The first fitness function \(f_1\) describes how long the controller x is capable of keeping the poles \(p_1\) and \(p_2\) balanced. (2) Additionally the number of directional changes (per time unit) r of the cart c is counted. The reason is that for each change of direction the cart has to be slowed down to \(c_v = 0\) and then be accelerated in the other direction – this consumes much more energy than the movement with constant speed [19]. This results in the number of directional changes being proportional to the energy consumption of the cart. (3) Finally the cart’s position \(c_{pos}\) relative to the centre of the track L / 2 is considered. Therefore the average distance per timestep \(f_3\) is computed – a larger value of \(f_3\) means a greater distance that is covered by the cart and that results in a higher energy consumption. Additionally this leads to a lower distance to one end of the track, which increases the probability of the cart to leave the track. We introduced \(f_3\) due to the fact that a controller can achieve a good fitness value for \(f_2\) with the following behaviour: If the controller moves the cart from the very left to the very right end of the track continuously, the cart executes very few directional changes but always travels a distance of nearly L between each. This would increase the energy consumption and makes \(f_3\) necessary.

The three fitness functions read as follows, where the parameter \(c_{pos_t}\) describes the position of the cart on the track at time t:

$$\begin{aligned} \begin{array}{lcl} f_1(x)=T-t &{} &{} \forall x \in \varOmega : 0 \le f_1(x) \le T \\ f_2(x)=\frac{r}{t} &{} &{} \forall x \in \varOmega : 0 \le f_2(x) \le 1 \\ f_3(x) = \frac{1}{t} {\sum \nolimits _{1}^{t}}{\left| \frac{L}{2} - c_{pos_t}\right| } &{} &{} \forall x \in \varOmega : 0 \le f_3(x) \le \frac{L}{2} \end{array} \end{aligned}$$
(5)

5.2 Experiments and Statistical Analysis

All algorithms under consideration have control parameters that strongly influence the performance. Therefore, we first conducted preliminary experiments with a meta-EA in order to identify suitable parameter settings. The best were investigated more closely in the context of the analysis described in this paper. Due to space restrictions we do not show the configurations in this paper, but will offer these in a technical report. We conduct two different types of experiments which are repeated 120 times each:

 

Average Number of Evaluations. :

In this experiment we test an algorithm’s ability to find a solution of predefined minimum fitness within a given amount of maximum evaluated networks. For each repetition, we record the number of evaluations needed to find an individual of desired fitness. At the end of the experiment, we compare the average number of evaluations and the success rate of each algorithm. Finally, we investigate the algorithms’ results for statistically significant differences.

Mean Fitness. :

The second experiment tests an algorithm’s ability to find a “good” Pareto front. Therefore each algorithm will be executed for a predefined number of evaluated networks with the final Pareto front (last generation or archive) being stored. All solutions of all Fronts are combined (one Pareto front per repetition) and used to compute the following quality indicators: \(\epsilon \), Spacing, generational Distance, Inverted generational Distance, Inverted generational Distance Plus and Hypervolume. This gives us the ability to evaluate the quality of the Pareto fronts that have been found by the algorithms. We show the averaged results of each algorithm for each performance measure and finally analyse the results for statistically significant differences.

 

For the statistical comparison of the algorithms (with the predefined configuration) we use the procedure described by Calvo and Santafé [20] using the R-library scmamp Footnote 4. First we determine if there is a statistically significant difference between any two algorithms (Friedman-Test [21]). If a difference has been ascertained we determine the pairwise difference between the algorithms (Friedman Aligned Ranks test [22]). The determined p-values are adjusted (Shaffer’s algorithm [23]) and then evaluated.

Average Number of Evaluations. Table 2 shows the results of the Double Pole Balancing experiment with velocities. We investigate the average number of evaluated networks until an algorithm found a network \(x^*\) for which \(f_1(x^*) = 0\), \(f_2(x^*) \le \frac{1}{20}\) and \(f_3(x^*) \le \frac{1}{5}\). This calls for a neural network that balances the poles of the cart for at least T timesteps and performs a change of direction at most every \(20^{\text {th}}\) timestep. Additionally it is not allowed to stay away from the centre of the track more than 20 cm on average. We were looking for a safe and energy-efficient controller for the cart.

Table 2. Number of evaluated networks until the algorithm found a network of the desired fitness in the Double Pole Balancing experiment with velocities (Average Number of Evaluations) averaged over 120 repetitions. A repetition was not successful, if no network \(x^*\) has been found within 25,000 evaluated networks.

The results in Table 2 show that the mNEAT was able to find a suitable network \(x^*\) in 85% of all repetitions. Note that the mNEAT (without variation) and the mNEAT with archive behave identical in this type of experimentFootnote 5 – this can be observed in both variants’ results, which are nearly equal in this experiment. The mNEAT-IB (R2) shows the best results with respect to number of evaluations (8,272) and success rate (92%). On the other hand the mNEAT-IB (NDR + HV) evaluated 14,150 networks until it found a network \(x^*\) and thus performed worst (even statistically significant for \(\alpha = 0.05\)). It has to be investigated whether the quality indicators in mNEAT-IB (NDR + HV) are not suitable for evolving neural networks or this variation has not been configured properly by the meta-EA. To summarize, all algorithms were capable of finding an energy efficient controller for the cart in at least 70% of all repetitions with a maximal number of 25,000 evaluations. Most of the algorithms even exceeded a success rate of 85%.

Table 3. Quality indicator values of the algorithms and variations using the previously determined parameter settings for the Double Pole Balancing experiment with velocities (Mean Fitness). (first row = mean value, second row = standard deviation)

Mean Fitness. The results of the Double Pole Balancing experiment with velocities (Mean Fitness) are shown in Table 3. We examined the commonly used quality indicators \(\epsilon \) (smallest amount \(\epsilon \) that is necessary to translate one set A into another set B) [24, 25], Spacing (short S – the spread of the solutions of a set A), generational Distance (short GD – the average distance from a set A to the Pareto front) [25], Inverted generational Distance (short IGD – the average distance from the Pareto front to a set A, Pareto Noncompliant), Inverted generational Distance Plus (short IGD+ – the average distance from the Pareto front to a set A, weakly Pareto Compliant) [26] and Hypervolume [24, 25]. See [7, pp. 256–262] for more details on \(\epsilon \), S, GD and [26] for IGD and IGD+. The Hypervolume (HV) has been described in Sect. 2.1.

Table 3 shows that the mNEAT-IB (R2) achieves the best results in \(\epsilon \), GD, IGD+ (beside mNEAT (Archive)) and HV and second best in IGD. The mNEAT (Archive) performs best with respect to IGD and IGD+ and second best regarding \(\epsilon \), GD and HV. To summarize the findings: All examined quality indicators, except Spacing, are dominated by mNEAT-IB (R2) and mNEAT (Archive). Concerning Spacing, the original mNEAT performs best, followed by mNEAT-IB (NDR + HV), while these results are still not good at all: The solutions are concentrated around the most promising areas of the objective space and not spread equidistantly. The mNEAT performs worst regarding all quality indicators, except Spacing (where the mNEAT-IB (R2) gives the worst results) and GD (where the mNEAT-IB (HV) is worst). With respect to the quality indicators shown in Table 3, we find that the mNEAT-IB (NDR + R2), the mNEAT-IB (R2) and the mNEAT (Archive) show statistically significant better results than the mNEAT-IB (NDR + HV), the mNEAT-IB (HV) and the mNEAT in many cases. This would indicate that this group were to be preferred for further investigations. In contrast, mNEAT shows best results in Spacing, which could be caused by mNEAT’s behaviour to replace large parts of the population by new individuals. We assume that mNEAT rather explores the objective space instead of exploiting promising areas. It has to be investigated whether the differences that have been observed between the algorithms and variations concerning the quality indicators depend on the selected parameter settings or the problem instance, therefore further experiments will be carried out in future research.

6 Conclusions and Future Work

This paper focused on multi-objective Neuroevolution. We followed two main research directions both based on the well-known neuroevolutionary approach NEAT which was designed for single-objective tasks. The first focused on developing multi-objective variants of NEAT itself. Here, we introduced a novel indicator-based algorithm. The second direction considered the combination of efficient evolutionary multi-objective algorithms developed for a large number of objectives and NEAT. In this case, the multi-objective algorithms provide the framework into which the main principles of NEAT are integrated. We derived and tested four different variants. This is the first approach which uses these modern multi-objective algorithms with indicator-based selection in the context of Neuroevolution: Previous research utilized the SPEA2 with the original NEAT (Pareto Strength approach is mapping K objectives into a single objective, making it applicable to NEAT without any modifications) or the NSGA-II which is not considered as performant when the number of objectives is relatively large.

All in all this paper is intended as a proof of concept showing that our algorithms are suitable to address multi-objective Neuroevolution. The first experimental results are very promising. Our experimental analysis shows that the novel algorithms are capable of finding an energy efficient cart controller for a multi-objective version of the well-known Double Pole Balancing problem within very few evaluations. Concerning the Mean Fitness experiment which addresses the quality of the final Pareto front, we find that all algorithms are capable of evolving good controllers within a predefined number of maximum evaluations. However, statistically significant differences between the algorithms exist. The research will be continued in several directions. First of all, we are currently investigating more difficult variants of the Pole Balancing problem.

Despite the promising first results, further experiments on truly high - dimensional MOPs are necessary to test the algorithms ability of optimizing more than three fitness functions. Currently we are investigating the FighTing Game AI Competition [27], for which we apply our algorithms to create neural networks as controllers for an AI player (with four and five objectives). Preliminary results show that our algorithms are capable of beating already established AI opponents after very few evaluations.

To gain more insights concerning the algorithms’ behaviour, especially with respect to robustness, we have to compare a larger number of different configurations in future research. For exploring [8, p. 41f.] the search space of parameter configurations, we will follow the guidelines provided by the design and analysis of simulation experiments (DASE) [28]. Additionally the newly proposed algorithms have to be compared to other existing algorithms like NEAT-PS or MM-NEAT to assess their performance in comparison to these.

In the future, further areas as for example computer games or maze navigation will be considered in order to investigate the range of applicability of our algorithms. This will provide insights regarding the question whether one and then which of the novel algorithms/variations emerges as preferable in the area of Reinforcement Learning. Additionally, further variations of the algorithms should be investigated to create an even more powerful algorithm for multi-objective Neuroevolution. Another interesting aspect to investigate is in how far the parameters of the algorithms can be automatically configured by the algorithms during execution. Reducing the number of parameters that have to be predefined by the user reduces the complexity of the search space (for parameter configurations) and on the other hand increases the usability of the algorithms. Therefore, it represents an important point of future research.