Keywords

1 Introduction

Recent advancements in artificial intelligence and deep learning have opened up opportunities for disruptive solutions in various areas. However, the effectiveness of these solutions depends on the neural network design which has to be tweaked over time to suit the application. Neuroevolution (NE) provides an automated solution to address this issue through which a neural network can be trained without explicit training data—by evolving an initial population of randomized neural networks over several generations by assessing each individual’s fitness [4]. Neuroevolution approaches like NEAT [16] and its variants have been designed to address several gaps perceived in fixed topology NE. Games have been the primary source of design and testing for developing and refining these approaches since they offer complexity in decision making, the ability to introduce uncertainty, and a platform with varying levels of complexity.

One of the primary challenges in designing neural networks using neuroevolution is understanding the effect of the fitness function on the effectiveness of the network. While several studies have focused on the neuroevolution approach itself, the design of fitness function is not made clear. It is important to analyze the design of fitness functions and their impact on the resultant neural network performance. In this paper, we explore an incremental design of fitness functions for a snake game to analyze the performance of neuroevolution. The snake game was chosen for its simplicity in the experiment implementation, target finding, and obstacle avoidance features. An effective neuroevolution strategy for such a game shows opportunities for designing solutions for real-time target finding, and navigation applications.

Interestingly the challenges posed by the objective-based approach, predominantly in neuroevolution, have led to Quality diversity - a recent family of evolutionary algorithms proposed to find a balance between both convergence and diversity during the search. These include novelty-based searches [8], behavioral diversity approach [11] and Curiosity Search [17]. These algorithms emerged due to the result of bringing a new perspective to the fitness function i.e. focus on the novelty of solutions rather than the mere quality of solutions. Hence, involving these algorithms in our fitness function design experiments is justified. Consequently, this paper also provides a comprehensive insight into the implementation of the novelty-based approach, including analysis of its many flavors and augmentations, and the drawbacks present in each. We then formulate an augmented Novelty Search of our own, which provides a performance comparable to that of the highest performing existing methods.

The primary contributions of this paper are as follows:

  • An incremental design of experiments and a comprehensive analysis (especially the fitness function) while employing neuroevolution for games

  • Adapting Novelty Search, and exploring its variants for the snake game and evaluating its performance over the proposed fitness functions.

The rest of the paper is organized as follows: Sect. 2 reviews the state of the art in neuroevolution with an emphasis on games, while Sect. 3 explains our architecture and design of the fitness functions. Section 4 analyses the results of our simulations and incremental experiments. Further, in Sect. 5 we discuss various novelty-based approaches with their implementation, analysis, and fine-tuning. Finally, we conclude in Sect. 6 with our observations.

2 Related Work

Previous work related to playing the Snake game have utilized various methods such as Q-learning [10, 19] and genetic programming [2]. Our implementation builds upon these insights, with an emphasis on recent advancements in evolutionary algorithms such as Neuroevolution [4], NEAT [16] and Novelty Search  [8].

Neuroevolution aims to find an efficient neural network for the given task through the evolution of neural network topology. The synaptic weights and connection between neurons are taken as genetic traits that are encoded. These traits are then subjected to evolutionary methods such as selection, crossover, and mutation. The gaps in the formulation of neuroevolution include the assorted encoding schemes in offer and their varying efficiency [13], the domination of mature features over newly evolved features in the population, and the need to maintain a simple network topology throughout the evolution process [16].

The NeuroEvolution of Augmenting Topologies (NEAT) algorithm advocates for starting the evolutionary process with a minimal structure with no hidden nodes and then subsequently evolving the neural network structure along with their connecting weights [16]. Stanley et al. went on to implement a real-time version of the NEAT algorithm to play the NERO video game, showcasing the diverse behavior and adaptability of game agents evolved through NEAT [15]. Games are found to be great test-beds for neuroevolution and more suited for the task than robotics; noting their excellent performance and numerous means of implementation [12].

Recent insights into evolutionary algorithms have called for conducting a novel behavior driven search replacing the traditional objective-based measure of fitness [8]. This Novelty Search approach exhibits adaptive behavior in deceptive environments and avoids local optima in the learning process, as evident in its success in navigating the maze experiment. It is shown that in complex environments, where the behavioral space is vast, the performance of Novelty Search is lacking [7]. Studies have tweaked the parameters of Novelty Search such as the archive size and the value of k [5], but there remains an inherent gap in the understanding of Novelty Search especially the modifications that lead to better results or are more suited for larger behavioral spaces.

While several studies have focused on the neuroevolution approach itself, the design of the fitness function is not made clear. This paper attempts to fill that gap, as well as provide insights into designing fitness functions and provides analysis into their performance in game environments. We then test these objective-based functions in increasingly complex scenarios to test their adaptability. We proceed to analyze modifications to Novelty Search that provide a better performance in large behavioral spaces with resource constraints. The merits and demerits of each approach are then analyzed and presented.

3 Architecture and Design of Experiments

This paper uses the classic snake game as the testbed for the neuroevolution experiments. Considering the fact that the focus of the paper is on the incremental design of experiments (especially the fitness function) while employing neuroevolution for games, the choice of a simple game as a testbed is well justified. As Fig. 3 shows, the goal of the game is to make a snake moving in a 2D space eat as many food blocks as possible without biting itself, both in the presence and absence of obstacles. Figure 1 shows the neuroevolution workflow adopted towards evolving a neural network controller for the snake game. The first step involves creating a population of randomized neural networks that will be subsequently evolved for the task at hand. The fitness function evaluates each neural network by simulating a game with that corresponding neural network as that snake’s brain. The fitness function essentially awards good moves the snake makes (like eating the food block) and penalizes bad moves (like the snake biting itself or colliding with the wall) throughout the snake’s lifetime. Snakes that do not show progress over a period (1600 steps) are killed, so as to not pollute the gene pool. Once the fitness evaluation has been completed, the top-performing candidates are deemed as “elites" and chosen to be a part of the next generation directly (after mutation), while the rest of the population is subjected to crossover proportionate to fitness values, and then randomly mutated (with a fixed probability for mutation selection and mutation rate) thereby potentially yielding a new generation of better neural networks. Each neural network of this newly created generation is put through the gaming simulation, assessed, and the evolution process repeats until we reach a fitness criterion that has been met. The evolution engine i.e. the neuroevolution algorithm adopted in the above workflow is NEAT with parameters given in Fig. 2.

Fig. 1.
figure 1

Our proposed neuroevolution architecture

Fig. 2.
figure 2

Table: parameter settings for NEAT workflow

Fig. 3.
figure 3

The Snake game environment, with the addition of dynamic obstacles

Each candidate neural network, which is represented by a direct encoding scheme, begins with no hidden nodes and with fully connected input and output nodes. A hidden layer architecture, as shown in Fig. 4, is formed with mutations of new hidden nodes with their connections and weights evolved by determining the fitness in subsequent generations which will give the controller its very intelligence in playing the snake game effectively, of course, guided by the fitness function we have been designing. Crossover of candidate neural networks is performed by tracking historical markers and the introduction of speciation [16], whereby innovations are protected and differing network topologies are merged.

3.1 Choice of Inputs and Output

In game playing environments, the input given to the neural network is often raw image data and previous studies have used screenshots of the snake game environment as input [10]. However, NEAT with direct encoded topologies perform well on pre-processed representations of the game state [6]. In our implementation, the coordinates of every element involved like the position of the food object and the coordinates of the snake’s own body are known at every tick of the game clock and can be provided as input features.

Figure 4 shows the features which were given as input to the neural network. Initially, the distance in units to the food particle, and the distance to the nearest obstacle in three directions: in front of the snake, to the left, and to the right were used. The directional distance units were then replaced with an approach proposed by [2] where binary inputs indicated only the presence of food or obstacles in that particular direction. This modification helped to evolve neural networks with simpler topologies and with less overhead.

Interestingly, it has been seen that humans perform complex maneuvers by their knowledge (or memory). In the snake game, knowledge of the previously made moves dictates the next one. The performance of Q-learning in the snake game could be attributed to the previous state information found in the Markov decision process [10]. To emulate this human behavior, we propose a memory unit that contains the history of moves the snake has previously made. This has been implemented as a set of inputs to the neural network that specifies a set of previous turns made by the snake, the size of which was varied from one to ten. These inputs were binary states that specified whether the turn made previously was a left or right turn.

The outputs for each candidate neural network are defined to be the two actions that can be taken by the snake viz. to turn left or to turn right, which are again binary representations. It can decide to do neither and continue forward (neither of the neural network’s outputs gets activated).

3.2 Incremental Fitness Function Design

In our snake game, the objective is to have the maximum score when our snake dies. The theoretical maximum score in the snake game is a fixed objective and we give the same amount of points for eating food at any point in the game. In general, it is seen in neuroevolution, and confirmed in our experiments, that NEAT works rather well by specifying general instructions and by letting it develop trends on its own [16].

Initially, the fitness function has been designed as a greedy strategy to maximize the number of food blocks eaten by the snake (henceforth representing candidate neural network controllers). The flip side of such a greedy approach is that the fitness function wrongly encourages the candidate neural networks to always maneuver the snake toward food even at the cost of biting itself thereby ending the game. Consequently, the fitness function should reward the candidate solutions that let the snake live longer while devouring more food blocks. Lockhart [9] proposes a reward for staying alive. Accordingly, the fitness function has been redesigned to reward those candidate controllers that let the snake live longer. The relative lifespan of a snake for each candidate controller is computed using each tick of the game clock. In addition, the rewarding and penalizing of the controllers for respectively steering the snake towards and away from the food is weighted to smoothen the greedy approach for eating more food blocks. Figure 4 shows the template of the neural network controller with the activation of input and outputs for the scenario in Fig. 3. In this scenario, none of the output nodes are activated, thus the agent continues in the current direction.

Fig. 4.
figure 4

Template of neural network controller for snake with sample activation

4 Simulation Results and Analysis

4.1 Objective Fitness Function Designs

The game is designed over a \(40\times 40\) grid and the learning process took place over 200 generations, by which time the agent’s performance generally became saturated. The greedy fitness function rewarded candidate solutions with 1 point for making the snake move in the direction of the food, −1 for moving away from the food and 2 points for eating the food, analogous to the distance reward in Wei et al. [19]. With this fitness function (labelled version a in Fig. 5) snakes learnt to move toward food as early as the 2nd generation, by virtue of the penalty for avoiding food. By the 10th generation, some snakes obtained a good score of 66 food particles eaten while avoiding walls swiftly. A high score of 75 was attained by a snake in the 13th generation. Later generations were saturated with no improvement. The snakes always killed themselves around the time when its length reached twice the size of the board.

Fig. 5.
figure 5

Observations of controller scores with varying fitness metrics

We then reduced the penalty for moving against the food from −1 to −0.2, hoping the snakes can now learn not to kill themselves in pursuit of food. With this weighted greedy fitness function, snakes have learnt to go after food after around 2 generations, and by 10 generations, they learn to properly avoid walls. Subsequent generations did not show any improvement in strategy. A high score of 69 was attained by a snake of version b at the 20th generation. In both fitness functions, almost all snakes died by biting themselves. However, in the weighted greedy version, we observed that the snakes tried to stick to the game area boundaries thereby keeping the area predominantly empty to facilitate eating food.

Since the very objective of the game is to keep the snake alive while eating as much food as possible, the new greedy fitness function (version c) has been designed to additionally reward snakes 0.01 points for staying alive for each game tick. With this fitness function, a high score of 66 has been obtained in 15 generations. As has been observed earlier, in as early as 3 generations, the snakes have learnt to avoid walls and go after food. However, the evolution stagnated after 24 generations with no new strategies. Interestingly, as the fitness function now rewards the snakes for their longevity, the snakes have now learnt to choose the longest paths (often zig-zag) thereby delaying their end. This greedy strategy helps snakes to maximize fitness. However, it does not allow the snakes to learn to not eat themselves.

The reward for survival led snakes to learn counterintuitive tactics and strategies. Additionally, the greedy approach in the form of penalty for moving away from food also hinders the progress of the snake. So the non-greedy fitness function (version d) awards 1 point for moving towards food and 2 points for eating food. With this fitness function, the simulation resulted in a high score of 178 within 50 generations. As there is no negative reinforcement when the snake decides to move away from the food, it takes a while (observed to be around 8 generations) before it learns to go after it. By virtue of the non-greedy approach, snakes try various maneuvers instead of going only for food thus contributing to diverse strategies. By generation 21, it has been observed that a snake has learned the strategy of following its tail to stay alive. With this non-greedy version, snakes do not stick to the wall, they prefer to eat the food in the shortest distance if possible, and learn to avoid spiraling deaths even after 20 generations.

To achieve human-like maneuvers, a feedback loop in the neural network architecture has been implemented in version e which enables the snake to keep track of its previous moves along with the fitness function without a greedy approach. The simulation with this feedback controller has achieved a high score of 211 within 50 generations. After 20 generations, the knowledge of one’s history of paths has resulted in highly sentient moves matching human performance. In fact, the optimal size of the memory (i.e. the number of previous moves) was found to be three. Beyond this limit, there was no noticeable increase in the performance of the snake.

4.2 Snake Game Environment with Obstacles

Fig. 6.
figure 6

Performance of non-greedy approach in different cases

This section intends to investigate the potential of the designed non-greedy fitness function for a custom-designed version of the snake game to reflect real-world complexity. Accordingly, the environment has been modified to dynamically place and remove random obstacle blocks as shown in Fig. 3. This modification has high relevance to real-world environments where an autonomous agent has to navigate through space towards a target while constantly avoiding random moving obstacles. The snake’s performance was promising as it learned that the new obstacles were like walls and it stayed clear of them, thus showcasing the generalizing ability of NEAT powered with our fitness function design. Figure 6 shows the performance of the non-greedy fitness function for the snake game with and without obstacles, and with a memory supported network. While the memory supported network performs best for the case of no obstacles, the performance with obstacles shows improvement as the generations increase demonstrating the robustness of the fitness function.

5 Augmenting Fitness Using Novelty Search

The foundation of novelty search (NS) is based upon defining the behavioral space. In the snake game environment, we choose the behavior of the snake agent to be evaluated as the path traversed by the snake during its lifetime. However, the selection of this behavior as a novelty metric poses a challenge. Consider a game instance \(G_{1}\) with the spawning of food at positional coordinates \((x_{1},\ y_{1}), (x_{2},\ y_{2}), ...,(x_{n},\ y_{n})\) to be considered as \(F_{1}\) and the path required by the snake to perfectly consume these food to be P1.

Let us suppose instead that this snake takes a path \(P_{2}\) slightly different to the perfect path \(P_{1}\) during the game. In the next generation, let us consider that for a game instance \(G_{2}\), food is spawned at places corresponding to \(F_{2}\) and the perfect path required to be \(P_{2}\) which the agent ends up traversing. Since the path \(P_{2}\) has been previously traversed by a past instance, the novelty score given to the agent of game \(G_{2}\) will be low, even though it was the perfect path required. To address this issue, during the learning stage of novelty search, a constraint is imposed on the positions of the food spawns, such that their positions will be static across all generations.

A new agent seen with high dissimilarity to previous agents can be deemed to be novel. The need arises for a function that is a measure of similarity between two paths. We propose the use of discrete Fréchet distance [3] between the paths traversed by two agents to be the similarity function. Fréchet distance for two polygonal curves P and Q is given as the minimum length required to join any two points between P and Q when moving forward along both the curves. These curves, which exist in metric space, are analogous to the path traversed by our agent in its lifetime. The inputs for the distance function are the coordinates of the ordered points of the curves.

The Fréchet distance of an agent’s path is calculated to every other agent’s path present in the generation and the novel archive. The k-nearest neighbors (\(k=3\)) found are averaged to derive a novelty score. This novelty metric was then used as a replacement for the fitness function in our experiment. Since computing the different snake paths and their similarity in a large game environment is expensive, the experiments are run on a \(10\times 10\) game dimension with a theoretical maximum score of 100 with a 3000 generation limit. The earlier experiments have also been run in the new game conditions for comparison.

Table 1. Comparison of different algorithms as a measure of food eaten

An assortment of distinct flavors of the NS algorithms has been tested over the years. In our complex problem space, an approach similar to [18] where a novel archive is omitted and behavioral differences are calculated only among the k-nearest neighbors (kNN) is a plausible enhancement. This approach is a mix of the classic NS and a behavioral diversity based implementation [11], thus formulating a modified NS which seeks out an ideal behavior that exists in a search space between that of the traditional NS and Curiosity Search [17]. This implementation of NS is analogous to intra-generational novelty search, as it calculates the novelty of the agents among the current generation of the population alone, forgoing the presence of a novel archive found in [16]. It is seen that striking a balance between the exploratory nature of NS and the goal-drivenness of objective fitness functions leads to good performance on deceptive tasks [1]. Consequently, the modified NS has been augmented with the non-greedy fitness function as follows:

$$\begin{aligned} f(x) = modifiedNovelty(x) + (0.5 * modifiedNovelty(x) * nongreedyFitness(x)) \end{aligned}$$
(1)

This augmented novelty metric calculation helps guide the NEAT algorithm to escape the local optima. While this function is comprehensive, the complexity of the guiding objective is in question and a simple metric guiding the search may well be sufficient. Therefore, we propose the use of a guiding metric analogous to a fitness metric, but one that is loosely defined based on the general goal of the game ie., maximizing the food eaten. Thus, we modify the previous fitness function to include a guiding metric (GM) as follows:

$$\begin{aligned} f(x) = modifiedNovelty(x) + (0.5 * modifiedNovelty(x) * GM(x)) \end{aligned}$$
(2)

Table 1 shows the performance of the proposed greedy and non-greedy fitness functions on the constrained environment along with the modified NS functions (n = 15). The usage of the fitness metric of an agent as a parameter in calculating the novelty score shows an increase in the maximum score over the usual NS approach. It is also observed that the simple guiding metric yields comparable results for the maximum and average scores to a full-fledged fitness function as reported in Table 1. However, its reliability needs improvement. The fitness-based approaches are less computationally intensive while providing comparable performance, but their demerit lies in the fact that the manual and iterative design of the fitness function is laborious. On the other hand, a novelty based approach involves more computing and memory overhead (especially when a novel archive is used) but the implementation, particularly with a simple guiding metric, is significantly less complex.

The constraint of static food positions used in the training of the novelty algorithm arises a question of whether these agents would be able to maintain their performance when this restriction is removed. High performing agents which were trained by each strategy were evaluated on instances of game environments with dynamic food positions as given in Table 1 with n = 100. The novelty algorithms perform well in dynamic environments even when their learning was performed with environmental constraints. This validates the adaptability of NEAT and novelty based approaches.

6 Conclusion and Future Work

In this paper, we have provided an in-depth look into the implementation of NEAT using the Snake game as a testbed. Both greedy and non-greedy fitness functions were designed and the performance of the latter showed promise. It also performs well in the presence of obstacles and adding memory to the Neural Network controller led to the evolution of better strategies leading to higher scores. Augmenting this fitness function with a novelty search drastically improved the game scores obtained. By simplifying the fitness function over novelty search using a guiding metric, we demonstrate that a comparable result is achievable with lesser overhead. By proposing an incremental approach to fitness function design for a goal-finding game, this paper has demonstrated the impact of both the quality and novelty of the fitness function. This opens up opportunities to evolve similar strategies using recurrent neural networks and adapt them for dynamic environments using neuromodulation.

Recent explorations have been made into automating the formation of the ideal fitness function, particularly through the use of a neural network as a fitness function. The use of a similarly trained neural network which could approximate novelty score might prove to be advantageous in NS as it would eliminate the storage and computational costs associated with a novel archive, which was a major limiting factor in our experiments. The results of our experiment indicate that the large behavioral spaces of complex game environments can be reduced with constraints to simpler problems for the implementation of NS. This method can be implemented for other complex environments to conduct further research.