Keywords

1 Introduction

Designing control software for a robot swarm is a challenging task, as the global desired behavior usually emerges from the interactions of the robots between each other and the environment [10, 37]. Manual software design therefore often relies on trial-and-error [4] and a general methodology for designing control software for robot swarms is still missing [12].

Automatic design offers a promising alternative, by transforming the design problem into an optimization problem. Instead of writing control software that performs a specific mission, a target architecture is optimized with regard to a mission-dependent objective function. A popular automatic design approach is neuro-evolutionary swarm robotics which uses evolutionary algorithms to design artificial neural networks. While this approach has successfully been applied to many missions [8, 11, 21, 33, 35, 36], multiple challenges remain to be solved [5, 31, 34]. The most important is the weak transferability of the generated control software, resulting in performance drops when deployed in reality. This drop in performance is often associated with the reality gap—inherent differences between the design context of the simulation and the real world.

Francesca et al. [14] see in this phenomenon a resemblance to the problem of over-fitting in machine learning. Analogous to the bias-variance trade-off [9, 17], they propose to introduce a bias to the automatic design process. Their proposed bias is a restriction of possible control software, by defining a control architecture which can be composed through the combination of predefined modules. As a proof of concept, Francesca et al. implemented AutoMoDe-Vanilla, an automatic modular design approach that generates finite-state machines with up to four states. Such generated finite-state machines are composed of states, which will execute an associated behavior as long as they are active, and transitions, that have an associated probabilistic condition which can trigger the transition from one state to another. Vanilla uses F-race [2] to combine the finite-state machines out of a set of predefined modules (behaviors and conditions) and to fine tune their parameters.

With AutoMoDe-Chocolate [13], Francesca et al. implemented a variant of Vanilla that differs only in the optimization algorithm employed. Chocolate uses Iterated F-race [3], instead of F-race. The results of their experiments show that Chocolate performs significantly better than Vanilla on many missions. Given that the only difference between the two methods is the optimization algorithm it seems apparent that the optimization algorithm is an important part of the automatic modular design approach and can have a great influence on the performance of generated control software. Following up on this observation, we create IcePop, another instance of AutoMoDe. It is functionally similar to Chocolate and Vanilla but it uses simulated annealing as an optimization algorithm. We choose simulated annealing because it is a well-studied algorithm [6, 19, 26, 29, 32] that has found many applications (for surveys see for example [1] and [32]).

Simulated annealing is a metaheuristic inspired by the thermodynamical process of annealing [23]. At higher temperatures the particles in a crystal are more excited and can move more freely than at lower temperatures. Similarly, the simulated annealing algorithm has a “temperature” parameter. When it is high, the algorithm has a chance to accept worsening solutions, mimicking the free movement of the particles. At lower temperatures, the algorithm will select worsening solutions less likely, thus constraining the movement of the solution candidate. Simulated annealing has shown properties that are desirable for the automatic design of control software. It has been shown to effectively traverse the search space and to converge quickly towards promising solutions [22]. This allows an efficient use of the allocated budget. Furthermore, simulated annealing contains mechanisms to escape local optima—e.g., by accepting worsening moves at higher temperatures. Without any a priori knowledge of the shape of the search space, this is an important property as it reduces the risk of premature convergence to suboptimal solutions.

The rest of this paper is structured as follows: In Sect. 2 we present the experimental setup that we used—the robotic platform, the design methods and the experimental protocol. In Sect. 3 we present four experiments and their results. In Sect. 4 we summarize our findings and give an outlook to future work.

2 Experimental Setup

In this section we describe the experimental setup and protocol that was used to obtain the results described in Sect. 3.

2.1 Robotic Platform

IcePop designs control software for a swarm of modified e-puck robots [16, 30]. The e-puck robots are equipped with two wheels, whose velocity can be adjusted independently, three ground sensors that can perceive the greyscale color value of the floor, and eight IR transceivers that are spaced equally around the robot, that can perceive proximity and light values. The robot is also equipped with a range-and-bearing board [18] that comprises twelve IR emitters and twelve receivers equally distributed along the perimeter of the board and pointed radially and outwards, on the horizontal plane. The range-and-bearing board allows the e-puck to send and receive messages within a range of 0.7m. In order to abstract the actual sensor configuration, we use a reference model [20]. Specifically, we use RM1.1 (see Table 1), the reference model that was used to define the modules of Chocolate.

In this reference model, each robot has eight light and proximity sensors returning floating point values between 0 and 1. \(prox_i\) and \(light_i\) define the proximity and light reading for the ith sensor respectively. Three ground sensors (\(ground_i\)) return one of three values, indicating whether the ground underneath them is black, gray or white. The reference model uses the range-and-bearing board to count the number of neighbors in communication range (n) and calculates an attraction vector (\(V_d\)) towards the center of mass of all perceived robots. Additionally the robot has two wheels, whose velocity can be adjusted independently (\(v_l\) and \(v_r\) for the velocity of the left wheel and the right wheel respectively).

2.2 Automatic Design Methods

We compare two automatic modular design methods: Chocolate and IcePop. Chocolate [13] generates probabilistic finite-state machines with up to four states. For that it uses a set of six behaviors and six conditions that are defined on top of RM1.1 [20]. The six behaviors are: exploration, stop, phototaxis, anti-phototaxis, attraction and repulsion. The six conditions are: black-floor, gray-floor, white-floor, neighbor-count, inverted-neighbor-count and fixed-probability. For a detailed description of the modules, we refer the reader to their original definition [14]. The optimization algorithm used by Chocolate is Iterated F-race [27].

In this paper, we propose IcePop. It is based on Chocolate, as it uses the same modules and target architecture. The difference between the two methods is that IcePop adopts the component-based simulated annealing algorithm (see Algorithm 1) as the optimization algorithm. Franzin and Stützle proposed this component-based algorithm in an effort to unify many variants of the simulated annealing algorithm [15]. We choose to adopt this algorithm because it provides the flexibility to easily change components should the need arise.

Table 1. Reference model RM1.1 [20]. Sensors and actuators of the e-puck robot. The period of the control cycle is 100 ms.
figure a
Table 2. Configuration of the simulated annealing algorithm.

The component-based simulated annealing algorithm contains placeholders for commonly used components. In Table 2, we present our choices of components that we use in the implementation of the simulated annealing for IcePop. The initial solution supplied to the algorithm is a minimal valid instance of control software. In our case this is a finite-state machine with exactly one state executing the stop behavior. The neighborhood function is implicitly defined through the application of a random valid perturbation operator. In IcePop, we have defined eleven perturbation operators: adding a state, removing a state, adding a transition, removing a transition, changing the initial state, changing the starting point of a transition, changing the end point of a transition, changing the behavior associated with a state, changing the condition associated with a transition, changing the parameters of a behavior, and changing the parameters of a condition. The initial temperature is set to 125.0. The stopping criterion is defined as a maximum budget of simulation runs. That is, after the allocated budget of simulation runs is exhausted, the algorithm should return the final instance of control software. The exploration criterion selects a random valid perturbation operator and applies it on the incumbent solution. The acceptance criterion is the Metropolis condition [23, 28] that accepts or rejects new solutions based on their performance. The Metropolis condition will always accept an improving solution, and will accept a worsening solution with probability \(\exp (-(e - e')/T)\) where T is the current temperature, e is quality of the currently best solution and \(e'\) is the quality of the perturbed solution. Because the performance of each instance of control software is stochastic, e and \(e'\) will be computed as the mean of a sample of 10 runs of the respective instance of control software. The temperature length determines the number of steps before the temperature cools down again. We set the value to 1, so that the cooling happens in every step. The cooling scheme that is then applied is the geometric cooling [23]. In geometric cooling, the updated temperature is computed as \(T*\alpha \), where T is the current temperature and \(\alpha \) is the cooling coefficient, which we set as \(\alpha = 0.9782\). Additionally, the temperature resets to the initial value every 5000 simulations.

The source code of our implementation of IcePop is available at: https://github.com/keua/design-of-robot-swarms-by-optimization

2.3 Missions

All experiments were conducted with 20 robots on two missions Aggregation with Ambient Cues (AAC) and Foraging.

AAC. The arena contains two circles, one black, one white. A light source is placed on the side of the arena that contains the black circle (Fig. 1, left). The robots are tasked to aggregate on the black spot. The objective function \(F_{\textsc {AAC}} = \sum _{t=0}^T N_t\) where \(N_t\) is the number of robots on the black circle at time step \(N_t\).

Fig. 1.
figure 1

The two missions: AAC (left) and Foraging (right).

Foraging. The arena contains two source areas in the form of black circles and a nest, as a white area. A light source is placed behind the nest to help the robots to navigate (Fig. 1, right). As the robots have no gripping capabilities, we consider an idealized version of foraging, where a robot is deemed to retrieve an object when it enters a source and then the nest. The goal of the swarm is to retrieve as many objects as possible. The objective function is \(F_{\textsc {f}} = N_i\), where \(N_i\) is the number of retrieved objects.

2.4 Protocol

As each design process is stochastic, we run 20 independent designs for each design method, resulting in 20 instances of control software. The so obtained instances are then each assessed 10 times in the design context (what we call simulation) and another 10 times in a different simulation setting (what we call pseudo-reality). Pseudo-reality is a concept to evaluate the transferability of control software [25]. Instead of assessing the performance directly in reality, a different simulation context is used. Research has shown that control software that transfers well into reality also transfers well into pseudo-reality, while control software that transfers badly into reality also transfers badly into pseudo-reality.

The results are presented in notched box-and-whisker boxplots, giving a visual representation of the samples. In such a notched box-and-whisker boxplot, the horizontal thick line denotes the median of the sample. The lower and upper sides of the box are called upper and lower hinges and represent the 25th and 75th percentile of the observations, respectively. The upper whisker extends either up to the largest observation or up to 1.5 times the difference between upper hinge and median—whichever is smaller. The lower whisker is defined analogously. Small circles represent outliers (if any), that are observations that fall beyond the whiskers. Notches extend to \(\pm 1.58 IQR/\sqrt{n}\), where IQR is the interquartile range and n = 20 is the number of observations. Notches indicate the 95% confidence interval on the position of the median. If the notches of two boxes do not overlap, the observed difference between the respective medians is significant [7].

3 Results

In this section we describe four experiments we conducted and the results we obtained. The instances of control software produced, the details of their performances, and videos of their execution on the robots are available as online supplementary material [24]. We also discuss possible reasons for the results.

3.1 Influence of the Budget

We conduct one experiment to investigate the influence of the budget on the performance of the generated control software. Designs with a smaller budget need less time to finish but usually produce results that perform less well in simulation. The higher the time the better usually the performance in simulation, but an overdesigning effect might be observed, where the improvement in simulation does not carry over to reality. We tested IcePop with five different budgets (5000, 10000, 25000, 50000 and 100000 simulations respectively).

Fig. 2.
figure 2

Performance of control software created by IcePop for different budgets.

The results displayed in Fig. 2 show the influence of the budget on the performance of the control software generated by IcePop. One trend that is apparent from the data, is that, as expected, a larger design budgets leads to control software that performs better in simulation. However the relative improvement diminishes and the performance seems to reach a peak around a budget of 50000 simulations.

Furthermore the performance in pseudo-reality improves initially with an increased budget. Here, however, the performance levels after the budget of 25000 simulations is reached and does not improve any further. This could be an indicator that the design reached the peak performance that is still transferable. Further designs might improve the performance in simulation but the transferability will suffer in return.

3.2 Influence of the Sample Size

We chose the Metropolis condition as the acceptance criterion in the component-based simulated annealing for IcePop. In its original definition it was defined to compare two single performance measures. As the evaluation of the performance of an instance of control software is stochastic, we sample several simulation runs. The mean of this sample is then supplied to the Metropolis condition.

In a second experiment, we investigate the influence of the sample size on the performance of the generated control software. Smaller sample sizes use less of the budget to evaluate one solution, allowing more solution candidates to be investigated. On the other hand, outliers will have a greater impact on the mean of the samples and thus the perceived performance. Larger sample sizes lead to the inverse effect. Fewer total solution candidates would be investigated but the performance of each individual solution candidate is more robust to outliers. We study the influence of the sample size on the performance of the generated control software by evaluating the performance in simulation and in pseudo-reality for three sample sizes: 5, 10, and 15. Additionally we test every variant on the three budgets that showed peak performance in the previous experiment (25000, 50000, and 100000 simulations).

Fig. 3.
figure 3

Influence of the sample size.

Fig. 4.
figure 4

Influence of the restart mechanism.

Fig. 5.
figure 5

Comparison between Chocolate and IcePop.

Figure 3 shows the results for the three different variants of the sample size over the three investigated budgets. For a budget of 25000 simulations, all variants perform similar and no differences can be seen, both in simulation and pseudo-reality. In the case of a budget of 50000, the variant with a sample size of 10 samples performs slightly better than the other two variants, in the mission Foraging when assessed in simulation. In pseudo-reality, this difference however is not present anymore. It could therefore very well be that this is simply a statistical artifact of the stochastic design process. For 100000 simulation runs, the three variants achieve a comparable performance again and only minor differences can be observed. All in all, the three different sample sizes that we compared show no noticeable differences.

3.3 Influence of the Restarting Mechanism

We conduct a third experiment, to investigate the influence of the restarting mechanism. Restarting resets the temperature to a higher value, allowing the design process to make bigger movements in the search space again. We investigate four different restarting mechanisms: fixed length (restarts after a fixed number of simulations, in this case every 5000 simulations), no restart (the temperature cools over the whole design process and is never restarted), reheat (the temperature is reset every 5000 simulations, the new temperature is set to the one that generated the biggest improvement so far), restart once (after the half of the budget is exhausted the temperature resets). We test all restarting mechanisms on budgets of 25000, 50000 and 10000 simulations.

Figure 4 shows the results for the different restarting mechanisms. The results for a budget of 25000 simulation runs show no difference between the four variants. In case of a budget of 50000 simulation runs all variants perform similarly in the mission AAC. In the mission Foraging, the restarting mechanism that restarts every 5000 simulation runs performs worse than the other three variants. For a budget of 100000 simulation runs, all four variants perform similarly again. In the mission Foraging, however, the fixed length restarting mechanism (default) shows a larger distribution than the other three variants.

In conclusion, the four different variants fail to produce noticeable differences in the performance of the generated control software.

3.4 Comparison with Chocolate

In the last experiment, we compare the performance of IcePop with Chocolate across three different budgets (25000, 50000 and 100000 simulations).

Figure 5 shows the comparison results of IcePop with Chocolate for budgets of 25000, 50000, and 100000 simulations respectively. Throughout all three budgets, it is apparent that IcePop performs better in simulation than Chocolate in both missions. In the mission AAC, the difference in performance is statistically significant.

Unfortunately the drop of performance when assessed in pseudo-reality is slightly larger for IcePop than for Chocolate. This could indicate that IcePop might be less transferable to real robots than Chocolate. Despite the larger performance drop, IcePop still performs better in pseudo-reality, and in AAC this difference in performance is also statistically significant.

Additionally, we have taken the best performing instance of control software of IcePop and Chocolate (with a design budget of 100k simulations) for each mission and directly applied it to a swarm of twenty real e-pucks. Videos of the performance of the control software on real robots can be found online [24].

4 Conclusions

In this work we have investigated a default configuration for simulated annealing in the context of automatic modular design. The results indicate that simulated annealing can be a viable candidate for the automatic modular design of robot swarms. Additionally, we have investigated the influence of some obvious variations to the simulated annealing on the performance of the automatic modular design. The component-based simulated annealing approach allowed us to easily implement these variants.

Simulated annealing is a well studied optimization algorithm with many proposed extensions, improvements and variants. A next step could be finding a suitable configuration of components that satisfies best the demands of the automatic modular design. Also, it would be interesting to apply IcePop to a broader range of missions.