1 Introduction

Swarm robotics is a particular type of multi-robot system in which each robot has its own controller, perception is local, and communication is based on spatial proximity (Dorigo & Şahin, 2004). The swarm’s designer operates at the individual level by providing each robot with the mechanisms that generate its behaviour. The group-level or swarm response emerges, through a self-organisation process, from the interactions between the robots and their social and physical environment. Due to the distributed and stochastic nature of this self-organisation process, it is notoriously difficult for the designer to predict which set of individual actions leads to the emergence of the desired collective response (Brambilla et al., 2013). Different approaches have been developed and are commonly used to overcome this design problem (Hamann, 2018). One of these approaches is Evolutionary Robotics (ER), which concerns the use of evolutionary computation techniques to synthesise artificial neural network controllers (Nolfi & Floreano, 2000). ER, through a process that mimics natural evolution, automatically generates progressively better group strategies for tasks requiring collaborations between the robots (Trianni & Nolfi, 2011). The mechanisms underpinning the individual actions are automatically assembled and selected by using a behaviour-based (instead of a mechanism-based) evaluation metrics. Consequently, they are less exposed to a designer’s a priori assumptions about which operational principles each robot has to follow to contribute to the group level response (Vargas et al., 2014). The ER approach has already been successfully used to generate different collective behaviours in robotic swarms, such as self-organising synchronisation (Trianni & Nolfi, 2009), self-assembly (Tuci et al., 2008), and cooperative object transport (Alkilabi et al., 2017). The objective of this study is to show that the ER approach can be successfully used to design mechanisms for a swarm of robots engaged in a particular type of collective decision-making problem.

Collective decision-making refers to a decision problem in which natural or artificial agents collectively make a choice among two or more alternative options in a way that, when the decision is made, it is no longer attributable to any single individual agent (Valentini et al., 2017). In nature, collective decision-making processes can be observed in social insects, which collectively choose foraging or nesting sites without any agent knowing the quality of all available options (Britton et al., 2002). Similar processes have been observed and studied in social sciences (Lim & Chan, 2016), in statistical physics (Bialek et al., 2012; Vicsek et al., 1995; Cavagna et al., 2018), in behavioural economics (Bose et al., 2017), and in swarm robotics (Strobel et al., 2020; De Masi et al., 2021a). In all these examples, the collective decision emerges from a non-linear feedback process generated by local interactions among the agents (Camazine et al., 2001; Halloy et al., 2007).

Swarm robotics studies have mainly investigated collective decision-making processes in systems in which the mechanisms underpinning the opinion selection at the individual level are based on specific hand-coded mechanisms. That is, the robots update their opinion following the rules of the voter model (Valentini et al., 2014, 2016; Scheidler et al., 2016), or of the majority model (De Masi et al., 2020, 2021b), or of alternative models generated by slightly varying the implementation details of these two models (Divband Soorati et al., 2019; Talamali et al., 2021). The rules that robots use to update their opinion are generally integrated into a finite state machine type of controller, in which, the different states form the behavioural repertoire needed by the robots to react to physical and social input. Besides the advantage of being completely interpretable, the finite state machine controllers have been demonstrated to be effective in supporting the collective decision-making process in a variety of swarm robotics scenarios (Valentini et al., 2015, 2016; Scheidler et al., 2016). However, in these studies, strong assumptions have to be made by the designer concerning the way in which the robots deal with and respond to sensory stimulation. For example, in scenarios characterised by multiple-environmental cues (e.g.,, physical and social cues), the designer has to choose how to combine the cues and/or which type of cues to prioritise in case of contradictory information signalling the option’s quality. With respect to this complex choice, the design method of hand-coding opinion selection mechanisms may limit the robot’s possibility to exploit subtle environmental structures and regularities that could support the development of alternative and potentially more effective solutions. The experimental work in swarm robotics has already pointed to some of the weaknesses of the most frequently used hand-coded opinion selection mechanisms, especially those concerning the adaptability of these mechanisms to cope with dynamic environments where the best option varies over time (Talamali et al., 2021; Divband Soorati et al., 2019; De Masi & Ferrante, 2020; Prasetyo et al., 2019).

Following a research direction initiated by Almansoori et al. (2021), in this paper we provide new data and a deeper evaluation in support of an alternative design process, in which the ER approach is employed to generate individual decision-making mechanisms unbiased by the designer’s assumptions on how each robot should form its opinion concerning the most favourable option. We test the ER approach on a collective perceptual discrimination task originally described in Valentini et al. (2016). In particular, we illustrate a design method in which we replace the hand-coded parts of the robot controller (i.e., the finite state machine) in charge of the opinion selection with an artificial neural network synthesised using evolutionary computation techniques. The new design process comes with other important modifications to the robot’s behavioural repertoire, which loses the classic structural organisation in modules/states based on the distinction between exploration and dissemination behaviour. With a series of comparative evaluations, we provide clear evidence that swarms in which robots make decisions using neural mechanisms synthesised using artificial evolution outperform swarms in which robots make decisions using classic hand-coded opinion selection mechanisms on metrics related to the robustness, adaptability, and scalability of the group response. We also port the best evolved neural model onto physical e-puck2 robots to ecologically validate the neural network-based decision-making mechanisms on a physical robotic system. The results of our original study contribute to strengthening the significance of the ER approach for the design of progressively more resilient swarm robotics systems capable of operating in complex natural environments.

2 State of the art

The swarm robotics literature on collective decision-making is already relatively vast. A comprehensive review of this research area can be found in Valentini (2017). In this section, we review papers in swarm robotics exclusively focussed on collective perceptual discrimination tasks considered instances of the larger class of collective decision-making problems.

To the best of our knowledge, the problem of collective perceptual discrimination in swarm robotics has been studied with reference to a single specific scenario where the options concern the colour of the arena floor. The swarm is generally required to reach a consensus on which colour, between black or white, covers the majority of the arena floor. This scenario was originally investigated by Valentini et al. (2016) who tested the effectiveness of three different hand-coded mechanisms for options selection (i.e., a weighted voter model, the majority model, and direct comparison) both with simulated and physical robots.

Subsequent studies have exploited this scenario not only to test the effectiveness of alternative forms of opinion formation mechanisms, but also to explore different issues (e.g.,, security) related to the process of collective decision-making. Almost the totality of the works which we will briefly review below, have explored the design of decision-making mechanisms using different forms of hand-coded control algorithms. For example, Strobel et al. (2018) replicated the study described by Valentini et al. (2016) to test the effectiveness of a blockchain-based smart contract to protect the collective decision-making process from agents (i.e., Byzantine robots) acting in order to disrupt the decision process. The blockchain-based layer operates on top of the decision-making mechanisms based on the majority rules. The authors demonstrated the capability of the blockchain-based approach to establish secure swarm coordination mechanisms and to identify and exclude Byzantine swarm members.

In Ebert et al. (2018), the collective perception scenario has been transformed into a multi-feature collective decision-making problem, in which simulated and physical robots are used to prove the effectiveness of a new algorithm designed to achieve consensus in a finite time. The proposed algorithm is based on a modified version of the majority model presented in Valentini et al. (2016) and is supported by the concept of concentration and of lock-in of the final decision, which is inspired by quorum sensing in natural collective systems. The authors show the efficacy of the proposed algorithm to correctly classify the features even if the algorithm is presented with an initially unequal distribution of opinions’ frequency within the swarm.

Ebert et al. (2020) introduced a distributed Bayesian algorithm for robot swarms to classify a spatially distributed feature of the collective perception scenario. Each robot operates as a Bayesian estimator while sharing and integrating the information of neighbouring robots. The authors show that a well-balanced combination of prior and decision thresholds allows faster decisions at a small cost of accuracy and that making fewer, less-correlated observations can increase decision-making accuracy.

Bartashevich and Mostaghim (2019) demonstrated that, in the collective perceptual discrimination scenario, the distribution and clustering levels of environmental features indicate the actual task difficulty more than the quantity ratio of those features. To evaluate the generalizability of existing collective decision-making strategies, the authors proposed nine different visual patterns containing specific structural information derived from the studies on matrix visualisation. New metrics for measuring cluster density in patterns and estimating cluster connectivity are proposed to evaluate the provided benchmarks.

Bartashevich and Mostaghim (2021) investigated eight belief combination operators from the evidence theory (i.e., a flexible mathematical framework for dealing with imperfect and uncertain information) to test the robustness of fusing environment and agent data against spatial correlations in an unknown environment. While the authors demonstrated via simulation that at least one of the tested fusion operators led the robotic swarms to reach consensus on all considered benchmarks effectively, they found that this strategy slows down the convergence time and does not apply to swarms larger than 20 robots.

Divband Soorati et al. (2019) and Pfister and Hamann (2022) have examined a dynamic version of the collective perceptual discrimination scenario in which the two alternative options undergo gradual qualitative changes. Divband Soorati et al. (2019) tested the proposed method to deal with dynamic environments using simulation and physical robots. If the ratio of the two available features is high (1:3), more than two-thirds of the swarm members can adapt to the environmental changes. Pfister and Hamann (2022) extended the Baysian approach originally proposed in Ebert et al. (2020) to a dynamic version of the collective perceptual discrimination scenario.

The only research works which investigated the collective perceptual discrimination scenario using non hand-coded decision-making mechanisms are the work by Morlino et al. (2012), and the work by Kaiser et al. (2023). However, in Morlino et al. (2012), the experimental setup refers to a slightly modified instance of the collective perceptual discrimination scenario, in which the main goal of the swarm is not the achievement of a group consensus. It rather concerns a slightly different problem related to mapping the variation of the proportion of black on the arena floor into variations of the frequency of sound signals that the robots have to emit in a synchronised way. The robot’s tasks are characterised by the fact that communication is global (i.e., a sound signal can be perceived in any location in the arena). In the work by Kaiser et al. (2023), the Evolutionary Robotics approach is used to design a specific neural module which is integrated into a hand-coded finite state machine controller. The module generates the robots’ opinion, within a control structure and within a task organisation that remain largely similar to the one originally proposed by Valentini et al. (2016). For example, in Kaiser et al. (2023), features such as the distinction between exploration and dissemination phase as well as the relationship between the time a single robot spends in the dissemination state and the quality of the currently selected option are kept as in Valentini et al. (2016).

Thus, we claim that the work illustrated in this paper is the first study demonstrating that the ER approach can be successfully used to synthesise the robot’s decision-making mechanisms in a collective perceptual discrimination task in which the group consensus has to be reached with partial individual knowledge and communication limited to a single spatially proximal neighbour.

3 Methods

This section describes the task, the simulation environment, the robots’ controller, and the evolutionary algorithm used to synthesise it.

3.1 The task and the simulation model

We study a collective decision-making scenario originally described by Valentini et al. (2016), where a swarm of 20 simulated robots explores a closed arena of \(2\times 2\) m. The arena floor is made of square \(10\times 10\) cm black and white tiles, which are randomly distributed on the arena floor (see Fig. 1a). During the design phase, the swarm experiences two types of environment: the black-dominant environment in which 55% of the tiles are black and 45% are white, and the white-dominant environment in which 55% of the tiles are white and 45% are black. As in Valentini et al. (2016), the task of the swarm is to reach a consensus on the best quality option, which corresponds to the colour that covers the largest portion of the arena floor (i.e., the white colour in white-dominant environments, and the black colour in the black-dominant environments).

Our simulation models the hardware specifications of the e-puck2 robot (see Fig. 1b), a 70mm diameter, wheeled, cylindrical robot equipped with a variety of sensors and whose mobility is provided by a differential drive system  (see Mondada & et al., 2009, for details). Our simulated e-pucks are equipped with infrared sensors positioned all around the robot’s body and a floor sensor positioned beneath the robot chassis. The infrared sensors return a signal which is proportional to the robot’s distance to obstacles. In this scenario, obstacles can be other robots or the arena walls. When the centre of the robot base is on a black tile, the reading of the floor sensor is 0. When it is on a white tile, the reading is 1. For communication, we simply assume that whenever two robots are less than 50 cm from each other, a 1-bit signal transmitted by one agent is received by the spatially proximal agent. Each robot signals its opinion for the entire duration of its life. The signal sender communicates its current opinion, which can be either black for black-dominant environments (the signal sent is 0) or white for white-dominant environments (the signal sent is 1). This type of communication can be reliably implemented on physical e-puck2 with the range &bearing board. Concerning the function that updates the position of the robots within the environment, we employed the differential drive kinematics equations, as illustrated by Dudek and Jenkin (2000). To compensate for the simulation-reality gap, uniform noise is added to all sensor readings, motor outputs and the position of the robots (see Ligot & Birattari, 2020, for a similar approach).

Fig. 1
figure 1

a The simulated arena with the robots engaged in the perceptual discrimination task. b Image of a physical e-puck2 robot. c The architecture of the dynamic neural network that underpins the opinion selection in each robot

3.2 The controllers

In order to reduce the complexity of the control design process, we opted for a modular approach in which a hand-coded algorithm allows each robot to develop a pseudo-random walk within the arena while avoiding obstacles. A dynamic neural network, synthesised using evolutionary computation techniques, is responsible for the selection of each robot’s opinion. The robot movements correspond to an isotropic random walk based on straight motion and random rotation. The robots move straight for 5s at a speed of 20 \({\textrm{cm}}/{\textrm{s}}\), and turn with turning angles taken from a wrapped Cauchy distribution (Kato & Jones, 2013). The probability density function is the following:

$$\begin{aligned} f_{\omega }(\theta , \mu , \rho )=\frac{1}{2 \pi } \frac{1-\rho ^{2}}{1+\rho ^{2}-2 \rho \cos (\theta -\mu )}, \quad 0<\rho <1, \end{aligned}$$
(1)

where \(\mu\) is the average value of the distribution and \(\rho\) the skewness. With \(\rho = 0\), the wrapped Cauchy distribution becomes uniform and there is no correlation between the movement directions before and after a turn. With \(\rho = 1\), we have a Dirac distribution and the robot follows a straight line. Here, we take \(\rho =0.5\). During this behaviour, when the proximity sensors detect an obstacle (the wall or other robots), the robot stops and turns at an angle chosen uniformly in the interval \([-\pi ,\pi ]\). After turning, if there is no obstruction ahead, the robot resumes its normal random walk otherwise, it repeats the manoeuvre.

The robot’s neuro-controller, in charge of the opinion selection, is a continuous time recurrent neural network (CTRNN) (Beer, 1995) with a multi-layer topology, as shown in Fig. 1c. The input neurons \(N_{I,1}\) and \(N_{I,2}\) take the readings from the robot’s floor sensor and the eventual communication signal (1 for white-dominant, 0 for black-dominant, and 0.5 whenever there are no other robots at less than 50 cm from the robot receiver). The output neuron \(N_{O,1}\) is used to set the robot opinion, and hidden neurons \(N_{H,1}\) and \(N_{H,2}\) form a fully recurrent continuous time hidden layer. The input neurons are simple relay units, while the activation of the output neuron (O) is governed by the following equations:

$$\begin{aligned} O = \sigma \left( \left( \sum _{i=1}^2 W^{O}_{i}\; \sigma (H_i+\beta ^{H}_i)\right) + \beta ^{O}\right) , \end{aligned}$$
(2)

with \(\sigma (z) =(1+e^{-z})^{-1}\). Using terms derived from an analogy with real neurons, O and \(H_i\) are the cell potentials of respectively output neuron and hidden neuron i, \(\beta ^{O}\) and \(\beta ^{H}\) are bias terms, \(W^{O}_{i}\) is the strength of the synaptic connection from hidden neuron i to output neuron. The hidden units are governed by the following equation:

$$\begin{aligned} \tau _{j}{\dot{H}}_j = -H_{j} + \sum _{i=1}^{2} W^{H}_{ij}\sigma (H_i+\beta ^{H}_i) + \sum _{i=1}^{2} W^{I}_{ij} I_i, \end{aligned}$$
(3)

where \(\tau _j\) is the decay constant, \(W^{H}_{ij}\) is the strength of the synaptic connection from hidden neuron i to hidden neuron j, \(W^{I}_{ij}\) is the strength of the connection from input neuron i to hidden neuron j, and \(I_i\) is the intensity of the sensory perturbation on neuron i. The weights of the connection between neurons, the bias terms, and the decay constants are genetically encoded parameters. Cell potentials are set to 0 each time a network is initialised or reset. State equations are integrated using the forward Euler method with an integration step size of 0.1 s. The cell potential (O) of the output neuron \(N_{O,1}\) is used to set the robot opinion (o), which corresponds to \(o=1\) (i.e., white-dominant) when the cell potential is above the threshold 0.5, and \(o=0\) (i.e., black-dominant) otherwise. For this perceptual discrimination task, we have chosen CTRNN as robots’ controller because this type of networks can approximate any dynamical system, as shown by Funahashi and Nakamura (1993).

3.3 The evolutionary algorithm and the fitness function

A simple evolutionary algorithm using linear ranking is employed to set the parameters of the networks. The population contains 64 genotypes. Generations following the first one are produced by a combination of selection with elitism, and mutation. For each new generation, the highest scoring individual (“the elite”) from the previous generation is retained unchanged. The remainder of the new population is generated by binary tournament selection from the 70 best individuals of the old population. Each genotype is a vector comprising 15 real values (10 connections, 2 decay constants, 3 bias terms). Initially, a random population of vectors is generated by initialising each component of each genotype to values chosen uniformly random from the range [0, 1]. New genotypes, except “the elite”, are produced by applying mutation, which entails a random Gaussian offset applied to each real-valued vector component encoded in the genotype, with a probability of 0.03. The mean of the Gaussian is 0, and its standard deviation is 0.1. During evolution, all vector component values are constrained to remain within the range [0, 1].

At the beginning of each evaluation trial, each genotype is decoded into a neuro-controller. Then, the controller is cloned on each of the \(R=20\) robots forming the swarm (i.e., we use homogeneous swarms). The robots are randomly placed in the arena with a randomly chosen orientation. Each trial differs from the others in the initialisation of the random number generator, which influences the robots’ initial position and orientation and the noise added to motors and sensors. Within a trial, the swarm life-span is 200s (\(T=2000\) simulation cycles). The fitness of a genotype is the average swarm evaluation score after it has been assessed 2 times in each type of environment (i.e., twice in the black-dominant, and twice in the white-dominant environment) for a total of 4 trials. In each trial e, the swarm is rewarded by an evaluation function \(F_{e}\) which is computed as follows:

$$\begin{aligned} \displaystyle F_{e}= & {} {\left\{ \begin{array}{ll} \frac{2}{T}\sum _{t = T/2}^{T} \sum _{r = 1}^{R} o^{r}_{t} \; \text {if swarm located in a white-dominantenv.}\\ \\ \frac{2}{T}\sum _{t = T/2}^{T}\sum _{r = 1}^{R}(1 - o^{r}_{t}) \; \text {if swarm located in a black-dominantenv.} \end{array}\right. } \end{aligned}$$
(4)

where \(o_t^r\) is the opinion of the robot r at time t. Note that, within each trial, the fitness score is computed from \(T=1000\) to the end of the trial. This is to exclude from the fitness score the effects of the inevitable fluctuations in the agents’ opinion that are observed at the beginning of each trial when the agent’s controller is in the “default” initial state.

Note that, the evolutionary algorithm illustrated above is quite simple. Its working principles are relatively similar to the algorithm referred to as EvoStick by Hasselmann et al. (2021). When applied to the design of controllers for robotic swarms, this family of algorithms has been proven to be as efficient as other more complex and more modern algorithms developed by the evolutionary computation community (Hasselmann et al., 2021). In view of this evidence, in this study, we opted for a simple and inexpensive design method in lieu of a modern, more complex, and eventually computationally more expensive design algorithm.

4 Results

We have performed 20 differently seeded evolutionary simulation runs. Each run lasted 500 generations. At the end of the evolutionary phase, for each evolutionary run, we have re-evaluated the best groups of the last 100 generations. In these initial re-evaluation tests, each selected group undergoes a set of 100 simulation trials (i.e., 50 in a black-dominant, and 50 in a white-dominant environment) in which the performance is computed using equation (4). The very best group at these tests has been chosen to demonstrate that the neural-network based decision-making mechanisms allow a group of simulated robots to reach consensus in both types of environment. Moreover, we show that the collective response of the best evolved group: i) scales to larger groups with only minimal degradation of the performance; ii) is robust enough to deal with different types of environmental variability and modifications of the robot-robot communication system; iii) can be successfully ported on physical robots. Even though we only discuss a single group’s performance in the remainder of this section, multiple different groups generated by different evolutionary runs managed to produce similar successful results in both robustness and scalability. A more comprehensive set of results, including movies and data not shown in the paper, can be found in Almansoori et al. (2023).

The tests related to robustness and scalability are shown in a comparative framework, in which the performances of the neural-network based decision-making mechanisms (hereafter, NNM) are shown in combination with those of groups in which robots use the voter model (hereafter, VM) and the majority model (hereafter, MM), both implemented following the description given by Valentini et al. (2016). Below, we provide a brief description of both the VM and the MM models. For a detailed illustration, we refer the reader to Valentini et al. (2016). In all the post-evaluation tests, unless otherwise indicated, the group size is 20, the most represented colour covers 55% of the arena floor (the other 45% is left to the opposite colour), and randomly seeded trials last 400s in which the robots move randomly as illustrated in Sect. 3.2. Moreover, the communication range is assumed to be 50 cm for all post-evaluation tests, except for those specifically stated otherwise. We also remind the reader that, regardless of the nature of the mechanisms used to update the individual opinions, the objective of the group in all the post-evaluation tests is to reach consensus; that is, all robots have to share the same correct opinion concerning the most represented colour of the arena floor, and this state has to last for at least 10s.

Fig. 2
figure 2

Box plot showing the number of robots with the correct opinion in the white-dominant environment (see white boxes) and in the black-dominant environment (see grey boxes) at regular time intervals of 10s until 200s of simulation, and then every 20s until the trial end. Each box is made of 50 points (corresponding to 50 differently seeded trials). Boxes represent the inter-quartile range of the data, while horizontal bars inside the boxes mark the median value. The whiskers extend to the most extreme data points within 1.5 times the inter-quartile range from the box

4.1 Quality of performance of the best evolved group

The first post-evaluation test refers to the dynamics of the collective behaviour with respect to the perceptual discrimination task. In particular, we show how the number of simulated robots holding the correct opinion evolves from trial start to trial end, for our best group in which robots use NNM to select their individual opinions. The results of this initial test are shown in Fig. 2, where white boxes refer to the number of robots with the correct opinion in the white-dominant environment, and grey boxes to the number of robots with the correct opinion in the black-dominant environment. Each box is made of 50 points generated by 50 randomly seeded trials. First, we notice that, in both types of environment, at the trial start, all robots hold the same opinion for white. This is due to a genetic bias quite frequently observed in binary collective and single-robot decision problems in which the robots’ opinion is generated by an artificial neural network (see Tuci, Quinn, & Harvey, 2002, for example). This bias does not influence the overall accuracy of the collective decision since, in both types of environment, the group repeatedly reaches the correct consensus (see Fig. 2 grey and white boxes from 200s). However, clear differences emerge between the decision dynamics characterising the group response in each type of environment. In particular, in the black-dominant environment (see Fig. 2 grey boxes), the group swiftly moves from all robots holding opinion white to a state in which the majority of the robots hold opinion black. This state persists within the group from 10s to the trial end. In the white-dominant environment (see Fig. 2 white boxes), the group moves to a state in which roughly half of the robots hold opinion white and half opinion black. The number of robots holding opinion white progressively increases but the state of consensus to the correct opinion (in this case, white) takes longer to emerge than in the black-dominant environment. Moreover, in the white-dominant environment, the variability in the number of robots holding the correct opinion persists slightly longer than in the black-dominant environment (see Fig. 2 between 180 and 200 s).

Fig. 3
figure 3

Box plot showing the number of robots with the correct opinion in the white-dominant environment (see white boxes) and in the black-dominant environment (see grey boxes) at regular time intervals of 10s until 200s of simulation, and then every 20s until the trial end. Each box is made of 50 points (corresponding to 50 differently seeded trials). In this test, the communication signals are randomly generated

In order to rule out the hypothesis that consensus is simply the result of the combination of individual decisions rather than the outcome of a collective process based on communication, we have repeated the above described post-evaluation test by replacing the content of the communication signals generated by the robots of the very best group with randomly generated signals. In this test, the group never manages to reach a consensus in any of the two environments (see Fig. 3). Based on this evidence, we conclude that the strategies of the NNM are genuinely collective and cooperative.

4.2 Robustness and scalability of the best evolved group

In this section, we show a series of post-evaluation tests, in simulation, aimed to evaluate the robustness of the best evolved collective strategy under conditions that differ from those that the group experienced during the design phase. We also show the results of scalability tests with groups made of up to 500 robots. All these tests are illustrated in a comparative framework, since the evaluations carried out on the group in which robots are equipped with the best evolved neural-network based decision-making mechanisms (NNM) are repeated on a group in which robots change opinions based on the hand-coded rules of the Voter Model (VM) and the Majority Model (MM). As shown in Sect. 2, several research works have recently examined issues concerning collective decision-making in swarm robotics. Most of these works are based on the use of the Voter and/or the Majority model. Different studies implement these models in slightly different ways. Here, we use the implementation illustrated by Valentini et al. (2016) as a reference.

In both VM and MM, the development of individual opinions is regulated by a finite state machine with two states for each available option (i.e., exploration for black \({\mathcal {E}}_{black}\), exploration for white \({\mathcal {E}}_{white}\), dissemination for black \({\mathcal {D}}_{black}\), and dissemination for white \({\mathcal {D}}_{white}\)). As in Valentini et al. (2016), at the trial start, half of the robots hold opinion white and are in state \({\mathcal {E}}_{white}\) and half of the robots hold opinion black and are in state \({\mathcal {E}}_{black}\). During exploration, each robot samples the floor’s colour using its floor sensors. The exploration state lasts for an amount of time randomly chosen from an exponential distribution with mean \(\sigma = 10\) s. At the end of the exploration state, each robot calculates its quality estimate \(p_i\) (with i \(\in \{black, white\}\)) corresponding to the proportion of the exploration time during which the floor colour matched its current opinion. Then, it transits to the dissemination state corresponding to its current opinion (i.e., \({\mathcal {D}}_{black}\), or \({\mathcal {D}}_{white}\)). In the dissemination state, each robot disseminates its current opinion to its neighbours for a period randomly chosen from an exponential distribution with mean \(\sigma\) = 10 s multiplied by its previously computed quality estimation \(p_i\). In the last two seconds of the dissemination state, each robot collects the opinions of its neighbours within communication distance and updates its opinion using either the VM or MM. In the case of the VM, each robot randomly adopts one of the last two received opinions. In the MM, each robot adopts the opinion held by the majority of its neighbours. Finally, each robot moves to the exploration state that corresponds to its new opinion (i.e., either \({\mathcal {E}}_{black}\), or \({\mathcal {E}}_{white}\)).

The first robustness test concerns the quality of the performance for groups controlled by the VM, the MM, and NNM model in three testing conditions differing in the proportion of the black and white area of the floor. In condition A, the most represented colour covers 55% of the floor, in B 60%, and in C 66%. The collective perception task becomes progressively easier for the swarm starting from condition A, in which the difference between the percentage of the floor surface covered by the two colours is at its minimum, to condition C, in which this difference is at its maximum value. For each testing condition, the swarm completes 50 trials. A trial starts with the 20 robots randomly positioned in the arena and terminates after 400 s, during which the robots perform a random walk illustrated in Sect. 3.2. For NNM controlled swarm, the trials in each type of condition are repeated for black and for white as colour dominant. For VM and MM controlled swarms, the trials are executed only for black colour dominant since the decision-making mechanisms characterising these models are functionally symmetric with respect to the colour. The quality of the performances is expressed in terms of accuracy (i.e., the number of trials in which the group reached the state of consensus to the correct opinion for at least 10s, see Fig. 4a) and time to convergence to consensus (see Fig. 4b).

Fig. 4
figure 4

Results of the first robustness tests in three testing conditions differed in the proportion of the black and white areas of the floor. In A, the most represented colour covers 55% of the floor, in B 60%, and in C 66%. a Bar plots showing accuracy, that is the number of trials (over 50 trials) in which the group reached the consensus state for at least 10s. b Box plots showing time to converge to consensus. The time to convergence to consensus is calculated over successful trials only. The asterisks in b refer to the p-values at the pairwise Mann–Whitney U test, with \(*\) for \(p < 0.05\), \(**\) for \(p < 0.01\), \(***\) for \(p < 0.001\), and \(****\) for \(p < 0.0001\). In both graphs, light grey and dark grey refers to the VM and MM controlled swarms in black-dominant environments, black and white refer to the NNM controlled swarm in black-dominant and in white dominant environment, respectively

Looking at Fig. 4a, we can notice that for all three types of decision-making mechanisms, as expected, the accuracy progressively increases from condition A to condition C. The most interesting result refers to the fact that the NNM controlled swarm is more accurate than the VM and the MM controlled swarms in all conditions. Moreover, the NNM controlled swarm is equally effective in both the black-dominant and white-dominant environments. For each condition, we run the \({\tilde{\chi }}^2\) test of independence to evaluate the relationship between the categorical variables “type of mechanisms” (i.e., hand-coded mechanisms, that is VM plus MM, and neural-network based mechanisms, that is NNM) and the “performance at this perceptual discrimination task” (i.e., number of successes, number of failures). Based on the results of the test, in all conditions, we reject the hypothesis that the variable type of mechanisms and the variable performance at the perceptual discrimination task are independent with \(p < 0.001\). In terms of time to convergence to consensus (see Fig. 4b), the results of this test indicate that: i) in all conditions, the NNM controlled swarm requires significantly less time than the VM and the MM controlled swarms to converge to consensus (i.e., the NNM converge significantly faster than VM and MM at the pairwise Mann–Whitney U test, see p values in Fig. 4b); ii) the time to converge to consensus tends to decrease for all models (except for the VM and MM in C) while the task becomes progressively easier; iii) the NNM controlled swarm, in all conditions, takes more time to reach consensus in white-dominant than in black-dominant environments. This is due to the genetic bias discussed in Sect. 4.1, whose beneficial effects on time to convergence to consensus in a black-dominant environment become progressively larger while the collective discrimination task becomes progressively easier.

Fig. 5
figure 5

Graphs showing a accuracy (i.e., the number of trials in which the group reaches consensus state for at least 10 s), and b time to convergence to consensus over 50 trials in tests in which we vary the maximum distance for robot-robot communication between 10cm to 70cm. The time to convergence to consensus is calculated over successful trials only. The asterisks in b refer to the p-values at the pairwise Mann–Whitney U test, with \(*\) for \(p < 0.05\), \(**\) for \(p < 0.01\), \(***\) for \(p < 0.001\), and \(****\) for \(p < 0.0001\). In all graphs, light grey and dark grey bars/boxes refer to the performances of VM and MM controlled swarms in a black-dominant environment, respectively. Black and white bars/boxes refer to the performances of NNM controlled swarms in a black-dominant and in the white-dominant environment, respectively

The second robustness test concerns the robot-robot communication distance. For swarms controlled by NNM, VM, and MM, we have recorded the accuracy (i.e., the number of trials, over 50 trials, in which the group reaches consensus state for at least 10s) and time to convergence to consensus in seven different experimental setups, which differ in terms of the maximum robot-robot distance beyond which communication is not possible. In particular, we have varied the communication distance from 10cm to 70cm at 10cm intervals. In this test, the most represented colour covers 55% of the arena floor. In terms of accuracy (see Fig. 5a), we see that, in all three models (VM, MM, and NNM controlled swarms), the general trend is for progressively better accuracy while increasing the robot-robot communication distance. In other words, all models rely on communication signals to reach a consensus. When the robot-robot communication distance is shortened with respect to the design phase—where it was set to 50 cm—and the total number of communication events within a trial progressively reduces, the valuable contribution of social influence to the development of individuals’ opinion tends to decrease, and the swarm becomes progressively less effective. However, the NNM controlled swarm is less touched by this disruption than the VM and MM controlled swarms. Note that, for NNM, by increasing the communication distance from 10cm to 20cm, the number of communication events that each robot experiences increases up to a point in which social interactions largely improve the group accuracy in both types of environment. For VM and MM, the accuracy falls below 40% for MM and below 60% for VM already at 40 cm communication distance (see Fig. 5a, light grey and dark grey bars). Note also that, the accuracy metric is subject to inherent variability given by the robots’ initial random position and the random noise applied to the robots’ sensors and actuators as explained in Sect. 3. This can explain why the MM model does not improve its performances (i.e., the accuracy) beyond the 50 cm robot-robot communication distance (see Fig. 5a, dark grey bars). Another reason could be that MM is known to be relatively fast in generating a consensus compared to VM, but less accurate than VM. For communication distances longer than 10 cm, we can exclude that the categorical variables, type of mechanisms (i.e., the hand-coded VM plus MM, and the neural-network based NNM) and performance at the perceptual discrimination task (i.e., num. successes, num. failure) are independent with \(p < 0.001\) at the \({\tilde{\chi }}^2\) test of independence. For VM and MM, concerning the time to convergence, it results that the shortest the maximum communication distance, the longer the time required to converge to consensus (see box plots in Fig. 5b light grey and dark boxes). This clear performance difference between NNM and hand-coded decision-making rules (i.e., VM and MM) is determined by the fact that in the later models, the robots’ opinions can only be changed through social influence (i.e., through communication), since the perceptual evidence generated by the floor sensor is exclusively used to set the dissemination time of robots own opinion. For robots using the NNM, both the perceptual experience generated by the floor sensors and the social influence generated by the robot-robot communication contribute to the development of the robots own opinion. The analysis of the operational principles underpinning the development of the individual decision is beyond the scope of this paper, which primarily focuses on the estimation of the quality of the NNM controlled swarm performances, and on the robustness and the scalability of the evolved collective responses. Nevertheless, we have collected some evidence indicating that individual robots equipped with the NNM do not comply with the simplest possible decision rule, that is the voter model (VM), by which individuals assume the opinion of a randomly selected group mate. The robots using NNM integrate social evidence (i.e., communication signals) and individual experience (i.e., the readings of the floor sensor) over time in a rather complex way. The results of this analysis can be found in Almansoori et al. (2023). A more detailed investigation on the nature of the neural network based decision-making mechanisms is left to future work.

As far as it concerns failed trials at both the first and the second robustness test illustrated above, we observed that, while for NNM the totality of failure are due to the fact that the swarm missed to reach a consensus, for VM and MM failure are determined by a mix of failure to reach a consensus and a convergence to the wrong consensus. We also observed that, for VM and MM at the second robustness test, for communication distances greater than 50 cm, the failed trials almost entirely resulted from a convergence to the wrong consensus.

Fig. 6
figure 6

Bar plots showing the results of the adaptivity tests for the VM controlled group in the black-dominant followed by a white-dominant environment (see light grey bars), for the MM controlled group in the black-dominant followed by a white-dominant environment (see dark grey bars), for the NNM controlled group in the black-dominant followed by a white-dominant environment (see black bars), and for the NNM controlled group in the white-dominant followed by a black-dominant environment (see white bars). For each group, the first bar refers to the number of trials (over 50 trials) in which the group reaches the correct consensus (i.e., 10 s with all robots sharing the correct opinion) at least once within the first 400 s of simulation; the second bar refers to the number of trials (over 50 trials) in which the group reaches, at least once, the correct consensus during the second half of the trial (i.e., from \(t= 400\) s to \(t= 800\) s). For each type of group, the evaluations are repeated in three testing conditions that differ in the proportion of the black and white area of the floor. In A, the most represented colour covers 55% of the floor, in B 60%, and in C 66%

Figure 6 shows the results of the adaptivity test, in which the type of the environment is changed abruptly at the half trial time (i.e., at \(t = 400\) s). While the VM and the MM controlled swarms are tested only on the change from the black-dominant to white-dominant environment (see Fig. 6 light grey, and dark grey bars for VM and MM controlled swarms, respectively), the NNM controlled swarms are tested on both possible changes, from the black-dominant to the white-dominant (see Fig. 6 black bars), and from the white-dominant to black-dominant environment (see Fig. 6 white bars). For each group, the first bar refers to the number of trials (over 50 trials) in which the group reaches the correct consensus (i.e., 10 s with all robots sharing the correct opinion) at least once within the first 400 s of the trial; the second bar refers to the number of trials (over 50 trials) in which the group reaches, at least once, the correct consensus during the second half of the trial (i.e., from \(t= 400\) s to \(t= 800\) s). For each type of group, the evaluations are repeated in three testing conditions that differ in the proportion of the black and white area of the floor. In condition A, the most represented colour covers 55% of the floor, in B 60%, and in C 66%. The graph clearly indicates that the NNM controlled swarms can easily cope with the environmental change by switching from the consensus to a colour at t = 400 s to a consensus to the opposite colour in about 400 s intervals. Contrary to the NNM controlled swarm, the VM and the MM controlled swarms cannot adapt to environmental change. Once the consensus is attained regarding the dominant colour in the first encountered type of environment, in none of the trials, the VM and the MM controlled swarms manage to change their opinions following the change of type of environment. This is another limitation of these implementations of the VM and the MM models caused by the fact that the robots’ opinion changing process is exclusively triggered by the social influence  (see also Divband Soorati et al., 2019, Prasetyo et al., 2019, Talamali et al.,2021, for more on this issue).

Table 1 Arena dimensions for each swarm size
Fig. 7
figure 7

Graphs showing a the number of trials (over 50 trials) in which the group reaches the correct consensus (i.e., 10s with all robots sharing the correct opinion, that is the accuracy metric), and b time to convergence to consensus. In both graphs, the swarm size is indicated on the x-axis. The time to convergence to consensus is calculated over successful trials only. The asterisks in b refer to the p-values at the pairwise Mann–Whitney U test, with \(*\) for \(p < 0.05\), \(**\) for \(p < 0.01\), \(***\) for \(p < 0.001\), and \(****\) for \(p < 0.0001\). In all graphs, light grey and dark grey bars/boxes refer to the VM and MM controlled swarms in black-dominant environments, respectively. Black and white bars/boxes refer to the NNM controlled swarms in black-dominant and in white-dominant environment, respectively

Finally, we run some tests to evaluate the accuracy and time to convergence to consensus of the different decision-making models on a scalability test. That is, we vary the swarm size from 20 (i.e., the swarm cardinality during the design phase) to 500 robots. The results are shown in Fig. 7 where in both graphs, light grey and dark grey bars/boxes refer to the VM and MM controlled swarms in a black-dominant environment; black and white bars/boxes refer to the NNM controlled swarm in black-dominant and in white-dominant environments, respectively. Note that, in these tests, we modified the arena size to keep constant the density of robots within the arena (5 robots/m\(^2\)). The dimensions of the environment for each swarm size are listed in Table 1. The graph in Fig. 7a, illustrating the accuracy of the group performance, shows a clear difference between the NNM and the VM and MM controlled swarms. While the NNM controlled swarm keeps the accuracy roughly stable (greater than or equal to 80% in both types of the environment) even with the largest swarm made of 500 robots, the VM and MM controlled swarms suffered a harsh drop in their performance when the swarm cardinality is increased. For all different group sizes, we can exclude that the categorical variables, type of mechanisms (i.e., the hand-coded VM plus MM, and the neural-network based NNM) and performance at the perceptual discrimination task (i.e., num. successes, num. failure) are independent with \(p < 0.001\) at the \({\tilde{\chi }}^2\) test of independence. The NNM controlled swarm accuracy is attained with a slight increase in the time of convergence to consensus for progressively larger groups (see Fig. 7b). Based on this evidence, we conclude that, at least for the conditions we considered in this scalability test, there is no need to train the NNM independently for different group sizes to keep the accuracy above the threshold of 80% success rate.

4.3 Experiments with physical e-puck robots

In order to ecologically validate the neural network based decision-making mechanisms synthesised using artificial evolution, we ported one of the evolved solutions on physical e-puck2 robots. Since we dispose of only 10 e-puck2, the arena has been resized to \(140\times 140\) cm to keep the density to 5 robots/m\(^2\) as in the simulation. Physical robots communicate using the range & bearing board calibrated to process only signals sent from robots at less than 50 cm distance from the receiver. The e-pucks move at a maximum speed of 12 cm/s according to the random walk illustrated in Sect. 3.2. We performed 30 trials in the black-dominant and 30 trials in the white-dominant environment. As for the design phase, each trial lasted 200 s. For the initialisation of the e-pucks2 positions within the arena, in all 60 trials (i.e., 30 trials in the black-dominant and 30 in the white-dominant environment) 5 robots are placed in the centre of randomly chosen black tiles and 5 robots are placed in the centre of randomly chosen white tiles.

Fig. 8
figure 8

Box plot showing the number of physical robots with the correct opinion in the white-dominant environment (see white boxes) and in the black-dominant environment (see grey boxes) at regular time intervals of 10 s until 200 s of simulation. Each box is made 30 points (corresponding to 30 differently seeded trials)

The results of the test with physical e-pucks are shown in Fig. 8, where white boxes refer to the number of robots holding the correct opinion in the white-dominant environment, and the black boxes to the number of robots holding the correct opinion in a black-dominant environment. In both environments, the group manages to successfully reach consensus in a shorter time than the one observed with simulated robots in Fig. 2. This is due to the swarm cardinality. Smaller groups reach consensus quicker as also shown in the scalability test (see Fig. 7). Moreover, the genetic bias has a less prominent effect than in the simulated group on the development of the group dynamics. This could also be a side effect of the group cardinality. However, further evaluation tests are needed to investigate how group cardinality interacts with the development of group dynamics. Based on this evidence, we claim that the neural-network based decision-making mechanisms are robust enough to successfully cope with the simulation-reality gap by overcoming the noise inherent in the physical world setup.Footnote 1

5 Conclusion

We have described a swarm robotics study focussed on the design of the robots individual mechanisms to allow a swarm to reach a consensus on a collective perceptual discrimination task. The main contribution of this study is the demonstration that a relatively small dynamic neural network, consisting of only five neurons, is sufficient to generate the mechanisms required by the robots of a homogeneous swarm to collectively decide which colour (between two) is predominant on the arena floor, despite the fact that the robots’ knowledge of the environment is partial (i.e., each robot can only explore a relatively small portion of the arena floor within the time allowed for exploration) and communication is local (i.e., social interactions are only possible between robots at less than 50 cm distance from each other).

We have extensively compared the performances of swarms in which robots update their opinion with neural-network based decision-making mechanism (NNM) with the performances of swarms in which robots update their opinion using hand-coded opinion selection models, such as the voter model (VM) and the majority model (MM). The comparative tests have shown that, the neural model is more effective than the classic hand-coded models in a set of environmental conditions generated by varying the level of difficulty of the perceptual discrimination task, by varying the maximum distance for robot-robot communication, and also by dynamically varying the quality of the options while the swarm has already reached a consensus. We have also shown that the performances of a swarm controlled by the neural model are less touched by variations in the swarm size than those of a swarm controlled by the hand-coded models. By successfully porting one of the developed solutions to the physical e-puck2 robots, we have demonstrated that neural network-based decision-making mechanisms are strong enough to successfully cope with the simulation-reality gap by overcoming the noise inherent in physical systems.

We have accounted for the performance differences between the NNM model and the other two models by highlighting a substantial difference in the way in which the perceptual evidence and the social influence contribute to the generation of the individual robots’ opinions in each model. In the NNM, the decision-making mechanisms are such that the individual perceptual evidence and the effects of the social influence are combined and integrated over time to develop the robots’ opinions. In VM and MM, the decision-making mechanisms are designed in a way that only the social influence has a direct effect on the way in which the robots’ opinion is formed and updated over time.

Although limited to this specific collective perceptual discrimination task, the design method employed in this study has been successful in synthesising individual opinion selection mechanisms without requiring strong a priori assumptions from the swarm designer on the nature of the robot operational principles. At the same time, we have demonstrated that this alternative design approach enhances the robustness, adaptability, and scalability of the swarm dynamics. For the future, we will research on the possibility to integrate within a single neural structure all the mechanisms underpinning the behavioural repertoire of the robots. In particular, we aim at synthesising neural based controllers that underpin both the movement and the opinion selection process. This could potentially improve the adaptability of the individual and of the group responses to the characteristics of the environment, since by directly controlling the movement, each robot can exert control on the flux of sensory stimulation and consequently on the process of quality estimation of the alternative options. At the same time, we plan to evaluate the potential of neural-network based decision making mechanisms in generating consensus in a broader and more complex set of decision making scenarios that will be selected by increasing the number of options, and by systematically varying the symmetrical/asymmetrical relationships between cost and quality of different options as discussed in Valentini et al. (2017).