Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

When a distributed system is composed of a large number of relatively incapable and poorly informed components, which is generally the case for robot swarms [2], the limitations of single individuals can be overcome by aggregating and processing the information collectively; in fact, by making collective decisions [1, 5, 9]. In addition to its accuracy and to the time it takes to make a decision [14], the success of a collective decision-making strategy can also be measured by the extent at which it can be generalized across different problem scenarios. The generality of a strategy allows the designer to reuse its high-level control logic in different problem scenarios and, while doing so, to focus only on the implementation of domain-specific, low-level control routines.

In this paper, we propose a novel decision-making scenario referred to as the collective perception problem and use this scenario to investigate the generality of two previously proposed strategies—the Direct Modulation of Majority-based Decisions (DMMD) [13, 14] and the Direct Modulation of Voter-based Decisions (DMVD)Footnote 1 [15]. In the collective perception scenario, a swarm of robots is required to explore an environment and evaluate the frequency of certain features that are scattered across it (e.g., the availability of precious metals, the presence of pollutants or cancer cells) with the objective to determine which feature is the most frequent. In our performance comparison, we also consider a third non-self-organizing decision-making strategy that we called Direct Comparison (DC). In this strategy, we allow robots to share a larger amount of information (i.e., quality estimates) and, based on this information, to modify their opinions by comparing their quality estimate with those of their neighbors. The DC strategy is representative of a class of more informed strategies which are generally expected to outperform self-organized approaches. Our aim is to use the DC strategy as a reference to highlight the benefits of self-organized approaches.

Previous studies focused on providing robots with the means to determine features of individual objects or specific regions in the environment. In [6], the authors develop a strategy that allows robots to individually and locally evaluate the shape of an object using their IR sensors and then to perform distributed sensor fusion with the aim of achieving collective perception. Schmickl et al. [11] propose two strategies, a hop-count strategy and a Trophallaxis-inspired strategy, that allow a swarm of robots to collectively perceive which area in the environment is the largest. Tarapore et al. [12] propose instead a control strategy that is inspired by the adaptive immune system of vertebrates; this strategy allows a swarm of agents to collectively discriminate between dangerous and non-dangerous objects and to adopt appropriate actions (e.g., tolerate or clear out the objects). Finally, Mermoud et al. [7] develop an aggregation-based strategy that allows robots to collectively perceive the type of a spot (i.e., good or bad) and to destroy the spots that have been perceived by the swarm as bad.

The DMMD and DMVD strategies were originally proposed for a site-selection scenario. The goal of the swarm in the site-selection scenario is to select the best location of the environment where to nest [14, 15]. In this paper, we support the generality of these strategies by implementing them for the collective perception scenario and comparing their performance by means of robot experiments and physics-based simulations. We use a swarm of \(N=20\) e-pucks [8] and study the performance of each strategy over two different problem setups representing a simple and a difficult decision-making problem. After successfully demonstrating the generality of the DMMD and DMVD strategies through robot experiments, we deepen our analysis using physics-based simulations. We implement the collective perception scenario using the ARGoS simulator [10] and use this setup to show that the self-organized mechanisms of the DMMD and DMVD strategies allow these strategies to sustain high levels of noise that would instead prevent the use of the more informed DC strategy.

Fig. 1.
figure 1

Illustration of the robotic platform and the experimental setup. (a) the e-puck robot with details of its hardware. (b) top-view picture of the collective perception scenario with two features represented by the black and white colors.

2 Robotic Platform and Experimental Setup

We performed experiments using the e-puck robotic platform [8]. The e-puck, shown in Fig. 1a, is a popular robotic platform within the community of swarm robotics and has been been adopted in a large number of experimental studies. It is a commercially available robot designed for research and education with a diameter of 7 cm and a battery autonomy of up to 45 min. The e-puck is a wheeled robot that can move with a maximum speed of 16 cm/s. In its basic configuration, the robot is equipped with RGB LEDs, a low-resolution camera, an accelerometer, a sound sensor, and 8 proximity sensors. Figure 1a shows the e-puck configuration used in our experiments where the robot is extended with the range & bearing IR communication module [4], the ground sensor, the Overo Gumstick module, and the omnidirectional turret (not used in these experiments). In our experiments, the robots use the range & bearing module to share their information locally with their neighbors (e.g., internal state, quality estimate). This module consists of 12 IR transceivers positioned around the circumference of the robot that allow it to send and receive messages up to a distance of approximately 70 cm. Additionally, the e-puck mounts 8 IR proximity sensors that are used to detect the presence and measure the distance of nearby obstacles. The e-puck has 3 ground sensors that allow it to measure gray-scale values of the surface. Finally, the Overo Gumstick module provides the e-puck with the capabilities to run Linux and with a Wi-Fi connection.

We consider a collective perception scenario characterized by an environment with \(n=2\) features (see Fig. 1b). The robots are positioned in a square arena with a total area of \(200\times 200\) cm\(^2\). The environment is approximately three orders of magnitude larger than a single robot footprint. It is bounded by four walls that can be detected by the proximity sensors of the robots. The surface of the environment is characterized by a grid consisting of \(10 \times 10\) cm\(^2\) cells. The color of each cell is used as an abstraction to represent a particular feature of the environment. Robots always have an opinion about which feature they currently believe to be the most frequent. In particular, the color black represents the feature of the environment associated to opinion a while the color white represents the feature of the environment associated to opinion b. Without loss of generality, we always have the black feature as the most frequent in the environment and, as a consequence, the goal of the swarm is to make a collective decision favoring opinion a. Each robot of the swarm uses its LEDs to show its current opinion: LEDs are lighted up in red when the robot favors opinion a and in blue when the robot favors opinion b. The robots use their ground sensors to perceive the brightness of the underlying surface, determine its color, and estimate the quality of the corresponding option.

3 Robot Control Algorithm

In this section, we describe the three collective decision-making strategies used in our performance comparison (i.e., the DMMD, DMVD, and DC strategies). All three strategies rely on common low-level control routines (i.e., random walk, obstacle avoidance, and quality estimation) that are described in Sect. 3.1. In Sect. 3.2, we describe the DMMD strategy and the DMVD strategy. Section 3.3 provides instead the description of the DC strategy.

3.1 Low-Level Motion Routines

We implemented a random walk routine as follows. A robot performing random walk alternates between straight motion and rotation on the spot. The robot moves straight for a random period of time with a mean duration of 40 s that is sampled from an exponential distribution. After this period of time, the robot turns on the spot for a random period of time that is uniformly distributed between 0 s and 4.5 s. The turning direction is also chosen randomly. With equal probability, the robot turns clockwise or counterclockwise. Once turning is completed, the robot resumes straight motion.

The detection by a robot of one or more nearby obstacles (i.e., a wall or a neighboring robot at a distance less than approximately 30 cm) causes the execution of the random walk to be paused and triggers the obstacle avoidance routine. We implemented the obstacle avoidance routine as follows. The robot uses its proximity sensors to detect the distance and the bearing of each perceived obstacle. It then uses this information to compute a new direction of motion that is opposite to the obstacles. Depending on the computed direction, the robot turns on the spot either clockwise or counterclockwise until its orientation corresponds to the computed one. Then, the robot resumes its random walk.

We implemented the following quality estimation routine to let a robot estimate the quality \(\rho _i\) of the feature associated to its opinion \(i \in \{a,b\}\). When executing the quality estimation routine, the robot uses its ground sensors to sample the color of the surface while moving randomly in the environment. During the entire execution of the quality estimation routine, the robot keeps track of the amount of time \(\tau _i\) during which it perceived the color associated to its current opinion i. Finally, the robot computes a quality estimate \(\hat{\rho }_i\) which is the ratio of \(\tau _i\) to the overall duration of the quality estimation routine.

3.2 DMMD and DMVD Strategies

The DMMD strategy and the DMVD strategy are characterized by a common structure of the robot controller that is implemented as a probabilistic finite-state machine (PFSM) and differ only in the individual decision-making mechanism used by robots to reconsider their current opinionsFootnote 2. The DMMD strategy uses the majority rule whereby a robot takes the opinion that is favored by the majority of its neighbors (including its own current opinion). On the other hand, the DMVD strategy uses the voter model whereby a robot adopts the opinion of a random neighbor. The PFSM of these strategies consists of two control states, the dissemination state \(\mathsf {D}_i\) and the exploration state \(\mathsf {E}_i\), that are replicated for both options of the decision-making problem (i.e., a total of four control states).

In the exploration states \(\mathsf {E}_i\), \(i \in \{a,b\}\), a robot with opinion i explores the environment by performing the random walk routine and, when necessary, the obstacle avoidance routine. Meanwhile, the robot samples the environment locally and estimates the option quality \(\rho _i\) by executing the quality estimation routine. The duration of the exploration state is random and exponentially distributed with a mean duration of \(\sigma \) s (see [14] for details). After this period of time, the robot switches to the dissemination state \(\mathsf {D}_i\).

In the dissemination states \(\mathsf {D}_i\), \(i \in \{a,b\}\), a robot with opinion i broadcasts its opinion locally to its neighbors. Meanwhile, the robot performs the same random walk and obstacle avoidance routines as in the exploration states. The aim of this motion pattern, however, is not to explore the environment but to mix the positions of robots of different opinions in the environment which eases the decision-making process. The robot uses its current quality estimate \(\hat{\rho }_i\) to amplify or inhibit the duration of the dissemination state \(\mathsf {D}_i\) in a way that this duration is proportional to its estimated quality. This modulation promotes the spread of the best opinion. To do so, the duration of the dissemination state is exponentially distributed with mean \(\hat{\rho }_i g\) sec, where g is a design parameter that defines the unbiased dissemination time. This modulation allows robots with better opinions (i.e., with higher quality estimates \(\hat{\rho }_i\)) to increase their chances to influence other robots. Finally, the robot collects the opinions broadcast by its neighbors and applies the individual decision-making mechanism (either the majority rule in the DMMD strategy or the voter model in the DMVD strategy) to determine its new opinion \(j \in \{a,b\}\). Then, the robot switches to the exploration state \(E_j\) to collect a new estimate \(\hat{\rho }_j\) of the option quality.

3.3 Direct Comparison of Option Quality

We define a third decision-making strategy, the direct comparison (DC) of option quality, by using the same PFSM of the DMMD and DMVD strategies but letting robots compare their quality estimates directly to modify their opinion. When executing the DC strategy, robots alternate periods of exploration to periods of dissemination. In contrast to the DMVD and DMMD strategies, the DC strategy does not make use of a mechanism for the modulation of positive feedback and the mean duration of the dissemination state \(\mathsf {D}_i\), \(i \in \{a,b\}\), is g, independently of the option quality \(\rho _i\). During the dissemination period, the robot broadcasts also its current estimate \(\hat{\rho }_i\) in addition to its opinion i. This additional information is used by robots to modify their opinions. At the end of the dissemination state, a robot with opinion i compares its opinion with that of a random neighbor with opinion \(j \in \{a,b\}\). If the neighbor’s estimate \(\hat{\rho }_j\) is greater than the considered robot’s estimate \(\hat{\rho }_i\), then robot modifies its current opinion to j. Next, the robot switches to the exploration state \(\mathsf {E}_j\) which is implemented identically to that of the DMMD and DMVD strategies. Our aim is to use the DC strategy to show the benefits of a self-organized approach. Indeed, the ability of the DMMD and DMVD strategies to discriminate different options is based on the self-organized processing of a multitude of individual quality estimates by the swarm [3]. These quality estimates are processed by modulating positive feedback in combination with a decision-making mechanism that operates on opinions of neighbors only.

4 Experiments

Given the collective perception scenario described in Sect. 2, we perform experiments using the DMVD, DMMD, and DC strategies. We look at the number \(D_i\), \(i\in \{a,b\}\), of robots in the dissemination state \(\mathsf {D}_i\) and at the number \(E_i\), \(i\in \{a,b\}\), of robots in the exploration state \(\mathsf {E}_i\) and define consensus as \(D_i+E_i=N\) for any \(i\in \{a,b\}\). We measure the strategies’ speed using the average time \(T_N\) necessary for the swarm to reach consensus on any opinion, and the strategies’ accuracy using the exit probability \(E_N\), that is, the probability to make the best decision, computed as the proportion of runs that converge to consensus on opinion a. We first perform experiments using a swarm of \(N=20\) e-pucks in two different setups representing a simple and a more difficult decision-making problemFootnote 3. Then, we deepen our experimental analysis by means of physics-based simulations implemented using the ARGoS simulator [10]. In both robot experiments and physics-based simulations, we set robots to start the execution of their controllers in the exploration state. Additionally, we set the mean duration of the exploration state to \(\sigma =10\) s and the unbiased duration of the dissemination state to \(g=10\) s.

4.1 Robot Experiments

We considered two different experimental setups for the collective perception problem. The first setup represents a simple decision-making problem where the proportion of resource a (i.e., color black) in the environment is approximately twice that of resource b (i.e., color white). Specifically, the surface of the environment is \(\rho _a=66\,\%\) black and \(\rho _b=34\,\%\) white and the problem difficulty is defined by the normalized option qualities \(\rho ^\star _a=1\) and \(\rho ^\star _b=\rho _b/\rho _a=0.515\). The second setup consists of a more difficult collective perception problem where the surface of the environment is \(\rho _a=52\,\%\) black and \(\rho _b=48\,\%\) white (i.e., \(\rho ^\star _a=1\) and \(\rho ^\star _b=0.923\)). For each combination of the problem setup and strategy, we performed 15 repetitions of the robot experiment. In all experiments, the swarm is initially unbiased with 10 robots in state \(\mathsf {E}_a\) and 10 robots in state \(\mathsf {E}_b\).

Fig. 2.
figure 2

The figure shows the evolution over time of the number of robots with opinion a (i.e., \(D_a+E_a\)) for the DMMD strategy (top), the DMVD strategy (middle), and the DC strategy (bottom). Gray and white box-plots show the distribution of the experimental runs converging to consensus on opinion a and consensus on opinion b. When white box-plots are not plotted, all runs converged on opinion a. The vertical lines show the average time necessary to reach consensus on any opinion. Parameters: \(\rho ^\star _a=1\), \(\rho _b^\star =0.515\).

Figure 2 shows the results of the robot experiments for the simple collective perception scenario using the DMMD strategy (top), the DMVD strategy (middle), and the DC strategy (bottom). The box-plots provide the evolution over time of the number \(D_a+E_a\) of robots with opinion a. The vertical lines indicate the average time \(T_N\) to reach consensus. When executing the DMMD strategy (see Fig. 2, top), the swarm of e-pucks requires on average \(T_N = 138\) s to converge on a consensus decision and has an accuracy of \(E_N = 0.933\) (i.e., 1 out of 15 repetitions converges to a wrong consensus on opinion b). In contrast, when executing the DMVD strategy or the DC strategy, the swarm of e-pucks is always able to identify the most frequent feature in the environment correctly (i.e., decision accuracy \(E_N=1.0\)). However, the DMVD strategy converges to consensus after \(T_N = 179.3\) s while the DC strategy is faster and requires only \(T_N=76\) s. In agreement with the results in [14], we observe that the DMMD strategy is faster but also less accurate than the DMVD strategy. For the simple experimental setup, the DC strategy benefits from using more information in the form of robots exchanging their quality estimates; it is faster than the DMMD and DMVD strategies and has the same accuracy as the DMVD strategy.

Fig. 3.
figure 3

The figure shows the evolution over time of the number of robots with opinion a (i.e., \(D_a+E_a\)) for the DMMD strategy (top), the DMVD strategy (middle), and the DC strategy (bottom). Gray and white box-plots show the distribution of the experimental runs converging to consensus on opinion a and consensus on opinion b. The vertical lines show the average time necessary to reach consensus on any opinion. Parameters: \(\rho _a=1\), \(\rho _b=0.923\).

Figure 3 shows the results of the robot experiments in the difficult collective perception scenario in which the normalized option qualities are given by \(\rho ^\star _a=1\) and \(\rho ^\star _b=0.923\). The increased difficulty of the decision-making problem overturns the results obtained by the simple experimental setup. The DMMD strategy based on the majority rule is the fastest strategy in the comparison and with an average consensus time of \(T_N=184\) s. The DMVD strategy based on the voter model is still the slowest strategy with an average consensus time of \(T_N=362\) s while the DC strategy has a consensus time of \(T_N=303.3\) s. In contrast, the DMMD strategy has the lowest accuracy, \(E_N=0.667\), reaching consensus on opinion a 10 times out of 15 repetitions (cf. [14]). The DMVD strategy, with a decision accuracy of \(E_N=0.933\), performs similarly to the DC strategy whose decision accuracy is still maximal, \(E_N=1.0\). In this more difficult collective perception scenario there is no strategy that outperforms all others in both speed and accuracy.

The communication overhead underlying the DC strategy seems to provide stronger benefits than those of the modulation of positive feedback used by the DMMD and DMVD strategies. However, a comparison of the results obtained with the simple and difficult experimental setups reveals an interesting performance trend. The increase in the difficulty of the decision-making problem resulted in a relative little slowdown of the DMMD strategy which is 1.33 times slower when compared to the simple experimental setup; the DMVD strategy is 2.02 times slower; while the DC strategy has a more pronounced slow down of 3.99 times. The DMMD strategy, with an accuracy \(28.5\,\%\) less than the simple setup, is preferable when consensus time is the most critical constraints. The DMVD strategy loses only \(6.7\,\%\) of its accuracy and its consensus time increases much less than that of the DC strategy. This trend suggests that the DMVD strategy could be the choice of reference for the designer when favoring the accuracy of the collective decision. The results of our robot experiments provide us with useful indications; however, since such experiments are particularly time-consuming, we could collect a limited amount of data (i.e., only 15 repetitions for each parameter configurations). In the next section, we deepen the results of our analysis by means of physics-based simulations.

4.2 Physics-Based Simulations

We performed physics-based simulations using the ARGoS simulator [10] and compared the performance of the DMMD, DMVD, and DC strategies over a wider region of the parameter space than what we did in the robot experiments. We varied the initial number \(\mathsf {E}_a(0)\) of robots favoring opinion a, the difficulty \(\rho ^\star _b\) of the collective perception scenario, and the swarm size N. For each parameter configuration, we performed 1000 repetitions of the simulated experiment.

Fig. 4.
figure 4

Illustration of (a) the exit probability and (b) the consensus time as a function of initial number \(\mathsf {E}_a(0)\) of robots with opinion a for the simple experimental setup. Figures (c) and (d) show the same metrics but for the difficult experimental setup. Parameters: \(\rho _a=1.0\), \(\rho _b \in \{0.515, 0.923\}\), \(N=20\).

We set \(N=20\) and study the simple and difficult scenarios defined above as a function of the initial number \(\mathsf {E}_a(0)\) of robots with opinion a (see Fig. 4). For the simple scenario, the exit probability \(E_N\) of the three strategies corresponds to that obtained in the robot experiments (cf. \(E_a(0)=10\) in Fig. 4a). For all strategies, \(E_N\) increases with increasing values of the initial number \(E_a(0)\) of robots with opinion a; the DC strategy has the highest accuracy while the DMMD strategy has the lowest. However, for all three decision-making strategies, the consensus time \(T_N\) shown in Fig. 4b is considerably shorter than that obtained with robot experiments. Additionally, the DMMD strategy is now the fastest strategy and it outperforms the DC strategy for all initial conditions \(E_a(0)\). For the difficult scenario, we observe similar differences in the speed and accuracy of the three decision-making strategies. Both the DMVD and DC strategies are considerably less accurate than in the robot experiments (see Fig. 4c). In addition, the decision accuracy of the DMMD strategy decreases more slowly than that of the DMVD and DC strategies when decreasing the value of \(E_a(0)\). As for the simple scenario, all strategies converge faster to consensus (see Fig. 4d). The DC strategy is the slowest strategy in the comparison.

The results of physics-based simulations reproduce only partially the performance obtained with robot experiments. We conjecture that the observed discrepancies are a result of differences in the level of noise between simulation and reality. For example, a few robots used during the experiments have particularly noisy proximity sensors; as a result they often collide with other robots or with the walls. Additionally, the uneven surface of the experimental arena caused robots to remain temporarily stuck over the same cell resulting in erratic quality estimates. The influence of these factors is increased by the limited number of runs performed with real robots as shown by the high variance of the consensus time characterizing the results in Figs. 2 and 3. Nonetheless, the physics-based simulations confirm a poor scalability of the DC strategy as previously shown by the robot experiments.

Fig. 5.
figure 5

Illustration of the consensus time \(T_N\) as a function of the difficulty of the decision-making problem (i.e., option quality \(\rho _b \rightarrow \rho _a\)) for a swarm of (a) \(N=20\) robots and (b) \(N=100\) robots. Parameters: \(\rho _a=1.0\), \(\rho _b \in [0.515;0.923]\), \(N \in \{20,100\}\), \(E_a(0)=N/2\), \(E_b(0)=N/2\). The vertical axis is in log-scale.

We deepen our comparison by analyzing the speed of the DMMD, DMVD, and DC strategies when varying the swarm size N and the difficulty \(\rho ^\star _b\) of the collective perception scenario (see Fig. 5). For swarms of size \(N=20\) and \(N=100\), we observe that the DC strategy is the strategy that suffers the largest loss of performance as a result of increasing problem difficulty. Additionally, by comparing Fig. 5a with Fig. 5b, we also observe that the consensus time of the DC strategy increases much faster than that of the other strategies when the size of the swarm is \(N=100\); therefore, the DC strategy does not scale with the swarm size. In contrast, the DMMD and DMVD strategies are only slightly affected by the larger swarm size.

5 Conclusion

We considered a novel decision-making scenario, collective perception, that requires a swarm of robots to explore a certain environment, perceive the presence of certain features, and determine which feature is the most frequent. We investigated the generality of the direct modulation of majority-based decision (DMMD) strategy and that of the direct modulation of voter-based decision (DMVD) strategy—two collective decision-making strategies previously proposed for the site-selection problem [14, 15]. DMMD and DMVD are modular strategies that combine a direct modulation mechanism of positive feedback (i.e., modulation of opinion dissemination) with an individual decision-making mechanism (respectively, the majority rule and the voter model). We investigated the benefit of these modules by considering a third strategy, the direct comparison of option quality (DC), that has no modulation mechanism and whose decision mechanism relies on a larger amount of information (i.e., quality estimates). Using both robot experiments and physics-based simulations, we performed an extensive comparison of the DMMD, DMVD, and DC strategies under realistic working conditions. Our results are twofold. On the one hand, we have shown that the DMMD and DMVD strategies provided us with off-the-shelf solutions to a decision-making scenario different from their original context of site-selection. These design solutions maintained their speed and accuracy performance and showed a promising level of generality. On the other hand, we have shown the benefits of a self-organized approach by highlighting the scalability problems of the consensus time of the DC strategy. It is more robust to noise, although it relies on less information. Clearly, there are problem setups for which the DC strategy outperforms both the DMMD and DMVD strategies. However, the communication overhead of this strategy requires sufficiently capable robots.