1 Introduction

Collective decision-making is a central cornerstone in most natural and artificial collective systems. In the context of artificial systems, collective decision-making is one of the most important building blocks for swarm robotics systems [3]. Many swarm robotics problems such as deciding a common direction for coordinated motion [8], or a common area in the environment to aggregate to [6], can be seen as instances of collective decision-making [28]. When a swarm needs to make a collective decision by choosing among a set of discrete alternatives, the resulting problem is called the best-of-n problem in a robot swarm, and has been thoroughly reviewed in [28].

In this paper, we consider a version of the best-of-n problem whereby a swarm of robots with minimal capabilities has to achieve consensus to one among n options, and in particular to the one with the best quality, while interacting only locally in an environment that is symmetric with respect to the distribution of the n options (that is, all options can be evaluated on average in the same amount of time). The robots do not communicate the perceived option quality. On the contrary, they can only advertise one option at the time, the one corresponding to their current opinion, and they use a decision mechanism to change their current opinion after observing their neighbors in local proximity. Many decision mechanisms have been utilized in the past, the most common ones being the voter model [2, 30] and the majority rule [14]. Consensus is built over time using a mechanism called positive feedback modulation [11], whereby fluctuations in robot’s opinions distributions will eventually produce a bias towards one of the two options, which will make that option more likely to be observed and henceforth reinforcing this bias, until consensus is reached. To date, the literature on the best-of-n for a mobile robot swarm has mainly focused to the static environment case, whereby both environment and option quality do not change over time, with only few exceptions [28].

In this paper, we consider the best-of-n problem in dynamic environments. More precisely, we consider the case where the environment is static and symmetric with respect to option distribution, but the quality is asymmetric and furthermore abruptly changes over time. The goal of the swarm is collectively select the option corresponding to the best quality, and at the same time to adapt this decision and shift its consensus state to another option if this becomes the best one. We consider the voter model as the main decision mechanism, and the positive feedback modulation mechanism first proposed in [30], which consists in robots advertising each option to their neighbors for a time that is proportional, on average, to the quality of each option, an idea that is inspired by the waggle dance behavior exhibited by honeybees [25].

We perform experiments using multi-agent computer simulations where the spatial dimension is taken into account, and each robot is abstracted by an agent. We show that the vanilla application of the voter model does not allow the swarm to adapt to a dynamic change of the options quality. To solve this problem, we introduce the notion of stubborn agents, that is, agents that do not apply any decision mechanism and therefore never change their opinion. Stubborn agents can be seen as scouts, constantly exploring their favorite opinion, irrespective of the opinion of others and of the consensus state of the swarm. We perform simulated experiments where we analyze the above idea by studying the effect of the key parameters: swarm size, proportion of stubborn individuals, and ratio of the option quality. We also perform preliminary experiment with another decision mechanism, the majority rule, showing that in this case the dynamics are completely different.

The rest of the paper is organized as follows. In Sect. 2, we analyze the literature in collective decision making and we relate our work considering also the few cases where the environment can be considered dynamic. In Sect. 3, we define the dynamic best-of-n problem, the collective decision-making mechanism, and the idea of having the stubborn individuals. In Sect. 4, we present our experimental setup by explaining the specific task and the parameters that have been studied. In Sect. 5 we present the results, while in Sect. 6 we conclude and we discuss the other possible directions in which this work can be extended.

2 Related Work

The direct biological inspiration of the best-of-n problem and in particular of the scenario chosen here comes from the collective behaviors of social insects such as ants [9] and more specifically bees [13, 25]. The literature on the best-of-n related to swarm robotics will be reviewed using two categories introduced in [28]. We conclude this section by analyzing the few works in the best-of-n that can be considered conducted in a dynamic environment.

In the first category, we find work whereby robots cannot measure directly the quality of the different options. Instead, there are asymmetries in the environment that bias the collective decision towards one of the n options. For example, in [5, 10], a classical aggregation task inspired by cockroaches is presented. Here, the asymmetry in the environment is represented by the different size of a number of shelters, that are perceived by the robots in the exact same way. Thanks to this environmental effect, robots are able to select the right shelter to aggregate in. Another example of environmental asymmetry is shown in [14, 27], whereby the environment is represented by a classical double bridge [7] and robots have to find the shortest path between two bridges connecting the nest to the food source. Differently from our work, here the asymmetries between the two paths induces agents selecting the shortest path to appear more frequently in the nest, therefore biasing the process towards that path. In this work, the majority rule was used as decision mechanism. In a subsequent work [23] performed on the same scenario, another mechanism called the k-unanimity rule (the agent switches opinion only after observing the same option k times in a row) was used.

In the second category, we find works in which the quality can be directly measured, as per our case, but are conducted in a static environment. The baseline studies on direct modulation of positive feedback through quality was performed in [29,30,31], whereby the authors thoroughly analyzed the voter model and the majority rule through real robot experiments, simulations, ordinary differential equations, and chemical reaction network models, and studied the speed versus accuracy trade-off. The authors in [19,20,21] developed a decision-making strategy that, differently from our work, includes also an uncommitted opinion (neither of the n alternative), a recruitment mechanism, and an inhibition mechanism, as in honeybees [26]. In a recent follow-up study [18], they have shown how this model can be general to encompass not only decision-making in social insects but also in the human brain [13]. Finally, in [15], the authors considered the best-of-n problem in an aggregation task. Here, agents use a direct recruitment mechanism and are able to commit by using a quorum-based mechanism that makes the swarm aware of the consensus level reached.

In the context of dynamic environment, relatively little effort has been put, however the idea of having the swarm not converging to a full consensus is not new in this paper. For example, biological studies have found that having only a large majority committing to an option rather than the unanimity allow fish schools to swiftly adapt to perturbations [4]. Back to artificial systems, in [16], the authors considered a task-sequencing problem that can be seen as a best-of-2 with two options: task complete and task incomplete. These have dynamic qualities because the task completion level, corresponding to the size of the cleared area, changes over time. The authors of [1] studied a dynamic version of aggregation. Here, each shelter emits a different sound that varies over time, and the swarm has to aggregate in the shelter with the loudest sound. The method is based on a fuzzy version of the original BEECLUST algorithm [12, 24]. In the original BEECLUST, after a waiting period, each agent chooses a new direction of motion at random, while in [1] a fuzzy controller that maps the loudness and the bearing of the sound determines the new direction of motion. Differently from all this work that focused on specific application scenarios, in this paper we perform a systematic study of a minimal model of the dynamic best-of-n problem, in order to understand better the effect of the most important key parameters.

3 The Model

In this section, we define the dynamic best-of-n problem (Sect. 3.1) and the collective decision-making model introduced (Sect. 3.2).

3.1 The Dynamic Best-of-n Problem

The best-of-n problem requires a swarm of agents to make a collective decision among n possible alternatives towards the choice that has the best quality. A typical example is the choice of best location for honeybees’ swarm foraging. Each of the n options has an intrinsic quality \(\rho _i\) with \( i \in {1, \ldots , n}\). Qualities \(\rho _i\) are defined in [1.05, 1.5, 3]. A best-of-n problem reaches the optimal solution when the collective decision of the swarm is for the option with maximum quality. That means that a large majority \(M \le N(1- \delta ) \) of agents agrees on the same option, where \(\delta \) is a small number chosen by the experimenter. In the case where \(\delta =0\) there is perfect consensus.

In this paper, as for the majority of the studies [29], we restrict n to 2 options, labeled a and b, having intrinsic quality \(\rho _a\) and \(\rho _b\). Without loss of generality, one option quality \(\rho _a\) is set to 1 while \(\rho _b > 1\). No cost is included in the current model, which means that the time needed to explore and assess the quality of both options is symmetric [28]. Each agent can measure the quality of different options, but cannot communicate it but rather can only advertise the option itself using local communication (see Sect. 3.2). In dynamic environments as introduced here, qualities can change over time: \(\rho _a = \rho _a(t)\) and \(\rho _b = \rho _b(t)\). In this study, we only consider qualities that are piece-wise constant: at a given time \(T_C\), the two qualities are swapped. Namely \(\rho _a(t)\) and \(\rho _b(t)\) remains constant for \(t<T_C\), they are swapped at \(T_C\) (\(\rho _a(T_C) = \rho _b(T_C - 1)\), \(\rho _b(T_C) = \rho _a(T_C - 1)\)), and again remain constant afterwards.

3.2 The Decision Mechanism and the Stubborn Agents

We consider two kinds of agents: normal and stubborn. Each agent has an initial opinion, which consists in one of the two options a or b. Normal agents are able to change their opinion by applying a decision mechanism that relies on the observation of other agents in local proximity. Stubborn agents instead never change their opinion and keep the one they have at the very beginning, either a or b.

Initially agents are positioned inside the nest. Then, they move toward the region corresponding to their opinion. They spend there an exponentially-distributed amount of time (sampled independently per agent) that does not depend on the option, during which they measure the quality of that site. Then they go back to the nest, each at a different time, and they start disseminating their opinion. Agents within the nest needs to be well-mixed in order to avoid agents with same opinion clustering near each other. A random walk is implemented in order to meet this well-mixed assumption as much as possible.

The agents controller is represented by the finite state machine in Fig. 1a. Accordingly, agents can have one of the following 4 possible states: dissemination state of opinion a (\(D_a\)), dissemination state of opinion b (\(D_b\)), exploration state of opinion a (\(E_a\)), exploration state of opinion b (\(E_b\)). In the figure, solid lines represent deterministic transitions, while dotted lines stochastic transitions. The symbol VR indicates that the voter model is used at the end of the dissemination state (in the case where the majority rule is used, this will be mentioned). In the dissemination state, the agent disseminates his opinion continuously to other agents he meets that are also in the dissemination state. The time spent by the agent disseminating its opinion is randomly sampled from an exponential distribution characterized by a parameter proportional to the quality of the region. As a consequence it is more probable to meet neighbors with the best opinion than meeting those with the worst one. This mechanism is called modulation of positive feedback and it is the driving mechanism to make the group converge on the option with the best quality. At the end of dissemination, each agent can change its opinion based on the opinions of other agents and using either the voter model or the majority rule. Both the voter model and the majority rule depend on the opinion of neighbors, that is the agents within a specified spatial radius (in our experiments set to 10 units). In the voter model, the agent switches its opinion to the one of a random neighbors. In the majority model, the agent changes its opinion to the one held by the majority of its neighbors.

Fig. 1.
figure 1

Panel a: Probabilistic finite state machine. \(D_a\), \(D_b\), \(E_a\) and \(E_b\) represent the dissemination and exploration state. Solid lines represent deterministic transitions, while dotted lines stochastic transitions. The symbol VR indicates that the voter model is used at the end of the dissemination state. Panel b: Screenshot of the simulation arena. This image is taken from NetLogo software.

4 Experimental Setup

The experiments have been conducted first on NetLogo simulator for fast prototyping. Then, the systematic simulations have been run using the simulator developed in [29].

Agents move on a 2-dimensional arena of size 200 (width) \(\times \) 100 (height) units (see Fig. 1 for a screenshot within NetLogo). In the binary model, the arena comprises a central region called the nest, where initially all agents are and where they meet to perform the decision-making process. The two external areas are the sites and represent the two options: option a on the left and option b on the right.

In order to test the robustness of the model, the most important parameters have been studied. For the voter model, as evident from Table 1, the total number of agents has three different values: 40, 100, 500. Without loss of generality, the interplay between \(\rho _a\) and \(\rho _b\) can be studied simply by keeping one of them fixed (\(\rho _a\) before the environment changes, and \(\rho _b\) after it changes) to a value of 1 and by changing the other one. The values of the second quality studied are: 1.05, 1.5, 3, indicating small, medium, and large difference in quality, respectively. The proportion of stubborn individuals have been studied in the range \(\{5\%,10\%,20\%\}\), equally distributed between the two opinions.

Table 1. Model parameters used in experiments

For the majority model only one set of parameters has been run (see Table 1). As initial conditions of each run, \(50\%\) of agents have opinion a and \(50\%\) of agents have opinion b.

The dissemination time is exponentially distributed with parameter \(\tau _D=g \rho \) with \(g=100\). The time of exploration is also exponentially distributed, with parameter set to \(\tau _E = 10\), therefore independent of the site.

In the dynamic environment considered in this paper, a new time parameter \(T_C\) is introduced: the time when the value of \(\rho _a\) and \(\rho _b\) are abruptly changed by swapping their values. In this study \(T_C=2500\), a value empirically chosen as a compromise to allow both consensus to the best option prior to change and reasonably short runs. For each configuration of parameters, an ensemble of simulation has been realized, consisting of \(R=50\) runs.

5 Results

The different configuration sets are compared in terms of temporal evolution of opinions. In particular only the proportion of agents with opinion a (\(p_a\)) is monitored, as the percentage of agents with opinion b (\(p_b\)) is simply given by \(p_b=1-p_a\). The plots report all the trajectories of this quantity over time (in simulated seconds, sampled every \(\varDelta t = 0.1\) steps) for all runs. In the paper are reported only the most relevant plots to discuss the results. All the plots of the full study, together with example videos, are available as Supplementary Material [17]. Table 1 reports in bold the parameters whose plots are included in the main text.

5.1 The Vanilla Voter Model

Figure 2 shows the results of runs of voter model without stubborn (also called vanilla voter model) for two different values of quality ratio: 1.05 (low) and 3 (high). It is interesting to note that for a low value of quality ratio the convergence is never reached, while for high value of quality ratio the convergence is reached but there is no adaptation to the environmental change. We will see next how stubborn individuals will play a driving role to get convergence and adaptation in the dynamic best-of-n problem.

Fig. 2.
figure 2

Opinion evolution for a voter model with no stubborn, for two different values of quality ratio: 1.05 (a) and 3 (b). For low quality ratio there is no convergence. For high quality ratio the convergence to one option is reached but there is no adaptation to the change of opinion quality

5.2 Effect of Quality Ratio and of Proportion of Stubborn Individuals

Figure 3 reports the results of runs for four different cases of systems of 100 agents: across rows, we vary the ratio \(\rho _a /\rho _b\) from very low (1.05) to very high (3). Across columns, we vary the stubborn percentage from \(5\%\) to \(20\%\). It is evident the large role played by quality ratio value: whatever the values of stubborn presence, in the case of low quality ratio there is no convergence of opinions, neither adaptation. On the other hand, for large quality ratio value (3) there is good convergence and adaptation, irrespective of the proportion of stubborn individuals (which only affects the final value of the consensus state in a decreasingly proportional way). Interestingly, while the presence of stubborn individuals has been shown to be fundamental to have convergence and adaptation, its percentage does not seem to significantly contribute in terms of time nor in terms of variance of number of agents following the opinion.

Fig. 3.
figure 3

Different cases of systems of \(N=100\) agents. (a) \(S=5\%\) and \(\rho _a/\rho _b=1.05\), (b) \(S=20\%\) and \(\rho _a/\rho _b=1.05\), (c) \(S=5\%\) and \(\rho _a / \rho _b=3\), and (d) \(S=20\%\) and \(\rho _a / \rho _b=3\). It shows that quality ratio has a higher effect than the percentage of stubborn.

5.3 Effect of Swarm Size Versus Proportion of Stubborn Individuals

Keeping constant the percentage of stubborn individuals, a big role of the swarm size is disclosed by Fig. 4 (the quality ratio varies across rows, while the swarm size across columns). Increasing the population size decreases the variance of fraction of agents following a certain opinion (here a), while the convergence or non-convergence are determined by the value of the quality ratio. In the case of low quality ratio, for small swarm size there is no convergence; increasing the size of the swarm show a certain tendency to convergence. Interestingly, in the case of high quality ratio, increasing the swarm size reduces the variance of adaptation time.

Since the number of stubborn individuals increases with the swarm size, a doubt arises if convergence is observed only for an absolute number of stubborn agents larger than a critical mass. This is denied by the evidence of other configurations. For instance, in the swarm with \(N=100\) agents and stubborn percentage \(20\%\) the number of stubborn individuals is 20, which is comparable to the number of stubborns of the swarm with \(N=500\) agents and stubborn percentage \(5\%\), that is 24 stubborns. The first case does not converge while the second case does converge: therefore it can be concluded that there is an intrinsic role of the size of the swarm itself, not related to any critical mass of stubborns. The only thing that could be further argued is that, rather than being an effect of the swarm size, it may be an effect of increased density instead. This will be confirmed or denied in future work.

Fig. 4.
figure 4

The effect of the swarm size 40 and 500 for the two quality differences 1.05 and 3: (a) \(N=40\) and \(\rho _a / \rho _b=1.05\), (b) \(N=500\) and \(\rho _a / \rho _b=1.05\), (c) \(N=40\) and \(\rho _a / \rho _b=3\), and (d) \(N=500\) and \(\rho _a / \rho _b=3\). In the case of low quality ratio, for small swarm size there is no convergence; increasing the size of the swarm show a certain tendency to convergence. In the other case (high quality ratio), increasing the swarm size reduces the variance of adaptation time.

5.4 The Majority Rule with Stubborn Individuals

Also a majority decision model has been implemented to compare with voter model results. Results show that the majority model never works, but does a sort of spontaneous symmetry breaking (more often biased to the option that is best at the beginning of the experiment, b) and is not sensitive to the presence of stubborn individuals. We will speculate more about this in the conclusions.

Fig. 5.
figure 5

Majority model for a system of \(N=100\) agents, \(\rho _a / \rho _b=3\), without stubborn (a) and with \(10\%\) of stubborns (b). Whether there are stubborn individuals or not the majority model never converge and a symmetry breaking is observed.

6 Conclusion, Discussion, and Future Work

In this work, we have introduced the dynamic best-of-n problem, and we have proposed a solution to this problem when the environment is asymmetric with respect to the option qualities that can be assessed by the robots and symmetric with respect to the time needed to assess each option [28]. The proposed solution consists in a combination of direct modulation of positive feedback coupled with the voter model and with the introduction of stubborn agents, that is, agents that do not change their opinion and stay committed to their initial option.

Through simulation experiments, we have shown that the voter model alone (i.e. without the stubborn agents) cannot make the swarm adapt to abrupt changes in the option qualities. After introducing stubborn agents, we have studied the effect of key parameters: the ratio of the two qualities, the proportion of stubborn agents, and the swarm size. Firstly, we reported that, as expected and reported in previous studies [14], the difference in site quality place a crucial role, whereby the ability to adapt to the environmental changes is strongly linked to the system accuracy, and higher level of accuracy and adaptability are observed with increasing ratio between the qualities. Secondly, contrarily to initial expectations, we found that increasing the ratio of stubborn individuals does not have an effect on neither the accuracy nor the adaptability. Finally, and surprisingly, we found that increasing swarm size has a beneficial effect on both consensus and adaptability: when the quality ratio is high (easier problems), the swarm is able to react faster with smaller variations on the reaction times; when the quality ratio is low (harder problems), a small swarm is not able to achieve consensus at all, while a larger swarm shows sign of approaching consensus and at the same time of adaptability to the change in the environment. This trend further confirms our speculation that, at least in this system, consensus-reaching and adaptability are strongly interlinked, as in previous study it was already found that accuracy increases with increasing swarm sizes as this will eventually approach a continuum model that, when studied using ordinary differential equations (ODEs), predicts that the swarm always achieves consensus to the best quality [30]. We concluded the experimental analysis with a preliminary study of the majority rule model, by showing that this model is ineffective in reaching consensus to the right option and at adapting to environmental changes. The latter is due to the effect of spatiality, as stubborn individuals committed to the same options are very unlikely to appear next to each other (Fig. 5).

This present study has revealed new insights but at the same time has raised new questions that we plan to investigate in future studies. Firstly, we would like to study this system using theoretical models such as ODEs and chemical reaction network modeling, as in [29], because this allows to study more broadly the effect of parameters and to have a deeper understanding of the dynamics. To do so, stubborn agents may be replaced with a spontaneous transition rate of all agents to a random opinion, as this can be more easily modeled. However, the equivalence of the two models needs to be tested. Secondly, we plan to systematically study the majority-rule to determine whether there is a variant that can make this mechanism adaptive as well. This study will start by systematically varying the proportion of stubborn agents, because we suspect that there could be a higher value which will cancel out the effects of spatiality and make also this model effective, thus we expect to see a sort of phase transition with respect to this parameter. Finally, provided enough resources, we plan to perform the experiments on real robots, likely kilobots [22], in order to have a proof of concept in the real world and potentially discover new factors influencing adaptability.