Keywords

1 Introduction

Swarm robotics (SR) [1] is a field in which many homogeneous robots coordinate behavior to accomplish a given task without any form of global control. The robots are relatively simple compared to the task they are dealing with, that their communication is usually local and sensory capabilities are limited. Therefore, the emergence of complex collective behaviors can be regarded as a result of local interactions between the robots, and between the robots and the environment. A swarm robotic system exhibits three advantages: (i) robustness: the system is inherently fault-tolerant, that even if some robots stop working during the task, the system is still functional, (ii) flexibility: the solution generated by SRS is flexible enough to deal with similar tasks, (iii) scalability: the system can operate with a wide range of group sizes.

SRS is mainly inspired by social insects. Social insects such as ants, bees and wasps exhibit collective behaviors to accomplish tasks beyond the capability of a single individual. This highly intelligent behavior implicitly shown in these social insects is sometimes called swarm intelligence [2]. Among those intelligent collective behaviors, task allocation is the one which results in individuals being engaged into different sub-tasks while solving a complex task, in which the allocation is not fixed but may change dynamically based on local observations of individuals. Since task allocation can be widely observed in almost all social livings, similar behavior can be beneficial for SRS in the same way.

The design methods for SRS can mainly be divided into two categories [3]. The first one is behavior-based design, in which individual-level behaviors are developed, tested and improved manually following a typical trial-and-error process until the required collective-level behaviors are obtained [4, 5]. However, in this approach, expertise knowledge of the undertaken task is required and the performance of the system is completely reliant on the human designer. The second approach is automatic design, wherein a certain automatic design method is adopted to reduce the effort of human developers. One promising automatic design approach is evolutionary robotics (ER) [6, 7], in which the controller of robots is developed in an iterative way utilizing the Darwinian principle. Typically, robot controllers in ER are represented by artificial neural networks (ANN) [8, 9], whose parameters are obtained through evolutionary algorithms (EA).

Currently, task allocation behaviors in SRS are mainly obtained through the use of behavior-based design methods. The difficulty that prevents ER from being applied for generating autonomous task allocation behaviors resides in the fact that for such complex tasks requiring proper task allocation, the design objective of the controller is too far beyond primitive capabilities of the controller. The gap results in all individuals in the first generations performing equally poorly, that the evolution process cannot start and no valid controller could be found [10]. This problem is also referred to as the bootstrap problem.

One methodology for overcoming the bootstrap problem is to assist the evolutionary process with human knowledge. In this context, three approaches have been widely adopted: incremental evolution, behavioral decomposition and human in-the-loop [10]. In incremental evolution [11], the robot controller is trained in a simplified or partial task first, then the difficulty of the task is gradually increased until the original goal is achieved. In behavioral decomposition [12, 13], the original task is decomposed into several relatively simple sub-tasks, and the robot controller is also divided into sub-controllers to deal with the corresponding sub-tasks, respectively. Differently, the key idea of human in-the-loop is to let developers indicate intermediate states directly in order to avoid local optima [14]. Considering the fact that complex tasks which require task allocation behavior are inherently composed of sub-tasks, we believe that the behavioral decomposition has more advantages.

In this study, we utilize a behavioral decomposition based evolutionary robotics approach to generate autonomous task allocation behavior for a robotic swarm in order to accomplish a complex task. The remainder of this study is organized as follows: related work are introduced in Sect. 2. Section 3 describes the behavioral decomposition. Experiment settings are explained in Sect. 4. Section 4.5 discusses the results. Finally, we conclude this study with Sect. 5.

2 Related Work

In most existing work, task allocation behaviors have been generated for scenarios where robots need to search for some objects in the environment and then operate on these objects (e.g. foraging).

In [5], the authors described a foraging scenario in which robots have to decide whether to perform a foraging task or rest in the nest according to the density of foods in the environment in order to maximize the net swarm energy.

Pini et al. [15] devised a task where robots need to transport objects from source to nest, which can be achieved by two ways: transport the objects directly through a long path, or take a shortcut using a cache, which allows only a few robots operating on it simultaneously. The robots therefore need to choose which way to use based on the cost involved.

Agassounon et al. [16] considered a scenario where robots gather randomly located objects in the field and then group them in a cluster. Since the probability for robots to find scattered objects reduces as they are clustered, redundant robots can rest in order to increase the efficiency of the swarm.

Conventionally, task allocation behavior is obtained through the use of either probabilistic or threshold methods.

Liu et al. [5] proposed a mathematical model to generate task allocation behavior for a foraging task. In their approach, each robot keeps two thresholds tracing the permissible maximum time of searching and resting. The two thresholds are dynamically updated over time in an adaptive way based on local interactions. In [17], Castello et al. utilized an adaptive response threshold to describe the sensitivity of a robot to the “need” for performing a task, and the relation between those “needs” and response is updated by a probabilistic method.

On the other hand, good results have been reported by several works which utilized behavior decomposition to address the bootstrap problem in ER. Lee [12] developed a hierarchical controller for a scenario in which the robot had to find a box and transport it to a light source. In their study, the original task was divided into three sub-tasks: “circle the box”, “transport the box” and “explore for the box”. In [13], Duarte et al. studied a complex task in which a robot had to rescue a teammate in a double T-maze. The rescue task is composed of three parts: exit the initial room, solve the double T-maze, and bring the teammate back to the initial room. A hierarchical strategy based controller was synthesized and achieved 92 % solve-rate.

3 Behavioral Decomposition

The key idea of behavioral decomposition is based on the divide-and-conquer thinking: if a task is too complex to be solved directly, one can decompose it into relatively simple sub-tasks and achieve them respectively. The implementation of behavioral decomposition proposed so far shares many similarities [12, 13]. Here, we describe a generalized version as follows:

Firstly, the original task is decomposed manually into relatively simple sub-tasks in a hierarchical and recursive way until all sub-tasks are solvable. The original task can be then represented by a tree-like graph: leaf nodes represent the simplest sub-tasks after decomposition, internal nodes are harder sub-tasks consisting of children sub-tasks, and then the root node is the original task to be solved. Importantly, the decomposition is based on the analysis of task specification rather than the robot individuals’ abilities, which allows the designer to devise task structure by top-to-down schemes. In addition, if several sub-tasks have the same requirements for robots, they can be solved by the same sub-controller.

To accomplish those sub-tasks respectively, the robot controller is also organized in a hierarchical way similar to the task, in which sub-controllers have two types: behavior primitive and arbitrator [12]. A behavior primitive is used to solve the corresponding simplest sub-tasks sharing the same requirements, and an arbitrator combines several primitives or lower arbitrators together to achieve relatively harder sub-tasks. Primitives and arbitrators are usually represented by artificial neural networks receiving a part or all of the sensory inputs of the robot, respectively. The outputs of primitives have direct control of the robot, while the outputs of arbitrators are used to activate or restrain their primitives dynamically. An example of the robot controller architecture is as shown in Fig. 1.

Fig. 1
figure 1

Behavioral decomposition based controller

The development of a robot controller is a hierarchical process following a bottom-to-top procedure, where lower sub-controllers are developed first, and then they are combined together through the development of its upper arbitrator. This process is repeated until all controllers are developed.

4 Experiments

4.1 A Complex Task

To demonstrate the effectiveness, we apply the proposed approach to a complex variation of classic foraging task as shown in Fig. 4a.

The experiment field is composed of three parts: nest (with barriers at the entrance, green area), central area (grey area) and split-up area (pink area). Thirty robots are located in the nest at the beginning of the experiment. There are ten small foods (S-foods, henceforth) and five large foods (L-foods, henceforth), each L-food is composed of seven S-foods in the central area. An S-food can be transported by a single robot, while an L-food is too heavy. Therefore, robots have to cooperate to move it. In addition, L-foods can be separated into S-foods automatically by moving them to the split-up area.

The goal task for robots is to transport all foods back to the nest. However, due to the existence of barriers at the entrance of the nest, L-foods must be moved to the split-up area first for decomposition. Furthermore, the sight of robots is limited, therefore they need to explore the field to find foods.

As discussed above, the goal task is actually composed of three sub-tasks, described as follows:

  • Explore

    Since the sight of robots is limited, they have to explore the experiment field for foods.

  • Transport L-foods to the split-up area

    L-foods can not be transported to the nest directly due to the existence of barriers. Instead, robots have to move L-foods to the split-up area to decompose them into S-foods.

  • Transport S-foods to the nest

    S-foods should be transported to the nest in order to achieve the task.

4.2 Specification of Robots

As shown in Fig. 2, each robot consists of eight IR sensors, an omni-camera, a gripper, two motors and a behavioral decomposition based ANN controller (invisible in the figure). Details of those components are described as follows:

Fig. 2
figure 2

Specification of Robots

IR sensors Set in eight directions to detect distance between the nearest object and the robot. If the distance exceeds the sensors’ maximum range, it will be recognized as the maximum range.

Omni-Camera Main sensorial component with limited range of sight, gathers information of:

  • Number of robots, S-foods and L-foods in its sight, separately

  • Angle of the nest, nearest S-food, nearest L-food, central area and split-up area (relative angle to the robot, represented by sin\(\theta \) and cos\(\theta \))

Gripper A pre-programmed component in the front of the robot, catches an S-food automatically when two conditions are met simultaneously: the robot is performing the “transport S-foods to the nest” sub-task and the gripper is touching an S-food, otherwise it will drop the caught S-food if there is any.

Motors Each accepts a target speed argument, and then accelerates until the target speed is achieved.

Behavioral decomposition based ANN controller Corresponding to the task decomposition, the controller is composed of three behavior primitives and one arbitrator, all represented by three-layered ANN whose hidden layer is fully linked (recurrent). The number of input nodes of these ANNs and the information they hold differ from each other according to specific sub-tasks, while they all have ten hidden nodes. Each primitive has two outputs that take control of the robots, dealing with three sub-tasks respectively. The arbitrator has three outputs determining which sub-task to perform at each moment. Figure 3 shows the architecture of this controller. Details of the arbitrator and three primitives are described as follows:

Fig. 3
figure 3

Architecture of the Robot controller

  • Explore Primitive

    If this sub-controller is activated, the robot moves around the central area. Inputs consist of the angle of the central area and eight IR sensors.

  • Decompose Primitive

    This sub-controller is used to perform the “transport L-foods to the split-up area” sub-task. It takes the angle of the split-up area and the nearest L-food as its inputs. The gripper will automatically catch an S-food in the front of the robot, if it is available. The robot will stop if there is no L-food in the robot’s sight while this primitive is active,

  • Transport Primitive

    When this sub-controller is active, the robot grabs and transport the nearest S-food to the nest. Inputs include status of the robot’s gripper (catching an S-food or not), the angle of the nest and the nearest S-food and eight IR sensors. If the robot can not see any S-food in its sight, the robot stops.

  • Task Allocation Arbitrator

    This arbitrator takes the number of robots, S-foods and L-foods in its sight, as well as its output in the last time step as its inputs. The corresponding primitive to the highest output will be activated.

4.3 Experimental Settings

As discussed in Sect. 3, to obtain the final controller, we first evolve three primitives in three independent experiments, and then evolve the task allocation arbitrator in the goal task to exploit those primitives. In addition, to increase the accuracy, all experiments are performed by simulation using Box2D.Footnote 1 Details of those experiments are described as follows:

Fig. 4
figure 4

Experiment fields for a complex task—the overall task, b explore primitive, c decompose primitive, and d transport primitive

4.3.1 Experiment for Explore Primitive

Since the sight of robots is limited, they have to explore the field for foods. To train this primitive, we devised the following experiment, as shown in Fig. 4b. There are five robots in the nest and twenty-five obstacles in the central area. Five foods are located in the central area and once robots detect a food, they get a bonus and the food will be marked detected meaning it can not be detected again. Foods will be refreshed every 10 s in a new random position. The initial position of robots, obstacles and foods is also randomly selected. Each experiment lasts for 50 s.

During the artificial evolution, each individual candidate of this primitive controller is simulated 50 times, and the average fitness value is used as its final fitness value. Table 1 describes the fitness function.

Table 1 Fitness function for explore primitive

4.3.2 Experiment for Decompose Primitive

The L-foods can not be moved to the nest directly due to the existence of barriers. Instead, robots need to transport them to the split-up area to be decomposed into S-foods. In this experiment, as shown in Fig. 4c, nine robots and one L-food are located in the central area. Their initial position is randomly selected. Additionally, the range of omni-camera in this experiment is temporally set to unlimited. Each experiment lasts for 50 s.

Each individual candidate of this primitive controller is simulated 50 times during the artificial evolution process, and the average fitness value is used as its final fitness value. The fitness function is shown in Table 2.

Table 2 Fitness function for decompose primitive

4.3.3 Experiment for Transport Primitive

The final objective for robots is to transport all S-foods to the nest. Therefore we perform this experiment to obtain the transport primitive. The experiment field is illustrated in Fig. 4d. There are sixteen obstacles, one robot and one S-food in the central area. The initial position is randomized at the beginning of each experiment, time limit of which is set to 50 s. Similarly, the range of omni-camera is set to unlimited.

The fitness function is shown in Table 3. Each controller candidate is evaluated 50 times, and the average fitness value is used as its final fitness value.

Table 3 Fitness function for transport primitive

4.3.4 Experiment for Task Allocation Arbitrator

The task allocation arbitrator is evolved in the original task as shown in Fig. 4a. Details are described in the beginning of this section. The fitness function is shown in Table 4, some parts are derived from its primitives’ fitness functions. This experiment lasts for 120 s for each candidate.

Table 4 Fitness function for task allocation arbitrator
Fig. 5
figure 5

Representative fitness transition of all experiments

4.4 Evolutionary Algorithm Settings

We adopted (\(\mu \), \(\lambda \)) evolution strategy for all experiments, sharing most parameters. The population size is set to 500, and the parent size to 75. The range of each ANN link is \(\in [-1.0, 1.0]\) following the standard normal distribution mutation, where initial mutation step size is set to 0.05. The maximum generation for three primitive experiments is 500, and 1000 for the arbitrator. For all experiments, we run 10 trials.

Fig. 6
figure 6

Snapshots of a successful trial, showing that robots are able to change their roles dynamically to achieve the task

4.5 Results

Figure 5 illustrates the most representative fitness transition of all experiments in 10 trials. The fitness of explore primitive, transport primitive, and the arbitrator grew smoothly and finally converged and reached the maximum fitness in the final stage. On the contrary, the fitness of decompose primitive was still growing in the final stage, which indicates that the maximum generation was not enough.

Snapshots of the generated autonomous task allocation behavior are shown in Fig. 6. In the first snapshot, robots start with performing the explore sub-task. After most robots moved to the central area, in snapshot 2, some robots changed to perform the “decompose L-foods” sub-task, and some robots performed the “transport S-foods” sub-task. In snapshot 3, they moved an L-food to the split-up area and then started to transport the appeared S-foods after decomposition, but when they found another L-food, they changed soon to move that L-food (see the change between snapshot 3, 4). Snapshot 5 shows that after they transported S-foods to the nest, they turned back to perform the explore sub-task. In snapshot 6, most foods were transported to the nest and the experiment ended due to the time limit.

As shown in these snapshots, we obtained a behavioral decomposition based controller that successfully exhibited autonomous task allocation behavior to achieve this complex task by using the proposed approach.

5 Conclusion

In this paper, we applied the evolutionary robotics approach to generate autonomous task allocation behavior for a robotic swarm to accomplish a complex task. The behavioral decomposition is adopted for overcoming the bootstrap problem. By means of computer simulation, we confirmed the effectiveness of the proposed approach for complex tasks that require autonomous task allocation behavior.

As for the next step, we plan to apply this approach to much more complex tasks to examine its scalability. In addition, we also plan to conduct quantitative experiments to demonstrate the performance of the proposed approach.