1 Introduction

Multi-agent planning has become increasingly important as more and more systems have many autonomous agents interacting in the same environment. Autonomous vehicles and service robots in buildings are just a couple of examples. In some multi-agent systems, each agent has its own goal and planning knowledge, and generates plans to achieve its goal while maximizing its own utility. Planning techniques such as symbolic planning allow an agent to compute a sequence of actions, by reasoning about action preconditions and effects, to bring about state transitions in order to achieve a goal that is unreachable using individual actions. For instance, the action of going through a door into a room is preconditioned by the robot being beside the door and the door being open; and the effect is the robot’s position being changed to the new room. When action costs are further incorporated into this planning process, agents can compute optimal plans that maximize utility (or minimize cost).

When multiple robots share a physical environment (such as our Segway-based robots, BWIBots Khandelwal et al. (2017), that are shown in Fig. 1), it is necessary to model how their plans interact with each other, to avoid conflicts and to construct action synergies as much as possible.

Fig. 1
figure 1

Three of our Segway-based mobile robot platforms

The agents’ plans might interact such that their independently-computed optimal plans become suboptimal at runtime, due to constrained resources such as narrow corridors that allow at most one robot to pass. Figure 2 shows an example of a conflict, where two robots need to go from S1 to G1 and S2 to G2 respectively. If the robots act according to independently-computed optimal plans, they will block each other in a narrow corridor, causing a conflict.

Fig. 2
figure 2

An example showing a conflict between the individually optimal plans of two robots

On the other hand, communications within a team of robots have the potential to leverage synergies in their plans by coordinating amongst themselves. For instance, when a robot knows its teammate is going to take an expensive door-opening action, it makes sense for the robot to plan to follow its teammate through the door instead of opening it separately. This scenario is demonstrated in Fig. 3 which shows the occupancy-grid map, where robot \(r_1\) and \(r_2\) are required to move from S1 to G1 and from S2 to G2 respectively. The floor is separated into two areas by two doors in the middle, D1 and D2. Without collaboration, \(r_1\) and \(r_2\) will plan to perform the expensive door opening procedure independently of each other and go through D1 and D2 respectively. The globally optimal solution is, while \(r_1\) is taking the expensive “call for open” action, \(r_2\) moves to D1 and waits until \(r_1\) “opens” D1 (with human help).

Fig. 3
figure 3

Floor map showing a potential synergy between two robots going from S1 to G1 and S2 to G2 respectively. D1 and D2 are closed doors

One of the key challenges to planning for such resource sharing and leveraging synergy is the inherent temporal uncertainty in the duration of robots’ actions at runtime, which is largely overlooked in existing research. While such conflicts can be resolved locally at runtime [e.g., two robots detecting a conflict at runtime can “negotiate” to decide who gives way to the other Wong and Kress-Gazit (2015)], we argue a better solution is to avoid such conflicts at planning time.

The contributions of this article include:Footnote 1

  • The formulation of the multi-robot planning with conflicts and synergies (mrpcs) problem;

  • A novel, Iterative Inter-Dependent Planning (iidp) framework for coordinating single-robot plans, which is the main contribution of this article;

  • Two algorithms that implement the iidp framework for solving mrpcs problems;

  • A model of temporal uncertainty in MRPCS problems;

  • Systematic evaluations of the framework and algorithms, including demonstrations on real robots.

iidp is generally applicable to mrpcs problems via utilizing single robot planners. However, iidp is not restricted to specific planners, forms of noisy action durations, or application domains. The iidp framework provides a high-level coordination protocol for a team of planning agents, to communicate their plans and produce plans with overall lower costs than independently generated plans, while efficiently using constrained resources.

The general structure of an iidp algorithm suggests that each robot iteratively computes and saves the conditionally optimal plan given other robots’ current plans. Each robot then exchanges plans with the others and re-plans accordingly. The iterative “negotiation” and re-planning process continues until the robots either reach an agreement or use up the maximum number of iterations.

While the general iidp framework can accommodate a diverse set of algorithms varying in the decision process they employ to re-plan, we introduce and focus on a study of two specific iidp algorithms in this article. The proposed iidp algorithms have different trade-offs between computational complexity and plan quality. The first is the best alternative (BA) algorithm, which aims to iteratively find the agent with the best alternative plans, and have this agent change its plan. The second algorithm is the increasing dependency (ID) algorithm, which increasingly considers the cost of interactions in a finite number of iterations. The agents start from completely ignoring interactions with other agents in the first iteration, and end with full consideration of interactions in the last iteration. Intuitively, this is a multi-round negotiation process, where the agents proceed from independent planning to full consideration of conflict costs and synergy benefits.

The most important property of iidp is that it does not increase an mrpcs problem’s dimensionality beyond that of a single agent and never considers the joint-agent state-action space, while still being able to produce good quality solutions (Sect. 4). iidp is named as being “inter-dependent” because an important step in the implementation of iidp algorithms calls an external algorithm that computes an optimal plan for a robot under the condition of other robots’ current plans (i.e., this planning process depends on existing plans). While this inter-dependent planning algorithm is an important component of any iidp realizations, the iidp framework is general and is not restricted to any specific inter-dependent planning algorithm.

As a secondary contribution of this article, we model and solve a specific mrpcs problem, namely multi-robot navigation, using the iidp framework. The aim of applying mrpcs to this specific domain includes illustrating the whole process of instantiating iidp, demonstrating the need of modeling temporal uncertainty in mrpcs problems, comparing the two new iidp algorithms (BA and ID), and evaluating the performance of iidp in a real-world problem.

In both the instantiated real world problem and the abstract virtual domain, we observe that our iidp algorithms reduce the overall plan cost compared to baseline algorithms. In the experiments with real robots, the use of the iidp framework enables a team of robots to avoid going into the same corridor at planning time, and leverages action synergy by sharing door-opening actions.

Although we primarily use a team of indoor mobile robots for demonstration and evaluation purposes in this article, the developed framework and algorithms are general enough to be applicable to a variety of other domains. For instance, one setting where multi-robot interaction and coordination are crucial, is the autonomous vehicle planning domain (Dresner and Stone 2008). Here conflicts can manifest as traffic jams while synergies can be formed by platooning, which in turn can yield energy saving, lower congestion, and improved safety (Tsugawa and Kato 2010).

The remainder of this article is organized as follows. Section 2 discusses existing research on multi-agent planning, and provides context for our contribution. Section 3 defines the mrpcs problem. Section 4 presents our iidp framework as well as the two algorithms developed under this framework. Section 5 provides technical details of instantiating iidp using the specific domain of multi-robot navigation, including modeling temporal uncertainty in navigation actions. Section 6 details the experiment setup and results, including demonstrations on a real multi-robot system. Section 7 concludes this article, and provides a few open problems for future work.

2 Related work

iidp algorithms solve problems by iteratively using a two-stage process: in the first stage, each agent employs a single agent planner, while in the second stage the agents exchange plans and negotiate over dependencies. We begin the review of related work with single-agent planners followed by a detailed discussion of multi-agent planners.

Planning was one of the earliest research areas in artificial intelligence, and aims at helping an agent to plan a sequence of actions to accomplish complex tasks that are impossible using individual actions. The key to formalizing a planning problem is the description of actions. Action description languages (or action languages) have been developed for describing preconditions and effects of actions, and can be used for describing planning domains. The strips language (Fikes and Nilsson 1972), as part of the Shakey project (Nilsson 1984), is the earliest action language. We refer the interested reader to a journal article that reviewed early action languages (Gelfond and Lifschitz 1998), where the focus was mainly on developing languages that are more natural and concise in describing planning domains. Nowadays, pddl Ghallab et al. (1998) is arguably the most widely used action language that is supported by a variety of planning algorithms and implementations (Helmert 2006; Hoffmann and Nebel 2001). pddl was developed for and maintained by the International Planning Competition (ipc) community since 1998. bc is an action language that is particularly attractive for robotic applications because it can represent recursive fluents, indirect action effects and defaults (Lee et al. 2013), and bc programs can be solved using systems such as Clingo Gebser et al. (2014). While we use bc in this work, it should be noted that iidp, as an efficient multi-robot planning framework, is not restricted to any action languages or planning systems.

Multi-agent planning Multi-agent planning (map) is a variant of the automated planning domain where plans must be computed for multiple agents acting in a joint state space. A line of publications offered a formal planning language description for representing map, including multi-agent extension of strips (Boutilier and Brafman 2001), Multi-agent Planning Language (mapl) (Brenner 2003), Concurrent strips (cstrips) (Oglietti and Cesta 2004), ma-strips(Brafman and Domshlak 2008), and ma-pddl (Kovács et al. 2012). A competition of distributed and multi-agent planning took place in 2015 as part of the workshop on distributed and multi-agent planning. The problems in that competition were formulated in the ma-pddl language. Among the solvers that participated, the adp planner (Crosby et al. 2013) ranked first in the centralized track and the psm planner (Komenda 2016) ranked first in the distributed track. The problem domain for that competition was limited to scenarios with multiple agents that are not self-interested and are working together to synthesize a joint plan that solves a common goal. All agents wish thereby for the goal to be reached at the end of the task execution and all share a common utility function. The work covered in this article, by contrast, considers self-interested agents, each with a specific and private goal. Moreover, the utility functions and action costs are not assumed to be common among the different agents. The agents’ collaboration behaviors produced in this article are a result of the agents leveraging each other’s actions and avoiding competing for constrained resources, where neither is detrimental to the individual agents’ own interests.

Multi-robot symbolic planning Action languages, including pddl, have been used for symbolic planning for a team of robots Filippidis et al. (2012), Alur et al. (2013), Knepper et al. (2013), Buehler and Pagnucco (2014), Wong and Kress-Gazit (2015, 2016). However, they either do not model possible runtime conflicts (assuming that plans computed can be successfully executed to the end without any interruptions (Knepper et al. 2013; Buehler and Pagnucco 2014) or aim at resolving conflicts locally at runtime (Filippidis et al. 2012; Alur et al. 2013; Wong and Kress-Gazit 2015, 2016). As an example of locally resolving conflicts, two robots that compete for a narrow corridor can “negotiate” to make sure one robot gives way to the other (Wong and Kress-Gazit 2015). Such conflict-resolving actions can be very expensive in practice, yielding locally optimal solutions. Very recently, a time-bounded multi-robot planning approach was developed for a team of logistics robots (Schäpers et al. 2018). Like this article, they considered action durations via temporal planning. Unlike this article, they did not consider the uncertainty in action durations, and their planning system models a limited time horizon (of 3 min) to reduce the search space.

While iidp can be applied to multi-agent planning problems, it is particularly useful to multi-robot systems, where a team of physical agents frequently have to avoid conflicts (due to limited resources) and build action synergies when possible. In comparison to these existing methods, our proposed iidp algorithms compute high-quality, joint plans, while successfully avoiding computing plans in a joint space. We also model noisy action durations, as one of the factors that affect runtime conflicts, and avoid such conflicts (in probability) at planning time.

Multi-robot task allocation (mrta) mrta aims at allocating a set of tasks to a set of robots. mrta includes a family of problems that can be classified based on the number of tasks a robot can execute at a time (single-task versus multi-task), the number of robots each task requires at most (single-robot vs. multi-robot) and if planning is needed (instantaneous versus time-extended) (Korsah et al. 2013). In this article, the robot team starts with each robot having a fixed task with no opportunity to transfer tasks to other robots, assuming the availability of a single-task, single-robot, time-extended mrta algorithm. More details on the assumptions made in this article are available in Sect. 3. In practice, it should be noted that mrta algorithms by themselves cannot compute symbolic plans.

Multi-robot scheduling A multi-robot scheduling problem’s input includes a set of robots and a set of tasks. The output is a schedule that is for each task an allocation of one or more time intervals to one or more robots (Brucker 2007; Zhang and Parker 2013; Coltin and Veloso 2014). Recent work on multi-robot scheduling further considers temporal uncertainty (in a multi-robot navigation task) (Brooks et al. 2015). However, in scheduling algorithms, generally a predefined and fixed set of tasks are given and need to be scheduled whereas in planning algorithms, actions are generated to achieve a final goal. Hence scheduling algorithms cannot be used for generating action sequences in complex domains that require reasoning about actions.

Multi-robot probabilistic planning Contemporaneously with symbolic planning, (po)mdp-based planning techniques have been extensively studied in the literature. Existing (po)mdp-based research has studied: planning with concurrent actions (Mausam and Weld (2008); Smith and Weld (1999)), planning under temporal uncertainty (Guo and Hernández-Lerma 2009; Younes and Simmons 2004), incorporating temporal logic into navigation task planning (Fentanes et al. 2015), and planning for multi-robot systems using a single (po)mdp (Khandelwal et al. 2015), multiple (po)mdps (Zhang et al. 2013), and dec-pomdps (Amato et al. 2015). Such algorithms are good at handling non-deterministic action outcomes using probabilities and planning toward maximizing long-term reward.

In contrast, symbolic planning techniques, such as strips, pddl and bc, fall into a very different planning paradigm, where the input are action preconditions and effects, non-deterministic action outcomes are handled by plan monitoring and replanning, and the output is a sequence of actions. Therefore, symbolic planning, in comparison to (po)mdp-based, is more suitable to problems where there are many potential goals and human-interpretable plans are required.

Multi-robot motion planning Existing work has investigated the problem of multi-robot concurrent task assignment and motion (trajectory) planning (Turpin et al. 2014; Ma and Koenig 2016). Given N robots and N goal locations, the algorithms aim to find a suitable assignment of robots to goals and the generation of collision-free, time parameterized trajectories for each robot. Although such motion planning algorithms are complimentary to our multi-robot symbolic task planning algorithm, their methods are applicable to problems that require only navigation actions and they do not model noisy action durations (assuming no runtime delays). Similar drawbacks exist when considering the task of multi-robot path planning which has received much attention lately (Sharon et al. 2015, 2013; Hönig et al. 2016). Moreover, such planners usually consider the joint-agent state/action space which limits their scalability. A prominent example is presented in Alonso-Mora et al. (2018) where an ltl encoding of motion Raman et al. (2013) is used as a centralized map system.

Distributed constraint optimization (dcop) The dcop framework (Modi et al. 2003) has been extensively used for modeling and solving distributed multi-agent coordination problems, ranging from virtual agents (Matsui et al. 2008), meeting scheduling (Hoang et al. 2016), emergency vehicles (Ferreira et al. 2009), and sensor networks (Lesser et al. 2012) to mobile robot teams (Jain et al. 2009; Yedidsion and Zivan 2016). In this body of work, the papers that applied dcop to real robots only considered one action at each iteration; to the best of our knowledge no previous work has incorporated dcop with planning so that agents reason about a sequence of actions at every iteration.

In this article, we develop the iidp framework and algorithms that compute a joint plan for a team of robots, without computing plans in a joint search space, significantly improving the planning efficiency compared to central planning techniques and baseline algorithms such as independent or sequential planning.

3 Definition of MRPCS problems

In this section we detail the assumptions taken to represent the Multi-Robot planning with conflicts and synergies (mrpcs) problem and formally define it.

Distributed system: We assume the set of robots are distributed, in the sense that each has its own goal, knowledge representation, and the ability to plan a sequence of actions to achieve that goal. We assume that each robot can work on at most one task at a time and each task requires only one robot, which corresponds to “single-robot”, “single-task” multi-robot planning problems (Gerkey and Matarić 2004). The tasks are not transferable among robots, and there is no priority to one robot’s task over another’s. Agents’ tasks are loosely coupled, such that one agent’s action can affect the cost of another’s, but there is no action that strictly requires the collaboration of two or more agents.

Communication: We assume that agents are able to communicate with each other reliably, anytime, anywhere about resources required by their current plans (e.g., which corridors are needed and when), which can be realized by directly communicating about the agents’ plans.

Resource sharing and action synergies: Collaborations among robots are realized via resource sharing and constructing action synergies. We assume the agents share the same environment and constrained resources, e.g., corridors that allow one robot at a time. Agents are self-interested and are unwilling to sacrifice their utility to improve social welfare. However, they are indifferent to taking actions that help other agents without introducing extra costs in their own goal achievement, such as sharing door openings or leading a platoon.

Uncertainty: We assume that action outcomes are deterministic, but action durations and costs are uncertain. For instance, two robots running into each other in a narrow corridor (that probabilistically allows only one robot at a time) can still achieve their original navigation goals, but resolving this conflict causes both robots significant extra costs (in the form of extra time to complete navigation actions). This setting motivates the robots to plan to avoid such conflicts.

Homogeneity: We assume that the robots are homogeneous, i.e., all robots share the same set of actions, and action outcomes (deterministically or probabilistically) are the same while being executed on different robots.

Costs: The agents estimate expected costs of their own plans according to the current plans of other agents. Each agent can model its own action costs and interaction costs (i.e. changes in action costs caused by conflicts and synergies). For example, the time for a robot to execute a navigation action at normal speed is an individual action cost, and the time it is blocked by another robot is an interaction cost. Interaction costs are asymmetric, i.e., a difference in cost for one agent in an interaction is not necessarily the same difference in cost for its counterpart. We model collaboration failures of constrained resources as finite costs (i.e., soft constraints).

Following the above assumptions, an mrpcs problem can be specified in the form of \(\langle {\mathcal {N}}, {\mathcal {D}}, {\mathcal {A}}, {\mathcal {S}}, {\mathcal {G}}, {\mathcal {F}} \rangle \):

  • \({\mathcal {N}}\) is a set of robots. \(|{\mathcal {N}}|=N\)

  • \({\mathcal {D}}\) is a description of the environment, including objects, their properties, and their relations.

  • \({\mathcal {A}}\) is a set of robot actions, each \(a\in \mathcal {A}\) being described by its preconditions, effects, and cost \(C_A(a)\).

  • \({\mathcal {S}}\) is a set of states in which each is the current state for a robot: \(s_i\in {\mathcal {S}}\) is the state of the ith robot and \(|{\mathcal {S}}| = N\).

  • \({\mathcal {G}}\) is a set of goal states in which each corresponds to a robot: \(g_i\in {\mathcal {G}}\) is the goal state of the ith robot and \(|{\mathcal {G}}| = N\).

  • \({\mathcal {T}}\) is a set of interactions in which each is a set of actions \(\{a_0, a_1, ...\}\) that lead to conflicts or synergies when executed at the same time. The cost of an interaction \(t \in \mathcal {T}\) is \(C_T(t)\).

The domain description \({\mathcal {D}}\) includes the environmental information that does not change over time. For instance, two rooms being directly accessible to each other should be included in \({\mathcal {D}}\) (whereas through-door accessibility should not, because it can be changed by robot actions). Action description \({\mathcal {A}}\) focuses on robot capabilities of making changes in the domain, e.g., a door-opening action can change a door’s property from closed to open”. A robot’s current state, \(s\in {\mathcal {S}}\), and goal, \(g\in {\mathcal {G}}\), are specified by values of domain properties. \({\mathcal {D}}\) and \({\mathcal {A}}\) correspond to the rigid and dynamic laws of action languages respectively (examples in Sect. 5.1).

The overall goal of solving an mrpcs problem is to maximize overall system utilities under the assumption that agents are self-interested, i.e., to find the user equilibrium with the best social welfare. The means of efficiently solving an mrpcs problem is by providing the agents with an algorithm that will determine when and what they communicate with each other, and when and how they plan and re-plan. The solution is a set of plans \(P = \{p_0, p_1,\ldots , p_{N-1}\}\), one for each robot, where the objective is to minimize C(P), the total costs of actions and the expected costs of interactions:

$$\begin{aligned} C(P) = \sum _{i = 0}^{N-1} \sum _{a \in p_i} C_A(a) + \sum _{t \in \mathcal {T} \cap P} C_T(t) \cdot Pr(t, P) \end{aligned}$$
(1)

where \(\mathcal {T} \cap P\) is the set of interactions that can be enabled by actions in P, and Pr(tP) is the probability that t occurs if P is executed (with temporal uncertainties).

4 The IIDP framework and two IIDP algorithms

Optimally solving multi-robot planning problems requires modeling spaces of both joint states and joint actions. The exponentially increasing number of joint actions and possible inter-dependencies of concurrent actions make optimal multi-robot planning challenging. The complexity of multi-robot planning is analyzed in Wagner and Choset (2015). When we further consider noisy action costs, such as the temporal uncertainty in action durations, the mrpcs problem becomes extremely difficult, even if the number of robots and the length of individual plans are within a reasonable range. In this section, we aim to provide a general solution to Multi-Robot Planning with Conflicts and Synergies (mrpcs) problems that cannot be solved using existing methods.

It is observed that single-robot symbolic task planning methods typically abstract away lower-level planning such as continuous motion planning, and focuses on finding a sequence of higher-level actions required to achieve a goal task. Similar to that abstraction process, our Iterative Inter-Dependent Planning (iidp) framework for mrpcs problems abstracts away the process of single-robot planning and focuses on creating a mechanism that enables the agents to efficiently reason about conditional plans. We further define the inter-dependent cost function\(\mathcal {C}(p, P^M)\) to be a function that estimates the sum of individual action costs of a plan p, and the interaction costs with a set of other robots M and their current plans \(P^M\). The implementation of \(\mathcal {C}(p, P^M)\) requires the modeling of action preconditions, effects, and costs, which is highly domain dependent and hence its development is independent of iidp.

4.1 The general structure of an iidp algorithm

figure f

The general structure of the iidp algorithms is depicted in Algorithm 1,

It defines an iteration-based negotiation, where in each iteration, each agent computes a plan, sends it to the other agents and receives theirs, then decides whether to switch to an alternative plan. The input of iidp includes robot \(r_i\)’s initial and goal states (\({s_i}\) and \({g_i}\)). The \(s\xrightarrow {p}g\) symbol represents that plan p leads to state transitions from initial state s to goal g. In the end of the program, plan \(p_i\) is returned. The framework assumes that all the robots concurrently run the same algorithm. \(\mathcal {M}(r_i)\) is a function that returns a subset of robots that \(r_i\) should consider. The implementation of \(\mathcal {M}\) is orthogonal to the iidp framework and is purposely left flexible to allow domain-dependent optimization of planning time. For example, to apply the iidp framework to a building-wide team of robots, it is reasonable to assume interactions can only occur among spatially close robots, and make \(\mathcal {M}\) return the robots that are currently on the same floor as \(r_i\). The notation \(P^M\) represents the set of plans of robots in the subset M.

4.2 The best alternative algorithm

figure g

Algorithm 2 shows the Best Alternative algorithm.

In this algorithm, the agents reason about which one has the best alternative plan and the one with the best alternative switches to its alternative planFootnote 2. The difference between the inter-dependent cost of the current plan and the inter-dependent cost of the best alternative plan for robot \(r_i\) is: \(\delta _i=\mathcal {C}(p_i, P^M)-\mathcal {C}(\hat{p_i}, P^M)\).

The stopping condition for this algorithm is realized when no agent has a better alternative plan. Since reaching this condition cannot be guaranteed, we added a limit on the number of iterations to ensure termination.

It should be noted that in Line 5, the operation of \({{\,\mathrm{\arg \min }\,}}\) requires a symbolic task planner for computing a sequence of actions while minimizing the overall plan cost. The notion of distributed optimization through using individual best alternative was appropriated from the maximum gain message (MGM) algorithm (Maheswaran et al. 2004) and its adaptation to mobile agent teams (MGM_MST) with dynamically changing constraints (Zivan et al. 2015).

4.3 The increasing dependency algorithm

figure h

Algorithm 3 shows the increasing dependency algorithm.

Informally, it iteratively computes and saves the conditional “optimal” plan for each robot given other robots’ current plans. In each iteration, the cost of conflict penalty increases (from zero in the first iteration to its full cost in the last iteration) and the utility of enabling a synergy increases (from 0 to full cost reduction).

\(\varTheta \) is an important parameter that represents how many rounds of negotiations the robots can perform before finalizing their plans, where a negotiation means a robot updates its plan based on plans of (not necessarily all) its teammates. The value of \(\varTheta \) influences the performance of increasing dependency in the following way. In lines 712, we enter a for-loop that has \(\varTheta + 1\) iterations, where \(\alpha \) is a negotiation depth parameter that incrementally grows by \(1/\varTheta \) in each iteration (Line 10). The loop continues until \(\alpha \) reaches 1. Intuitively, the negotiation depth measures how much a robot considers its teammates: when \(\alpha = 0\), it totally “ignores” its teammates (conflicts have no cost and collaboration failures have high costs); when \(\alpha = 1\), it considers its teammates as important as itself (costs are not discounted). Inter-dependent planning is conducted in Line 11, where we compute the optimal conditional plan while minimizing the conditional plan cost given the current plans of its \({M^{\prec i}} \subseteq M\) dependent teammates that precede it in the planning order \({\mathcal {N}^{\prec }}_j\), which is a predefined (arbitrary) order over \(\mathcal {N}\).

The difference between the increasing dependency algorithm and the best alternative algorithm is that in the best alternative algorithm the agents calculate alternative plans according to the full costs of plan dependencies. On the other hand, in the increasing dependency algorithm, agents explore the range of solutions between no dependency and full dependency to detect better solutions that lie in between these two extremes.

4.4 Illustrative examples

We make the following assumptions in the following examples:

  1. 1.

    Robots conflict at a node if they arrive at the same time step (even when they are moving in the same direction).

  2. 2.

    Costs of conflicts are cumulative, i.e. the cost of two conflicts is higher than the cost of one conflict. This allows the algorithm to minimize the probability of conflicts, if not possible to avoid them completely.

  3. 3.

    Robot i starts at \(s_i\) and plans to go to \(g_i\).

Fig. 4
figure 4

Example for the different behavior of the two algorithms. In this example increasing dependency outperforms best alternative. By using the best alternative algorithm, agents \(r_1\) and \(r_2\) conflict and the total cost is 105, while using the increasing dependency algorithm, results in no conflict and a solution cost of 10

An illustrative example is shown in Fig. 4 to reflect the difference between the algorithms’ performance. In this example increasing dependency outperforms best alternative. Consider three robots (\(r_1, r_2\), and \(r_3\)) with their respective start and goal states (\(s_i \rightarrow g_i\)). Nodes AB and C represent constrained states which cause conflicts if two agents reach them at the same time step. The edges represent actions that transition from one state to another while the numbers next to each edge represent the cost of taking that action. Conflict costs are modeled by a very large number \(\gg 100\). In the best alternative algorithm, the agents would start by selecting their best independent plans:

  • \(r_1 : s_1 \rightarrow A \rightarrow C \rightarrow g_1\)

  • \(r_2 : s_2 \rightarrow A \rightarrow C \rightarrow g_2\)

  • \(r_3 : s_3 \rightarrow B \rightarrow g_3\)

This set of plans creates two conflicts for robots \(r_1\) and \(r_2\) in nodes A and C. Robot \(r_1\)’s best alternative is to take an expensive action (100) directly to its goal \(g_1\). Robot \(r_2\)’s best alternative is to go through node B and still have one conflict with robot \(r_3\). In this case only robot \(r_1\) will switch and the resulting solution cost is 105.

In the increasing dependency algorithm, the initial plans would be the same but as conflict costs increase to 2, agent \(r_2\) will switch to the plan that goes through node B, followed by agent \(r_3\) switching to a direct action to \(g_3\) resulting in no conflict and a solution cost of 10, which is also the optimal solution in this case.

Fig. 5
figure 5

Example for the different behavior of the two algorithms. In this example best alternative outperforms increasing dependency

Figure 5 provides an example where best alternative outperforms increasing dependency. Consider two robots (\(r_1, r_2\)) with their respective start and end states (\(s_i \rightarrow g_i\)). Node A represents a constrained state which causes a conflict if two agents reach it at the same time step. The edges represent actions that transition from one state to another while the numbers next to each edge represent the cost of taking that action. Conflict costs are modeled by a cost of 100 for each robot. In the best alternative algorithm, the agents would start by selecting their best independent plans:

  • \(r_1 : s_1 \rightarrow A \rightarrow g_1\)

  • \(r_2 : s_2 \rightarrow A \rightarrow g_2\)

This set of plans creates a conflict for robots \(r_1\) and \(r_2\) in node A. Robot \(r_1\)’s best alternative is to take \(s_1 \rightarrow g_1\) directly to its goal with a cost of 4 resulting in \(\delta _1=101-4=97\) . Robot \(r_2\)’s best alternative is to go directly to its goal with a cost of 2 resulting in \(\delta _2=100-2=98\). In this case only robot \(r_2\) will switch and the resulting solution cost is 1+2=3, which is the optimal solution.

In the increasing dependency algorithm, the initial plans would be the same but in the first round of negotiation, as the conflict costs increase to 5, agent \(r_1\) will switch to the plan that goes directly to \(g_1\) incurring a cost of 4. Agent \(r_2\) which is the second in the planning order would not switch its plan since now it has no conflict, resulting in a solution cost of 4, which is not the optimal solution in this case.

4.5 Baseline algorithms

A very crude baseline to compare inter-dependent planning algorithms against is simply having the agents greedily compute individual plans without considering those of other agents (which is the initial stage of the iidp algorithm’s planning process).

A slightly more advanced baseline would be a single sequence of inter-dependent planning where each agent plans according to the plans of the agents in front of it in a set order. We call this algorithm single order.

In fact, we can see both of these baseline algorithms as variations of the increasing dependency algorithm. When \(\varTheta =0\), the robots compute plans independently, because we define \(\alpha =0\) which results in the independent planning algorithm. When \(\varTheta =1\), there is only one round of negotiation which results in the single order configuration. The performance of single order is sensitive to the order of the robots being planned for, because a given robot considers all robots in front of it (in plan queue \(Q^M\)) but none of the robots after it. An extreme case is that the Nth robot’s updated plan is not considered by any of its teammates. When \(\varTheta >1\), there are multiple rounds of negotiations and we get the intended increasing dependency algorithm.

Finding the best order requires going over all possible N! orderings. As another baseline, we propose the best order algorithm which provides the solution achieved by the best possible sequential planning order.

figure i

We present examples as evidence that neither the best order nor increasing dependency algorithms always finds the optimal plan.

Fig. 6
figure 6

Two examples with avoiding conflicts. The left diagram shows a case where increasing dependency outperforms best order. The right diagram shows a three-robot example where best order finds the optimal plan and increasing dependency fails. Each node in the graph represents a state of a single agent. Each edge is an action that can be taken from the connected node. There are no self edges (the agent can never stay in its current state). No more than one agent can be in the same state

Figure 6 shows an example where increasing dependency outperforms best order (Left) and an example of best order outperforms increasing dependency (Right).Footnote 3

In the left diagram of Fig. 6, the individual plans can possibly conflict at nodes A and D. increasing dependency successfully avoids both conflicts by suggesting plan (\(s_1 \rightarrow A \rightarrow C \rightarrow g_1\), \(s_2 \rightarrow B \rightarrow D \rightarrow g_2\)), producing the optimal solution. best order will try two orderings: \(1 \rightarrow 2 \quad and \quad 2 \rightarrow 1\). single order with planning order \(1 \rightarrow 2\) generates the plan (\(s_1 \rightarrow A \rightarrow D \rightarrow g_1\), \(s_2 \rightarrow B \rightarrow D \rightarrow g_2\)), which results in a conflict at D. Similarly, single order with the planning order \(2 \rightarrow 1 \) generates a plan that causes a conflict at A. In this example, increasing dependency outperforms best order.

The right diagram of Fig. 6 shows a three-robot example where best order outperforms increasing dependency. best order is able to find the optimal plan (with the ordering \(1 \rightarrow 2 \rightarrow 3\)): (\(s_1 \rightarrow A \rightarrow D \rightarrow g_1\), \(s_2 \rightarrow B \rightarrow E \rightarrow g_2\), \(s_3 \rightarrow C \rightarrow F \rightarrow g_3\)). increasing dependency produces a locally optimal solution (\(s_1 \rightarrow A \rightarrow D \rightarrow g_1\), \(s_2 \rightarrow A \rightarrow D \rightarrow g_2\), \(s_3 \rightarrow B \rightarrow E \rightarrow g_3\)). \(r_1\) and \(r_2\) cannot switch to conflict-free plans due to the plan of \(r_3\), while \(r_3\) has no incentive to change its plan.

4.6 Complexity analysis

The complexity of the different algorithms can be analyzed in two aspects, computation complexity and communication complexity. The presented analysis is a function of four variables: N which is the number of robots, C which denotes the complexity of a single inter-dependent planning operation (\(\mathcal {C}\) in Line 11), SR which is the complexity of a send/receive operation in which a single robot sends out its plan to the rest of the robots and receives their plans, and \(\varTheta \) which is the negotiation depth.

  1. 1.

    The best alternative algorithm has \(O(\varTheta \cdot (C+N \cdot SR))\) complexity in the worst case since the algorithm runs for \(\varTheta \) times at most, and at each iteration all the robots compute their plans simultaneously (C). The send/receive operation is multiplied by N to account for systems where the communication is sequential.

  2. 2.

    The increasing dependency algorithm requires that each robot performs one inter-dependent planning operation, and one send/receive operation at each negotiation depth. The inter inter-dependent planning for each robot is done in a sequential manner (one robot at a time) summing up to a total complexity of \(O(\varTheta \cdot N \cdot (C+SR))\).

  3. 3.

    The best order algorithm has \(O(N! \cdot N \cdot (C+SR))\) complexity because for every possible order (N!), every robot, by order (N), performs a single inter-dependent planning operation (C), and a send/receive operation (SR). The best order algorithm is therefore not applicable to large instances and will only be discussed in relation to small scale examples.

Note that none of the above variants of the iidp framework produce a contingency plan. Uncertainty in action duration might lead to a plan’s failure during runtime. In such cases, the iidp framework is reactivated in order to compute a valid plan. In theory, a plan might fail at every time step resulting in a new call to the iidp planner. In such cases, the complexity analysis provided above should be multiplied by a factor of T, where T is the planning horizon.

5 An instantiation of IIDP

In this section, we instantiate iidp using a multi-robot navigation task. Symbolic task planning techniques are needed because robots need to reason about between-room accessibilities and plan to open doors as needed. The single-robot version of this domain (without modeling noisy action durations) has been studied in existing research Khandelwal et al. (2014), Zhang et al. (2015). In this section, we present our symbolic planner, shifted-Poisson distributions for accumulating temporal uncertainty over actions, and a novel algorithm for computing conditional plan cost under temporal uncertainty.

5.1 Single-robot symbolic planning using bc

We use action language bc Lee et al. (2013) for symbolic planning in this work because it can formalize defaults and recursively defined action effects (e.g., two rooms are accessible to each other if each of them is accessible to a third room). However, the algorithms developed in this article are not restricted to specific action languages or symbolic planners. We adapt existing formulations of bc-based, single-robot navigation tasks Khandelwal et al. (2014), Zhang et al. (2015) for multi-robot settings. For instance, we use the following rules to define the ownership between rooms and doors:

$$\begin{aligned} \begin{array}{l} hasdoor(l1,d1).~hasdoor(l1,d2).~hasdoor(l2,d2).~\ldots \\ \mathbf{default}~~\lnot hasdoor(L,D). \end{array} \end{aligned}$$

where, R and D represent a room and a door respectively. The last rule above is a default for reasoning with incomplete knowledge: it is believed that roomRdoes not have doorDunless there is evidence supporting the contrary.

Action description, \({\mathcal {A}}\), includes the rules that formalize the preconditions and effects of actions that can be executed on each robot. We use fluents open(D), facing(D), beside(D), and loc(L) to represent door D is open, the robot is facing door D, the robot is beside door D, and the robot is in room L. Robot identities are not included in the representation of a robot’s location, loc(L), because a robot’s state does not model the state of other robots.

Robot actions include approach(D), opendoor(D), cross(D), and waitforopen(D), where waitforopen(D) enables a robot to wait for another robot to open door D and is only useful in multi-robot systems. As an example of how actions are modeled, we arbitrarily select action cross(D) and present its definition as below. Crossing door D changes the robot’s location from \(L_1\) to \(L_2\), the room on the other side of door D. The last three rules below describe the executability, e.g., cross(D) cannot be executed if door D is not open.

$$\begin{aligned} \begin{array}{l} cross(D)~~\mathbf{causes}~~\lnot facing(D).\\ cross(D)~~\mathbf{causes}~~loc(L_2)~~\mathbf{if}~~loc(L_1),~acc(L_1,D,L_2).\\ \mathbf{nonexe}~~cross(D)~~\mathbf{if}~~loc(L),~\lnot hasdoor(L,D).\\ \mathbf{nonexe}~~cross(D)~~\mathbf{if}~~\lnot facing(D).\\ \mathbf{nonexe}~~cross(D)~~\mathbf{if}~~\lnot open(D). \end{array} \end{aligned}$$

Given a planning goal, a planner can find many solutions. We select the one that minimizes the overall cost. In implementation, to model the progress of navigation actions (approach, in our case), we discretize distance by representing each corridor using a set of grid cells. Accordingly, each approach action is replaced by a sequence of actions that lead the robot to follow waypoints. We use clingo4 for solving bc programs (Gebser et al. 2014).

5.2 Modeling noisy action durations

In single-robot systems, following the plan generated by a symbolic planner, such as our bc-based planner, a robot can execute actions to optimally achieve the goal. When multiple robots share an environment, their plans might interact such that their independently-computed optimal plans become suboptimal at runtime. In order to leverage such interactions toward sharing resources and constructing action synergy, it is necessary to model the temporal uncertainty (in noisy action durations) that propagates over actions.

This subsection presents a novel model for representing and reasoning about temporal uncertainty in the noisy durations of navigation actions. This representation is used for not only modeling individual actions’ noisy durations but also accumulating the uncertainty over a sequence of actions. In this article, we consider only the temporal uncertainty from navigation action approach(D). Deriving the probability density function (pdf) of approach(D) builds on the following assumptions:

  1. 1.

    Unless explicitly delayed, the robots move at constant velocity v. Unless specified otherwise, \(v = 1\) in this article.

  2. 2.

    A human obstacle appears within every unit distance at a known rate, and their appearances are independent of each other. We use \(\lambda \) to denote this rate.

  3. 3.

    While taking action approach(D), each obstacle appearance causes a delay for a known amount of time, \(\delta \).Footnote 4

  4. 4.

    Non-navigation actions do not introduce extra uncertainty at run time, and navigation actions cannot be delayed more than once.

Following Assumptions 1 and 2, we can use a Poisson distribution to model the number of delays caused by human appearances in a unit time and its corresponding pdf is:

$$\begin{aligned} \hat{f}(k,\lambda )=\nicefrac {\lambda ^k e^{-\lambda }}{k!} \end{aligned}$$
(2)

where e is Euler’s number and k is the number of delays.

Proposition 1

If X and Y are two independent discrete random variables with a Poisson distribution: \(X \sim Poisson(\lambda _1)\) and \(Y \sim Poisson(\lambda _2)\), then their sum \(Z = X + Y\) follows another Poisson: \(Z \sim Poisson(\lambda _1 + \lambda _2)\) (Knill 1994).

According to Proposition 1, when a robot travels for time t (instead of unit time), the number of delays, \(k'\), accumulates over time and follows another Poisson distribution with a pdf of \(\hat{f}(k', \lambda ')\). Following Assumption 1, parameter \(\lambda '\) is a function of traveled distance d:

$$\begin{aligned} \lambda '(d)=\lambda \cdot t(d)=\lambda \cdot \nicefrac {d}{v} \end{aligned}$$
(3)

Since \(k'\) follows a Poisson distribution, we can compute the overall time needed for traveling a distance of d:

$$\begin{aligned} t = t^{act} + t^{del} =\nicefrac {d}{v} + k'\cdot \delta \end{aligned}$$
(4)

where \(t^{act} = d/v\), as a linear function of distance d, represents the acting time, and \(t^{del}=k'\cdot \delta \) is the delayed time.

Using Eqs. 3 and 4, we can see the overall navigation time t follows a shifted Poisson distribution with pdf:

$$\begin{aligned} f\big (t,\lambda '(d)\big )= \nicefrac {\big (\lambda '(d)\big )^{\frac{t-d/v}{\delta }}\cdot e^{-\lambda '(d)}} {\big (\frac{t-d/v}{\delta }\big )!} \end{aligned}$$
(5)

Figure 7a visualizes two example pdfs. For instance, it is the most likely that traveling distance \(d = 50\) at velocity \(v = 1\) takes 60 time units while modeling possible delays (instead of 50 in obstacle-free domains). It also shows that a longer distance produces more uncertainty in completion time. We also collected navigation time using a real robot, and the results shown in Fig. 7b suggest that a shifted Poisson distribution can well represent noisy durations of navigation actions (with parameters properly set).

Fig. 7
figure 7

apdfs of two shifted Poisson distributions used for modeling the noisy durations of navigation actions: \(v=1\), \(\delta =5\), and \(\lambda =0.05\). A longer distance brings more uncertainty; and b A real robot navigates in a reasonably busy corridor (28m) for 164 times. The shifted Poisson well models the action’s noisy durations

We further remove the parameter of \(\lambda '\) and substitute d / v with \(t^{act}\), and use \( Dist (t^{act},\lambda ')\) to represent a distribution over possible lengths of completion time. Modeling noisy action durations in this way paves the way to further investigating how uncertainty is accumulated over plans that include a sequence of actions. For instance, \(t^{act} = 50\) and \(\lambda ' = 2.5\) correspond to the (blue) circle-mark curve in Fig. 7a. Since (we assume) non-navigation actions do not introduce extra uncertainty at running time, Equation 4 can be directly applied to modeling the distribution over possible lengths of time consumed by a sequence of actions including potentially both navigation and non-navigation actions. A plan of form \(\langle a_0,a_1,\ldots \rangle \) can be represented as below to further model the distribution over possible lengths of completion time of each action. We call p an extended plan (or simply plan).

$$\begin{aligned} p:~\langle ~(a_0,t^{act}_0,\lambda '_0), (a_1,t^{act}_1,\lambda '_1), \ldots ~\rangle \end{aligned}$$

According to Proposition 1, the time consumed by executing the first K actions in plan p follows a distribution of:

$$\begin{aligned} Dist \left( \sum _{k=0}^{K-1}{t_k}^{act}, \sum _{k=0}^{K-1}{\lambda '_k}\right) \end{aligned}$$

Therefore, \( Dist (t^{act},\lambda ')\) represents a novel distribution that can model the temporal uncertainty that accumulates over a sequence of actions in robot navigation problem. Note that other application domains may require very different representations (pdfs) for modeling their noisy action durations, and this subsection, as an illustrative example and for the purpose of evaluating iidp, simply presents a concise pdf representation for navigation actions.

5.3 Computing inter-dependent plan cost: \(\mathcal {C}\)

In a two-robot system that includes robots r and \(r'\), p and \(p'\) are the robots’ extended plans. The inter-dependent plan cost of \(p'\) given p is the estimate of total cost robot \(r'\) will consume, if r and \(r'\) simultaneously execute their plans, p and \(p'\), respectively. Different from single-robot planning, we have to consider possible conflicts and door-sharing behaviors (and any conflicts or synergies in general) in computing inter-dependent plan costs. We first compute the probability of robot \(r'\)’s navigation action \(a'\) overlapping p’s navigation action a over time (parameter \(\lambda \) omitted from pdfs), while the overlapping in space is handled by the symbolic task planner:

$$\begin{aligned} { Pr }^{ ovlp }(a,a')=1&- \int \limits _0^{\infty } \int \limits _{t_2}^{\infty } f^s_1(t_1)f^c_2(t_2)\,\mathrm{d}t_1\,\mathrm{d}t_2\nonumber \\&- \int \limits _0^{\infty } \int \limits _{t_1}^{\infty } f^c_1(t_1)f^s_2(t_2)\,\mathrm{d}t_2\,\mathrm{d}t_1 \end{aligned}$$
(6)

where \(f^s_1\) and \(f^c_1\) are the pdfs of starting and completion times of action a; \(f^s_2\) and \(f^c_2\) are the pdfs of starting and completion times of action \(a'\).

The first double integral computes the probability of the completion of \(a'\) being earlier than the start of a, and the second computes the probability of the start of \(a'\) being after the completion of a.Footnote 5

We use \(t^{wait}(a,a')\) to represent the time of robot \(r'\) waiting for r to open door D, where \(a'\) is \(r'\)’s action and is waitforopen(D).

$$\begin{aligned} t^{\textit{wait}}(a,a')= \int \limits _0^{\infty } \int \limits _{t_2}^{\infty } (t_1 - t_2)f_1^c(t_1)f_2^s(t_2) \,\mathrm{d}t_1\,\mathrm{d}t_2 \end{aligned}$$
(7)

where \(f_1^c\) is the pdf of the completion time of action a; and \(f_2^s\) is the pdf of the start time of action \(a'\).

It is possible that robot r has finished the action of going through door D before robot \(r'\) arrives. In this case, robot \(r'\) may have avoided closer doors and has to reopen the door. We compute the probability of such failures:

$$\begin{aligned} \textit{Pr}^{\textit{fail}}(a,a')= \int \limits _0^{\infty } \int \limits _{t_1}^{\infty } f_1^c(t_1)f_2^s(t_2) \,\mathrm{d}t_2\,\mathrm{d}t_1 \end{aligned}$$
(8)

where \(f^c_1\) is the time of robot n completing action a, the action of opening door D, and \(f^s_2\) is the time of robot \(r'\) starting the action of waitforopen(D), i.e., \(a'\).

figure j

Algorithm 5 presents our algorithm for computing inter-dependent plan cost (in our navigation domain). While computing the cost of waitforopen(D), we need to consider both the cases that have synergy and those that failed (Lines 67). Although the form of temporal uncertainty varies significantly over different robot actions, this approach can be easily applied to other domains for sharing limited resource and constructing “wait-for-action”-style synergies, as long as the pdfs of the actions’ durations are available.

6 Experiments

We have implemented iidp in two domains:

  1. 1.

    An abstract domain that enables testing the algorithms’ performance on large scale instances of agents, state, action, and interaction spaces.

  2. 2.

    An instantiated multi-robot navigation domain where we test both in simulation and on real robots. In this domain we integrate temporal uncertainty into the model. Simulation experiments in this domain were conducted using a realistic multi-robot simulation environment (gazebo Koenig and Howard 2004) and a much faster simulator that does not have an interface for visualization or a physics engine for simulating conflict consequences. Noisy action durations in the abstract simulator are sampled from predefined distributions.

Fig. 8
figure 8

Total cost for different numbers of agents. increasing dependency and best alternative overlap

Fig. 9
figure 9

Percentage of cost reduction from the independent planning algorithm cost

Experiments were conducted to investigate how different values of \(\varTheta \) (single order vs. increasing dependency) affect the performance in reducing overall cost, to evaluate the necessity of modeling noisy action durations using our probabilistic model, and to study the performance of iidp in systems that include varying numbers of robots.

6.1 Abstract domain simulations

In this section we test the iidp framework and algorithms on an abstract domain. The software simulator we developed to represent mrpcs problem was designed to enable running multiple experiments with varying numbers of agents, and different sizes of state, action, and interaction spaces. The simulator represents the planning domain of each agent in a graph with states as nodes and actions as edges. An mrpcs problem can be generated by randomly selecting state transitions and agent interactions.Footnote 6 We use the problem generator to compare the performances of the iidp algorithms with respect to the number of agents.

Fig. 10
figure 10

Number of conflicts for different numbers of agents

Fig. 11
figure 11

Number of synergies for different numbers of agents

The parameter setting for our experiments was chosen as follows: the number of agents \(N={2...50}\), the number of states in each agent’s planning domain \(S=10\), the number of actions \(A=4 \cdot S\), the total number of possible interactions \(|\mathcal {T}|=N \cdot 100\). The average sequence of actions per agent from start to end goal is 5. The increasing dependency parameter \(\varTheta =2 \cdot A\) as well as the best alternative parameter \(\varTheta =2 \cdot A\). The cost of every action is 1, the cost of every conflict is an additional 1, and the benefit of a synergy is a reduction of 1 in the cost of the action. All costs are per action per agent.

The algorithms that were tested are: increasing dependency (ID), best alternative (BA), and independent planning (IP). For every number of agents we generated 100 random problems where random elements were the start states and end goals of the agents, the mapping of actions to states, and the pairs of actions that had conflicts or synergies. In total there were 4900 different abstract domains generated and all three algorithms ran on all domains. The results of every run were recorded and averaged to compare the algorithms’ performance as displayed in Figs. 811.

Figure 8 displays the total cost incurred by all the agents as they perform their plans. Each point in the graph is an average over 100 experiments. Both ID and BA outperform IP, with \(5.7 \%\) and \(5.5 \%\) average reduction respectively. These differences are statistically significant with p-value \(< 0.05\), using a paired T-Test. The difference between ID and BA however are not statistically significant but we do observe that ID outperforms BA in 42 out of 49 scenarios with regards to average cost. To demonstrate that, we present Fig. 9 that offers a closer look at the difference in cost reduction each algorithm has in comparison with the IP baseline.

Figure 10 shows the average number of conflicts that the agents face. Here, ID and BA also outperform IP significantly with 3.8 and 4.1 conflicts on average respectively compared to 6 conflicts for IP. Despite the fact that the difference in the average number of conflicts between ID and BA is not statistically significant, we do observe that ID outperforms BA in all 49 scenarios with regards to average number of conflicts.

Figure 11 shows the average number of synergies that the agents face. Again, ID and BA outperform IP significantly with 8.2 and 8 synergies on average respectively compared to 5.7 synergies for IP. ID outperforms BA in 42 out of 49 scenarios with regards to average number of synergies.

Our conclusion from this empirical evaluation is that the iidp algorithms are scalable and invariant to the number of agents or the size of the domain. Both proposed algorithms outperform a independent planning significantly, and IP has a small advantage over BA. To test robustness of these results, we experimented with 3 times the number of interactions and found that the results were qualitatively similar. We therefore continue our analysis of a practical domain with the ID algorithm.

6.2 Multi-robot navigation domain

In this section, we evaluate the iidp algorithms in the multi-robot navigation domain.

Gazebo simulation (Independent Planning versus Increasing Dependency) Fig. 12 (left) shows our gazebo testing environment. We add human walkers (right) to simulate the process of walking people causing delays to robot navigation actions. The floormap is divided into grid cells and taking a symbolic action (to one of the four directions) enables the robot to move to one of the nearby cells given no obstacles. Two robots need to navigate from their initial positions (green rectangles) to their goal positions (red ellipses). The two robots start at the same time, and we record the completion time for each of the robots. The performance is evaluated based on the robots’ overall completion time.

Fig. 12
figure 12

gazebo simulation environment (and a picture of a human walker blocking a robot)

Fig. 13
figure 13

Costs of robots R1 and R2 in 45 trials collected using gazebo simulation environment

Table 1 Average cost consumed (time) and standard deviation from gazebo simulation experiments (reported in Fig. 13)

Experiments in gazebo were conducted to visualize and validate the whole process of multi-robot plan generation and execution, and to compare increasing dependency to a baseline that computes plans for the robots independently independent planning. The results in the form of execution costs of two robots are shown in Fig. 13. We can see a cluster of red circles in the bottom-left, which indicates the increasing dependency algorithm reduces the overall plan cost. The blue squares on the left, for instance, correspond to the trials where robot R2 avoids R1 by taking a big detour (locally optimal solution). Table 1 shows the averages of the same set of results collected from gazebo. Considering both robots, increasing dependency (\(M = 1\), \(\varTheta = 2\)) significantly reduces the average completion time from more than 250 s to about 120 s.

Fig. 14
figure 14

Abstract simulation environment, where action durations are generated by sampling from pre-specified distributions. Doors are marked by the letter d and corridors by cor

Fig. 15
figure 15

Planning for a two-robot system (evaluating \(\varTheta \)): R1 and R2 need to move from d2 to d9 and from d7 to d4 respectively

Abstract simulation (Single Order vs. Increasing Dependency) In order to run a lot more trials, we use an abstract simulator: navigation actions’ noisy durations are sampled from a shifted Poisson distribution (Eq. 5); conflict cost is 40; waitforopen(D) failure cost is 12; and conflicts are possible only if both robots are taking navigation actions.Footnote 7 Figure 14 shows the domain map.

The first set of experiments in abstract simulation was conducted to evaluate how the number of rounds of negotiations (\(\varTheta \) in Algorithm 3) affects the overall cost. Figure 15 reports the results: \(\varTheta = 0\) means no collaborations between robots (baseline); \(\varTheta = 1\) and \(\varTheta = 2\) correspond to single order and increasing dependency respectively. Each data point corresponds to results from 50 trials (the same for the following results unless stated otherwise). For instance, when robot R1 is delayed by 5 time units, we see increasing dependency reduces the overall cost from more than 80 to lower than 50, and enables R1 and R2 to share door-opening actions. When one robot starts much earlier than the other (two ends of these curves), the overall costs are all about 45 no matter whether collaborations are enabled or not, because they can hardly cause conflicts or share doors. Comparing the triangle and square curves in both subfigures, we find that increasing dependency enables more action synergies than single order (via sharing door-opening actions), even when the overall cost reduction introduced by such synergies is small.

Analysis of temporal uncertainty Our next set of experiments evaluate the need for modeling temporal uncertainty, where the baseline does not model the noise in action durations (optimistically assuming no delays in navigation actions). R1 needs to move from d3 in corridor cor1 to door d4, so the best plan for R1 is to open and go through door d3 in any case. The head start of R1 varies in a relatively small range (\(\pm \,6\)).

Fig. 16
figure 16

Planning for a two-robot system (evaluating the need of modeling temporal uncertainty): R1 and R2 need to move from d3 (cor1 side) to d4 and from d7 to d5 respectively

Figure 16 reports the results of these experiments. When R1 has a head start of \(-1\), the overall cost of our increasing dependency algorithm is smaller than the baseline by more than ten (reduced from more than 50 to less than 40). Focusing on this performance improvement, we find robot R2 can either open the bottom door by itself or follow the first robot through the corridor door on the top. Without modeling temporal uncertainty (baseline), R2 is not aware of the risk of being delayed while moving upward. As a consequence, R2 will be moving to door d3 (hoping to follow R1 through door d3), even if it is in a risky situation that a single delay on the way will make it too late to catch up with R1’s door-opening action. The big variance in overall cost for the baseline corresponds to the fact that the trials where robot R2 succeeds in following R1 through the door and the trials where it fails produce very different overall costs. increasing dependency models the noisy action durations and enables R2 to dynamically evaluate the uncertainty from its teammate and itself, and is able to balance the risk and potential benefit to select the best path.

Fig. 17
figure 17

Planning for a three-robot system (overall cost under four configurations of increasing dependency): R1, R2 and R3 need to move from d1 to d9, from d6 to d4, and from d10 to d8 (cor1 side) respectively

Evaluation of the effect of\(\varTheta \)andM In a team that includes more than two robots, iidp algorithms have the option to consider only a subset of its teammates in conditional planning (specified by M in Algorithm 3). This set of experiments was conducted in a team of three robots to evaluate how the size of M affects the performance of iidp. Accordingly, we adjust the value of \(\varTheta \) (single order vs. increasing dependency) and the value of |M|. In this experiment, \(|M| = 1\) means M contains only the teammate right before \(r_i\) in the arbitrary order \({\mathcal {N}^{\prec }}_j\). Robots R2 and R3 have different head starts before R1’s plan execution. The subfigures of Fig. 17 report the results of nine different head start combinations. In each subfigure, the x-axis corresponds to one of the four increasing dependency configurations, and the y-axis corresponds to the overall cost. We do not see significant differences over the four increasing dependency configurations in most head start combinations. This corresponds to our expectation that, when the robots’ plans do not have (much) overlap in time, their plan executions do not affect each other and it is unlikely to have conflicts or construct synergy. In the middle-left and bottom-middle subfigures, we see considering two other robots (instead of one) significantly reduces the cost of robot R3 and the overall cost.

Table 2 Mean and standard deviation values of the four configurations in Fig. 17. Given \(|M| = 2\), the average overall cost using increasing dependency is significantly different from that of single order (v-value=0.0128)

Table 2 shows the performance of the four increasing dependency configurations. The reduction of average cost by considering plans of all other robots is significant, regardless of \(\varTheta \)’s value: p-value=0.03 when \(\varTheta = 1\), and p-value=0.02 when \(\varTheta = 2\). Given all other robots are considered in conditional planning, increasing dependency performs significantly better than single order (bold font). However, when only one other robot is considered, we do not see a significant difference between increasing dependency and single order (the left two columns). Therefore, increasing dependency with \(|M|=2\) performs significantly better than all three other configurations.

Robot trial Collecting statistical results using multiple robots on navigation tasks can be difficult in practice, because the robots sometimes take a very long time to finish a trial, especially when the collaborations are not successful, and robot collisions can cause physical damage to the robot platforms and sometimes to the environment. However, in order to demonstrate that our methods can be used to enable two real robots to collaborate by constructing an action synergy, we present an illustrative (successful) trial in the real world.

We have implemented the two configurations of increasing dependency and all actions formalized in action language \({\mathcal {BC}}\), including approach(D), opendoor(D), cross(D), and waitforopen(D), on a team of real robots.

Figure 18 shows a picture of R2 (robot on the right) waiting to follow R1 (robot on the left) through door D1, using increasing dependency.

Fig. 18
figure 18

Using increasing dependency, two robots construct action synergy by sharing a door-opening action: robot R1 asks for help from a human for opening the door and is executing the gothrough action, while robot R2 is waiting to follow R1 through the door

A video of this trial is available at: https://youtu.be/ADbH3sppLHQ.

7 Conclusions and future work

We introduce the multi-robot planning with conflicts and synergies (mrpcs) problem, and develop a novel, iterative inter-dependent planning (iidp) framework and algorithms. We propose two algorithms for iidp, increasing dependency and Best Alternative. We test the algorithms in an abstract domain which enables scaling to a large number of robots. We also instantiate iidp on a multi-robot navigation problem with temporal uncertainty, where we introduce a shifted Poisson distribution to represent robot navigation delays and present a novel algorithm for computing conditional plan cost. In experiments, we observe that iidp algorithms bring significant improvements against baselines where robots do not coordinate their plans, or coordinate but do not model temporal uncertainty, and increasing dependency enables more collaborative synergies and less conflicts, compared to independent planning and single order. Finally, we implement iidp on real robots and present a demo where the robots share plans to realize a synergy.

There are a number of ways to make improvements in this line of research. For instance, Sect. 3 includes a list of assumptions, which might not hold in some domains. A relatively strong assumption made in this article is that the robots can reliably communicate among the team members with zero cost. Future work can look into scenarios where robot communications are unreliable, delayed, or expensive. The single-robot, single-task assumption, as described in Sect. 3, can be removed in future work, e.g., to consider tasks that require two robots to move a box that is too heavy for individuals. The evaluation of our iidp framework in the real world was primarily using a team of mobile robots conducting navigation tasks. There are a number of other multi-robot tasks that can be used for evaluations, such as human guidance, dual-arm manipulation, and object delivery. The developed iidp framework was evaluated only using a homogeneous team of robots. After implementing and evaluating the iidp framework with homogeneous teams, natural follow-up work is to apply the developed iidp framework to heterogeneous teams or human-robot teams. The communication cost of human-robot teams will likely be much higher than multi-robot systems. However, given the increasing presence of collaborative robots in everyday environments, the potential impact of applying iidp to human-robot collaboration domains will be significant.