Introduction

In many real-world multi-robot systems, robots have to plan collision-free paths to different locations to execute different tasks. Today, thousands of warehouse robots already navigate fully autonomously to relocate inventory pods in automated fulfillment centers [1, 2] or deliver parcels in sortation centers [3]. In the coming years, autonomous aircraft-towing vehicles will tow aircraft from the runways to the terminal gates (and vice versa) at airports. Other examples include autonomous intersection management [4], forklift robot fleets [5, 6], game characters in video games [7], object-transportation robots [8], patrolling robots [9], service robots [10, 11], swarms of differential-drive robots and quadcopters [12,13,14], robots in formations [15], and other multi-robot systems [16].

Solving the path planning problem optimally for multiple robots is computationally challenging, especially for a large number of robots. However, the above real-world applications require computing high-quality collision-free paths for a large number of robots in a short computation time since shorter paths result in higher throughput or lower operating costs (since fewer robots are required to achieve the same throughput) of the systems.

Multi-Agent Path Finding

Many recent works in the artificial intelligence, robotics, and operations research communities have modeled the path planning problem for multiple robots as a combinatorial optimization problem on graphs, called multi-agent path finding (MAPF) [17, 18••]. MAPF has also been studied under the name of multi-robot path planning on graphs [19]. A MAPF problem instance consists of a connected undirected graph and a set of robots. The vertices of the given graph correspond to locations and the edges correspond to connections between locations that the robots can move along. Each robot occupies one vertex at each discrete time step and is given a start vertex and a goal vertex. Between two consecutive time steps, each robot takes an action to either move to an adjacent vertex or wait at its current vertex. Two robots collide if they move to the same vertex or traverse the same edge in opposite directions at the same time. The problem of MAPF is to find collision-free paths for the robots from their start vertices to their goal vertices. The objective is to minimize either the makespan, defined as the maximum of the arrival times of all robots at their goal vertices, or the flowtime, defined as the sum of the arrival times of all robots at their goal vertices.

Finding a solution to any MAPF problem instance or deciding its unsolvability can be done in polynomial time [20]. However, it is NP-hard (namely, unlikely that a polynomial-time algorithm exists) to find a solution with the minimum makespan [21] or the minimum flowtime [22•] to a MAPF problem instance, even if the given graph is a planar graph [23] or a 2D 4-neighbor grid [24]. In addition, it is NP-hard to compute an approximate solution within any constant factor less than 4/3 to a MAPF problem instance [25].

On one hand, recent advances in MAPF solving have resulted in powerful MAPF algorithms that can compute collision-free paths for a large number of robots in a short runtime, despite the complexity of solving MAPF optimally. These advances have resulted in a number of achievements, including a MAPF software [26] that recently won the Flatland Challenge [27], a train-scheduling competition at NeurIPS 2020 (one of the top machine learning conferences). The MAPF solver has been demonstrated to be capable of computing high-quality paths (namely with small makespan or flowtime) for up to 3000 robots in minutes of runtime on a simulator. On the other hand, there are key challenges [28] that must be addressed in order to apply MAPF algorithms to real-world applications of multi-robot systems, which requires techniques beyond MAPF solving. The following sections of this review survey latest advances that enhance MAPF solving and extensions to MAPF solving that tackle the research challenges in generalizing it to real-world scenarios.

MAPF Algorithms

Recent MAPF algorithms can be categorized into reduction-based, rule-based, and search-based algorithms. In the following, we survey their methodologies and highlight their properties in terms of completeness (complete for all MAPF problem instances, complete for MAPF problem instances on graphs with special properties, or incomplete) and optimality (optimal, bounded-suboptimal, or suboptimal with respect to different objectives). A MAPF algorithm is complete for a class of MAPF problem instances if it guarantees to return a solution for any solvable MAPF problem instance in the class or correctly decide that the given MAPF problem instance in the class is unsolvable in finite time.

Reduction-Based MAPF Algorithms

Reduction-based MAPF algorithms reduce MAPF to other well-studied combinatorial problems, such as Boolean Satisfiability [29], Integer Linear Programming [30], and Answer Set Programming [31, 32]. They are complete for all MAPF problem instances. They solve MAPF with the makespan objective optimally but can be modified to solve MAPF with other objectives optimally [30, 32, 33], bounded suboptimally (with the guarantee that the resulting solution is within a user-provided suboptimality factor from the optimal solution) [34], and suboptimally [35,36,37]. They perform well for MAPF problem instances on small-sized graphs with densely placed robots. For example, a state-of-the-art Integer Linear Programming–based MAPF solver can compute a solution with the minimum makespan for a MAPF problem instance on a \(24\times 18\) 2D 4-neighbor grid with 60 robots in less than 15 s of runtime [37].

Rule-Based MAPF algorithms

Rule-based algorithms solve MAPF using a set of primitive operations that specify the actions of the robots in different situations. They often guarantee completeness for only a restricted class of MAPF problem instances. Rule-based algorithms are often very efficient by simply following the predefined primitive operations but provide no guarantee on the solution quality (optimality). Push and Swap [38] and its extension [39] can compute a solution for 100 agents in less than 10 s of runtime but provide no completeness guarantee theoretically. One of their descendants, Push and Rotate [40], is complete for MAPF problem instances on graphs with at least two vertices that are unoccupied by robots. TASS [41] is complete for MAPF problem instances on “solvable” trees based on prior work on solving multi-robot motion planning on trees [42]. BIBOX [43] is complete for MAPF problem instances on bi-connected graphs with at least two vertices unoccupied by robots. Its descendant [44] works for strongly bi-connected directed graphs with at least two vertices unoccupied by robots. SAG [45] is complete for MAPF problem instances on grid-like “well-connected” graphs, runs in polynomial time, and provides a constant-factor approximation guarantee for minimizing the makespan on such graphs.

Search-Based MAPF Algorithms

Search-based MAPF algorithms [46] solve MAPF with heuristic search techniques. The main computational challenge of optimally solving MAPF with a search algorithm is that the number of possible states of a MAPF problem instance grows exponentially in the number of robots.

  • A*-based MAPF algorithms [47,48,49] plan paths with joint states but try to reduce the size of the state space they need to explore. They are complete for all MAPF problem instances and can be used for either makespan minimization or flowtime minimization.

  • Decoupled MAPF algorithms [50,51,52] plan paths for robots one at a time according to a predefined or a dynamic total ordering on the robots. Path planning for each robot uses an A* search in vertex and time dimensions that treats already planned paths of other robots as moving obstacles. Decoupled MAPF algorithms are often efficient but provide no optimality or even completeness guarantee. PIBT [53] develops a scheme to decide a partial ordering on the robots dynamically but only guarantees that all robots reach their goal vertices at least once (but possibly not at the same time) in finite time on biconnected graphs. MAPF-LNS [54] is another recent decoupled MAPF algorithm that uses Large Neighborhood Search [55], a local search algorithm, to improve a suboptimal MAPF solution by repeatedly replanning paths for a subset of robots. It is one of the core elements of the MAPF software [26] that won the Flatland Challenge. Its descendant MAPF-LNS2 [56] uses Large Neighborhood Search to improve a MAPF plan (paths of all robots) with collisions by repeatedly replanning paths for a subset of robots to reduce the number of collisions, until a collision-free MAPF plan (a MAPF solution) is obtained.

  • Hierarchical MAPF algorithms plan paths for robots individually on the low level and dynamically couple the resulting single-robot paths with a tree search on the high level. They are complete for all MAPF problem instances. Increasing Cost Tree Search [57] minimizes the flowtime. It performs a best-first tree search of all combinations of the arrival times of robots on the high level and checks whether collision-free paths exist for a combination of arrival times on the low level. Conflict-Based Search (CBS) [58••] is arguably the most popular optimal MAPF algorithm. It minimizes either the makespan or the flowtime. CBS first finds individually time-optimal paths for all robots (ignoring collisions). On the high level, it then performs a best-first search on a binary constraint tree. Each branching resolves one collision in the computed paths by imposing constraints on individual robots that forbid them from occupying a vertex or traversing an edge at a given time step. On the low level, CBS uses an A* search in vertex and time dimensions to replan for a robot that obeys the constraints. Many improvements to CBS have been proposed in recent years: Meta-Agent CBS [58••] dynamically groups multiple robots into a meta-agent on the high level and uses an A* search to plan paths for these robots with their joint states on the low level. ICBS [59] always first resolves collisions that result in child search nodes whose costs are larger than that of the current node, thus affording the high-level search of CBS pruning opportunities. CBSH [60] and its improvement [61] use an admissible heuristic to improve the high-level best-first search of CBS. Disjoint-Splitting CBS [62] expands each node in a way such that any solution is admitted by the subtree under only one but not both of its child nodes, thus reducing duplicate search effort of the high-level search of CBS. IDCBS [63] replaces the high-level best-first search of CBS with iterative-deepening depth-first searches. Symmetry-Breaking CBS [64,65,66•] and Mutex-Propagation CBS [67] add multiple constraints to a child node at a time to break symmetry in the high-level search of CBS. The best Symmetry-Breaking CBS variant has empirically been shown to compute optimal solutions for MAPF problem instances on a \(256\times 257\) 2D 4-neighbor grid with 100 robots in seconds of runtime [66•]. ECBS [68] and its improvements [69, 70] perform a bounded-suboptimal search on the constraint tree, making CBS bounded-suboptimal. Recent research [71] has also developed an anytime version of the bounded-suboptimal search on the constraint tree for CBS. Another line of recent research also uses machine learning to learn a good branching policy for the high-level search of CBS for both the optimal [72] and bounded-suboptimal [73] settings.

  • Hybrid MAPF algorithms combine several of the above search-based MAPF techniques or combine search-based MAPF techniques with reduction-based or rule-based MAPF techniques. SMT-CBS [74] replicates the high-level search of CBS with Satisfiability Modulo Theories, which is then solved by a Boolean Satisfiability solver, and minimizes either the makespan or the flowtime. Lazy CBS [75] replaces the high-level search of CBS with a Constraint Programming solver and minimizes the flowtime. BCP [76, 77] combines Branch-and-Cut-and-Price techniques for Mixed Integer Programming with symmetry-breaking techniques for MAPF and minimizes the flowtime. Priority-Based Search [78•] is a recent hierarchically decoupled that performs a depth-first search on a binary priority tree to explore all possible orderings on the robots. It is complete for only “well-formed” MAPF problem instances and has empirically been shown to compute close-to-optimal solutions for MAPF problem instances on a \(481\times 530\) 2D 4-neighbor grid with 600 robots in half a minute of runtime. Some algorithms combine both primitive operations (rule-based MAPF techniques) and search. MAPP [79] explore different ways of combining paths of individual robots and is complete for MAPF problem instances on “slidable” graphs [79]. There is also a MAPF algorithm that uses a combination of A* searches on a graph abstraction, primitive operations, and reductions to Constraint Satisfaction Problems [80].

Recent research [81, 82] has used machine learning to select a MAPF algorithm among multiple candidate MAPF algorithms for a given MAPF problem instance.

MAPF Extensions and Related Problems

Recent studies have also generalized the standard definition of MAPF to different real-world scenarios.

MAPF with Deadlines

MAPF with Deadlines [83, 84] aims to maximize the number of robots that reach their goal vertices within a given deadline. Its applications include robots that need to evacuate before a disaster and robots that need to finish tasks before a deadline.

MAPF with Delay Probabilities and Robust MAPF

MAPF with Delay Probabilities (MAPF-DP) [85] generalizes MAPF to the case where the uncertainty of robot motion has to be considered during planning to ensure a collision-free execution of the plan. In MAPF-DP, the uncertainty of each robot is characterized by a given delay probability with which the robot stays in its current vertex whenever it intends to traverse an outgoing edge of its current vertex. The problem of MAPF-DP is to find a plan that consists of a path for each robot and a plan-execution policy that controls with GO or STOP commands how each robot proceeds along its path such that no collisions occur during plan execution. MAPF-DP has also been studied under the name MAPF with Uncertainty [86], where the paths are planned in the belief space of the robots and the execution of the resulting plan is not guaranteed to be collision-free. K-Robust MAPF [87] extends CBS to enforce K time steps for which a vertex must be unoccupied after it has been occupied by a robot during planning, which reduces the possibility of collisions during plan execution without using plan-execution policies. Recent research [88] has generalized Symmetry-Breaking CBS [66•] to the K-Robust MAPF setting. Probabilistic Robust MAPF [89] bounds the probability that any collision occurs during plan execution.

MAPF with Continuous Time or Kinematic Constraints

MAPF with Continuous Time [90] extends CBS to planning paths on weighted graphs where the edge weights characterize the nonuniform traversal times of the edges. Other research [91, 92] has also studied MAPF on weighted graphs. MAPF for Large Agents [93] allows a robot to occupy more than one vertex at one time step according to its given shape and volume. Two robots collide if both of them occupy some vertex at the same time step. The resulting CBS-based algorithm has been applied to planning collision-free trajectories for quadcopters that take into account their ellipsoid shapes and downwash effects. MAPF-POST [94•, 95] is a polynomial-time algorithm that post-processes a MAPF solution to create a plan-execution schedule that works on non-holonomic robots, takes their kinematic constraints, such as the maximum and minimum translational and rotational velocity limits, into account, and provides a guaranteed safety distance between them, which avoids time-intensive replanning in many cases.

Reinforcement Learning for Distributed MAPF

PRIMAL [96•] and its descendant [97] model MAPF as a multi-agent reinforcement learning task, where all robots follow the same learned single-agent policy to decide their actions at each time step based on their local observations. Recent research [98,99,100] uses a graph neural network to allow robots to communicate and also precomputed shortest path distances [99] or an online shortest path computation [97, 101] to assist training. We note that classic optimization-based collision-avoidance approaches [102,103,104] use a similar distributed MAPF setting but can be applied to robots moving in continuous space.

MAPF with Target Assignment

Anonymous MAPF, also known as Permutation-Invariant MAPF or Unlabeled MAPF, does not assume predefined goal vertices for the robots and aims to find a one-to-one mapping from the given goal vertices to the robots and collision-free paths for the robots to their assigned goal vertices. Minimizing the makespan for Anonymous MAPF is polynomial-time solvable using a max-flow algorithm [105]. CBS-TA [106] searches all possible assignments of goal vertices to robots and minimizes the flowtime for Anonymous MAPF. Combined Target Assignment and Path Finding (TAPF) [107] partitions the robots into teams where the problem of each team is an Anonymous MAPF problem. CBM [107] combines the high-level search of CBS and a low-level min-cost max-flow algorithm to minimize the makespan for TAPF. Recent research [108, 109] has studied generalized TAPF problems where each robot needs to get assigned multiple goal vertices. MG-MAPF [110] assumes that each robot is preassigned multiple unordered goal vertices and aims to compute collision-free paths for the robots to visit all goal vertices. MG-TAPF [111] aims to find a one-to-one mapping from the given tasks that each consists of a sequence of ordered goal vertices and collision-free paths for the robots that visit the goal vertices of their assigned tasks in the specified order.

Online MAPF and Multi-Agent Pickup and Delivery

Online MAPF [112, 113] assumes that each robot is assigned a new goal vertex by a black box once it reaches its current goal vertex. Recent research has conducted a theoretical study [113] on the competitiveness of online MAPF algorithms (namely the performance gap between online and optimal offline MAPF algorithms). RHCR [114] generalizes (offline) MAPF algorithms to online MAPF by repeatedly replanning paths for the robots. One version of RHCR that uses Priority-Based Search [78•] has been shown to compute paths for 1000 robots on a \(37\times 77\) 4-neighbor grid in less than half a minute of runtime. Multi-Agent Pickup and Delivery (MAPD) [115•] is a combined multi-robot task-allocation and path-planning problem. MAPD has first been studied in an online setting where robots have to constantly get assigned a stream of incoming tasks that are added to the system at unknown release times and plan collision-free paths to the pickup and delivery vertices of the tasks. Online MAPD algorithms [115•] repeatedly apply task-assignment and MAPF algorithms to (re-)assign tasks to and (re-)plan paths for robots whenever a new task arrives or a robot becomes available for executing tasks. Recent research [116] has developed an online MAPD algorithm that considers kinematic constraints of robots directly during planning and shown experimentally that the algorithm can compute solutions for MAPD problem instances with 250 robots and 2000 tasks within a total runtime of 10 s. Offline MAPD [117] considers tasks that are known a priori. Recent research [3] has also considered an online TAPF/MAPD variant that aims to minimize the idle time of sorting stations—that is, when there are no warehouse robots servicing the sorting stations—in an automated sortation center. The resulting algorithm has been shown to compute solutions for 350 robots within 2 s of runtime on an industrial robot simulator.

Conclusions

Planning collision-free paths for multiple robots is a fundamental building block for many real-world applications of multi-robot systems. It has been studied as a graph-optimization problem under the name of MAPF by researchers from the artificial intelligence, robotics, and operations research communities. As the efficiency of MAPF algorithms improves, they will become increasingly viable for real-time path-planning operations of multi-robot systems. Current research has also addressed several challenges to adapt MAPF techniques to the requirements of real-world multi-robot systems. Future research directions include developing deeper theoretical understandings for MAPF variants such as Distributed MAPF and MAPD and combining MAPF techniques with considerations of complex real-world settings such as general temporal constraints and dependencies of tasks and high-order dynamic constraints of robots. Readers are referred to the MAPF information page [118] for a listing of MAPF researchers and links to their publications and software, tutorials, a mailing list, and other resources.