Keywords

1 Introduction

Multi-agent path finding (MAPF) [13, 21,22,23, 25, 28, 33] is the task of navigating agents from given starting positions to given individual goals. Usually MAPF is understood to be a discrete problem that takes place in undirected graph \(G=(V,E)\) where agents from set \(A=\{a_1,a_2,...,a_k\}\) are placed in its vertices. The constraint that there is at most one agent per vertex is followed in the basic variant. The initial configuration of agents in vertices of the graph can be written as an assignment \(\alpha _0: A \rightarrow V\) and similarly the goal configuration as \(\alpha _+: A \rightarrow V\). The task of navigating agents can be formally expressed as a task of transforming the initial configuration of agents \(\alpha _0: A \rightarrow V\) into the goal configuration \(\alpha _+: A \rightarrow V\).

In the standard MAPF, movements are instantaneous and are possible into vacant neighbors assuming no other agent is entering the same target vertexFootnote 1. We usually denote the configuration of agents at discrete time step t as \(\alpha _t: A \rightarrow V\). Non-conflicting movements transform configuration \(\alpha _t\) instantaneously into next configuration \(\alpha _{t+1}\) so we do not consider what happens between t and \(t+1\).

To reflect various aspects of real-life applications variants of MAPF have been introduced such as those considering kinematic constraints [10], large agents [15], or deadlines [17] - see [16] for more variants.

This work focuses on an extension of MAPF introduced only recently [1, 32] that considers continuous time and space and continuous movements of agents between predefined positions placed arbitrarily in the n-dimensional Euclidean space. The continuous version will be denoted as MAPF\(^\mathcal {R}\). It is natural in MAPF\(^\mathcal {R}\) to assume geometric agents of various shapes that occupy certain volume in the space - circles in the 2D space, polygons, spheres in the 3D space etc. In contrast to MAPF, where the collision is defined as the simultaneous occupation of a vertex by two agents, collisions are defined as any spatial overlap of agents’ bodies in MAPF\(^\mathcal {R}\) or a an occurrence that is too close to each other. Agents move along straight lines connecting predefined positions. Different shapes of agents’ bodies play a role. Hence for example a movement along two distinct lines that is collision free when done with small agents may turn into a collision if performed with large agents.

The motivation behind introducing MAPF\(^\mathcal {R}\) is the need to construct more realistic paths in many applications such as controlling fleets of robots or aerial drones [8, 11] where continuous reasoning is closer to the reality than the standard MAPF.

1.1 Contribution

The contribution of this paper consists in showing how to apply satisfiability modulo theory (SMT) reasoning [6, 18] in MAPF\(^\mathcal {R}\) solving. This is an extension of the conference paper [30] where the usage of SMT paradigm for the makespan optimal solving of MAPF\(^\mathcal {R}\) has been described. In this paper, we further improve the concept and adapt it for the sum-of-costs optimal solving.

The SMT paradigm constructs decision procedures for various complex logic theories by decomposing the decision problem into the propositional part having arbitrary Boolean structure and the theory part that is restricted on the conjunctive fragment. We introduce an SMT-based algorithm for finding makespan optimal solutions to MAPF\(^\mathcal {R}\). Extending the algorithm by nogood recording enables finding solutions that are sum-of-costs optimal.

1.2 Related Work and Organization

The original version of the SMT-based approach focuses on makespan optimal MAPF solving and builds on top of the Conflict-based Search (CBS) algorithm [22, 24]. Makespan optimal solutions minimize the overall time needed to relocate all agents into their goals.

CBS tries to solve MAPF lazily by adding conflict elimination constraints on demand. It starts with the empty set of constraints. The set of constraints is iteratively refined with new conflict elimination constraints after conflicts are found in solutions for the incomplete set of constraints. Since conflict elimination constraints are disjunctive (they forbid occurrence of one or the other agent in a vertex at a time) the refinement in CBS is carried out by branching in the search process.

CBS can be adapted for MAPF\(^\mathcal {R}\) by implementing conflict detection in continuous time and space while the high-level framework of the CBS algorithm remains the same as shown in [1]. In the SMT-based approach we are trying to build an incomplete propositional model so that if a given MAPF\(^\mathcal {R}\) \(\varSigma ^\mathcal {R}\) has a solution of a specified makespan then the model is solvable (but the opposite implication generally does not hold). This is similar to the previous SAT-based [5] MAPF solving [27, 31] where a complete propositional model has been constructed (that is, the given MAPF has a solution of a specified makespan if and only is the model is solvable).

The propositional model in the SMT-based approach in constructed lazily through conflict elimination refinements as done in CBS. The incompleteness of the model is inherited from CBS that adds constraints lazily. This is in contrast to SAT-based methods like MDD-SAT [31] where all constraints are added eagerly resulting in a complete model. We call our new algorithm SMT-CBS\(^\mathcal {R}\). The major difference of SMT-CBS\(^\mathcal {R}\) from CBS is that instead of branching the search we only add a disjunctive constraint to eliminate the conflict in SMT-CBS\(^\mathcal {R}\). Hence, SMT-CBS\(^\mathcal {R}\) does not branch the search at all at the high-level (the model is incrementally refined at the high-level instead).

Similarly as in the SAT-based MAPF solving we use decision propositional variables indexed by agent a, vertex v, and time t with the meaning that if the variable is \( TRUE \) agent a appears in v at time t. However the major technical difficulty with the continuous version of MAPF is that we do not know all necessary decision variables in advance due to continuous time. After a conflict is discovered we may need new decision variables to avoid that conflict. For this reason we introduce a special decision variable generation algorithm.

The paper is organized as follows: we first introduce MAPF\(^\mathcal {R}\) formally. Then we recall a variant of CBS for MAPF\(^\mathcal {R}\). Details of the novel SMT-based solving algorithm SMT-CBS\(^\mathcal {R}\) for finding makespan optimal solutions follow. Next, an experimental evaluation of SMT-CBS\(^\mathcal {R}\) against the continuous version of CBS is shown. We also show a brief comparison with the standard MAPF. Finally we introduce nogood recording into the SMT-CBS\(^\mathcal {R}\) to enable optimization with respect to the sum-of-costs objective.

1.3 MAPF with Continuous Time

We follow the definition of MAPF with continuous time denoted MAPF\(^\mathcal {R}\) from [1] and [32]. MAPF\(^\mathcal {R}\) shares several components with the standard MAPF: the underlying undirected graph \(G=(V,E)\), set of agents \(A=\{a_1,a_2,...,a_k\}\), and the initial and goal configuration of agents: \(\alpha _0: A \rightarrow V\) and \(\alpha _+: A \rightarrow V\).

Definition 1

(MAPF\(^\mathcal {R}\) ) Multi-agent path finding with continuous time (MAPF\(^\mathcal {R}\)) is a 5-tuple \(\varSigma ^\mathcal {R}=(G=(V,E), A, \alpha _0, \alpha _+, \rho )\) where G, A, \(\alpha _0\), \(\alpha _+\) are from the standard MAPF and \(\rho \) determines continuous extensions as follows:

  • \(\rho .x(v), \rho .y(v)\) for \(v \in V\) represent the position of vertex v in the 2D plane; to simplify notation we will use \(x_v\) for \(\rho .x(v)\) and \(y_v\) for \(\rho .x(v)\)

  • \(\rho .velocity(a)\) for \(a \in A\) determines constant velocity of agent a; simple notation \(v_a = \rho .velocity(a)\)

  • \(\rho .radius(a)\) for \(a \in A\) determines the radius of agent a; we assume that agents are circular discs with omni-directional ability of movements; simple notation \(r_a = \rho .radius(a)\)

Naturally we can define the distance between a pair of vertices u, v with \(\{u,v\} \in E\) as \(dist(u,v) = \sqrt{(x_v - x_u)^2+(y_v - y_u)^2}\). Next we assume that agents have constant speed, that is, they instantly accelerate to \(v_a\) from an idle state. The major difference from the standard MAPF where agents move instantly between vertices is that in MAPF\(^\mathcal {R}\) continuous movement of an agent between a pair of vertices (positions) along the straight line interconnecting them takes place. Hence we need to be aware of the presence of agents at some point in the 2D plane on the lines interconnecting vertices at any time.

Collisions may occur between agents due to their size which is another difference from the standard MAPF. In contrast to the standard MAPF, collisions in MAPF\(^\mathcal {R}\) may occur not only in a single vertex or edge but also on pairs of edges (on pairs of lines interconnecting vertices). If for example two edges are too close to each other and simultaneously traversed by large agents then such a condition may result in a collision. Agents collide whenever their bodies overlapFootnote 2.

We can further extend the set of continuous properties by introducing the direction of agents and the need to rotate agents towards the target vertex before they start to move towards the target (agents are no more omni-directional). The speed of rotation in such a case starts to play a role. Also agents can be of various shapes not only circular discs [15]. For simplicity we elaborate our solving concepts for the above basic continuous extension of MAPF with circular agents only. We however note that all developed concepts can be adapted for MAPF with more continuous extensions like directional agents which only adds another dimension to indices of propositional variables.

A solution to given MAPF\(^\mathcal {R}\,\varSigma ^\mathcal {R}\) is a collection of temporal plans for individual agents \(\pi = [\pi (a_1),\pi (a_2),...,\pi (a_k)]\) that are mutually collision-free. A temporal plan for agent \(a \in A\) is a sequence \(\pi (a) = [((\alpha _0(a),\alpha _1(a)),[t_0(a),t_1(a))); ((\alpha _1(a),\) \(\alpha _2(a)), [t_1(a),t_2(a))); ...; ((\alpha _{m(a)-1}(a),\alpha _{m(a)}(a)),(t_{m(a)-1}(a),t_{m(a)}(a)))]\) where m(a) is the length of individual temporal plan and each pair \((\alpha _i(a),\alpha _{i+1}(a)),\) \([t_i(a),t_{i+1}(a)))\) in the sequence corresponds to traversal event between a pair of vertices \(\alpha _i(a)\) and \(\alpha _{i+1}(a)\) starting at time \(t_i(a)\) and finished at \(t_{i+1}(a)\) (excluding).

It holds that \(t_i(a) < t_{i+1}(a)\) for \(i = 0,1,...,m(a)-1\). Moreover consecutive vertices must correspond to edge traversals or waiting actions, that is: \(\{\alpha _i(a),\) \(\alpha _{i+1}(a)\} \in E\) or \(\alpha _i(a) = \alpha _{i+1}(a)\); and times must reflect the speed of agents for non-wait actions, that is:

$$\begin{aligned} \alpha _i(a) \ne \alpha _{i+1}(a) \Rightarrow t_{i+1}(a) - t_i(a) = \frac{dist(\alpha _i(a), \alpha _{i+1}(a))}{v_a}. \end{aligned}$$

In addition to this, agents must not collide with each other. One possible formal definition of a geometric collision is as follows:

Definition 2

(Collision) A collision between individual temporal plans \(\pi (a) = \) \([((\alpha _i(a),\) \(\alpha _{i+1}(a)),\) \([t_i(a),t_{i+1}(a)))]_{i=0}^{m(a)}\) and \(\pi (b) = [((\alpha _i(b),\alpha _{i+1}(a)),\) \([t_i(b),\) \(t_{i+1}(b)))]_{i=0}^{m(b)}\) occurs if the following condition holds:

  • \(\exists i \in \{0,1,...,m(a)\}\) and \(\exists j \in \{0,1,...,m(b)\}\) such that:

    • \(dist([x_{\alpha _i(a)},\) \(y_{\alpha _i(a)};\) \(x_{\alpha _{i+1}(a)},\) \(y_{\alpha _{i+1}(a)}];\) \([x_{\alpha _j(b)},\) \(y_{\alpha _j(b)};\) \(x_{\alpha _{j+1}(b)},\) \(y_{\alpha _{j+1}(b)}])\) \(< r_a + r_b\)

    • \([t_i(a),t_{i+1}(a)) \cap \) \([t_j(b),\) \(t_{j+1}(b)) \ne \emptyset \)

  • (a vertex or an edge collision - two agents simultaneously occupy the same vertex or the same edge or traverse edges that are too close to each other).

The distance between two lines P and Q given by their endpoint coordinates \(P=[x_1,y_1;x_2,y_2]\) and \(Q=[x'_1,y'_1;x'_2,y'_2]\) denoted \(dist([x_1,y_1;x_2,y_2];[x'_1,y'_1; x'_2,y'_2])\) is defined as the minimum distance between any pair of points \(p \in P\) and \(q \in Q\): \(min\{dist(p,q)\;|\;p \in P \wedge q \in Q\}\). The definition covers degenerate cases where a line collapses into a single point. In such a case the definition of dist normally works as the distance between points and between a point and a line.

The definition among other types of collisions covers also a case when an agent waits in vertex v and another agent passes through a line that is too close to v. We note that situations classified as collisions according to the above definition may not always result in actual collisions where agents’ bodies overlap; the definition is overcautious in this sense.

Alternatively we can use more precise definition of collisions that reports collisions if and only if an actual overlap of agents’ bodies occurs. This however requires more complex equations or simulations and cannot be written as simple as above. The presented algorithmic framework is however applicable for any kind of complex definition of collision as the definition enters the process as an external parameter.

The duration of individual temporal plan \(\pi (a)\) is called an individual makespan; denoted \(\mu (\pi (a)) = t_{m(a)}\). The overall makespan of MAPF\(^\mathcal {R}\) solution \(\pi = [\pi (a_1),\pi (a_2),\) ...\(,\pi (a_k)]\) is defined as max\(_{i=1}^k(\mu (\pi (a_i)))\).

The sum-of-costs is another important objective used in the context of MAPF [23, 32]. Calculated as the summation over all agents of times they spend moving before arriving to the goal. Due to its more complex calculation, the sum-of-costs objective is more challenging to be integrated in the SMT-based solving framework.

The individual makespan is sometimes called an individual cost. A sum-of-cost for given temporal plan \(\pi (a)\) is defined as \(\sum _{i=1}^k{\mu (\pi (a_i))}\).

An example of MAPF\(^\mathcal {R}\) and makespan optimal solution is shown in Fig. 1. We note that the standard makespan optimal solution yields makespan suboptimal solution when interpreted as MAPF\(^\mathcal {R}\).

Fig. 1.
figure 1

An example of MAPF\(^\mathcal {R}\) instance on a [3, 1, 3]-graph with three agents and its makespan optimal solution (an optimal solution of the corresponding standard MAPF is shown too) [30].

Through the straightforward reduction of MAPF to MAPF\(^\mathcal {R}\) it can be observed that finding a makespan optimal solution with continuous time is an NP-hard problem [19, 29, 35].

2 Solving MAPF with Continuous Time

We will describe here how to find optimal solution of MAPF\(^\mathcal {R}\) using the conflict-based search (CBS) [22]. CBS uses the idea of resolving conflicts lazily; that is, a solution of MAPF instance is not searched against the complete set of movement constraints. Instead of forbidding all possible collisions between agents we start with initially empty set of collision forbidding constraints that gradually grows as new conflicts appear. CBS originally developed for MAPF can be modified for MAPF\(^\mathcal {R}\) as shown in [1]: let us call the modification CBS\(^\mathcal {R}\).

2.1 Conflict-Based Search

CBS\(^\mathcal {R}\) is shown using pseudo-code in Algorithm 1. The high-level of CBS\(^\mathcal {R}\) searches a constraint tree (CT) using a priority queue (ordered according to the makespan or other cumulative cost) in the breadth first manner. CT is a binary tree where each node N contains a set of collision avoidance constraints N.constraints - a set of triples \((a_i,\{u,v\},[t_0,t_+))\) forbidding occurrence of agent \(a_i\) in edge \(\{u,v\}\) (or in vertex u if \(u = v\)) at any time between \([t_0,t_+)\), a solution \(N.\pi \) - a set of k individual temporal plans, and the makespan \(N.\mu \) of the current solution.

figure a

The low-level process in CBS\(^\mathcal {R}\) associated with node N searches temporal plan for individual agent with respect to set of constraints N.constraints. For given agent \(a_i\), this is the standard single source shortest path search from \(\alpha _0(a_i)\) to \(\alpha _+(a_i)\) that at time t must avoid a set of edges (vertices) \(\{\{u,v\} \in E\;|\;(a_i,\{u,v\},[t_0,t_+)) \in N.constraints \wedge t \in [t_0,t_+)\}\). Various intelligent single source shortest path algorithms can be applied here such as A* [9].

CBS\(^\mathcal {R}\) stores nodes of CT into priority queue \(\textsc {Open}\) sorted according to the ascending makespan. At each step CBS takes node N with the lowest makespan from \(\textsc {Open}\) and checks if \(N.\pi \) represent non-colliding temporal plans. If there is no collision, the algorithms returns valid MAPF\(^\mathcal {R}\) solution \(N.\pi \). Otherwise the search branches by creating a new pair of nodes in CT - successors of N. Assume that a collision occurred between agents \(a_i\) and \(a_j\) when \(a_i\) traversed \(\{u,v\}\) during \([t_0,t_+)\) and \(a_j\) traversed \(\{u',v'\}\) during \([t'_0,t'_+)\). This collision can be avoided if either agent \(a_i\) or agent \(a_j\) does not occupy \(\{u,v\}\) or \(\{u',v'\}\) respectively during \([t_0,t_+) \cap [t'_0,t'_+) = [\tau _0, \tau _+)\). These two options correspond to new successor nodes of N: \(N_1\) and \(N_2\) that inherit set of conflicts from N as follows: \(N_1.conflicts = N.conflicts\) \(\cup \{(a_i,\{u,v\},[\tau _0, \tau _+))\}\) and \(N_2.conflicts = N.conflicts\) \(\cup \{(a_j,\{u',v'\},[\tau _0, \tau _+))\}\). \(N_1.\pi \) and \(N_1.\pi \) inherit plans from \(N.\pi \) except those for agent \(a_i\) and \(a_j\) respectively that are recalculated with respect to the new sets of conflicts. After this \(N_1\) and \(N_2\) are inserted into \(\textsc {Open}\).

Definition of collisions comes as a parameter to the algorithm though the implementation of validate-Plans procedure. We can switch to the less cautious definition of collisions that reports a collision after agents actually overlap their bodies. This can be done through changing the validate-Plans procedure while the rest of the algorithm remains the same.

2.2 A Satisfiability Modulo Theory (SMT) Approach

A close look at CBS reveals that it operates similarly as problem solving in satisfiability modulo theories (SMT) [6, 18]. The basic use of SMT divides a satisfiability problem in some complex theory T into an abstract propositional part that keeps the Boolean structure of the decision problem and a simplified decision procedure \( DECIDE_T \) that decides fragment of T restricted on conjunctive formulae. A general T-formula \(\varGamma \) being decided for satisfiability is transformed to a propositional skeleton by replacing its atoms with propositional variables. The standard SAT-solving procedure then decides what variables should be assigned \( TRUE \) in order to satisfy the skeleton - these variables tells what atoms hold in \(\varGamma \). \( DECIDE_T \) then checks if the conjunction of atoms assigned \( TRUE \) is valid with respect to axioms of T. If so then satisfying assignment is returned and we are finished. Otherwise a conflict from \( DECIDE_T \) (often called a lemma) is reported back to the SAT solver and the skeleton is extended with new constraints resolving the conflict. More generally not only new constraints are added to resolve a conflict but also new variables i.e. atoms can be added to \(\varGamma \).

The above observation inspired us to the idea to rephrase CBS\(^\mathcal {R}\) in terms of SMT. T will be represented by a theory with axioms describing movement rules of MAPF\(^\mathcal {R}\); a theory we will denote \(T_{ MAPF ^\mathcal {R}}\)Footnote 3.

A plan validation procedure known from CBS will act as \( DECIDE _{ MAPF ^\mathcal {R}}\) and will report back a set of conflicts found in the current solution. The propositional part working with the skeleton will be taken from existing propositional encodings of the standard MAPF such as the MDD-SAT [31] provided that constraints forbidding conflicts between agents will be omitted (at the beginning). In other words, we only preserve constraints ensuring that propositional assignments form proper paths for agents but each agent is treated as if it is alone in the instance.

2.3 Decision Variable Generation

MDD-SAT introduces decision variables \(\mathcal {X}_{v}^{t}(a_i)\) and \(\mathcal {E}_{u,v}^{t}(a_i)\) for discrete time-steps \(t \in \{0,1,2, ...\}\) describing occurrence of agent \(a_i\) in v or the traversal of edge \(\{u,v\}\) by \(a_i\) at time-step t. We refer the reader to [31] for the details of how to encode constraints of top of these variables. As an example we show here a constraint stating that if agent \(a_i\) appears in vertex u at time step t then it has to leave through exactly one edge connected to u or wait in u.

figure b
$$\begin{aligned} { \mathcal {X}_u^t(a_i) \Rightarrow \bigvee _{v\;|\;\{u,v\} \in E}{\mathcal {E}^t_{u,v}(a_i)} \vee \mathcal {E}^t_{u,u}(a_i), } \end{aligned}$$
(1)
$$\begin{aligned} { \sum _{v\,|\,\{u,v\} \in E }{\mathcal {E}_{u,v}^t{(a_i)} + \mathcal {E}^t_{u,u}(a_i) \le 1} } \end{aligned}$$
(2)

Vertex collisions expressed for example by the following constraint are omitted. The constraint says that in vertex v and time step t there is at most one agent.

$$\begin{aligned} {\sum _{a_i \in A \,|\,v \in V}{\mathcal {X}^t_v(a_i)} \le 1 } \end{aligned}$$
(3)

A significant difficulty in MAPF\(^\mathcal {R}\) is that we need decision variables with respect to continuous time. Fortunately we do not need a variable for any possible time but only for important moments.

If for example the duration of a conflict in neighbor v of u is \([t_0,t_+)\) and agent \(a_i\) residing in u at \(t \ge t_0\) wants to enter v then the earliest time \(a_i\) can do so is \(t_+\) since before it would conflict in v (according to the above definition of collisions). On the other hand if \(a_i\) does not want to waste time (let us note that we search for a makespan optimal solution), then waiting longer than \(t_+\) is not desirable. Hence we only need to introduce decision variable \(\mathcal {E}_{u,v}^{t_+}(a_i)\) to reflect the situation.

Generally when having a set of conflicts we need to generate decision variables representing occurrence of agents in vertices and edges of the graph at important moments with respect to the set of conflicts. The process of decision variable generation is formally described as Algorithm 2. It performs breadth-first search (BFS) on G using two types of actions: edge traversals and waiting. The edge traversal is the standard operation from BFS. Waiting is performed for every relevant period of time with respect to the end-times in the set of conflicts of neighboring vertices.

As a result each conflict during variable generation through BFS is treated as both present and absent which in effect generates all possible important moments.

Procedure generate-Decision generates decision variables that correspond to actions started on or before specified limit \(\mu _{max}\). For example variables corresponding to edge traversal started at \(t < \mu _{max}\) and finished as \(t' > \mu _{max}\) are included (line 10). Variables corresponding to times greater than \(\mu _{max}\) enable determining what should be the next relevant makespan limit to test (see the high-level algorithm for details). Assume having a decision node corresponding to vertex u at time t at hand. The procedure first adds decision variables corresponding to edge traversals from u to neighbors denoted v (lines 11–14). Then all possible relevant waiting actions in u with respect to its neighbors v are generated. Notice that waiting with respect to conflicts in u are treated as well.

2.4 Eliminating Branching in CBS by Disjunctive Refinements

The SMT-based algorithm itself is divided into two procedures: SMT-CBS\(^\mathcal {R}\) representing the main loop and SMT-CBS-Fixed\(^\mathcal {R}\) solving the input MAPF\(^\mathcal {R}\) for a fixed maximum makespan \(\mu \). The major difference from the standard CBS is that there is no branching at the high-level. The set of conflicts is iteratively collected during the entire execution of the algorithm whenever a collision is detected.

Procedures encode-Basic and augment-Basic build formula \(\mathcal {F}(\mu )\) over decision variables generated using the aforementioned procedure. The encoding is inspired by the MDD-SAT approach but ignores collisions between agents. That is, \(\mathcal {F}(\mu )\) constitutes an incomplete model for a given input \(\varSigma ^\mathcal {R}\): \(\varSigma ^\mathcal {R}\) is solvable within makespan \(\mu \) then \(\mathcal {F}(\mu )\) is satisfiable.

Conflicts are resolved by adding disjunctive constraints (lines 13–15 in Algorithm 4). The collision is avoided in the same way as in the original CBS that is one of the colliding agent does not perform the action leading to the collision. Consider for example a collision on two edges between agents \(a_i\) and \(a_j\) as follows: \(a_i\) traversed \(\{u,v\}\) during \([t_0,t_+)\) and \(a_j\) traversed \(\{u',v'\}\) during \([t'_0,t'_+)\).

These two movements correspond to decision variables \(\mathcal {E}_{u,v}^{t_0}(a_i)\) and \(\mathcal {E}_{u',v'}^{t_0'}(a_j)\) hence elimination of the collision caused by these two movements can be expressed as the following disjunction: \(\lnot \mathcal {E}_{u,v}^{t_0}(a_i) \vee \lnot \mathcal {E}_{u',v'}^{t'_0}(a_j)\). At level of the propositional formula there is no information about the semantics of a conflict happening in the continuous space; we only have information in the form of above disjunctive refinements. The disjunctive refinements are propagated at the propositional level from \( DECIDE _{ MAPF ^\mathcal {R}}\) that verifies solutions of incomplete propositional models.

The set of pairs of collected disjunctive conflicts is propagated across entire execution of the algorithm (line 16 in Algorithm 4).

figure c

Algorithm 3 shows the main loop of SMT-CBS\(^\mathcal {R}\). The algorithm checks if there is a solution for given MAPF\(^\mathcal {R}\) \(\varSigma ^\mathcal {R}\) of makespan \(\mu \). The algorithm starts at the lower bound for \(\mu \) that is obtained as the duration of the longest temporal plan from individual temporal plans ignoring other agents (lines 3–4).

Then \(\mu \) is iteratively increased in the main loop (lines 5–9). The algorithm relies on the fact that the solvability of MAPF\(^\mathcal {R}\) w.r.t. cumulative objective like the makespan behaves as a non decreasing monotonic function. Hence trying increasing makespans eventually leads to finding the optimal makespan provided we do not skip any relevant makespan \(\mu \). The next makespan to try will then be obtained by taking the current makespan plus the smallest duration of the continuing movement (line 19 of Algorithm 4). The iterative scheme for trying larger makespans follows MDD-SAT [31].

3 Evaluation of the Makespan Optimal Version

In this section we present results of the experimentation with SMT-CBS\(^\mathcal {R}\) for makespan optimal MAPF\(^\mathcal {R}\) solving. We implemented SMT-CBS\(^\mathcal {R}\) in C++ to evaluate its performanceFootnote 4. SMT-CBS\(^\mathcal {R}\) was implemented on top of Glucose 4 SAT solver [3] which ranks among the best SAT solvers according to recent SAT solver competitions [4]. The incremental mode of the SAT solver has been used - that is, when the formula has been modified the solver was not consulted from scratch but instead learned clauses are preserved from the previous run.

It turned out to be important to generate decision variables in a more advanced way than presented in Algorithm 2. We need to prune out decisions from that the goal vertex cannot be reached under given makespan bound \(\mu _{max}\). That is whenever we have a decision (ut) such that \(t + \varDelta t > \mu _{max}\), where \(\varDelta t = dist_{estimate}(u, \alpha _+(a)) / v_a \) and \(dist_{estimate}\) is a lower bound estimate of the distance between a pair of vertices, we rule out that decision from further consideration. Moreover we apply a postprocessing step in which we iteratively remove decisions that have no successors. The propositional model is generated only after this preprocessing.

figure d

In addition to SMT-CBS\(^\mathcal {R}\) we re-implemented in C++ CBS\(^\mathcal {R}\), currently the only alternative solver for MAPF\(^\mathcal {R}\) based on own dedicated search [1]. The distinguishing feature of CBS\(^\mathcal {R}\) is that at the low-level it uses a more complex single source shortest path algorithm that searches for paths that avoid forbidden intervals, a so-called safe-interval path planning (SIPP) [34].

Our implementation of CBS\(^\mathcal {R}\) used the standard heuristics to improve the performance such as the preference of resolving cardinal conflicts [7]. In the preliminary tests with SMT-CBS\(^\mathcal {R}\), we initially tried to resolve against single cardinal conflict too but eventually it turned out to be more efficient to resolve against all discovered conflicts (the presented pseudo-code shows this variant)Footnote 5 Footnote 6.

Fig. 2.
figure 2

Comparison of CBS\(^\mathcal {R}\) and SMT-CBS\(^\mathcal {R}\) in terms of average runtime, makespan, and success rate on layered graphs. The standard CBS on the corresponding standard MAPF is shown too (times are in seconds). Makespan is shown for the case when the instance is interpreted as the standard MAPF and as MAPF\(^\mathcal {R}\) [30].

3.1 Benchmarks and Setup

SMT-CBS\(^\mathcal {R}\) and CBS\(^\mathcal {R}\) were tested on synthetic benchmarks consisting of layered graphs, grids, game maps [26]. The layered graph of height h denoted \([l_1,l_2,...,l_h]\)-graph consists of h layers of vertices placed horizontally above each other in the 2D plane (see Fig. 1 for [3, 1, 3]-graph). More precisely the i-th layer is placed horizontally at \(y = i\). Layers are centered horizontally and the distance between consecutive points in the layer is 1.0. Size of all agents was 0.2 in radius.

Fig. 3.
figure 3

Comparison of CBS\(^\mathcal {R}\) and SMT-CBS\(^\mathcal {R}\) on \(8 \times 8\) grid with \(2^k\) neighborhood (\(k=3\)) [30].

We measured runtime and the number of decisions/iterations to compare the performance of SMT-CBS\(^\mathcal {R}\) and CBS\(^\mathcal {R}\). Small layered graphs consisting of 2 to 5 layers with up to 5 vertices per layer were used in tests. Three consecutive layers are always fully interconnected by edges. There is not edge across more than three layers of the graphs. That is in graphs with more than 3 layers agents cannot go directly to the goal vertex.

Fig. 4.
figure 4

Comparison of CBS\(^\mathcal {R}\) and SMT-CBS\(^\mathcal {R}\) on \(16 \times 16\) grid with \(2^k\) neighborhood (\(k=3\)) [30].

In all tests agents started in the 1-st layer and finished in the last h-th layer. To obtain instances of various difficulties random permutations of agents in the starting and goal configurations were used (the 1-st layer and h-th layer were fully occupied in the starting and goal configuration respectively). If for instance agents are ordered identically in the starting and goal configuration with \(h \le 3\), then the instance is relatively easy as it is sufficient that all agents move simultaneously straight into their goals.

We also used grids of sizes \(8 \times 8\) and \(16 \times 16\) with no obstacles in our tests. Initial and goal configuration of agents have been generated randomly. In contrast to MAPF benchmarks where grids are 4-connected we used interconnection with all vertices in the neighborhood up to certain distance called \(2^k\)-neighborhood in [1]. A similar setup has been used in game maps (Dragon Age). The difference here is that the game maps are larger and contain obstacles.

Ten random instances were generated for individual graph. The timeout for all tests has been set to 1 min in layered graphs and small grids and 10 min for game maps. Results from instances finished under this timeout were used to calculate average runtimes.

Fig. 5.
figure 5

Comparison of CBS\(^\mathcal {R}\) and SMT-CBS\(^\mathcal {R}\) on game maps (Dragon Age) with \(2^k\) neighborhood (\(k=3\)).

3.2 Comparison of MAPF\(^\mathcal {R}\) and MAPF Solving

Part of the results obtained in our experimentation with layered graphs is shown in Fig. 2. The general observation from our runtime evaluation is that MAPF\(^\mathcal {R}\) is significantly harder than the standard MAPF. When continuity is ignored, makespan optimal solutions consist of fewer steps. But due to regarding all edges to be unit in MAPF, the standard makespan optimal solutions yield significantly worse continuous makespan (this effect would be further manifested if we use longer edges).

SMT-CBS\(^\mathcal {R}\) outperforms CBS\(^\mathcal {R}\) on tested instances significantly. CBS\(^\mathcal {R}\) reached the timeout many more times than SMT-CBS\(^\mathcal {R}\). In the absolute runtimes, SMT-CBS\(^\mathcal {R}\) is faster by factor of 2 to 10 than CBS\(^\mathcal {R}\).

In terms of the number of decisions, SMT-CBS\(^\mathcal {R}\) generates order of magnitudes fewer iterations than CBS\(^\mathcal {R}\). This is however expected as SMT-CBS\(^\mathcal {R}\) shrinks the entire search tree into a single branch in fact. We note that branching within the search space in case of SMT-CBS\(^\mathcal {R}\) is deferred into the SAT solver where even more branches may appear.

In case of small grids and large maps (Figs. 3, 4 and 5), the difference between CBS\(^\mathcal {R}\) and SMT-CBS\(^\mathcal {R}\) is generally smaller but still for harder instances SMT-CBS\(^\mathcal {R}\) tends to have better runtime and success rate. We attribute the smaller difference between the two algorithms to higher regularity in grids compared to layered graphs that exhibit higher combinatorial difficulty.

4 Sum-of-Costs Bounds and Nogood Recording

We modified the SMT-based MAPF\(^\mathcal {R}\) solving framework for the sum-of-costs objective. Again the SMT-based algorithm for the sum-of-costs variants is divided into two procedures: SMT-CBS\(^\mathcal {R}_{SOC}\) representing the main loop (Algorithm 5) and SMT-CBS-Fixed\(^\mathcal {R}_{SOC}\) solving the input MAPF\(^\mathcal {R}\) for a fixed maximum makespan \(\mu \) and sum-of-costs \(\xi \) (Algorithm 6).

figure e
Fig. 6.
figure 6

Comparison of SMT-CBS\(^\mathcal {R}\) and CCBS on \(\mathtt {empty}\)-\(\mathtt {16}\)-\(\mathtt {16}\). Left: Success rate (the ratio of solved instances out of 25 under 120 s), the higher plot is better. Right: and sorted runtimes where the lower plot is better are shown.

Procedures encode-Basic and augment-Basic in Algorithm 6 build a formula according to given RDDs and the set of collected collision avoidance constraints. New collisions are resolved lazily by adding mutexes (disjunctive constraints). A collision is avoided in the same way as in the makespan optimal variant. Collision eliminations are tried until a valid solution is obtained or until a failure for current \(\mu \) and \(\xi \) which means to try bigger makespan and sum-of-costs.

figure f

After resolving all collisions we check whether the sum-of-costs bound is satisfied by plan \(\pi \). This can be done easily by checking if \(\mathcal {X}_u^{t}(a_i)\) variables across all agents together yield higher cost than \(\xi \) or not. If cost bound \(\xi \) is exceeded then corresponding nogood is recorded and added to \(\mathcal {F}(\mu )\) and the algorithm continues by searching for a new satisfying assignment to \(\mathcal {F}(\mu )\) now taking all recorded nogoods into account. The nogood says that \(\mathcal {X}_u^{t}(a_i)\) variables that jointly exceed \(\xi \) cannot be simultaneously set to \( TRUE \).

Formally, the nogood constraint can be represented as a set of variables \(\{\mathcal {X}_{u_1}^{t_1}(a_1),\) \(\mathcal {X}_{u_2}^{t_2}(a_2),...\) \(\mathcal {X}_{u_k}^{t_k}(a_k)\}\). We say the nogood to be dominated by another nogood \(\{\mathcal {X}_{u_1}^{t'_1}(a_1),\) \(\mathcal {X}_{u_2}^{t'_2}(a_2), ...\) \(\mathcal {X}_{u_k}^{t'_k}(a_k)\}\) if and only if \(t'_i \le t_i\) for \(i=1,2,...k\) and \(\exists i \in \{1,2,...,k\}\) such that \(t'_i < t_i\). To make the nogood reasoning more efficient we do not need to store nogoods that are dominated by some previously discovered nogood. In such case however, the single nogood does not forbid one particular assignment but all assignments that could lead to dominated nogoods.

Fig. 7.
figure 7

Comparison of SMT-CBS\(^\mathcal {R}\) and CCBS on \(\mathtt {maze}\)-\(\mathtt {32}\)-\(\mathtt {32}\)-\(\mathtt {4}\). Surprisingly the best performance with SMT-CBS\(^\mathcal {R}\) highly connected neighborhoods (\(K=4,5\) is easier than \(K=3\)).

The set of pairs of collision avoidance constraints is propagated across entire execution of the algorithm. Constraints originating from a single collision are grouped in pairs so that it is possible to introduce mutexes for colliding movements discovered in previous steps.

Algorithm 3 shows the main loop of SMT-CBS\(^\mathcal {R}\). The algorithm checks if there is a solution for \(\varSigma ^\mathcal {R}\) of makespan \(\mu \) and sum-of-costs \(\xi \). It starts at the lower bound for \(\mu \) and \(\xi \) obtained as the duration of the longest from shortest individual temporal plans ignoring other agents and the sum of these lengths respectively.

Then \(\mu \) and \(\xi \) are iteratively increased in the main loop following the style of SATPlan [12]. The algorithm relies on the fact that the solvability of MAPF\(^\mathcal {R}\) w.r.t. cumulative objective like the sum-of-costs or makespan behaves as a non decreasing function. Hence trying increasing makespan and sum-of-costs eventually leads to finding the optimum provided we do not skip any relevant value.

We need to ensure important property in the makespan/sum-of-costs increasing scheme: any solution of sum-of-costs \(\xi \) has the makespan of at most \(\mu \). The next sum-of-costs to try is be obtained by taking the current sum-of-costs plus the smallest duration of the continuing movement (lines 17–27 of Algorithm 6).

The following proposition is a direct consequence of soundness of CCBS and soundness of the encoding (Proposition 1) and soundness of the makespan/sum-of-costs increasing scheme (proof omitted).

Proposition 1

The SMT-CBS\(^\mathcal {R}\) algorithm returns sum-of-costs optimal solution for any solvable MAPF\(^\mathcal {R}\) instance \(\varSigma ^\mathcal {R}\).

5 Evaluation of the Sum-of-Costs Optimal Variant

The sum-of-costs optimal version of SMT-CBS\(^\mathcal {R}\) and CCBS were tested on benchmarks from the \(\mathtt {movingai.com}\) collection [26]. We tested algorithms on three categories of benchmarks:

  1. (i)

    small empty grids (presented representative benchmark \(\mathtt {empty}\text {-}\mathtt {16}\text {-}\mathtt {16}\)),

  2. (ii)

    medium sized grids with regular obstacles (presented \(\mathtt {maze}\text {-}\mathtt {32}\text {-}\mathtt {32}\text {-}\mathtt {4}\)),

  3. (iii)

    large game maps (presented \(\mathtt {ost003d}\), a map from Dragon Age game).

Fig. 8.
figure 8

Comparison of SMT-CBS\(^\mathcal {R}\) and CCBS on \(\mathtt {ost003d}\). SMT-CBS\(^\mathcal {R}\) is fastest for \(K=3\) but for higher K the performance decreases significantly.

In each benchmark, we interconnected cells using the \(2^K\)-neighborhood [20] for \(K=3,4,5\) - the same style of generating benchmarks as used in [2] (\(K=2\) corresponds to MAPF hence not omitted). Instances consisting of k agents were generated by taking first k agents from random scenario files accompanying each benchmark on \(\mathtt {movingai.com}\). Having 25 scenarios for each benchmarks this yields to 25 instances per number of agents.

Part of the results obtained in our experimentation is presented in this sectionFootnote 7. For each presented benchmark we show success rate as a function of the number of agents. That is, we calculate the ratio out of 25 instances per number of agents where the tested algorithm finished under the timeout of 120 s. In addition to this, we also show concrete runtimes sorted in the ascending order. Results for one selected representative benchmark from each category are shown in Figs. 6, 7 and 8.

The observable trend is that the difficulty of the problem increases with increasing size of the \(K-\)neighborhood with notable exception of \(\mathtt {maze}\text {-}\mathtt {32}\text {-}\mathtt {32}\text {-}\mathtt {4}\) for \(K=4\) and \(K=5\) which turned out to be easier than \(K=3\) for SMT-CBS\(^\mathcal {R}\).

Throughout all benchmarks SMT-CBS\(^\mathcal {R}\) tends to outperform CCBS. The dominance of SMT-CBS\(^\mathcal {R}\) is most visible in medium sized benchmarks. CCBS is, on the other hand, faster in instances containing few agents. The gap between SMT-CBS\(^\mathcal {R}\) and CCBS is smallest in large maps where SMT-CBS\(^\mathcal {R}\) struggles with relatively big overhead caused by the big size of the map (the encoding is proportionally big). Here SMT-CBS\(^\mathcal {R}\) wins only in hard cases.

6 Discussion and Conclusion

We extended the approach based on satisfiability modulo theories (SMT) for solving MAPF\(^\mathcal {R}\) from the makespan objective (described in the conference version of the paper [30]) towards the sum-of-costs objective. Our approach builds on the idea of treating constraints lazily as suggested in the CBS algorithm but instead of branching the search after encountering a conflict we refine the propositional model with the conflict elimination disjunctive constraint as it has been done in previous application of SMT in the standard MAPF. Bounding the sum-of-costs is done in similar lazy way through introducing nogoods incrementally. If it is detected that a conflict free solution exceeds given cost bound then decisions that jointly induce cost greater than given bound are forbidden via a nogood (that is, at least one of these decisions must not be taken). As nogoods storing all possible nogoods representing cases when the cost bound is exceeded could be inefficient, we introduce a concept of nogood dominance. It is sufficient to store important nogoods only while all dominated nogoods are enforced automatically.

SMT-CBS\(^\mathcal {R}\) was compared with CCBS [2], currently the only alternative algorithm for MAPF\(^\mathcal {R}\) that modifies the standard CBS algorithm, on a number of benchmarks. The outcome of our comparison is that SMT-CBS\(^\mathcal {R}\) performs well against CCBS. The best results SMT-CBS\(^\mathcal {R}\) are observable on medium sized benchmarks with regular obstacles. We attribute the better runtime results of SMT-CBS\(^\mathcal {R}\) to more efficient handling of disjunctive conflicts in the underlying SAT solver through propagation, clause learning, and other mechanisms. On the other hand SMT-CBS\(^\mathcal {R}\) is less efficient on large instances with few agents.

The important restriction which our concept rely on is that agents cannot move completely freely in the continuous space. We strongly assume that agents only move on the fixed embedding of finite graph \(G=(V,E)\) into some continuous space where vertices are assigned points and edges are assigned curves on which the definition of smooth movement is possible. Hence for example using curves other than straight lines for interconnecting vertices does not change the high-level SMT-CBS\(^\mathcal {R}\).

We plan to extend the RDD generation scheme to directional agents where we need to add the third dimension in addition to space (vertices) and time: direction (angle). The work on MAPF\(^\mathcal {R}\) could be further developed into multi-robot motion planning in continuous configuration spaces [14].