1 Introduction

A variety of robotic applications, ranging from manufacturing to logistics and service robotics, involve multiple robotic systems operating in the same workspace. In traditional, industrial domains, such as car manufacturing, the environment is fully known and predictable. This allows the robots to operate in a highly scripted manner by repeating the same predefined motions as fast as possible. New types of tasks, however, require robotic manipulators that compute high-quality paths on the fly. For instance, a team of robotic arms can be tasked to pick and sort a variety of objects that are dynamically placed on a common surface. Multiple challenges need to be addressed in the context of such applications, such as detecting the configuration of the objects and grasping. This work deals with the multi-robot motion planning (MMP) problem (Wagner and Choset 2013; Gravot and Alami 2003a; Gharbi et al. 2009) in the context of such setups, i.e., computing the paths of multiple, high-dimensional systems, such as robotic arms, that operate in a shared workspace, as shown in Fig. 1. The focus is to solve MMP in a computationally efficient way as well as in a coupled manner, which allows to argue about the quality of the resulting paths.

Fig. 1
figure 1

Planning in a coupled manner for multiple high-dimensional robots is a computationally challenging problem that motivates this work. The image shows an instance of a motion planning problem solved by the proposed approach. It involves 4 robotic arms, each with 7 degrees of freedom, operating in a shared workspace. The arms need to move simultaneously from an initial to a goal configuration

Planning for multiple, high-dimensional robotic systems is quite challenging. The motion planning problem is already computationally hard for a single robot (Canny 1988) that is a kinematic chain of rigid bodies. Thus, most approaches for multi-robot motion planning either quickly become intractable as the number of robots increases or alternatively sacrifice completeness and path quality guarantees. In particular, problem instances are especially hard when the robots operate in a shared workspace and in close proximity. In this case, it is not easy to reason for the robots in a decoupled manner. Instead, it is necessary to operate in the composite configuration space of all robots. The space requirements, however, for solving motion planning instances increase exponentially with problem dimensionality. The composite space of all robots in MPP instances is typically very high-dimensional to explore in a comprehensive and resolution complete manner, such as discretizing it with a grid and searching over it.

Sampling-based planners aim to help with such dimensionality issues by approximating the connectivity of the underlying configuration space. They construct graph-based representations, such as a roadmap or a tree data structure, which store collision free configurations and paths through a sampling process. Under certain conditions regarding the density of the corresponding graph, sampling-based planners can provide desirable path quality guarantees. Specifically, they achieve asymptotic optimality, i.e., as the sampling process progresses, the best path on the graph converges to the optimal one in the original configuration space. Nevertheless, even sampling-based planners face significant space challenges in the context of MPP problem, such as the one shown in Fig. 1, which corresponds to a 28-dimensional space. In particular, it becomes infeasible with standard, asymptotically optimal sampling-based planners to explicitly store a graph in the corresponding space that will allow the discovery of a solution in practice. This is due to the large number of samples required to cover an exponentially larger volume as the dimensionality of the underlying space increases. Asymptotically optimal planners must maintain in the order of logn edges per sample, where n is the number of samples. Thus, when planning for high-dimensional systems, the space requirements of the corresponding roadmaps surpass the capabilities of standard workstations rather quickly.

A previously proposed sampling-based planner specifically designed for multi-robot problems, called \(\mathtt{dRRT}\)  (Solovey et al. 2015a), achieved progress in this area by leveraging an implicit representation of the composite space in order to provide both completeness and efficiency. This implicit representation is a graph, which corresponds to the tensor product of roadmaps explicitly constructed for each robot. This allows finding solutions for relatively high-dimensional multi-robot motion planning problems. Nevertheless, this prior method did not provide any path quality guarantees.

One key contribution of this work is to show that the structure of this implicit representation is guaranteed (asymptotically) to contain the optimal path for a set of robots moving simultaneously. Nevertheless, defining an implicit graph that contains a high-quality solution does not guarantee that the final solution is optimal unless the search process over this graph is appropriate. While a provably optimal search approach, such as \(\mathtt A^{\text{* }}\), could be implemented to search this graph, the extremely large branching factor of the implicit roadmap makes this prohibitively expensive, especially in the context of anytime planning. Instead, this work leverages the observation that a sampling-based method inspired by RRT\(^*\), which maintains a spanning tree over the underlying implicit graph, will return optimal solutions if it allows rewiring operations during the spanning tree construction. Namely, it must converge to the tree with all of the minimum-cost paths starting from the initial query state to each other node in the graph. Further, this work shows that for a broad range of cost functions over paths in this graph can be used while still guaranteeing the proposed \(\mathtt{dRRT^*}\) approach will asymptotically converge towards such a tree.

This paper is an extension of prior work (Dobson et al. 2017), which introduced an initial version of the \(\mathtt{dRRT^*}\) and the sufficient conditions for generating an asymptotically optimal planner in this context. The current manuscript provides the following extensions:

  • A more thorough analysis that shows that the desirable guarantee can be achieved for an additional distance metric for multi-robot motion planning;

  • A more detailed description of the method, which has been further improved for computational efficiency purposes through the appropriate incorporation of heuristics;

  • The method has been extended to handle systems with shared degrees of freedom, as shown in related work (Shome and Bekris 2017).

  • The experimental section has been extended to include the new methods as well as demonstrations on physical platforms.

The following section summarizes related prior work on the subject before Sect. 3 introduces the problem setup. Section 4 describes the underlying structure of the implicit tensor-roadmap and the previous method \(\mathtt{dRRT}\) (Solovey et al. 2015a). The changes to \(\mathtt{dRRT}\) necessary to achieve asymptotic optimality and computational efficiency, which result to the proposed algorithm \(\mathtt{dRRT^*}\) are presented in Sect. 5. An analysis of the properties of the method are showcased in Sect. 6. The method is extended, in Sect. 7 to systems with shared degrees of freedom. Section 8 evaluates the methods experimentally and demonstrates their performance.

2 Prior work

The multi-robot motion planning problem (MMP) is notoriously difficult as it involves many degrees of freedom, and consequently a vast search space, as each additional robot introduces several additional degrees of freedom to the problem. Certain instances of the problem can be solved efficiently, i.e., in polynomial run time, and in a complete manner, at times even with optimality guarantees on the solution costs (Turpin et al. 2013; Adler et al. 2015; Solovey et al. 2015b). However, in general MMP is computationally intractable (Hopcroft et al. 1984; Spirakis and Yap 1984; Solovey and Halperin 2016; Johnson 2018).

Decoupled MMP techniques (Erdmann and Lozano-Perez 1987; Ghrist et al. 2005; LaValle and Hutchinson 1998; Peng and Akella 2004; Van Den Berg and Overmars 2005; Van Den Berg et al. 2009) reduce search space size by partitioning the problem into several subproblems, which are solved separately. Then, the different solutions are combined. These methods, however, typically lack completeness and optimality guarantees. While some hybrid approaches can take advantage of the inherent decoupling between robots and provide guarantees (Van Den Berg et al. 2009), they are often limited to discrete domains. The problem is more complex when the robots exhibit non-trivial dynamics (Peng and Akella 2005). Collision avoidance or control methods can scale to many robots, but lack path quality guarantees (Van Den Berg et al. 2011; Tang and Kumar 2015).

In contrast to that, centralized approaches (Kloder and Hutchinson 2005; O’Donnell and Lozano-Pérez 1989; Salzman et al. 2015; Solovey et al. 2015a; Svestka and Overmars 1998; Wagner and Choset 2013) usually work in the combined high-dimensional configuration space, and thus tend to be slower than decoupled techniques. However, centralized algorithms often come with stronger theoretical guarantees, such as completeness. Through the rest of this section we will consider centralized methods, with an emphasis on sampling-based approaches.

Sampling-based algorithms for a single robot (Kavraki et al. 1996; LaValle and Kuffner 1999; Karaman and Frazzoli 2011) can be extended to the multi-robot case by considering the fleet of robots as one composite robot (Gildardo 2002). Such an approach suffers from inefficiency as it overlooks aspects of multi-robot planning, and hence can handle only a very small number of robots. Several techniques tailored for instances of MMP involving a small number of robots have been described (Hirsch and Halperin 2004; Salzman et al. 2015).

In previous work (Solovey and Halperin 2014), an extension of MMP was introduced, which consists of several groups of interchangeable robots. At the heart of the algorithm is a novel technique where the problem is reduced to several discrete pebble-motion problems (Kornhauser et al. 1984; Luna and Bekris 2011; Yu and LaValle 2013). These reductions amplify basic samples into massive collections of free placements and paths for the robots. An improved version (Krontiris et al. 2015) of this algorithm applied it to rearrange multiple objects using a robotic manipulator.

Previous work (Svestka and Overmars 1998) introduced a different approach, which leverages the following fundamental observation: the structure of the overall high-dimensional multi-robot configuration space can be inferred by first considering independently the free space of every robot, and combining these subspaces in a meaningful manner to account for robot-robot collisions. They suggested an approach which combines roadmaps constructed for individual robots into one tensor-product roadmap \(\hat{\mathbb {G}}\), which captures the structure of the joint configuration space (see more information in Sect. 4).

Due to the exponential nature of the resulting roadmap, this technique is only applicable to problems that involve a modest number of robots. A recent work (Wagner and Choset 2013) suggests that \(\hat{\mathbb {G}}\) does not necessarily have to be explicitly represented. They apply their \(\mathtt{M^*}\) algorithm to efficiently retrieve paths over \(\hat{\mathbb {G}}\), while minimizing the explored portion of the roadmap. The resulting technique is able to cope with a large number of robots, for certain types of scenarios. However, when the degree of simultaneous coordination between the robots increases, there is a sharp increase in the running time of this algorithm, as it has to consider many neighbors of a visited vertex of \(\hat{\mathbb {G}}\). This makes \(\mathtt{M^*}\)less effective when the motion of multiple robots needs to be tightly coordinated.

Recently a different sampling-based framework for MMP was introduced, which combines an implicit representation of \(\hat{\mathbb {G}}\) with a novel approach for pathfinding in geometrically-embedded graphs tailored for MMP (Solovey et al. 2015a). The discrete-RRT (\(\mathtt{dRRT}\)) algorithm is an adaptation of the celebrated \(\mathtt{RRT}\) algorithm for the discrete case of a graph, and it enables a rapid exploration of the high-dimensional configuration space by carefully walking through an implicit representation of the tensor product of roadmaps for the individual robots (see extensive description in Sect. 4). The approach was demonstrated experimentally on scenarios that involve as many as 60 DoFs and on scenarios that require tight coordination between robots. On most of these scenarios \(\mathtt{dRRT}\) was faster by a factor of at least ten when compared to existing algorithms,including the aforementioned \(\mathtt{M^*}\).

Later, \(\mathtt{dRRT}\) was applied to motion planning of a free-flying multi-link robot (Salzman et al. 2016). In that case, \(\mathtt{dRRT}\) allowed to efficiently decouple between costly self-collision checks, which were done offline, and robot-obstacle collision checks, by traversing an implicitly-defined roadmap, whose structure resembles to that of \(\hat{\mathbb {G}}\). \(\mathtt{dRRT}\) has also been used in the study of the effectiveness of metrics for MMP, which are an essential ingredient in sampling-based planners (Atias et al. 2017).

The current work proposes \(\mathtt{dRRT^*}\) and shows that it is an efficient asymptotically optimal extension of the previously proposed \(\mathtt{dRRT}\). The \(\mathtt{dRRT^*}\) framework is an anytime algorithm, which quickly finds initial solutions and then refines them, while ensuring asymptotic convergence to optimal solutions. Simulations show that the method practically generates high-quality paths while scaling to complex, high-dimensional problems, where alternatives fail.

3 Problem setup and notation

We start with a definition of the problem. Consider a shared workspace with \(R \ge 2\) holonomic robots, each operating in a d-dimensional configuration space \(\mathbb {C}_i \subset \mathbb {R}^d\) for \(1\le i\le R\). For a given robot i, denote its free space, i.e., the set of all collision free configurations, by \(\mathbb {C}^{f }_i \subset \mathbb {C}_i\), and the obstacle space by \(\mathbb {C}^{o }_i=\mathbb {C}_i\setminus \mathbb {C}^{f }_i\).

The composite configuration space\(\mathbb {C}= \prod ^R_{i=1} \mathbb {C}_i\) is the Cartesian product of each robot’s configuration space. That is, a composite configuration \(Q = (q_1,\ldots ,q_R) \in \mathbb {C}\) is an R-tuple of robot configurations. For two distinct robots ij, denote by \(I_i^j(q_j)\subset \mathbb {C}_i\) the set of configurations of robot i, which lead into collision with robot j at its configuration \(q_j\). Then, the composite free space \(\mathbb {C}^{f }\subset \mathbb {C}\) consists of configurations \(Q=(q_1,\ldots ,q_R)\) in which robots do not collide with obstacles or pairwise with each other. Formally:

  • \(q_i \in \mathbb {C}^{f }_i\) for every \(1\le i\le R\);

  • \(q_i \not \in I_i^j(q_j), q_j \not \in I_j^i(q_i)\) for every \(1 \le i < j\le R\).

The composite obstacle space is defined as \(\mathbb {C}^{o }= \mathbb {C}\setminus \mathbb {C}^{f }\).

Multi-robot planning is concerned with finding (collision-free) composite trajectories of the form \(\varSigma :[0,1]\rightarrow \mathbb {C}^{f }\). \(\varSigma \) is an R-tuple \((\sigma _1, \ldots , \sigma _R)\) of single-robot trajectories \(\sigma _i:[0,1] \rightarrow \mathbb {C}_i\).

This work is concerned with producing high-quality trajectories, which minimize certain cost functions. In particular, we consider three cost functions \(cost (\cdot )\), which are presented below. Let \(\varSigma =(\sigma _1,\ldots ,\sigma _R)\) be a composite trajectory. For the following, \(\Vert \cdot \Vert \) denotes the standard arc length of a curve:

  • The sum of path lengths: \(cost (\varSigma )= \sum _{i=1}^R \Vert \sigma _i\Vert \).

  • The maximum path length: \(cost (\varSigma )= \max _{i=1:R} \Vert \sigma _i\Vert \).

  • The Euclidean arc length of \(\varSigma : cost (\varSigma ) = \Vert \varSigma \Vert \)

Fig. 2
figure 2

Given the individual robot roadmaps \(\mathbb {G}_i\) and \(\mathbb {G}_j\) shown on the left, the tensor product roadmap \(\hat{\mathbb {G}}_{i,j}\) arises, which is shown on the right. For each pair of nodes, where one is selected from \(\mathbb {G}_i\) and the other from \(\mathbb {G}_j\), a node is defined in \(\hat{\mathbb {G}}_{i,j}\). Two nodes in the tensor-product roadmap share an edge if their constituent nodes in the individual robot roadmaps also share an edge. For instance, nodes 1A and 3C do not share an edge in \(\hat{\mathbb {G}}_{i,j}\) because nodes 1 and 3 do not share an edge in \(\mathbb {G}_i\)

This work presents \(\mathtt{dRRT^*}\) as an efficient, anytime solution to the robustly-feasible composite motion planning (RFCMP) problem:

Definition 1

(RFCMP) Given R robots operating in composite configuration space \(\mathbb {C}= \prod ^R_{i=1} \mathbb {C}_i\), and for a given query \(S=(s_1,\ldots ,s_R), T=(t_1,\ldots ,t_R)\), an RFCMP problem is one which yields a robustly-feasible trajectory \(\varSigma : [0,1] \rightarrow \mathbb {C}^{f }\) and \(\varSigma (0)=S, \varSigma (1)=T\). Namely, there exists a fixed constant \(\delta > 0\) such that \(\forall \ \tau \in [0,1],X\in \mathbb {C}^{o }\) it holds that

$$\begin{aligned} \Vert \varSigma (\tau )-X\Vert \ge \delta . \end{aligned}$$

One of the primary objectives of this work is to provide asymptotic optimality in the composite configuration space without explicitly constructing a planning structure in this space.

Definition 2

(Asymptotic Optimality) Let m be the time budget of the algorithm and a robustly optimal solution \(\varSigma ^{(m)}\) of cost \(c^*\) is returned after time m, then asymptotic optimality is defined as ensuring that the following holds true for any \(\epsilon >0\).

$$\begin{aligned} \lim _{m\rightarrow \infty }\Pr \left[ cost (\varSigma ^{(m)})\le (1+\epsilon )c^*\right] =1. \end{aligned}$$

4 Algorithmic foundations

This section provides a detailed description of the discrete-\(\mathtt{RRT}\) (\(\mathtt{dRRT}\)) method (Solovey et al. 2015a), which is the basis of our method presented in Sect. 5. \(\mathtt{dRRT}\) was posed as an efficient way to search an implicitly defined tensor-product roadmap, which captures the structure of \(\mathbb {C}\) without explicitly sampling this space.

4.1 Tensor-product roadmap

Here we provide a formal definition of the tensor-product roadmap that \(\mathtt{dRRT}\) is designed to explore. For every robot \(1\le i\le R\) construct a PRM graph (Kavraki et al. 1996), denoted by \(\mathbb {G}_i = (\mathbb {V}_i, \mathbb {E}_i)\), which is embedded in \(\mathbb {C}^{f }_i\). That is, \(\mathbb {G}_i\) can be viewed as an approximation of \(\mathbb {C}^{f }_i\) and encodes collision free motions for robot i. The construction of \(\mathbb {G}_i\) is determined by two parameters n and \(r_n\), which represent the number of samples, and the connection radius, respectively. As will be discussed in the following sections, it is necessary the roadmaps \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\) to be constructed with certain range of parameters to guarantee asymptotic optimality of the new planners (Sect. 5).

Define the tensor-product roadmap, denoted by \(\hat{\mathbb {G}}= (\mathbb {\hat{V}}, \mathbb {\hat{E}})\), as the tensor product between \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\) (see Fig. 2). Each vertex of \(\hat{\mathbb {G}}\) describes a simultaneous placement of the R robots, and similarly an edge of \(\hat{\mathbb {G}}\) describes a simultaneous motion of the robots. Formally, \(\mathbb {\hat{V}}= \{ (v_1, v_2, \ldots , v_R ): \forall \ i,\ v_i \in \mathbb {V}_i \}\) is the Cartesian product of the nodes from each roadmap \(\mathbb {G}_i\). For two vertices \(V =(v_1,\ldots ,v_m) \in \mathbb {\hat{V}}, V'=(v'_1,\ldots ,v'_m) \in \mathbb {\hat{V}}\), the edge set \(\mathbb {\hat{E}}\) contains edge \((V,V')\) if \(\forall i \in [1,R]:\ v_i=v'_i\) or \((v_i,v'_i)\in \mathbb {E}_i\).Footnote 1 Note that by the definition of \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\), the motion described by each edge \(E\in \mathbb {\hat{E}}\) represents a path for the R robots in which the robots do not collide with obstacles. However, collisions between pairs of robots still may be possible.

It is important to note that the tensor-product roadmap has \(\Vert \mathbb {\hat{V}}\Vert = \prod _{i=1}^{R} \Vert \mathbb {\hat{V}}_i\Vert \) vertices. Given the neighborhood of a node \( v_i \) in \( \mathbb {G}_i \) as \( \mathtt {Adj}(v_i,\mathbb {G}_i) \), the size of the neighborhood of a node \( v = \{ v_1 \ldots v_R \} \) in \( \hat{\mathbb {G}}\) is \( \Vert \mathtt {Adj}(v,\hat{\mathbb {G}}) \Vert = \prod _{i=1}^{R} \Vert \mathtt {Adj}(v_i,\mathbb {G}_i) \Vert \). Using the much smaller \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\) to construct \(\hat{\mathbb {G}}\) online is computationally beneficial.

The presented algorithms share a common set of input and output parameters, such as the configuration space decompositions, which are predefined. In practice, the algorithms use pre-computed roadmaps in each constituent space online. The collision volumes that correspond to the robot and obstacles in the scene are also used online for validation. The algorithms output a trajectory in the configuration space of all robots, which is collision free with all obstacles and among robots.

4.2 Discrete RRT

An explicit construction of \(\hat{\mathbb {G}}\) is possible in very limited settings that either involve few robots, e.g., \(R=2\), or when the underlying single-robot roadmaps have few vertices and edges. However, in general it is prohibitively costly to fully represent it due to its size, which grows exponentially with the number of robots, in terms of the number of vertices. Moreover, in some cases it may be even a challenge to represent all the edges adjacent to a single vertex of \(\hat{\mathbb {G}}\), as there may be exponentially many of those.

Fig. 3
figure 3

a The method reasons over all neighbors \(q'\) of q so as to minimize the angle \(\angle _{q}(q', q'')\). b\(\mathbb {O}_d(\cdot ,\cdot )\) finds graph vertex \(V^{new }\) by minimizing angle \(\angle _{V^{near }}(V^{new },Q^{rand })\). c, d\(V^{near }\)and \(Q^{rand }\)are projected into each robot’s \(\mathbb {C}\)-space to find \(v^{new }_{i }\) and \(v^{new }_{j }\), respectively, which minimize both \(\angle _{ v^{near }_{i }} (v^{new }_{i }, q^{rand }_{i } )\) and \(\angle _{ v^{near }_{j }} (v^{new }_{j }, q^{rand }_{j })\)

The \(\mathtt{dRRT}\) algorithm enjoys the rich structure that \(\hat{\mathbb {G}}\) offers (see Sect. 6) without explicitly representing it. In particular, it gathers information on \(\hat{\mathbb {G}}\) only from the single-robot roadmaps \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\).

Similarly to the single-robot planner \(\mathtt{RRT}\) (LaValle and Kuffner 1999), \(\mathtt{dRRT}\) grows a tree rooted at the start state of the given query (Line 1). \(\mathtt{dRRT}\) restricts the growth of its tree \(\mathbb {T}\) to the tensor-product roadmap \(\hat{\mathbb {G}}\) in contrast to \(\mathtt{RRT}\), which explores the entire space \(\mathbb {C}\). That is, \(\mathbb {T}\) is a subgraph of \(\hat{\mathbb {G}}\), and \(\mathbb {T}\subset \mathbb {C}^{f }\).

The high level operations of the \(\mathtt{dRRT}\) approach are outlined in Algorithm 1. The approach will iterate until a solution is found or the time limit is exceeded (Algorithm 1, Line 2), beginning with performing a fixed number \(n_{\text {it}}\) expansion steps at the beginning of each iteration (Lines 3, 4). This expansion process is outlined in Algorithm 2. The approach then checks to see if there is a connected path to the target (Line 5), and once a path is found, it is returned (Lines 6, 7).

The expansion procedure begins by drawing a random sample \(Q^{rand }\in \mathbb {C}\) (Line 1). It then finds the nearest neighbor \(V^{near }\) in the tree (Line 2) and then selects a neighbor \(V^{new }\), such that \((V^{near },V^{new })\in \mathbb {\hat{E}}\), according to a direction oracle function \(\mathbb {O}_d\) (Line 3). Then, if \(V^{new }\) is not in the tree (Line 4), it is added to the tree (Line 5) and an edge from \(V^{near }\) to \(V^{new }\) is also added (Line 6).

Fig. 4
figure 4

a The Voronoi region \(Vor (V)\) of vertex V is shown where if \(Q^{rand }\) is drawn, vertex V is selected for expansion. b When \(Q^{rand }\) lies in the directional Voronoi region \(Vor '(V)\), the expand step expands to \(V^{new }\). c Thus, when \(Q^{rand }\) is drawn within volume \(Vol (V) = Vor (V) \cap Vor '(V)\), the method will generate \(V^{new }\) via V

We now elaborate on \(\mathbb {O}_d\). Given \(V^{near },Q^{rand }\), the oracle returns a vertex \(V^{new }\in \mathbb {\hat{V}}\) that is the neighbor of \(V^{near }\) (in \(\hat{\mathbb {G}}\)) found in the direction of \(Q^{rand }\). The crux of the approach is that \(\mathbb {O}_d\) can come up with such a neighbor efficiently without relying on explicit representation of \(\hat{\mathbb {G}}\). Let \(Q,Q', Q''\in \mathbb {C}\) and define \(\rho (Q,Q')\) to be the ray through \(Q'\) starting at Q. Then, denote \(\angle _{Q} (Q',Q'')\) as the minimum angle between \(\rho (Q,Q')\) and \(\rho (Q,Q'')\). Denote by \(\mathtt {Adj}(V^{near },\hat{\mathbb {G}})\) the set of neighbor nodes of \(V^{near }\) in \(\hat{\mathbb {G}}\), i.e., for every \(V\in \mathtt {Adj}(V^{near },\hat{\mathbb {G}})\) it holds that \((V^{near }, V)\in \mathbb {\hat{E}}\). Then

$$\begin{aligned} \mathbb {O}_d(V^{near },Q^{rand })=\mathop {\text {argmin}}\limits _{V\in \mathtt {Adj}(V^{near },\hat{\mathbb {G}})} \angle _{V^{near }}(Q^{rand },V). \end{aligned}$$
figure f
figure g

The implementation of \(\mathbb {O}_d\) (Algorithm 5) proceeds in the following manner (see a two-robot case illustrated in Fig. 3). Let \(Q^{rand }=(q^{rand }_1,\ldots ,q^{rand }_R), V^{near }=(v^{near }_1,\ldots ,v^{near }_R)\). For every robot \(1\le i\le R\), the oracle extracts from \(\mathbb {G}_i\) the neighbor \(v^{new }_i\) of \(v^{near }_i\), which minimizes the expression \(\angle _{v^{near }_i}(q^{rand }_i,v^{new }_i)\). Notice that such a search can be performed efficiently as it only requires to traverse all the neighbors of \(v^{near }_i\) in \(\mathbb {G}_i\). The combination of all \(v^{near }_i\) yields \(V^{near }\).

As in \(\mathtt{RRT}\), \(\mathtt{dRRT}\) has a Voronoi-bias property  (Lindemann and LaValle 2004). Showing that \(\mathtt{dRRT}\) exhibits Voronoi bias is slightly more involved compared to the basic \(\mathtt{RRT}\). This is illustrated in Fig. 4. To generate an edge \((V, V^{new })\), random sample \(Q^{rand }\) must be drawn within the Voronoi cell of V, denoted as \(Vor (V)\) (Fig. 4a) and in the general direction of \(V^{new }\), denoted as \(Vor '(V)\) (Fig. 4b). The intersection of these two volumes: \(Vol (V) = Vor (V) \cap Vor '(V)\), is the volume to be sampled so as to generate \(V^{new }\)via \(V^{near }\) as shown in Fig. 4.

The high-level loop of the algorithm remains similar across the method variants. The input parameter \(n_{it}\) denotes how many times the tree is expanded before the algorithm checks whether a solution has already been discovered. If \(n_{it}=1\), this check is performed every iteration. If tracing the path is an expensive operation—typically it corresponds to a heuristic search process over the tensor product roadmap—then the implementer can choose to use a higher value.

5 Asymptotically optimal discrete RRT

This section outlines two versions of the proposed asymptotically optimal variant of the \(\mathtt{dRRT}\) method. The first is a simple uninformed approach, which relies on the fact that to provide asymptotic optimality, it is sufficient to use a simple rewiring scheme. This simplified version will be called the asymptotically-optimal \(\mathtt{dRRT}\) (\(\mathtt{ao\hbox {-} dRRT}\)). For the sake of algorithmic efficiency however, a second, more advanced version is also proposed referred to as \(\mathtt{dRRT^*}\). To summarize the algorithmic contributions of the current work over the original \(\mathtt{dRRT}\):

  • \(\mathtt{dRRT^*}\) performs a rewiring step to refine paths in the tree, reducing costs to reach particular nodes.

  • \(\mathtt{dRRT^*}\) is anytime, employing branch and bound pruning after an initial solution is reached.

  • \(\mathtt{dRRT^*}\) promotes progress towards the goal during tree node selection.

  • \(\mathtt{dRRT^*}\) employs an informed expansion procedure \(\mathbb {I}_d\) capable of using heuristic guidance.

5.1 ao-\(\mathtt{dRRT}\)

This section outlines \(\mathtt{ao\hbox {-} dRRT}\), an asymptotically optimal version of the \(\mathtt{dRRT}\) algorithm which has been minimally modified to guarantee asymptotic optimality. At a high-level the approach uses a tree re-wiring technique reminiscent of \(\mathtt RRT^{\text{* }}\) (Karaman and Frazzoli 2011).

Algorithm 3 outlines \(\mathtt{ao\hbox {-} dRRT}\) which iteratively expands a tree \(\mathbb {T}\) over \(\hat{\mathbb {G}}\) given a time budget (Algorithm 3, Line 2), performing \(n_{\text {it}}\) consecutive calls to \(\mathtt{Expand\_\mathtt{ao\hbox {-} dRRT}}\) (Lines 3, 4). Then, the method attempts to trace the path \(\pi \) which connects the start S with the target T (Line 5). If such a path is found and is better than \(\pi _{best }\), it replaces \(\pi _{best }\) (Lines 6, 7). \(\pi _{best }\) is returned after the time limit is reached (Line 8).

figure h
figure i
figure j

The expansion procedure for \(\mathtt{ao\hbox {-} dRRT}\) is very similar to the original \(\mathtt{dRRT}\) method, and is outlined in Algorithm 4. It begins by drawing a random sample in the composite configuration space (Line 1), and then finds the nearest neighbor \(V^{near }\) to this sample in the tree (Line 2). It then selects a neighbor \(V^{new }\) according to the oracle function \( \mathbb {O}_d\) (Algorithm 5). This is the same oracle that is used in \( \mathtt{dRRT}\) that tries to select a neighbor of \( V^{near }\) most in the direction of \( Q^{rand }\). Then, if \(V^{new }\) is not in the tree (Line 4), it is added to the tree (Line 5) and an edge from \(V^{near }\) to \(V^{new }\) is also added (Line 6). Where this expansion step differs is that if \(V^{new }\) is already in the tree (Line 7), the method performs a rewiring step (Line 8) to check to see if the path to \(V^{new }\) is of lower cost than the existing one.

The method would be similar to \( \mathtt{dRRT}\) in terms of the samples that constitute the tree, however \( \mathtt{ao\hbox {-} dRRT}\) improves the solution cost with iterations and finds better solutions compared to \( \mathtt{dRRT}\). It is however desirable to focus the search in order find the initial solution quickly, while preserving solution quality improvement over time.

5.2 \(\mathtt{dRRT^*}\)

The main body of the informed \(\mathtt{dRRT^*}\) algorithm is provided in Algorithm 6. The proposed method is an improvement on top of \( \mathtt{ao\hbox {-} dRRT}\), that preserves the asymptotic optimality while benefiting computationally from branch-and-bound pruning once a solution is found, greedy child propagations during node selection, and heuristic guidance during expansions.

The key insight behind the algorithmic improvements is the fact that by virtue of the structure of the tensor-product roadmap\(\hat{\mathbb {G}}\), there readily exists a usable heuristic measure \(\mathbb {H}\) in the constituent roadmaps \(\mathbb {G}_i\). The shortest path on a constituent roadmap to the goal T can be used as a heuristic to guide the tree.

If there is no robot interaction introduced by the individual robot shortest paths, such a path comprising of the individual shortest paths is a solution that suffices. In cases of interaction between the robots, a shortest path is expected to deviate locally in regions of interaction. The best a robot can do from any constituent roadmap vertex is to follow the shortest path to the goal on the constituent roadmap. Although domain-specific heuristics can be also applied to the algorithm, it should be noted that in the currently proposed method, the purpose of the heuristic is to primarily discover an initial solution as quickly as possible. This in turn helps branch-and-bound kick in and further focuses the search once a bounding cost is ascertained from the initial solution.

At a high level, Algorithm 6 follows the structure of \(\mathtt{ao\hbox {-} dRRT}\). The only change is that the outer loop keeps track of the tree node being added as \(V^{last }\) and passes it on to the next call to the \(\mathtt {Expand\_\mathtt{dRRT^*}}\) subroutine. The use of this information to apply heuristic guidance is detailed in the description of the function.

Algorithm 7 outlines the expansion step. The default behavior is summarized in Algorithm 7, Lines 1–3, i.e., when no \(V^{last }\) is passed as argument (Line 1). This operation corresponds to an exploration step similar to \(\mathtt{RRT}\), i.e., a random sample \(Q^{rand }\) is generated in \(\mathbb {C}\) (Line 2) and its nearest neighbor \(V^{near }\) in \(\mathbb {T}\) is found (Line 3). If the last iteration generated a node \(V^{last }\) that was closer to the goal relative to its parent, then \(V^{last }\) is provided to the function. In this case (Line 4–6) \(Q^{rand }\) is set as the target configuration T, and \(V^{near }\) is selected as \(V^{last }\). This constitutes the greedy child propagations which promotes progress towards the goal.

figure k
figure l
figure m

Informed expansion\(\mathbb {I}_d\) The expansion procedure in Algorithm 8 replaces the oracle in \(\mathtt{dRRT}\). It switches between distinct guided and exploratory behaviors according to whether \(q^{rand}_i\) attempts to drive the expansion towards the target T or not. When the method uses heuristic guidance (Fig. 5), among all the neighbors of \(v^{near }_i\) on a constituent roadmap \(\mathbb {G}_i\), \(\mathtt {Adj}(v^{near }_i, \mathbb {G}_i)\), the one with the best heuristic measure \(\mathbb {H}\) is selected. During the exploration phase, the method selects a random neighbor out of \(\mathtt {Adj}(v^{near }_i, \mathbb {G}_i)\).

Fig. 5
figure 5

The method selects a neighbor with the best heuristic estimate \(\mathbb {H}\) from each of the constituent roadmaps during the guided expansion phase in \(\mathbb {I}_d\)

In either case, the oracle function \(\mathbb {I}_d\) returns to Algorithm 7 the implicit graph node \(V^{new }\) that is a neighbor of \(V^{near }\) on the implicit graph (Line 7). Then the method finds neighbors N, which are adjacent to \(V^{new }\) in \(\hat{\mathbb {G}}\) and have also been added to \(\mathbb {T}\) (Line 8). Among N, the best node \(V^{best }\) is chosen as the node to connect \( V^{new }\) according to cost measure. Such an operation might yield no valid parent \( V^{best }\) due to collisions along \( \mathbb {L}(\cdot ) \). In such a case (Line 10) the method fails to add a node during the current iteration. Line 11 implements a branch-and-bound based on the cost of the best solution so far.

Lines 12–15 recount the tree addition and parent rewire process. Lines 16–17 perform an additional rewiring step in the neighborhood if \(V^{new }\) is a better parent of any of the neighboring nodes. Line 19 switches the child promotion by checking whether \(V^{new }\) made progress toward the goal according to the heuristic measure. The method ensures that in this case \(V^{new }\)(or \(V^{last }\)) is a child-promoted node, which would be selected during the next iteration. This effect of this behavior is that if the uncoordinated individual shortest paths are collision free, this would be greedily attempted first from the child-promoted nodes added to the tree. Evaluations indicate that this proves very effective in practice.

It should be noted that all candidate edges \(\mathbb {L}(\cdot )\) in Line 9 and 17 are collision-free and for the sake of algorithmic clarity, collision checking has been assumed to be encoded into the steering function \(\mathbb {L}(\cdot )\) and this is enforced during tree additions and rewires. Any specialized sampling behavior is assumed to be part of the implementation of the subroutine \(\mathtt {Random\_Sample}\).

Notes on implementation In the implementation, the heuristic measure \( \mathbb {H}\) is efficiently calculated by precomputing all-pair shortest paths on the constituent graphs with Johnson’s algorithm (Johnson 1977), which runs in \( \mathcal {O}(|V|^2log|V|) \). Precomputing the heuristic measure alleviates any overhead of spending online computation time. It is proposed that for large graphs, the vertices can be subsampled and the heuristic estimated for representative nodes that approximate the \(\mathbb {H}\) value in their neighborhoods. The neighbor with the best \(\mathbb {H}\) value can be computed once for a given target T, and reused during the iterations inside \(\mathbb {I}_d\). In Algorithm 7 (Line 19), \( \mathbb {H}\) refers to a heuristic estimate in the composite space. This can be deduced from the constituent spaces. The method also included additional focused random sampling (Gammell et al. 2015) once a solution is found to aid in convergence. For a fraction of the “random samples”, goal biasing samples the target state for a robot.

The set \( \mathtt {Adj}(\cdot ) \) returns the set of neighbors on the graph for a vertex, in addition to the vertex itself. This ensures that it is possible for a robot to stay static during an edge expansion. This means that the algorithm is also able to discover solutions where a subset of the robots must remain stationary for a period of time.

6 Analysis

This section examines the properties of \(\mathtt{dRRT^*}\) starting with the asymptotic convergence of the implicit roadmap \(\hat{\mathbb {G}}\) to containing a path in \(\mathbb {C}^{f }\) with optimum cost. Then, it is shown that the online search eventually discovers the shortest path in \(\hat{\mathbb {G}}\). The combination of these two facts proves the asymptotic optimality of \(\mathtt{dRRT^*}\)and \(\mathtt{ao\hbox {-} dRRT}\).

For simplicity, the analysis considers robots operating in Euclidean space, i.e., \(\mathbb {C}_i\) is a d-dimensional Euclidean hypercube \([0,1]^d\) for fixed \(d \ge 2\). Robots are assumed to have the same number of degrees of freedom d. The results can relate to a large class of systems, which are locally Euclidean (see, Dobson and Bekris (2013)). This is applicable to all the systems under consideration in the paper, including manipulators, with bounded angular degrees of freedom. Analysis of systems, which are not locally Euclidean, requires additional rigor especially regarding the definition of the cost metric. The Discussion section includes a description of a possible extension of the presented analysis to non-holonomic systems. It is acknowledged that the arguments presented in the current section will not readily transfer to such systems.

6.1 Optimal convergence of \(\hat{\mathbb {G}}\)

In this section we prove that when the connection radius r(n)Footnote 2 used for the construction of the single-robot PRM roadmaps \(\mathbb {G}_1,\ldots ,\mathbb {G}_R\) is chosen in a certain manner, this yields a tensor-product graph \(\hat{\mathbb {G}}\), which contains asymptotically optimal paths for MMP.

Definition 3

A trajectory \(\varSigma :[0,1]\rightarrow \mathbb {C}^{f }\) is robust if there exists a fixed \(\delta >0\) such that for every \(\tau \in [0,1],X\in \mathbb {C}^{o }\) it holds that \(\Vert \varSigma (\tau )-X\Vert \ge \delta \), where \(\Vert \cdot \Vert \) denotes the standard Euclidean distance.

Definition 4

Let \(cost \) be one of the cost functions defined in Sect. 3. A value \(c>0\) is robust if for every fixed \(\epsilon >0\), there exists a robust path \(\varSigma \), such that \(cost (\varSigma )\le (1+\epsilon )c\). The robust optimum\(c^*\), is the infimum over all such values.

For any fixed \(n\in \mathbb {N}^+\), and a specific instance of \(\hat{\mathbb {G}}\) constructed from R roadmaps, having n samples each, denote by \(\varSigma ^{(n)}\) the lowest-cost path (with respect to \(cost (\cdot )\)) from S to T over \(\hat{\mathbb {G}}\).

Definition 5

\(\hat{\mathbb {G}}\) is asymptotically optimal (AO) if for every fixed \(\epsilon > 0\) it holds that \(cost (\varSigma ^{(n)}) \le (1+\epsilon )c^*\) asymptotically almost surely,Footnote 3 where the probability is over all the instantiations of \(\hat{\mathbb {G}}\) with n samples for each PRM.

Using this definition, the following theorem is proven. Recall that d denotes the dimension of a single-robot configuration space.

Theorem 1

\(\hat{\mathbb {G}}\) is AO when

$$\begin{aligned} r(n)\ge r^*(n)= \gamma \left( \frac{\log n}{n} \right) ^{\frac{1}{d}}, \end{aligned}$$

where \(\gamma = (1+\eta )2 \left( \frac{1}{d} \right) ^{\frac{1}{d}} \left( \frac{\mu (\mathbb {C}^{f })}{\zeta _d} \right) ^{\frac{1}{d}}\) where \(\eta \) is any constant larger than 0, \(\mu \) is the volume measure and \(\zeta _d\) is the volume of an unit hyperball in \(\mathbb {R}^d\).

Since the method deals with solving the problem of finding a robust optimum solution, some \(\epsilon >0\) is fixed. By definition of the problem, there exists a robust trajectory \(\varSigma :[0,1] \rightarrow \mathbb {C}^{f }\), and a fixed \(\delta >0\), such that \(cost (\varSigma )\le (1+\epsilon ) c^*\). Additionally for every \(X \in \mathbb {C}^{o },\ \tau \in [0,1]\) it holds that \(\Vert \varSigma (\tau )-X\Vert \ge \delta \).

If it can be shown that \(\hat{\mathbb {G}}\) contains a trajectory \(\varSigma ^{(n)}\), such thatFootnote 4:

$$\begin{aligned} cost (\varSigma ^{(n)})\le (1+o(1))\cdot cost (\varSigma ) \end{aligned}$$
(1)

a.a.s., this would imply that \(cost (\varSigma ^{(n)}) \le (1+\epsilon )c^*\), proving Theorem 1.

As a first step, it will be shown that the robustness of \(\varSigma = (\sigma _1,\ldots ,\sigma _R)\) in the composite space implies robustness in the single-robot setting, i.e., robustness along \(\sigma _i\).

For \(\tau \in [0,1]\) define the forbidden space parameterized by \(\tau \) as

$$\begin{aligned} \mathbb {C}^{o }_i(\tau ) = \mathbb {C}^{o }_i \cup \bigcup _{j=1, j \ne i}^R I_i^j( \sigma _j (\tau ) ). \end{aligned}$$
(2)

Claim 1

For every robot i, \(\tau \in [0,1]\), and \(q_i \in \mathbb {C}^{o }_i(\tau )\), \(\Vert \sigma _i(\tau ) - q_i \Vert \ge \delta \), i.e., the robustness of \(\varSigma = (\sigma _1,\ldots ,\sigma _R)\) in the composite space implies robustness over all single-robot paths \(\sigma _i\).

Proof

Fix a robot i, and fix some \(\tau \in [0,1]\) and a configuration \(q_i \in \mathbb {C}^{o }_i(\tau )\). Next, define the following composite configuration

$$\begin{aligned} Q = (\sigma ^1(\tau ), \ldots , q_i, \ldots , \sigma ^R(\tau )). \end{aligned}$$

Note that it differs from \(\varSigma (\tau )\) only in the i-th robot’s configuration. By the robustness of \(\varSigma \) it follows that

$$\begin{aligned} \delta&\le \Vert \varSigma (\tau ) - Q \Vert \\&= \bigg ( \Vert \sigma _i(\tau ) - q_i \Vert ^2 + \sum _{j=1,j \ne i}^R \Vert \sigma _j(\tau ) - \sigma _j(\tau ) \Vert ^2 \bigg )^{\frac{1}{2}} \\&\le \Vert \sigma _i(\tau ) - q_i \Vert . \end{aligned}$$

\(\square \)

The result of Claim 1 is that the paths \(\sigma _1, \ldots , \sigma _R\) are robust in their individual spaces w.r.t the parameterized forbidden space \( \mathbb {C}^{o }_i(\tau ) \). This means that there is sufficient clearance for the individual robots to not collide with each other given a fixed location of a single robot.

Next, a Lemma is derived using proof techniques from the literature (Janson et al. 2015, Theorem 4.1), and it implies every \(\mathbb {G}_i\) contains a single-robot path \(\sigma _i^{(n)}\) that converges to \(\sigma _i\).

Lemma 1

For every robot i, let \(\mathbb {G}_i\) be constructed with n samples and a connection radius \(r(n)\ge r^*(n)\). Then it contains a path \(\sigma _i^{(n)}\) with the following attributes a.a.s.:

  1. (i)

    \(\sigma _i^{(n)}(0) = s_i\), \(\sigma _i^{(n)}(1) = t_i\);

  2. (ii)

    \(\Vert \sigma _i^{(n)}\Vert \le (1 + o(1)) \Vert \sigma _i\Vert \);

  3. (iii)

    \(\forall q \in Im (\sigma _i^{(n)})\), \(\exists \tau \in [0,1]\) s.t. \(\Vert q-\sigma _i(\tau )\Vert \le r^*(n)\), where Im\((\cdot )\) is the function image.

Proof

The first two properties of Lemma (i) and (ii) restate (Janson et al. 2015, Theorem 4.1), which is applicable to the setup of this work. The last property (iii) is an immediate corollary of the first two: due to the fact that \(\sigma _i^{(n)}\) is obtained from \(\mathbb {G}_i\), every point along the path is either a vertex of the graph, or lies on a straight-line path (i.e., an edge) between two vertices, whose length is at most \(r^*(n)\). \(\square \)

To complete the proof of Theorem 1 it remains to be shown that the combination of \(\sigma _1^{(n)},\ldots , \sigma _R^{(n)}\) yields the trajectory, \( \varSigma ^{(n)} \) of a desired cost, i.e., one that conforms to Eq. 1. The bound derived in Lemma 1 (ii) looks like what we need for proving Theorem 1. Even though a similar bound exists in the individual spaces, it needs to be shown that Eq. 1 holds for a cost function in \( \mathbb {C}^{f }\). We proceed to show this individually for the different cost functions.

6.1.1 Optimal convergence for a linear combination of Euclidean arc lengths

Lemma 2

Given Lemma 1 (ii), Eq. 1 holds for a cost function \(cost (\cdot )\) that is a linear combination of Euclidean arc lengths

Proof

Here consider the case that \(cost (\varSigma )= \sum _{i=1}^R \Vert \sigma _i\Vert \), which can also be easily modified for \(\max _{i=1:R} \Vert \sigma _i\Vert \) or some arbitrary linear combination of the arc lengths.

In particular, define \(\varSigma ^{(n)}=(\sigma _1^{(n)},\ldots , \sigma _R^{(n)})\), where \(\sigma _i^{(n)}\) are obtained from Lemma 1. Then

$$\begin{aligned} cost (\varSigma ^{(n)})=\sum _{i=1}^R\Vert \sigma _i^{(n)}\Vert&\le (1 + o(1))\sum _{i=1}^R\Vert \sigma _i\Vert \\&\le (1+o(1))cost (\varSigma ). \end{aligned}$$

\(\square \)

Lemma 3

A path \(\varSigma ^{(n)} = (\sigma _1^{(n)}\ldots \sigma _R^{(n)})\) exists, that satisfies the properties of Lemma 1, and is collision free both in terms of robot-obstacle and robot-robot.

Proof

Every constituent roadmap \(\mathbb {G}_i\) of n samples is constructed to satisfy Lemma 1 and contains individual robot paths \(\sigma _i^{(n)}\). \(\hat{\mathbb {G}}\) defines the tensor-product graph in the composite configuration space \(\mathbb {C} \). The path \( \varSigma ^{(n)}\) is a combination of the individual robot paths \(\sigma _i^{(n)}\). Lemma 1 implies that \(\hat{\mathbb {G}}\) contains a path \( \varSigma ^{(n)}\) in \(\mathbb {C} \), that represents collision-free motions relative to obstacles, and minimizes the cost function. Nevertheless, it is not clear whether this ensures the existence of a path where robot-robot collisions are avoided. That is, although \(Im (\sigma ^{(n)}_i)\subset \mathbb {C}^{f }_i\), it might be the case that \(Im (\varSigma ^{(n)})\cap \mathbb {C}^{o }\ne \emptyset \). Next, it is shown that \(\sigma _1^{(n)},\ldots , \sigma _R^{(n)}\) can be reparametrized to induce a composite-space path whose image is fully contained in \(\mathbb {C}^{f }\), with length equivalent to \(\varSigma ^{(n)}\).

For each robot i, denote by \(V_i=(v_i^1,\ldots ,v_i^{\ell _i})\) the chain of \(\mathbb {G}_i\) vertices traversed by \(\sigma ^{(n)}_i\). For every \(v_i^j\in V_i\) assign a timestamp \(\tau _i^j\) of the closest configuration along \(\sigma _i\), i.e.,

$$\begin{aligned} \tau _i^j=\mathop {\text {argmin}}\limits _{\tau \in [0,1]}\Vert v_i^j-\sigma _i(\tau )\Vert . \end{aligned}$$

Also, define \(\mathcal {T}_i=(\tau _i^1,\ldots ,\tau _i^{\ell _i})\) and denote by \(\mathcal {T}\) the ordered list of \(\bigcup _{i=1}^R\mathcal {T}_i\), according to the timestamp values. Now, for every i, define a global timestamp function \(T\!S_i:\mathcal {T}\rightarrow V_i\), which assigns to each global timestamp in \(\mathcal {T}\) a single-robot configuration from \(V_i\). It thus specifies in which vertex robot i resides at time \(\tau \in \mathcal {T}\). For \(\tau \in \mathcal {T}\), let j be the largest index, such that \(\tau _i^j \le \tau \). Then simply assign \(T\!S_i(\tau )= \tau _i^j\). From property (iii) in Lemma 1 and Claim 1 it follows that no robot-robot collisions are induced by the reparametrization. This concludes the proof of Theorem 1. \(\square \)

6.1.2 Optimal convergence for Euclidean arc length

Arguments for convergence of the cost of the solution in terms of the Euclidean arc length of the composite path \( \varSigma \) can be made to extend the results of Lemma 2. A robot having dDoFs, \( (F_1, \ldots F_d) \) exists in an \(\mathbb {R}^d\) space. The motion of the robot constitutes a curve in \(\mathbb {R}^d\), defined as

$$\begin{aligned} \varSigma = (f_1(t), \ldots f_d(t)), \end{aligned}$$

where \(f_i(t)\) is the coordinate function of the curve \(\varSigma \) along the DoF\(F_i\), where \(t\in [0,1]\) and \(i \in [1,d]\). The Euclidean arc-length of the path in \(\mathbb {R}^d\):

$$\begin{aligned} \Vert \varSigma \Vert = \int { \sqrt{(f'_1(t)^2 + \cdots + f'_d(t)^2)}}\ dt. \end{aligned}$$
(3)

The coordinate function is assumed to be continuous with respect to the Lebesgue measure on t and of bounded variation. The Lebesgue measure assigns a measure to subsets of n-dimensional Euclidean space. For \(n = 1, 2,\) or 3, it coincides with the standard measure of length, area, or volume. The assumption here is that the curve is Lebesgue integrable over the coordinate function. This relates to the variation of the curve being smooth for the parametrization, over subsets of the curve that correspond to the Lebesgue measure over t. \( \varSigma \) is also a rectifiable curve, i.e., the curve has finite length.

Definition 6

The partial arc length is defined as,

$$\begin{aligned} s(x) = \Vert \varSigma :[0,x]\Vert , x\le 1. \end{aligned}$$

By definition and assuming smoothness and bounded variation (Pelling 1977), for \(t=0\) to \(t=x \le 1:\)

$$\begin{aligned} s(x) = \sup _{P} \sum _{j=1}^{m} \sqrt{\bigg ( \sum _{i=1}^{d} ( f_i(t_j) - f_i(t_{j-1} ) )^2 \bigg )}. \end{aligned}$$
(4)

The value is the supremum over all possible finite partitions \(P:0=t_0<\cdots <t_m=x\), that can divide t. This generates a finite set of m parts.

We denote the value of s(x) for some partition P as:

$$\begin{aligned} s(x)_P = \sum _{j=1}^{m} \sqrt{ \sum _{i=1}^{d} ( f_i(P(j)) - f_i(P({j-1}) ) )^2 }. \end{aligned}$$
(5)

A part shall refer to the curve between \( \varSigma (P(j-1)) \) and \( \varSigma (P(j)) \). The measure of the part would be

$$\begin{aligned} \mu _{j-1,j}^P = s(P(j)) - s(P(j-1)). \end{aligned}$$
(6)

Let \(P^*\) be the finite supremum partitioning over t, that has m parts. This means, \(s(x) = s(x)_{P^*}\).

Without loss of generality, let us assume that \( P^* \) corresponds to the supremum partitioning that has the least number of parts, \( |P^*|=m+1 \), i.e., there are no degenerate partitions. A finer partition can introduce additional parameterization over t, and hence is a superset of \( P^* \), but cannot increase the value of s(x) since \( P^* \) is the supremum. Note that a partition sequence with m parts will have a cardinality of \(m+1\). The finer partition has all these \(m+1\) parametrization values in addition to others. Since s(x) is finite and m is finite, \( \exists P^* \ | \ \mu _{j-1,j}^{P^*} \in \mathbb {R}_{+} \ \forall \ j \in t \).

Fig. 6
figure 6

Partitions of the curves in the full space and the composite spaces (bottom). \(P^*\) is the supremum partition but the parts do not have equal measures. \(P^*\) is subdivided into a finer partition Q with \(L=6\) equal measure parts. Let the composite curves exist in individual robot configuration spaces \(\mathbb {C}_0\) and \(\mathbb {C}_1\), respectively. These curves also have a finer partition with 6 equal parts

Claim 2

Given a finite set of paths \(\xi \), there exists a finer partitioning, \(Q^*\) for \(P^*\) over each \(\varSigma \in \xi \), which yields L number of equal measure parts (Fig. 6) for every \(\varSigma \in \xi \).

Proof

$$\begin{aligned} \exists L \ \ \text {s.t.} \ \ s(1)_{Q^*} = s(1)_{P^{*}} = \Vert \varSigma \Vert , \\ Q^* \supseteq P^*,\ |Q^*| = L+1,\\ \mu _{j-1,j} = \mu _{k-1,k} = \frac{\Vert \varSigma \Vert }{L} \in \mathbb {R}_{+} \ \ \forall \ j,k \in [1,\ldots L]. \end{aligned}$$

This holds true for every \( \varSigma \) for a corresponding \( Q^* \supseteq P^* \). The measure of every part\( \varSigma (l) \) is equal, and is denoted by \( \Vert \varSigma (l) \Vert = \frac{\Vert \varSigma \Vert }{L} \). This simplifies Eq. 5 to \( \Vert \varSigma \Vert = \sum _{l=1}^{L} \Vert \varSigma (l) \Vert \). \(\square \)

Claim 3

Additionally, by this simplification, Eq. 5 in the composite space is restated for \(\varSigma _{Rd}=\{\varSigma _1\ldots \varSigma _R\}\) where \(\varSigma _{Rd}, \varSigma _1\ldots \varSigma _R\in \xi \).

Proof

In the multi-robot space Euclidean space \(\mathbb {R}^{Rd}\), the arc length in the composite space can be expressed in terms of the arc lengths traversed in the individual robot spaces.

$$\begin{aligned} \Vert \varSigma _{Rd} \Vert&= s(1)_{Q^*} \\&= \sum _{l=1}^{L} \sqrt{ \sum _{i=1}^{Rd} ( f_i(Q^*(j)) - f_i(Q^*({j-1}) ) )^2 } \\&= \sum _{l=1}^{L} \sqrt{ \sum _{i=1}^{d} ( \delta f_i )^2 + \cdots + \sum _{i=(R-1)d+1}^{Rd} ( \delta f_i )^2} \\&= \sum _{l=1}^{L} \sqrt{ \Vert \varSigma _1(l) \Vert ^2 + \cdots + \Vert \varSigma _R(l) \Vert ^2} \\&= \sum _{l=1}^{L} \sqrt{ \sum _{i=1}^{R} \Vert \varSigma _i(l) \Vert ^2 }, \end{aligned}$$

where, with a slight abuse of notation \( \delta f_i \) is a shorthand representation for some \( f_i(Q^*(j)) - f_i(Q^*({j-1}) ) \). \(\square \)

Lemma 4

For a \(\varSigma ^{(n)} = (\sigma _1^{(n)}\ldots \sigma _R^{(n)})\), where \(\sigma _i^{(n)}\) is obtained from Lemma 1, given that \(\Vert \sigma _i^{(n)}\Vert \le (1 + o(1)) \Vert \sigma _i\Vert \), Eq. 1 holds for the Euclidean arc lengths.

Proof

Partitioning the arcs \(\sigma _i^{(n)}\), and \(\sigma _i\), into L (chosen as per Claim 2) pieces of equal length, yields two trajectory sequences, for \(\ l \in N_+, l \le L\).

The high level idea is that leveraging the uniformity in the parameterized parts introduced by L, Lemma 1 (ii) has to be recombined to represent the Euclidean arc length in the composite space.

$$\begin{aligned}&\Vert \sigma _i^{(n)}\Vert \le (1 + o(1)) \Vert \sigma _i\Vert \end{aligned}$$
(Using Lemma 1)
$$\begin{aligned} \Rightarrow&\sum _{l=1}^L \Vert \sigma _i^{(n)}(l) \Vert \le (1 + o(1)) \sum _{l=1}^L \Vert \sigma _i(l) \Vert \\ \Rightarrow&\Vert \sigma _i^{(n)}(l) \Vert \le (1 + o(1)) \Vert \sigma _i(l) \Vert \\ \Rightarrow&\Vert \sigma _i^{(n)}(l)\Vert ^2 \le (1 + o(1))^2 \Vert \sigma _i(l)\Vert ^2. \\&\text {Combining over } R \text { using Claim~3,}\\&\sum _{i=1}^R\Vert \sigma _i^{(n)}(l)\Vert ^2 \le (1 + o(1))^2\sum _{i=1}^R\Vert \sigma _i(l)\Vert ^2 \\ \Rightarrow&L\sqrt{\sum _{i=1}^R\Vert \sigma _i^{(n)}(l)\Vert ^2 } \le (1 + o(1)) L\sqrt{\sum _{i=1}^R\Vert \sigma _i(l)\Vert ^2 } \\ \Rightarrow&\Vert \varSigma ^{(n)}\Vert \le (1+o(1))\Vert \varSigma \Vert \end{aligned}$$
(Using Claim 2 and 3)

\(\square \)

Following the same parameterization described in Lemma 3, Theorem 1 can be shown for the Euclidean metric as well.

6.2 Asymptotic optimality of \(\mathtt{dRRT^*}\)

Finally, \(\mathtt{dRRT^*}\) is shown to be AO. Denote by m the time budget in Algorithm 6, i.e., the number of iterations of the loop. Denote by \(\varSigma ^{(n,m)}\) the solution returned by \(\mathtt{dRRT^*}\) for n samples in the individual constituent roadmaps and m iterations of the \(\mathtt{dRRT^*}\) algorithm.

Theorem 2

If \(r(n)>r^*(n)\) then for every fixed \(\epsilon >0\) it holds that

$$\begin{aligned} \lim _{n,m\rightarrow \infty }\Pr \left[ cost (\varSigma ^{(n,m)})\le (1+\epsilon )c^*\right] =1. \end{aligned}$$

Since \(\hat{\mathbb {G}}\) is AO (Theorem 1), it suffices to show that for any fixed n, and a fixed instance of \(\hat{\mathbb {G}}\), defined over RPRMs with n samples each, \(\mathtt{dRRT^*}\) eventually (as m tends to infinity), finds the optimal trajectory over \(\hat{\mathbb {G}}\). This property is stated in Lemma 5 and proven subsequently. The same arguments hold for both \( \mathtt{dRRT^*}\) and \( \mathtt{ao\hbox {-} dRRT}\), with the difference highlighted explicitly in the proof.

Lemma 5

(Optimal Tree Convergence of \(\mathtt{dRRT^*}\)) Consider an arbitrary optimal path \(\pi ^*\) originating from \(v_0\) and ending at \(v_{t}\), then let \(O^{(m)}_k\) be the event such that after m iterations of \(\mathtt{dRRT^*}\), the search tree \(\mathbb {T}\) contains the optimal path up to segment k. Then,

$$\begin{aligned} \liminf _{m \rightarrow \infty } \mathbb {P}\big ( O^{(m)}_t \big ) = 1. \end{aligned}$$

Proof

This is shown using Markov chain results (Grinstead and Snell 2012, Theorem 11.3). Specifically, absorbing Markov chains can be leveraged to show that \(\mathtt{dRRT^*}\) will eventually contain the optimal path over \(\hat{\mathbb {G}}\). An absorbing Markov chain has some subset of its states in which the transition matrix only allows self-transitions.

The proof follows by showing that the \(\mathtt{dRRT^*}\) method can be described as an absorbing Markov chain, where the target state of a query is represented as an absorbing state in a Markov chain. For completeness, the theorem is re-stated here.

Theorem 3

(Thm 11.3 in Grinstead & Snell) In an absorbing Markov chain, the probability that the process will be absorbed is 1 (i.e., \(Q(m) \rightarrow 0\) as \(n \rightarrow \infty \)), where Q(m) is the transition submatrix for all non-absorbing states.

The first part is that the \(\mathtt{dRRT^*}\) search is cast as an absorbing Markov chain, and second, that the transition probability from each state to the next is nonzero, i.e., each state eventually connects to the target.

For query (ST), let the sequence \(V = \{ v_1, v_2, \cdots , v_{t }\}\) of length t represent the vertices of \(\hat{\mathbb {G}}\) corresponding to the optimal path through the graph which connects these points, where \(v_{t }\) corresponds to the target vertex, and furthermore, let \(v_{t }\) be an absorbing state. Theorem 3 works under the assumption that each vertex \(v_{i }\) is connected to an absorbing state \(v_{t }\).

Then, let the transition probability for each state have two values, one for each state transitioning to itself, which corresponds to the \(\mathtt{dRRT^*}\) search expanding along some other arbitrary path. The other value is a transition probability from \(v_{i }\) to \(v_{i +1}\). This corresponds to two slightly different cases for \( \mathtt{ao\hbox {-} dRRT}\) and \( \mathtt{dRRT^*}\).

Case \(\mathtt{ao\hbox {-} dRRT}\) The transition probability from \(v_{i }\) to \(v_{i +1}\) corresponds to the method sampling within the volume \(Vol (v_{i })\). Then, as the second step, it must be shown that this volume has a positive probability of being sampled in each iteration. It is sufficient then to argue that \(\frac{\mu (Vol (s_{i }))}{\mu (\mathbb {C}^{f })} > 0\). Fortunately, for any finite n, previous work has already shown that this is the case given general position assumptions (Solovey et al. 2015a, Lemma 2).

Case \(\mathtt{dRRT^*}\) In the case of \( \mathtt{dRRT^*}\) due to the random neighborhood selection in the expansion \( \mathbb {I}_d\), there is a positive transition probability from \(v_{i }\) to \(v_{i +1}\).

Given these results, the \(\mathtt{dRRT^*}\) is cast as an absorbing Markov chain, which satisfies the assumptions of Theorem 3, and therefore, the matrix \(Q(m) \rightarrow 0\). This implies that the optimal path to the goal has been expanded in the tree, and therefore \( \liminf _{m \rightarrow \infty } \mathbb {P}\big ( O^{(m)}_t \big ) = 1\). \(\square \)

7 Extension to shared degrees of freedom

This section describes an extension of the \( \mathtt{dRRT^*}\) approach to systems with shared degrees of freedom (DoF), with specific focus on humanoid robots with two arms. The challenge here arises because of the high dimensionality of the robots. The shared DoF is a general formulation, which can refer to either degrees of freedom in a torso or a mobile base etc.

This section is structured in the same way as the rest of the algorithmic descriptions, and a lot of the shared notations and details are omitted for the sake of brevity. Instead, the interesting insights into the problems that arise due to the shared DoF are highlighted, and resolved. A high level overview of the differences of dual-arm \( \mathtt{dRRT^*}\) (\( \mathtt da\hbox {-}dRRT^*\)) from the previously stated methods includes:

  • \(\mathtt da\hbox {-}dRRT^*\) decomposes the space by grouping the shared DoF with one of the arms.

  • \(\mathtt da\hbox {-}dRRT^*\) implicitly builds two trees online that explores two tensor roadmaps.

  • \(\mathtt da\hbox {-}dRRT^*\) needs additional arguments for proving robustness in Claim 1.

Fig. 7
figure 7

The decomposition of the full configuration space into the arm spaces (\(\mathbb {C}_{\text {{l}}} \) and \(\mathbb {C}_{\text {{r}}} \)), and the shared degrees of freedom (\(\mathbb {C}_{\text {{s}}} \)) such that \(\mathbb {C} = \mathbb {C}_{\text {{l}}} \times \mathbb {C}_{\text {{s}}} \times \mathbb {C}_{\text {{r}}} \)

Fig. 8
figure 8

The image on the left is an illustration of the decomposition of the space to create arm-shared DoF roadmaps \(\mathbb {R}\) and arm only roadmaps \(\mathbb {P}\). The example has three vertices in each roadmap consisting of a combination Left (l), Shared (s), and Right(r) values. For the sake of clarity the vertices on the arm-only roadmaps correspond to the \(\mathbb {R}\) roadmaps. The image on the right shows the connectivity in the tensor product roadmap \(\hat{\mathbb {G}}_l= \mathbb {R}_{\text {ls}}\times \mathbb {P}_{\text {r}}\). A similar tensor product is obtained for \(\hat{\mathbb {G}}_r\)

The current work does not get into aspects related to manipulation. Nevertheless, the primitives designed here can speed up dual-arm manipulation task planning, where computational benefits can be achieved by operating over multiple roadmaps (Gravot et al. 2002; Gravot and Alami 2003b). The topology of dual-arm manipulation has been formalized (Koga and Latombe 1994; Harada et al. 2014) and extended to the N-arm case (Dobson and Bekris 2015). It requires the consideration of multi-robot grasp planning (Vahrenkamp et al. 2010; Dogar et al. 2015), regrasping (Vahrenkamp et al. 2009), as well as closed kinematic chain constraints (Cortés and Siméon 2005; Bonilla et al. 2017). Furthermore, force control strategies are helpful for multi-arm manipulation of a common object (Caccavale and Uchiyama 2008). Recently coordinated control has been applied to solve human-robot interaction tasks (Sina Mirrazavi Salehian et al. 2016).

The algorithm is meant to address the applicability of \( \mathtt{dRRT^*}\) to high dimensional humanoid robots with shared DoF.

7.1 Problem setup and notation

As shown in Fig. 7, the DoF\([F_1, \ldots , F_d]\) can be grouped into left, right, and shared DoF subsets, so that: \(\mathbb {C} = \mathbb {C}_{\text {{l}}} \times \mathbb {C}_{\text {{s}}} \times \mathbb {C}_{\text {{r}}} \). A candidate solution path \(\varSigma :[0,1]\rightarrow \mathbb {C}^{f }\) can be decomposed to projections \([\varSigma _{l}, \varSigma _{s}, \varSigma _{r}]\) along \(\mathbb {C}_{\text {{l}}} \), \(\mathbb {C}_{\text {{s}}} \) and \(\mathbb {C}_{\text {{r}}} \) respectively.

The method proposes the construction of the following roadmaps, as shown in Fig. 8:

  • A left-shared \(\mathbb {R}_{ls}(\mathbb {V}_{ls}, \mathbb {E}_{ls})\) and a right-shared DoF roadmap \(\mathbb {R}_{sr}(\mathbb {V}_{sr}, \mathbb {E}_{sr})\), where \(\mathbb {V}_{ls} \subset \mathbb {C}_{\text {{l}}} \times \mathbb {C}_{\text {{s}}} \) and \(\mathbb {V}_{sr} \subset \mathbb {C}_{\text {{s}}} \times \mathbb {C}_{\text {{r}}} \). The edges are collision-free paths in the same spaces, i.e., no collisions with the static geometry, or self-collisions among the arm or the shared DoFs.

  • A left arm \(\mathbb {P}_{\text {l}}(\mathbb {V}_{l}, \mathbb {E}_{l})\) and a right arm roadmap \(\mathbb {P}_{\text {r}}(\mathbb {V}_{r}, \mathbb {E}_{r})\), such that \(\mathbb {V}_{l} \subset \mathbb {C}_{\text {{l}}} \), and \(\mathbb {V}_{r} \subset \mathbb {C}_{\text {{r}}} \). These roadmaps do not consider the static geometry as they are not grounded by the shared DoFs. So, only self-collisions between arm links are avoided.

The method focuses on two tensor product roadmaps: \( \hat{\mathbb {G}}_l= \mathbb {R}_{ls} \times \mathbb {P}_r \), and \( \hat{\mathbb {G}}_r= \mathbb {R}_{sr} \times \mathbb {P}_l\). The method then simultaneously searches over \(\hat{\mathbb {G}}_l\) and \(\hat{\mathbb {G}}_r\) in a \( \mathtt{dRRT^*}\)-esque fashion.

7.2 Methodology

This section describes the proposed method, and the way the \( \mathtt da\hbox {-}dRRT^*\) builds a forest of two trees \(\mathbb {T}\), which explores both \(\hat{\mathbb {G}}_l\) and \(\hat{\mathbb {G}}_r\). In terms of the method’s properties it is sufficient to consider only one roadmap, but in practice, exploring them simultaneously helps in the convergence, since we can evaluate more possible solutions and rewires. The approach shows faster convergence compared to \( \mathtt RRT^{\text{* }}\) in \(\mathbb {C} \), and scales more than \( \mathtt{PRM^*} \).

At a high-level, the proposed Dual-arm \(\mathtt{dRRT^*}\) (\(\mathtt da\hbox {-}dRRT^*\)) simultaneously explores the tensor product roadmaps \(\hat{\mathbb {G}}_l\) and \(\hat{\mathbb {G}}_r\), by building a search tree for each one so as to find a solution from the start configuration S to the target configuration T. For every vertex, the algorithm keeps track from which tensor product roadmap the vertex belongs to. Upon initialization, the tree starts with two vertices, \(S_l\) and \(S_r\), one corresponding to tensor product roadmap \( \hat{\mathbb {G}}_l\) and the other to \( \hat{\mathbb {G}}_r\). Then, at every iteration, the tree data structure \(\mathbb {T}\) is expanded by adding a new edge and a node by calling an expand subroutine like Algorithm 7. The differences arises in the neighborhood calculation in Algorithm 7 Line 8. The neighborhood N for \( V^{new }\) considers the tensor roadmap neighborhoods that are part of the tree for both roadmaps. \( V^{new }\) belongs to to either \( \hat{\mathbb {G}}_l\) or \( \hat{\mathbb {G}}_r\). \( \hat{V^{new }} \) is chosen to be the nearest tree vertex that was generated on the other tensor roadmap. N is the set of all tree vertices that are tensor roadmap neighbors of \( V^{new }\) or \( \hat{V^{new }} \). While doing rewires, care is taken to only rewire nodes belonging to the same tensor roadmap. The consideration of a richer neighborhood lets the algorithm ensure adequate exploration of both tensor roadmaps. The informed oracle \( \mathbb {I}_d\) is similar to Algorithm 8 with the difference arising for the constituent roadmap \( \mathbb {P} \), where the \( \mathbb {H}\) estimate is simply the shortest Euclidean distance to the goal.

Notes on efficiency The difference of the decomposition for shared degrees of freedom compared to \( \mathtt{dRRT^*}\) is that \(\mathbb {G}= \mathbb {R} \times \mathbb {P} \) does not give two kinematically independent spaces. Specifically, \( \mathbb {P} \) depends on the shared \( \mathtt{DoF}\) to be grounded to the frame of the robot. This means that the heuristic \( \mathbb {H}\) is less informed for \( \mathbb {P} \) and can only use the straight line distance. The \(\mathtt{dRRT^*}\)algorithm does not work out of the box in the case of robots with shared degrees of freedom. The effect of the less expressive heuristic in \(\mathtt da\hbox {-}dRRT^*\), translates into some degradation in performance relative to the case of two kinematically independent robotic arms. Nevertheless, \(\mathtt da\hbox {-}dRRT^*\) is still significantly faster than operating directly in the composite space of the entire robot. There are not many methods that can practically compute solutions for such high-dimensional (e.g., 15 degrees of freedom) systems with kinematic dependences. The proposed \(\mathtt da\hbox {-}dRRT^*\) method preserves some of the scalability benefits of \(\mathtt da\hbox {-}dRRT^*\)and addresses the kinematic dependence that arises for many popular humanoid robots.

7.3 Analysis

Asymptotic optimality of tensor roadmaps Given the decomposition, \( \mathbb {C} \) is divided into two parts: \(\mathbb {R}_{\text {ls}}\) and \(\mathbb {P}_{\text {r}}\).

If a robust optimal path \( \varSigma ^* \) exists in \( \mathbb {C} \), most of the arguments of Sect. 6 still hold for this decomposition. Due to the nature of the space decomposition, since the constituent spaces do not correspond to kinematically independent robots, the clearance assumption in Claim 1 needs to be reworked.

Claim 4

Robustness in \(\mathbb {C} \) implies robustness in \(\mathbb {C}_{ls}\) and \(\mathbb {C}_r\). For every decomposition, \(\tau \in [0,1]\), and \(q_i \in \mathbb {C}^{o }_i(\tau )\), \(\Vert \sigma _i(\tau ) - q_i \Vert _2 \ge \delta \).

Proof

Consider any \(Q=(\varSigma _{ls}(\tau ), q_r)\), where \(q_r\) is a configuration in \(\mathbb {C}_r\) so that the right arm collides either with the static geometry or with the left-shared part of the robot, which is at \(\sigma _{ls}(\tau )\). Given a robust \(\varSigma \), Q is a colliding configuration: \(\delta \le \Vert \varSigma (\tau )-Q\Vert \). But Q and \(\varSigma (\tau )\) only differ in \(q_r\), so the path \(\sigma _r\) has clearance \(\delta \)

$$\begin{aligned} \delta \le || \sigma _r(\tau ) - q_r||. \end{aligned}$$

By switching the decomposition of \( \varSigma \) in \( \hat{\mathbb {G}}_l\) into \( (\sigma _{l}, \sigma _{sr}) \), by the above reasoning:

$$\begin{aligned} \delta \le || \varSigma (\tau ) - Q || \implies \delta&\le || \varSigma _l(\tau ) - q_l||\\ \text {Now, since~ }\forall \tau :\ \ \ || \sigma _{ls}(\tau ) - q_{ls}||&\ge || \sigma _l(\tau ) - q_l|| \\ \implies \delta&\le || \sigma _{ls}(\tau ) - q_{ls}||. \end{aligned}$$

This proves the robustness for \( \sigma _{ls} \). The same reasoning can be applied to \( \mathbb {C}_{sr} \) and \( \mathbb {C}_l \). \(\square \)

It suffices to follow the proof structures outlined in Sect. 6 to argue asymptotic optimality for the method. It should be noted that due to the coupled nature of \( \mathbb {C} \) introduced by the shared \( \mathtt{DoF}\), the use of the Euclidean cost metric is more applicable.

8 Experimental validation

This section provides an experimental evaluation of \(\mathtt{dRRT^*}\) by demonstrating practical convergence, scalability for disk robots, and applicability to dual-arm manipulation. The choice of a cost metric depends on the type of application and the underlying system properties. For systems without shared degrees of freedom, the considered cost function is the sum of individual Euclidean arc lengths, which is a popular choice for multi-robot systems. For systems with shared degrees of freedom, the combined nature of the underlying configuration space motivates the use of Euclidean arc length in the composite space as the metric. The results show that the properties and benefits of the proposed algorithms stay robust for both choices of cost functions.

Fig. 9
figure 9

The 2D environment where the 2 disk robots operate

8.1 Tests on systems without shared DoF

The approach and alternatives are executed on a cluster with Intel(R) Xeon(R) CPU E5-4650 @ 2.70 GHz processors, and 128 GB of RAM. The solution costs are evaluated in terms of the sum of Euclidean arc lengths.

2 disk robots among 2D polygons This base-case test involves 2 disks (\(\mathbb {C}_i := \mathbb {R}^2\)) of radius 0.2 with bounded velocity, in a \(10 \times 10\) region, inflated by the radius, as in Fig. 9. The disks have to swap positions between (0, 0) and (9, 9). This is a setup where it is possible to compute the explicit roadmap, which is not practical in more involved scenarios. In particular, \(\mathtt{dRRT^*}\) is tested against: (a) running \(\mathtt{A^{\text{* }}}\) on the implicit tensor roadmap \(\hat{\mathbb {G}}\) (referred to as “Implicit \(\mathtt A^{\text{* }}\)”), where \(\hat{\mathbb {G}}\) is defined over the same individual roadmaps with N nodes as those used by \(\mathtt{dRRT^*}\); (b) an explicitly constructed \( \mathtt{PRM^*} \) roadmap with \(N^2\) nodes in \(\mathbb {C}\); and (c) the \( \mathtt{ao\hbox {-} dRRT}\) variant of the algorithm.

Results are shown in Fig. 10. \(\mathtt{dRRT^*}\) converges to the optimal path over \(\hat{\mathbb {G}}\), similar to the one discovered by Implicit \(\mathtt A^{\text{* }}\), while quickly finding an initial solution of high quality. Furthermore, the implicit tensor product roadmap \(\hat{\mathbb {G}}\) is of comparable quality to the explicitly constructed roadmap. The convergence of \( \mathtt{dRRT^*}\) is faster compared to corresponding \( \mathtt{ao\hbox {-} dRRT}\) variant as evident from Fig. 10(left).

Table 1 presents running times. \(\mathtt{dRRT^*}\) and implicit \(\mathtt A^{\text{* }}\) construct 2 N-sized roadmaps (row 3), which are faster to construct than the \( \mathtt{PRM^*} \) roadmap in \(\mathbb {C}\) (row 1). \( \mathtt{PRM^*} \) becomes very costly as N increases. For \(N=500\), the explicit roadmap contains 250,000 vertices, taking 1.7 GB of RAM to store, which was the upper limit for the machine used. When the roadmap can be constructed, it is fast to query (row 2). \(\mathtt{dRRT^*}\) quickly returns an initial solution (row 5), at par with the solution times from the explicit roadmap and well before Implicit \(\mathtt A^{\text{* }}\) returns a solution (row 4). The initial solution times are compared visually in Fig. 10 which demonstrates the efficiency of \(\mathtt{dRRT^*}\) compared to \(\mathtt{ao\hbox {-} dRRT}\) as well. The next benchmark further emphasizes this point.

The comparison between the early solution time required to find a suboptimal solution by the proposed method against the computation time needed by the optimal \(\mathtt A^{\text{* }}\) highlights the impact of roadmaps of increasing sizes. While \(\mathtt{dRRT^*}\)’s initial solution times barely change, the time taken by any variant of heuristic search over the composite roadmap increases with the size of the roadmap. This indicates that roadmaps of size similar to the tensor roadmaps considered here would rapidly cease to be solvable without anytime performance similar to that of \(\mathtt{dRRT^*}\).

Fig. 10
figure 10

For every n, 10 randomly generated pairs of roadmaps are generated. \(\mathtt{dRRT^*}\) and \(\mathtt{ao\hbox {-} dRRT}\) ran on 5 random experiments for every roadmap pair, and the implicit \(\mathtt{A^{\text{* }}}\) searches these 10 tensor combinations. \( \mathtt{PRM^*} \) is run 10 times for every n. (Top): Average solution cost is reported over iterations. Data averaged over 10 roadmap pairs. \(\mathtt{dRRT^*}\) (solid circled line) and \(\mathtt{ao\hbox {-} dRRT}\) (solid triangled line) converges to the optimal path through \(\hat{\mathbb {G}}\) (dashed line). (Bottom): Initial solution times for the algorithms

Fig. 11
figure 11

Data averaged over 10 runs for \(R = 3\) to 10 robots. The data is reported for the algorithms \(\mathtt{dRRT^*}\), \(\mathtt{ao\hbox {-} dRRT}\), \(\mathtt{dRRT}\), and \(\mathtt RRT^{\text{* }}\). (Top inset): The success ratio shows the fraction of the runs that returned a solution. (Top): Relative solution cost of the algorithms for increasing R over 100,000 iterations. The horizontal green line at 1 denotes the best possible cost estimate, which is a combination of the individual robot shortest paths for each problem. All the other costs are represented as multiples of this estimate. \(\mathtt{dRRT}\) only reports a single solution, and is denoted by the inverted blue triangles. The other algorithms improve the solution over the iterations, represented by vertical bars between the average initial solution cost and average final reported solution cost. The numbers above the \(\mathtt{dRRT^*}\) bars represent the iteration number of the first solution for \(\mathtt{dRRT^*}\). (Bottom left): The average initial solution times for the algorithms. (Bottom right): The plot of the reported solution cost over time for the different algorithms. \(\mathtt{dRRT}\) only reports a single solution and is represented by the inverted triangles

Many disk robots among 2D polygons In the same environment as above, the number of robots R is increased to evaluate scalability. The same environment is maintained in this benchmark to introduce additional complexity purely in terms of the addition of more robots into the planning problem. The effect of more difficult and practical planning scenarios would be explored in the subsequent benchmarks with manipulators. Each robot starts on the perimeter of the environment and is tasked with reaching the opposite side. An \(N=50\) roadmap is constructed for every robot. It quickly becomes intractable to construct a \( \mathtt{PRM^*} \) roadmap in the composite space of many robots.

Figure 11 shows the inability of alternatives to compete with \(\mathtt{dRRT^*}\) in scalability. Solution costs are normalized by an optimistic estimate of the path cost for each case, which is the sum of the optimal solutions for each robot, disregarding robot-robot interactions. The colored vertical bars represent the range of the average initial and final solution costs. Implicit \(\mathtt A^{\text{* }}\) fails to return solutions even for 3 robots. Directly executing \(\mathtt RRT^{\text{* }}\) in the composite space fails to do so for \(R \ge 6\). The original \(\mathtt{dRRT}\) method (without the informed search component) starts suffering in success ratio for \(R \ge 4\) and returns worse quality solutions than \(\mathtt{dRRT^*}\). The \( \mathtt{ao\hbox {-} dRRT}\) variant performs similar to \( \mathtt{dRRT}\) in terms of success ratio but expectedly finds better solutions than \( \mathtt{dRRT}\). \( \mathtt{dRRT^*}\) finds solutions up to \( R=10 \).

Table 1 Construction and query times (s) for 2 disk robots

In order to give an estimate of the immensity of the size of the search space, for \( R=10 \), the tensor-product roadmap represents an implicit structure consisting of \(50^{10}\) or \(\mathtt {\sim }\,100\) million-billion vertices.

Dual-arm manipulator This test (Fig. 12) shows the benefits of \(\mathtt{dRRT^*}\) when planning for two 7-dimensional arms. Figure 13 shows that \(\mathtt RRT^{\text{* }}\) fails to return solutions within 100K iterations. Using small roadmaps is also insufficient for this problem. Both \(\mathtt{dRRT^*}\) and Implicit \(\mathtt A^{\text{* }}\) require larger roadmaps to begin succeeding. But with \(N \ge 500\), Implicit \(\mathtt A^{\text{* }}\) always fails, while \(\mathtt{dRRT^*}\) maintains a 100% success ratio. As expected, roadmaps of increasing size result in higher quality path. The informed nature of \(\mathtt{dRRT^*}\) also allows to find initial solutions fast, which together with the branch-and-bound primitive allows for good convergence. The initial solution times in Fig. 13 indicate that the heuristic guidance succeeds in finding fast initial solutions even for larger roadmaps.

8.2 Tests on systems with shared DoF

This section showcases three benchmarks of increasing difficulty, which are used to evaluate the performance of the \(\mathtt da\hbox {-}dRRT^*\). All the experiments were run on a cluster with Intel(R) Xeon(R) CPU E5-4650 @ 2.70 GHz processors, and 128 GB of RAM. In each benchmark, different sizes n of the constituent roadmaps \(\mathbb {R}_{\text {ls}}\) and \(\mathbb {R}_{\text {sr}}\) were evaluated. The \(\mathtt da\hbox {-}dRRT^*\)algorithm is compared against \(\mathtt RRT^{\text{* }}\)and \( \mathtt{PRM^*} \). The platforms used are \(\mathtt{Motoman}\)SDA10F, with a torsional DoF, and \(\mathtt{Baxter}\) on a mobile base that can rotate and translate. For the \( \mathtt{PRM^*} \) algorithm and all benchmarks, 20 randomly seeded roadmaps with 50,000 nodes are constructed in \(\mathbb {C} \) and data are gathered from 20 experiments. A 50,000 node roadmap has \(\approx 1\) million edges, and takes \(\approx 7\) h to construct in these high dimensional spaces. Larger roadmaps run into memory scalability issues. These roadmaps in the full space occupied \(\approx 50\) MB. In comparison, the space requirement for two arm roadmaps were \(<1\) MB.

For all benchmarks, both \(\mathtt RRT^{\text{* }}\) and \(\mathtt da\hbox {-}dRRT^*\)were allowed to run for 100,000 iterations. \(\mathtt RRT^{\text{* }}\) is ran in 20 different randomly seeded experiments for every benchmark. For the \(\mathtt da\hbox {-}dRRT^*\) algorithm, 20 experiments are run for every benchmark, for the different constituent roadmap sizes n, by building 4 pairs of randomly seeded constituent roadmaps, and running 5 randomly seeded experiments over each roadmap combination.

Fig. 12
figure 12

The start and target configuration for the dual-arm manipulator benchmark on a \(\mathtt{Motoman}\)SDA10F. \(\mathtt{dRRT^*}\) is run for a dual-arm manipulator to go from its home position (left) to a reaching configuration (right)

Fig. 13
figure 13

(Top): 5 random experiments are run for 4 random roadmap pairs for every n. \(\mathtt{dRRT^*}\) achieves perfect success ratio as n increases. (Middle): \(\mathtt{dRRT^*}\) solution quality over time. Here, larger roadmaps provide benefits in terms of running time and solution quality. (Bottom): Initial solution times for \(\mathtt{dRRT^*}\)

Motoman tabletop benchmark A set of 20 random collision-free starts and goals are selected in the tabletop environment, shown in Fig. 14.

They are only used if they are sufficiently far away from each other. \(\mathtt da\hbox {-}dRRT^*\)is tested with constituent roadmap sizes of 100, 250 and 500. All the algorithms succeed in every experiment. In this simpler problem, smaller roadmaps are quicker to search, and generate initial solutions faster compared to \(\mathtt RRT^{\text{* }}\), as shown in Fig. 15 (top).

Fig. 14
figure 14

The Motoman tabletop benchmark setup for \(\mathtt da\hbox {-}dRRT^*\) with randomly sampled start and goal configurations

Searching the \( \mathtt{PRM^*} \) is the fastest (online), but the solution quality is worse than that obtained from the other methods. \(\mathtt da\hbox {-}dRRT^*\) converges to better solutions, compared to the other algorithms, as shown in Fig. 15 (bottom).

Fig. 15
figure 15

Motoman Tabletop Benchmark: Top: The initial solution times are reported for every algorithm. Bottom: The average solution costs over time are reported

Motoman shelf benchmark This benchmark sets up the \(\mathtt{Motoman}\) in front of 3 shelves. The robot has to plan between two states where both arms are inside different shelving units, which require the rotation of its torso (Fig. 16 (top)).

This is a significantly harder problem, and \(\mathtt RRT^{\text{* }}\) suffers in terms of success ratio (Fig. 16 (second)). \(\mathtt RRT^{\text{* }}\) takes much longer to find the initial solution, as indicated by Fig. 16 (middle). \( \mathtt{PRM^*} \) is still the fastest in finding solutions (only online cost considered again here). The \(\mathtt da\hbox {-}dRRT^*\)solution cost is much better than both the average \( \mathtt{PRM^*} \) solution, and \(\mathtt RRT^{\text{* }}\), as shown in Fig. 16 (bottom). \(\mathtt da\hbox {-}dRRT^*\) will quickly converge for smaller roadmaps, and then stop improving the cost. The larger roadmaps contain better solutions, causing \(\mathtt da\hbox {-}dRRT^*\)to converge slower.

Fig. 16
figure 16

Motoman Shelf Benchmark: Top: The setup of the benchmark. Second: Success ratios of the algorithms are shown over time. Middle: The initial solution times are reported for every algorithm. Bottom: The average solution costs over time are reported. The dashed horizontal line denotes the average solution cost discovered by \( \mathtt{PRM^*} \). The shaded regions represent the corresponding algorithm’s standard deviation of cost

Mobile Baxter benchmark This benchmark uses a Rethink\(\mathtt{Baxter}\) robot with a mobile base. The robot is grasping two long objects inside a shelf Fig. 17 (top). The robot has to navigate across a cramped, walled room, to a placing configuration inside a shelf on the other side of the room.

This proves to be the most challenging problem among the three benchmarks. As shown in Fig. 17 (middle), \(\mathtt RRT^{\text{* }}\)fails to find a solution. It should be noted that, when tested on a simpler version of the benchmark without the pillar in the room, \(\mathtt RRT^{\text{* }}\)could find solutions. \( \mathtt{PRM^*} \) also falters by showing a very low success rate. This indicates that we need even larger roadmaps in \(\mathbb {C} \) to solve harder problems. The problem is solved when a dense implicit structure, with \(n=1000\) is explored by \(\mathtt da\hbox {-}dRRT^*\).

Figure 17 (bottom) shows that \(\mathtt da\hbox {-}dRRT^*\) finds better initial and converged solutions when compared to the instances in which \( \mathtt{PRM^*} \) succeeded.

Fig. 17
figure 17

Mobile Baxter Benchmark: Top: The setup of the benchmark. Middle: Success ratios of the algorithms are shown over time. Bottom: The average solution costs over time are reported. The dashed horizontal line denotes the average solution cost discovered by \( \mathtt{PRM^*} \)

Fig. 18
figure 18

Real world experiments were performed in a 28 dimensional space with 2 dual-armed manipulators planning their motion to a goal configuration corresponding to pre-grasping states for 4 objects resting on a table in the shared workspace. The sequence corresponds to freeze-frames starting in sequence from the top-left, and progressing along each row till the bottom-right

Fig. 19
figure 19

Real world experiments were performed in a 28 dimensional space with 2 dual-armed manipulators planning their motion from a start state to an approach state close to a pole positioned in the center of a tightly coupled shared workspace. The arms then swap positions on the pole and return back to the start state. The goals correspond to the last images in each row. The sequence corresponds to freeze-frames starting in sequence from the top-left, and progressing along each row till the bottom-right

8.3 Real world experiments

Experiments were performed in a 28-dimensional space with two dual-armed manipulators: (a \(\mathtt{Motoman}\)SDA10f and a \(\mathtt{Baxter}\)). Initial solutions were obtained in a fraction of a second for the two experimental setups, with the method allowed to run for 1000 iterations to improve the quality of the demonstrated trajectories. The two setups are chosen carefully to demonstrate in the first instance a typical application of simultaneous grasping that may arise in real world scenarios, and in the second instance a problem that forces very close interactions between the arms in close proximity.

Pre-grasp demonstration As shown in Fig. 18, the demonstration simulates an application to multi-arm manipulation, where the goals of the motion planning problem for 4 arms is to pre-grasping configurations for 4 objects placed on a table in the shared workspace between the robots. 1000 node roadmaps were constructed for each arm and \(\mathtt{dRRT^*}\) was used to search for a solution to the motion planning problem. The solution was computed offline and an open-loop execution was performed on the real system.

Coupled workspace demonstration As shown in Fig. 19, a pole is positioned between the two robots so that the arms cannot cross over. The objective is for the 4 arms to (a) approach the pole at alternating heights, (b) then swap the height of their approaching configurations, and (c) finally return back to the start state. 1000 node roadmaps were constructed for each arm and \(\mathtt{dRRT^*}\) was used to search for a solution to the three motion planning problems. The solutions that were computed offline, were stitched together and replayed in an open loop execution on the real system.

9 Discussion

This work proves asymptotic optimality of sampling-based multi-robot planning over implicit structures by extending the \(\mathtt{dRRT}\) approach. Asymptotic optimality is achieved by making a modification, resulting in \(\mathtt{ao\hbox {-} dRRT}\) which expands a spanning tree over an implicitly defined tensor product roadmap, and leverages a simple re-wiring scheme.

This method already has the advantage of avoiding the construction of a large, dense roadmap in the composite configuration space of many robots. This can be further improved to use heuristics so as to search in an informed manner, in the \(\mathtt{dRRT^*}\) method. The method is also extended to work with robot systems, which share degrees of freedom, resulting in \(\mathtt da\hbox {-}dRRT^*\).

Experimental results show the efficacy of the proposed approaches. Furthermore, by leveraging heuristics, \(\mathtt{dRRT^*}\) is able to solve more challenging problem instances than the baseline \(\mathtt{ao\hbox {-} dRRT}\) method, and the approach is demonstrated to solve complex, real-world problems with robot manipulators operating in a shared workspace with a high degree of coupling.

In terms of practical applicability, \(\mathtt{dRRT^*}\) promises fast initial solutions times (Figs. 11, 13) on the order of a fraction of a second for most problems, including for high-dimensional, kinematically independent multi-robot problems, which is an exciting result. The solution quality improvement indicates the anytime properties of the approach, where paths of improved path quality are discovered as more computation time is invested. While problems with shared degrees of freedom provide less guidance and result in performance degradation, the scalability benefits remain even in this case relative to composite planning. Future work includes the consideration of dynamics. The existing theoretical analysis of \(\mathtt{dRRT^*}\) assumes that the individual robot systems are holonomic, which guarantees the existence of near-optimal single-robot paths (see Lemma 1 and (Janson et al. 2015, Theorem 4.1)). Recent results concerning asymptotic optimality of PRM for non-holonomic systems (see Schmerling et al. (2015a, b)) bring the hope of achieving a more general analysis of the current work as well. The proposed framework can also be leveraged toward efficiently solving simultaneous task and motion planning for many robot manipulators (Dobson and Bekris 2015). The demonstrated applications to manipulators also motivate dual-arm rearrangement challenges (Shome et al. 2018).