Monte-Carlo Search for Prize-Collecting Robot Motion Planning with Time Windows, Capacities, Pickups, and Deliveries

Edelkamp, Stefan; Plaku, Erion; Warsame, Yassin

doi:10.1007/978-3-030-30179-8_13

Stefan Edelkamp¹⁰,
Erion Plaku¹¹ &
Yassin Warsame¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11793))

Included in the following conference series:

Joint German/Austrian Conference on Artificial Intelligence (Künstliche Intelligenz)

1294 Accesses
5 Citations

Abstract

Logistics operations often require a robot to pickup and deliver objects from multiple locations within certain time frames. This is a challenging task-and-motion planning problem as it intertwines logical and temporal constraints about the operations with geometric and differential constraints related to obstacle avoidance and robot dynamics. To address these challenges, this paper couples vehicle routing over a discrete abstraction with sampling-based motion planning. On the one hand, vehicle routing provides plans to effectively guide sampling-based motion planning as it explores the vast space of feasible motions. On the other hand, motion planning provides feasibility estimates which vehicle routing uses to refine its plans. This coupling makes it possible to extend the state-of-the-art in multi-goal motion planning by also incorporating capacities, pickups, and deliveries in addition to time windows. When not all pickups and deliveries can be completed in time, the approach seeks to minimize the violations and maximize the profit.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Multi-robot Multi-goal Motion Planning with Time and Resources

Motion Planning

Solving the Physical Vehicle Routing Problem for Improved Multi-robot Freespace Navigation

1 Introduction

We are seeing nowadays an emergent need to enhance the capabilities of mobile robots deployed in factories, warehouses, and other logistics operations. In these settings, robots are often required to pickup and deliver objects from multiple locations within certain time windows. This gives rise to challenging task-and-motion planning problems as it intertwines logical and temporal constraints about the operations with geometric and differential constraints related to obstacle avoidance and robot dynamics. Robot dynamics, which are often high-dimensional and nonlinear, constrain the feasible motions, making it difficult to generate trajectories that enable the robot to complete its tasks. Moreover, due to the limited capacity, a robot may have to deliver some objects before picking up others, causing the robot to travel longer, which could violate temporal constraints. When the problem is oversubscribed, the robot also has to decide which tasks to abandon, while seeking to complete as many of the remaining tasks as possible.

To address these challenges, we build upon the burgeoning body of research on task-and-motion planning. In a discrete setting, which ignores the obstacles and robot dynamics, there has been significant progress in solving challenging vehicle routing problems that often include time windows, pickups, deliveries, capacities, or premium service requests [1, 2, 4, 7, 10, 14]. In a continuous setting, sampling-based motion planning has shown great success in dealing with constraints imposed by collision avoidance and robot dynamics [3, 5, 11, 12, 15]. Sampling-based motion planning has even been coupled with TSP solvers to effectively solve multi-goal motion-planning problems [16]. Temporal reasoning has also been integrated to allow for time windows [6].

We advance research on task-and-motion planning by developing an effective approach that solves significantly more complex problems. While related work can solve multi-goal motion planning with time windows [6], this paper develops the first effective approach that also incorporates pickups, deliveries, and capacities. Another contribution is that our approach is able to deal with oversubscription by generating collision-free and dynamically-feasible trajectories that enable the robot to pickup and deliver as many objects as possible within the specified time windows. This is made possible by first developing an effective discrete solver based on Monte-Carlo search to quickly generate low-cost tours. Central to our approach is an effective coupling of vehicle-routing solvers in a discrete setting with sampling-based motion planning in a continuous setting that accounts for obstacles and robot dynamics. On the one hand, vehicle routing provides low-cost tours to effectively guide sampling-based motion planning as it explores the vast space of feasible motions. On the other hand, motion planning provides feasibility estimates which vehicle routing uses to refine its plans. To deal with oversubscription, our approach associates costs with the travel and benefits with fulfilling pickups and deliveries, seeking to minimize the temporal violations and maximize the net benefit. Experiments with nonlinear vehicle models show the efficiency and scalability of the approach as we increase the complexity of the operations.

2 Problem Formulation

Robot dynamics are expressed as differential equations

$$\begin{aligned} \dot{s} \leftarrow f(s, a), s \in \mathcal {S}, a \in \mathcal {A}, \end{aligned}$$

(1)

where $\mathcal {S}$ and $\mathcal {A}$ denote the state space and action space, respectively. As an example, the dynamics of the car model (Fig. 2) used in the experiments are defined as

$$\begin{aligned} \dot{x} = v \cos (\theta ) \cos (\psi ), \dot{y} = v \sin (\theta ) \cos (\psi ), \end{aligned}$$

(2)

$$\begin{aligned} \dot{\theta } = v \sin (\psi ), \dot{v} = a_{{\mathrm {acc}}}, \dot{\psi } = a_\omega , \end{aligned}$$

(3)

where the state $(x, y, \theta , \psi , v) \in \mathcal {S}$ defines the position, orientation, steering angle, and velocity; the action $(a_{{\mathrm {acc}}}, a_\omega ) \in \mathcal {A}$ defines the acceleration and steering rate. As another example, a snake model (Fig. 2) can be obtained by attaching several trailers to the car and augmenting f as

$$\begin{aligned} \dot{\theta _i} = ({v}/{H})(\sin (\theta _{i-1})-\sin (\theta _0)) \mathop {\prod }\nolimits _{j=1}^{i-1} \cos (\theta _{j-1}-\theta _j), \end{aligned}$$

(4)

where $\theta _0 = \theta $, N is the number of trailers, $\theta _i$ is the orientation of the i-th trailer, and H is the hitch distance [11].

When applying an action $a \in \mathcal {A}$ to a state $s \in \mathcal {S}$, the robot changes state according to its dynamics. The new state $s_{{\mathrm {new}}} \in \mathcal {S}$ is computed by a function

$$\begin{aligned} s_{{\mathrm {new}}} \leftarrow \small {\textsc {simulate}}(s, a, f, dt), \end{aligned}$$

(5)

which numerically integrates $f$ for one time step dt. Applying a sequence of actions ${\langle {a_1, \ldots , a_{i-1}} \rangle }$ gives rise to a dynamically-feasible motion trajectory $\zeta : \{1, \ldots , i\} \rightarrow \mathcal {S}$, where $\zeta (1) \leftarrow s$ and $\forall j \in \{2, \ldots , i\}$:

$$\begin{aligned} \zeta (j) \leftarrow \small {\textsc {simulate}}(\zeta (j-1), a_{j-1}, f, dt). \end{aligned}$$

(6)

Multi-Goal Motion Planning with Time Windows, Pickups, Deliveries, and Capacities: The world contains obstacles, pickup regions $\{\mathcal {R}_1, \ldots ,\mathcal {R}_k\}$, and delivery regions $\{\mathcal {R}_{k+1}, \ldots , \mathcal {R}_{n=2k}\}$. Each $\mathcal {R}_i \in \mathcal {R}= \{\mathcal {R}_1, \ldots , \mathcal {R}_n\}$ is associated with a load $\ell _i \in \mathbb {Z}$ and a time interval $[t_i^{{\mathrm {start}}}, t_i^{{\mathrm {end}}}]$ during which the pickup or delivery can be made. For each pickup-delivery pair ${\langle {\mathcal {R}_i, \mathcal {R}_{i+\lfloor \frac{n}{2}\rfloor }} \rangle }$, it holds that $\ell _i = -\ell _{i + \lfloor {\frac{n}{2}}\rfloor } \ge 0$. The robot has a maximum capacity $C_{{\mathrm {max}}} \in {\mathbb {N}}$.

Let $\mathcal {G}= \{\mathcal {G}_1, \ldots , \mathcal {G}_n\}$ denote the goals, where $\mathcal {G}_i = {\langle {\mathcal {R}_i, [t_i^{{\mathrm {start}}}, t_i^{{\mathrm {end}}}], \ell _i} \rangle }$. Let $\small {\textsc {goals}}(\zeta )$ denote the goals completed by the robot as it moves according to a trajectory $\zeta $. A pickup goal $\mathcal {G}_i$ is completed only when the robot reaches $\mathcal {G}_i$ within its time window $[t_i^{{\mathrm {start}}}, t_i^{{\mathrm {end}}}]$ and has sufficient remaining capacity to add the load $\ell _i$. The robot makes a delivery only when the region is reached within its time window and the corresponding load has already been picked up.

Given an initial state $s_{{\mathrm {init}}} \in \mathcal {S}$, the objective is then to compute a collision-free and dynamically-feasible trajectory $\zeta $ that enables the robot to complete all the goals, i.e., $\small {\textsc {goals}}(\zeta ) = \mathcal {G}$. When this is not possible, the objective is to reduce the number of missed goals $|\mathcal {G}\setminus \small {\textsc {goals}}(\zeta )|$.

3 Discrete Problem Solving

As mentioned, our approach relies on vehicle routing over a discrete abstraction to guide sampling-based motion planning. The discrete solver is required to solve a variant of TSP that includes prizes, time windows, and capacity constraints for pickup and delivery tasks. The discrete solver is invoked numerous times by the approach. While the initial call requires the discrete solver to find a solution involving all pickups and deliveries, subsequent calls involve only a subset of the original problem as some goals may have already been reached. This makes the problem more challenging since the discrete solver has to deal with partial problems where some orders have already been completed and some items have already been picked up and are still on the vehicle.

We now define the discrete problem and then describe a specialized solver based on Monte-Carlo search and a general PDDL3 planner. Section 4 describes how the discrete solvers are integrated within the overall framework.

Definition 1

(Prize-Collecting TSP with Time Windows, Pickups, and Deliveries, and Capacity Constraints) Given

a graph $G = (V, E, T)$ with $V = \{v_{{\mathrm {start}}}, g_1, \ldots , g_m\}$, $E = V \times V$, and times $T = \{t_{(v',v'')} : (v', v'') \in E\}$
goals $\{{\langle {g_1, t_1^{{\mathrm {start}}}, t_1^{{\mathrm {end}}}, \ell _1, p_1} \rangle }, \ldots , {\langle {g_m, t_m^{{\mathrm {start}}}, t_m^{{\mathrm {end}}}, \ell _m, p_m} \rangle }\}$
- $[t_i^{{\mathrm {start}}}, t_i^{{\mathrm {end}}}]$ is the time interval to visit $g_i$
- $\ell _i \in \mathbb {Z}$ represents the load associated with $g_i$
- $p_i \in {\mathbb {R}}$ is the profit gained by visiting $g_i$
orders $O = \{{\langle {p_1,d_1} \rangle }, \ldots ,{\langle {p_k,d_k} \rangle }\}$ of pickup $P = \{p_1,\ldots , p_k\}$ and delivery $D = \{d_1,\ldots , d_k\}$ tasks
- $P \cup D = \{g_1,\ldots ,g_n\}$ (mutually disjoint)
- if $p_i$ is empty (denoted by $p_\bot $), then ${\langle {p_i, d_i} \rangle }$ is a delivery-only order
- otherwise, the load for $p_i$ must be positive and match the negative value of the load for $d_i$
maximum capacity $C_{{\mathrm {max}}}$ that exceeds the preload, i.e., $C_{{\mathrm {max}}} \ge C_{{\mathrm {pre}}} = -\sum _{p_i \in P, p_i = p_\bot } {{\mathrm {load}}}(d_i)$

compute a path $\sigma = \sigma _1,\ldots ,\sigma _k$ over $G = (V, E, T)$ and start times ${\langle {t_1, \ldots , t_{|\sigma |}} \rangle }$ such that (i) $\sigma $ starts at $v_{{\mathrm {start}}}$; (ii) each goal $\sigma _i$ is reached within its time window; (iii) pickups are visited before corresponding deliveries; (iv) capacity is respected; and (v) maximize the accrued net benefit $\small {\textsc {payoff}}(\sigma ) - a \cdot \small {\textsc {travelcost}}(\sigma ) + b \cdot \small {\textsc {goalsvisited}}(\sigma ) $ where a and b are user-defined constants.

This objective function allows for oversubscription, making it possible to generate partial tours. We have implemented a solver based on Monte-Carlo search and another on PDDL3.

Monte-Carlo Solver. Monte-Carlo search [19] is a randomized reinforcement learning strategy that utilizes random rollouts. Nested rollout policy adaptation (NRPA) [17] was introduced as a way to use gradient ascent rather than navigating the tree directly. The algorithm starts with nested level L and performs a sequential call to $L-1$, $L-2$, and if they reach the lowest recursion level, the rollout occurs, which could find a better solution.

Pseudocode for the rollout function developed for our discrete problem is shown in Algorithm 1. Rollout incrementally constructs the tour, penalizing for capacity constraint violations, and rewarding for making deliveries. To handle the fact the vehicle has to wait when it arrives early at a goal, we use two different variables, one for makespan and one for distance.

An important principle of NRPA is to bias rollouts through learning by weighting possible actions. The weights for the actions are updated in each step of the algorithm to prioritize the movements of the best routing sequence found. In addition, each simulation state is coded differently to prevent the policy of the simulation state from affecting the selection of subsequent simulation state steps. The choices of the simulation steps are random, but the probability is not the same in each step. Policy adaptation adds value to the action of the best sequence and decreases the weight of other actions.

PDDL3 Planner. Preferences are soft constraints on logical expressions. In PDDL3 [8], we compute the quality of a plan based on how many preferences are satisfied. We transform the routing problem of Definition 1 into a PDDL3 planning domain and task description, and call a domain-independent planner to generate a feasible plan. We use OPTIC [2], which allows the handling of preferences and uses CPLEX as the backend MILP solver.

Two main PDDL3 actions are shown in Fig. 3. For pickup and delivery tasks, we split the durative action execute task into one for pickups and one for the delivery, and one that executes a delivery-only task, based on some preload value.

4 Overall Approach

Our approach enhances the framework for integrating temporal reasoning into motion planning [6]. The main components are as follows: (i) construct a roadmap to provide a suitable discrete abstraction, (ii) expand a tree in the vast state space of feasible motions and partition the motion tree into equivalence classes, and (iii) use discrete solvers to guide the expansion of the equivalence classes.

We make significant technical contributions to enhance these components, namely: (a) develop an efficient discrete solver based on Monte-Carlo search, (b) develop effective notions of equivalence classes that account for pickups, deliveries, and capacities, and (c) enable guidance of the motion-tree expansion by partial tours to handle oversubscription.

Pseudocode is shown in Algorithm 2. The rest of the section describes the components and their interplay in more detail.

Discrete Abstraction. As in [6], the discrete abstraction is obtained by constructing a roadmap $RM$ to capture the connectivity of the environment. As in PRM [9], the roadmap is obtained by sampling collision-free configurations and connecting neighboring configurations. This process continues until all the regions in $\mathcal {R}$ are connected. Figure 4 shows some examples.

After constructing the roadmap, Dijkstra’s shortest-path algorithm is invoked from each region $\mathcal {R}_i$ to compute minimum cost paths to every roadmap vertex. As an enhancement to [6], in order to generate paths that are not so close to the obstacles, we add a clearance component to the cost of each roadmap edge, i.e., $\small {\textsc {cost}}(q_i, q_j) = ||q_i-q_j|| / \min (\small {\textsc {clear}}(q_i), \small {\textsc {clear}}(q_j))$, where $\small {\textsc {clear}}(q_i)$ denotes the separation from the obstacles. These paths will be used by the discrete solver when computing tours.

Motion Tree and Equivalence Classes. A motion tree $\mathcal {T}$ is incrementally expanded by adding new vertices and edges. Each vertex v is associated with a collision-free state, denoted by v.s. Each edge $(v_i, v_j)$ is labeled with a control action a and represents a collision-free and dynamically-feasible motion from $v_i.s$ to $v_j.s$, i.e., $v_j.s \leftarrow \small {\textsc {simulate}}(v_i.s, a, f, dt)$.

Let $\zeta _\mathcal {T}(v)$ denote the trajectory from the root of $\mathcal {T}$ to the vertex v. If $\small {\textsc {goals}}(\zeta _\mathcal {T}(v)) = \mathcal {G}$, then $\zeta _\mathcal {T}(v)$ constitutes a solution since all goals would have been completed. As $\mathcal {T}$ is expanded, our approach keeps track of the vertex v that minimizes $|\mathcal {G}\setminus \small {\textsc {goals}}(\zeta _\mathcal {T}(v))|$, using the time duration of $\zeta _\mathcal {T}(v)$ to break ties. The tree expansion continues until a solution is found, a runtime limit is reached, or a convergence criterion is met (no improvement in best v after numerous iterations). At the end, the approach returns the best trajectory $\zeta _\mathcal {T}(v)$.

How should $\mathcal {T}$ be expanded from v? Since $\zeta _\mathcal {T}(v)$ has already completed $\small {\textsc {goals}}(\zeta _\mathcal {T}(v))$, the objective is to complete as many of the remaining goals as possible. This is where the discrete solver comes into play as it can provide tours to visit the remaining goals, while accounting for time windows, pickups, deliveries, and capacity constraints. Since the discrete solver operates over the roadmap, we must also map v.s to the roadmap, e.g., by selecting the nearest roadmap configuration to v.s. Let then v.t denote the time duration of $\zeta _\mathcal {T}(v)$, $v.\mathrm {rem}$ the remaining goals, i.e., $v.\mathrm {rem} = \mathcal {G}\setminus \small {\textsc {goals}}(\zeta _\mathcal {T}(v))$, and v.q the mapping from v.s to a roadmap configuration. The discrete solver can then be invoked to provide a tour over the roadmap that starts from v.q at time v.t and completes as many of the remaining goals $v.\mathrm {rem}$ as possible.

It is infeasible, however, to call the discrete solver for every vertex since $\mathcal {T}$ can have tens of thousands of vertices. For this reason, $\mathcal {T}$ is partitioned into equivalence classes, which group together vertices that provide the same discrete information. A new vertex $v_{{\mathrm {new}}}$ is added to an equivalence class $\mathcal {X}_v$ only if it has the same remaining goals (hence, the same remaining capacity, pickups, and deliveries), maps to the same roadmap configuration, and the tour associated with v (denoted by $v.\sigma $) is compatible with the start time of $v_{{\mathrm {new}}}$, i.e.,

$$\begin{aligned} \mathcal {X}_v = \{v_{{\mathrm {new}}} \in \mathcal {T}: v.q = v_{{\mathrm {new}}}.q \wedge v.{{\mathrm {rem}}} = v_{{\mathrm {new}}}.{{\mathrm {rem}}} \wedge \\ \small {\textsc {CompatibleTour}}(v.\sigma , v_{{\mathrm {new}}}.t)\} \end{aligned}$$

When a new vertex $v_{{\mathrm {new}}}$ is added to $\mathcal {T}$, it is also checked if it can be added to an existing equivalence class. If not, a new equivalence class $\mathcal {X}_{v_{{\mathrm {new}}}}$ is created. At this point, the discrete planner is invoked to compute the tour associated with $\mathcal {X}_{v_{{\mathrm {new}}}}$.

Putting it All Together: Guided Search

As shown in Algorithm 2, the overall approach starts by constructing a roadmap $RM$, computing shortest paths in $RM$, and then initializing the motion tree $\mathcal {T}$ and the equivalence classes $\mathcal {X}$. The core loop is driven by two procedures: (i) selecting an equivalence class $\mathcal {X}_v$ and (ii) expanding the motion tree $\mathcal {T}$ along the temporal plan $\mathcal {X}_v.\sigma $. The equivalence classes are continually updated each time a vertex is added to $\mathcal {T}$. These procedures are invoked repeatedly until a solution is found, a runtime limit is reached, or the convergence criterion is met.

Selecting an Equivalence Class. A weight is defined for each equivalence class $\mathcal {X}_v \in \mathcal {X}$ as

$$\begin{aligned} w(\mathcal {X}_v) = \frac{\alpha ^{\small {\textsc {NrSelections}}(\mathcal {X}_v)}}{\beta ^{\small {\textsc {duration}}(\mathcal {X}_{v}.\sigma )} + \gamma ^{|\mathcal {X}_v.\sigma |} + \delta ^{|\mathcal {X}_v.{{\mathrm {rem}}}| - |\mathcal {X}_v.\sigma |}}. \end{aligned}$$

(7)

and the equivalence class with maximum weight is then selected for expansion. The term $\beta ^{\small {\textsc {duration}}(\mathcal {X}_{v}.\sigma )}$, $\beta > 1$, increases the weight when the temporal plan $\mathcal {X}_v.\sigma $ is short. The term $\gamma ^{|\mathcal {X}_v.\sigma |}$, $\gamma > 1$, increases the weight when the tour has only a few goals. The term $\delta ^{|\mathcal {X}_v.{{\mathrm {rem}}}| - |\mathcal {X}_v.\sigma |}$, $\delta > 1$, is essential to deal with oversubscription (which was not considered in [6]). Note that $\mathcal {X}_v.{{\mathrm {rem}}}$ denotes the goals that have to be reached; so $\mathcal {X}_v.\sigma $ is a partial or complete tour over these goals. The term essentially penalizes temporal plans that miss goals. We recommend setting $\delta \gg \gamma \gg \beta $ so that the weight is dominated by how many goals a temporal plan missed, followed next by the number of goals, and lastly by the plan duration. The term $\alpha ^{\small {\textsc {NrSelections}}(\mathcal {X}_v)}$, $0< \alpha < 1$, decreases the weight over repeated selections. This is to avoid selecting $\mathcal {X}_v$ when repeated expansions from $\mathcal {X}_v$ fail due to constraints imposed by the obstacles and the robot dynamics.

Expanding the Equivalence Class. After selecting $\mathcal {X}_v$, the approach seeks to expand $\mathcal {T}$ to complete the pickup and delivery goals of the temporal plan $\mathcal {X}_v.\sigma $. To reach the first goal $\mathcal {G}_1$ in $\mathcal {X}.\sigma $, the approach retrieves the minimum cost path ${\langle {q_1,\ldots , q_k} \rangle }$ in the roadmap from v.q to $\mathcal {G}_1$ and attempts to expand $\mathcal {T}$ along this path. A proportional-derivative-integrative controller [18] is used to expand $\mathcal {T}$ toward ${\langle {q_1,\ldots , q_k} \rangle }$ in succession by turning and then moving the vehicle toward each target. Intermediate collision-free states are added to $\mathcal {T}$ as new vertices. The expansion stops when a collision occurs or $\mathcal {G}_1$ is reached. Once the expansion stops, the approach updates the equivalence classes and their weights to account for the new vertices added to $\mathcal {T}$ and goes back to the core loop to select perhaps another equivalence class. When the approach has difficulty expanding $\mathcal {T}$, it penalizes the edges so that in the next iteration the expansion could be attempted along a different path. This allows the approach to discover alternative paths to reach each goal.

5 Experiments and Results

Experiments are conducted using challenging logistics scenarios, where a vehicle with nonlinear dynamics (car or snake) has to navigate in obstacle-rich environments (Figs. 1 and 4) to pickup and deliver items from numerous locations within specified time windows. The efficiency and scalability of the approach is evaluated by increasing the number of goals, tightening the time windows, and reducing the vehicle capacity.

Discrete Solvers. We use Monte-Carlo as the specialized solver and Optic [2] as the general temporal planner. We also use a Random solver as a baseline for small problem instances. Random iterates in a random order over all the possible permutations with n goals (then with $n-1, \ldots , 1$), stopping as soon as it finds a valid tour.

Benchmark Instances. We extended the benchmark generator in [6] to incorporate loads and capacity constraints. Specifically, 30 instances were generated for each combination of a scene S and number of goals n, denoted by $\mathcal {I}_{\langle {S, n} \rangle }$. Each instance is generated by randomly placing the goals and the vehicle until there are no collisions. Each goal $\mathcal {G}_i$ is assigned a random load (from 1 to 5) and a random assignment is then used to create pickup-delivery pairs. Time bounds are generated by first computing a random tour with the constraint that a pickup must appear before its corresponding delivery. The time window for goal $\mathcal {G}_i$ is set to $[(1 - \epsilon ) t_i, (1 + \epsilon ) t_i]$, where $t_i$ is the time to reach $\mathcal {G}_i$ in the tour when traveling along the shortest paths in the roadmap, using the expected velocity to convert distances to time. The parameter $\epsilon $ is used to tighten the time window (default is 0.2).

The vehicle capacity, by default, is set to $100\%$ of the sum of the loads. Even in such cases, it may be difficult or even impossible to reach all the goals due to constraints imposed by the time windows, obstacles, and dynamics. Experiments are also conducted with reduced capacity.

For each combination of parameters, the planner is run over all instances. Results report mean runtime, distance traveled, time duration, and goals missed. The mean is computed after dropping runs that are below the first quartile or above the third quartile to avoid the influence of outliers. Runtime measures everything from reading the input until finding a solution or reaching the runtime limit (set to 40 s per run).

Results: Increasing the Number of Goals. Figure 5 shows the efficiency and scalability of the approach as the number of goals is increased from 6 to 30. Even when coupled with Random, the approach is still able to quickly solve problems with up to 10 goals. The approach has similar performance when using Optic, which provides generality but is not as fast as specialized solvers. Indeed, the approach achieves the best results when coupled with the Monte-Carlo discrete solver, scaling up to 30 goals in a matter of a few seconds.

Figure 6 shows results on the quality of the solution trajectories in terms of the time duration, distance traveled, and goals missed. The results are normalized with respect to the tour obtained from the initial location (tour associated with $\mathcal {X}_{v_{{\mathrm {init}}}}$). The results show that our approach produces high-quality solutions. Note that in some cases our approach is able to find even shorter solutions than those associated with the initial tour. This could be due to the vehicle traveling faster than the expected velocity used in the tour calculations or taking smoother turns when following tour edges.

Results: Adjusting the Time Windows. Figure 7 (left) shows the results when varying $\epsilon \in \{0.05, 0.1, 0.2, 0.3, 0.4\}$ to adjust the time window $[(1 - \epsilon ) t_i, (1 + \epsilon ) t_i]$ for each goal $\mathcal {G}_i$. Note that the approach is quite efficient even for small values. As expected, when $\epsilon $ is too small, e.g., $\epsilon = 0.05$, it becomes quite difficult to find solutions, causing the approach to miss some goals (which is also why the reported solutions are shorter).

Results: Adjusting the Vehicle Capacity. Figure 7 (right) summarizes the results when varying the vehicle capacity from 100% of the sum of the loads down to 40%. The results show that the approach is quite efficient even when the vehicle capacity is considerably reduced. When the capacity becomes too small, the problem becomes significantly more difficult, which in some cases causes our approach to miss some goals.

Results: Runtime Distribution. Figure 8 summarizes the runtime distribution. As the problems increase in complexity so does the time taken by the discrete solver. Recall that the discrete solver is invoked numerous times during each run to compute tours for each equivalence class that is created during the motion-tree expansion. The discrete solver enables our approach to focus the expansion along promising directions, thus saving valuable computational time.

6 Discussion

We have developed an effective and scalable approach to solve challenging multi-goal motion-planning problems with time windows, pickups, deliveries, and capacity constraints. A crucial aspect of our approach is the use of a discrete solver over a suitable abstraction to effectively guide sampling-based motion planning. We expect our framework to benefit even more from advances in discrete planning. A natural but involved research direction is minimizing energy consumption, e.g., by adding recharging stations. Another direction is to enable more expressive PDDL tasks. We also expect to progress this work towards cooperative motion planning of several robots [13].

References

Abdo, A., Edelkamp, S., Lawo, M.: Nested rollout policy adaptation for optimizing vehicle selection in complex VRPs. In: IEEE Conference on Local Computer Networks LCN, pp. 213–221 (2016)
Google Scholar
Benton, J., Coles, A.J., Coles, A.: Temporal planning with preferences and time-dependent continuous costs. In: ICAPS, vol. 77, pp. 2–10 (2012)
Google Scholar
Choset, H., et al.: Principles of Robot Motion: Theory, Algorithms, and Implementations. MIT Press, Cambridge (2005)
MATH Google Scholar
Coltin, B., Veloso, M.: Online pickup and delivery planning with transfers for mobile robots. In: Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)
Google Scholar
Şucan, I.A., Kavraki, L.E.: A sampling-based tree planner for systems with complex dynamics. IEEE Trans. Robot. 28(1), 116–131 (2012)
Article Google Scholar
Edelkamp, S., Lahijanian, M., Magazzeni, D., Plaku, E.: Integrating temporal reasoning and sampling-based motion planning for multi-goal problems with dynamics and time windows. IEEE Robot. Autom. Lett. 3, 3473–3480 (2018)
Article Google Scholar
Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: 2013 IEEE Symposium on Computational Intelligence in Scheduling (SCIS), pp. 44–51. IEEE (2013)
Google Scholar
Gerevini, A., Haslum, P., Long, D., Saetti, A., Dimopoulos, Y.: Deterministic planning in the fifth international planning competition: PDDL3 and experimental evaluation of the planners. Artif. Intell. 173(5–6), 619–668 (2009)
Article MathSciNet Google Scholar
Kavraki, L.E., Švestka, P., Latombe, J.C., Overmars, M.H.: Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Trans. Robot. Autom. 12(4), 566–580 (1996)
Article Google Scholar
Kiesel, S., Burns, E., Wilt, C., Ruml, W.: Integrating vehicle routing and motion planning. In: Twenty-Second International Conference on Automated Planning and Scheduling (2012)
Google Scholar
LaValle, S.M.: Planning Algorithms. Cambridge University Press, Cambridge (2006)
Book Google Scholar
LaValle, S.M.: Motion planning: the essentials. IEEE Robot. Autom. Mag. 18(1), 79–89 (2011)
Article Google Scholar
Le, D., Plaku, E.: Cooperative, dynamics-based, and abstraction-guided multi-robot motion planning. J. Artif. Intell. Res. 63, 361–390 (2018)
Article MathSciNet Google Scholar
Nenchev, V., Belta, C., Raisch, J.: Optimal motion planning with temporal logic and switching constraints. In: 2015 European Control Conference (ECC), pp. 1141–1146. IEEE (2015)
Google Scholar
Plaku, E.: Region-guided and sampling-based tree search for motion planning with dynamics. IEEE Trans. Robot. 31, 723–735 (2015)
Article Google Scholar
Plaku, E., Rashidian, S., Edelkamp, S.: Multi-group motion planning in virtual environments. Comput. Anim. Virtual Worlds (2016, in press)
Google Scholar
Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI, pp. 649–654 (2011)
Google Scholar
Spong, M.W., Hutchinson, S., Vidyasagar, M.: Robot Modeling and Control. Wiley, Hoboken (2005)
Google Scholar
Winands, M.H., Björnsson, Y., Saito, J.T.: Monte-carlo tree search solver. Comput. Games 5131, 25–36 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

King’s College London, London, UK
Stefan Edelkamp & Yassin Warsame
Catholic University of America, Washington, D.C., USA
Erion Plaku

Authors

Stefan Edelkamp
View author publications
You can also search for this author in PubMed Google Scholar
Erion Plaku
View author publications
You can also search for this author in PubMed Google Scholar
Yassin Warsame
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yassin Warsame .

Editor information

Editors and Affiliations

Freie Universität Berlin, Berlin, Germany
Christoph Benzmüller
Universität Mannheim, Mannheim, Germany
Heiner Stuckenschmidt

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Edelkamp, S., Plaku, E., Warsame, Y. (2019). Monte-Carlo Search for Prize-Collecting Robot Motion Planning with Time Windows, Capacities, Pickups, and Deliveries. In: Benzmüller, C., Stuckenschmidt, H. (eds) KI 2019: Advances in Artificial Intelligence. KI 2019. Lecture Notes in Computer Science(), vol 11793. Springer, Cham. https://doi.org/10.1007/978-3-030-30179-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-30179-8_13
Published: 24 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30178-1
Online ISBN: 978-3-030-30179-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics