1 Introduction

To operate safely in stochastic environments, it is crucial for agents to be able to plan in real-time in the presence of uncertainty. However, the nature of such environments often precludes the existence of guaranteed-safe, collision-free paths. Therefore, this work considers probabilistically safe planning, in which paths must be able to satisfy all constraints with a user-mandated minimum probability. A major challenge in safely navigating such environments is how to properly address the multiple sources of external uncertainty, often classified as environment sensing (ES) and environment predictability (EP) (Lavalle and Sharma 1997). Under this partition, ES uncertainties might be attributable to imperfect sensor measurements or incomplete knowledge of the environment, while EP uncertainties address limited knowledge of the future state of the environment. This work focuses on addressing robustness to EP uncertainty, a key challenge for existing path planning approaches (Melchior and Simmons 2007; Fulgenzi et al. 2008; Leonard et al. 2008; Aoude et al. 2010c).

More specifically, this paper considers the problem of probabilistically safe motion planning to avoid one or more dynamic obstacles with uncertain motion patterns. While existing probabilistic planning frameworks can readily admit dynamic obstacles (Thrun et al. 2005; LaValle 2006), such objects typically demonstrate complex motion patterns in real-world domains, making them difficult to model and predict. For instance, to reliably traverse a busy intersection, an autonomous vehicle would need to predict the underlying intents of the surrounding vehicles (e.g. turning right vs. going straight), in addition to estimating the possible trajectories corresponding to each intent. Even with perfect sensors, accurately predicting possible variations in the long-term trajectories of other mobile agents remains a difficult problem.

One of the main objectives of this work is to accurately model and predict the future behavior of dynamic obstacles in structured environments, such that an autonomous agent can identify trajectories which safely avoid such obstacles. In order to provide long-term trajectory predictions, this work uses pattern-based approaches for modeling the evolution of dynamic obstacles, including clustering of observations (Sect. 2). Such algorithms group previously-observed trajectories into clusters, with each represented by a single trajectory prototype (Bennewitz et al. 2005); predictions are then performed by comparing the partial path to each prototype. While this reduces the dependence on expert knowledge, selecting a model which is sufficiently representative of the behavior without over-fitting remains a key challenge.

In previous work (Joseph et al. 2010, 2011), the authors presented a Bayesian nonparametric approach for modeling dynamic obstacles with unknown motion patterns. This nonparametric model, a mixture of Gaussian processes (GP), generalizes well from small amounts of data and allows the model to capture complex trajectories as more data is collected. However, in practice, GPs suffer from two interconnected shortcomings: their high computational cost and their inability to embed static feasibility or vehicle dynamical constraints. To address both problems simultaneously, this work introduces the RR-GP algorithm, a clustering-based trajectory prediction solution which uses Bayesian nonparametric reachability trees to improve the original GP prediction. Similar to GPs, RR-GP is a data-driven approach, using observed past trajectories of the dynamic obstacles to learn a motion pattern model. By conditioning the obstacle trajectory predictions obtained via GPs on a reachability-based simulation of dynamically feasible paths (Aoude et al. 2011), RR-GP yields a more accurate prediction of future behavior.

The other main objective of this paper is to demonstrate that through appropriate choice of planner, an autonomous agent can utilize RR-GP predictions to identify and execute probabilistically feasible paths in real-time, in the presence of uncertain dynamic obstacles. This agent is subject to limiting dynamical constraints, such as minimum turning rates, acceleration bounds, and/or high speeds. This work proposes a real-time path planning framework using chance-constrained rapidly exploring random trees, or chance constrained RRTs (CC-RRT) (Luders et al. 2010b), to guarantee probabilistic feasibility with respect to the dynamic obstacles and other constraints. CC-RRT extends previously-developed chance constraint formulations (Blackmore et al. 2006; Calafiore and Ghaoui 2007), which efficiently evaluate bounds on the risk of constraint violation at each timestep, to incorporate an RRT-based framework. By applying RRT to solve this risk-constrained planning problem, this planning algorithm is able to rapidly identify probabilistically safe trajectories online in a dynamic and constrained environment. As a sampling-based algorithm (LaValle 2006), RRT incrementally constructs trajectories which satisfy all problem constraints, including the probabilistic feasibility guarantees, and thus scales favorably with problem complexity.

The proposed planning algorithm tightly integrates CC-RRT with the RR-GP algorithm, which provides a likelihood and time-varying Gaussian state distribution for each possible behavior of a dynamic obstacle at each future timestep. This work shows that probabilistic feasibility can be guaranteed for a linear system subject to such uncertainty. An alternative, particle-based approximation which admits the use of nonlinear dynamics and/or non-Gaussian uncertainty is also presented. Though this alternative can only approximate the feasibility guarantees, it avoids the conservatism needed to establish them theoretically. While this work focuses on intersection collision avoidance, the proposed algorithm can be applied to a variety of structured domains, such as mid-air collision avoidance and military applications.

After Sect. 2 presents related work, preliminaries are provided in Sect. 3, which establishes the problem statement, and Sect. 4, which reviews the GP-based motion pattern modeling approach. Section 5 presents the RR-GP algorithm, with simulation results demonstrating its effectiveness in Sect. 6. Section 7 extends the CC-RRT framework to integrate RR-GP trajectory predictions, allowing dynamic obstacles with multiple possible behaviors. Finally, Sect. 8 presents simulation results, which demonstrate the ability of the fully-integrated algorithm to significantly reduce the risk of collisions with dynamic obstacles.

2 Related work

Modeling the evolution of dynamic obstacles can be classified into three main categories: (1) worst-case, (2) dynamic-based, and (3) pattern-based approaches (Lachner 1997; Mazor et al. 2002; Vasquez et al. 2008). In the worst-case approach, the dynamic obstacle is assumed to be actively trying to collide with the planning agent, or “host vehicle” (Miloh and Sharma 1976; Lachner 1997). The predicted trajectory of the dynamic obstacle is the solution of a differential game, where the dynamic obstacle is modeled as a pursuer and the host vehicle as an evader (Aoude et al. 2010a). Despite providing a lower bound on safety, such solutions are inherently conservative, and thus limited to short time horizons in collision warning/mitigation problems to keep the level of false positives below a reasonable threshold (Kuchar and Yang 2002).

The dynamic-based approach predicts an obstacle’s future trajectory by propagating its dynamics forward in time, based on its current state and an assumed fixed mode of operation. This prediction typically uses a continuous Bayes filter, such as the Kalman filter or its variations (Sorenson 1985). A popular extension in the target tracking literature is the Interacting Multiple Model Kalman filter, which matches the obstacle’s current mode of operation from among a bank of continuously-updated Kalman filters (Mazor et al. 2002). Though useful for short-term prediction, dynamic-based approaches tend to perform poorly in the long-term prediction of trajectories, due to their inability to model future changes in control inputs or external factors.

In the pattern-based approach, such as the one used in this work, target obstacles are assumed to move according to typical patterns across the environment, learned via previous observation of the targets. There are two main techniques that fall under this category, discrete state-space techniques and clustering-based techniques. In the discrete state-space technique, the motion model is developed via Markov chains; the object state evolves from one state to another according to a learned transition probability (Zhu 2002). In the clustering-based technique, previously-observed trajectories are grouped into different clusters, with each represented by a single trajectory prototype (Bennewitz et al. 2005). Given a partial path, prediction is then performed by finding the most likely cluster, or computing a probability distribution over the different clusters. Both pattern-based techniques have proven popular in solving long-term prediction problems for mobile agents (Fulgenzi et al. 2008; Vasquez et al. 2008). However, discrete state-space techniques can often suffer from over-fitting for discretizations of sufficient resolution, unlike clustering-based techniques (Joseph et al. 2011).

Many existing approaches in the literature seek to emulate human-like navigation in crowded environments, where obstacle density is high and interaction between agents and obstacles can significantly influence behavior. Trautman and Krause (2010) use GPs to model interactions between the agent and dynamic obstacles present in the environment. Althoff et al. (2011) use Monte Carlo sampling to estimate inevitable collision states probabilistically, while Henry et al. (2010) apply inverse reinforcement learning for human-like behavior. By contrast, our algorithm considers constrained, often non-holonomic agents operating in structured environments, where encounters with dynamic obstacles are less frequent but more heavily constrained. The proposed algorithm is similar to Fulgenzi et al. (2008), which uses GPs to model moving obstacles in an RRT path planner; however, Fulgenzi et al. (2008) relies solely on GPs for its modeling, which can lead to less accurate prediction, and uses heuristics to assess path safety.

While several approaches have been previously proposed for path planning with probabilistic constraints, the approach developed in this work does not rely on the use of an optimization-based framework (Blackmore et al. 2006; Calafiore and Ghaoui 2007). While such optimizations have been demonstrated for real-time path planning, they lack the scalability with respect to problem complexity inherent to sampling-based algorithms, a crucial consideration in complex and dynamic environments. For example, MILP-based optimizations—NP-hard in the number of binary variables (Garey and Johnson 1979)—tend to scale poorly in the number of obstacles and timesteps, resulting in many approaches being proposed specifically to overcome MILP’s computational limits (Earl and D’Andrea 2005; Vitus et al. 2008; Ding et al. 2011). Because sampling-based algorithms such as CC-RRT perform trajectory-wise constraint checking, they avoid these scalability concerns—feasible solutions can typically be quickly identified even in the presence of many obstacles, and observed changes in the environment. The trade-off is that such paths do not satisfy any optimality guarantees, though performance will improve as more sampled trajectories are made available. Extensions such as RRT\(^\star \) (Karaman and Frazzoli 2009) can provide asymptotic optimality guarantees, with the trade-off of requiring additional per-node computation (in particular, a steering method).

CC-RRT primarily assesses the probabilistic feasibility at each timestep, rather than over the entire path. Because of the dynamics, the uncertainty is correlated, and thus the probability of path feasibility cannot be approximated by assuming independence between timesteps. While path-wise bounds on constraint violation can be established by evenly allocating risk margin across all obstacles and timesteps (Blackmore 2006), this allocation significantly increases planning conservatism, rendering the approach intractable for most practical scenarios. Though this allocation could also be applied to CC-RRT by bounding the timestep horizon length, it is not pursued further in this work.

This work also proposes an alternative, particle-based approximation of the uncertainty within CC-RRT, assessing path feasibility based on the fraction of feasible particles. Both the approaches of Blackmore et al. (2010) and particle CC-RRT (PCC-RRT) are able to admit non-Gaussian probability distributions and approximate path feasibility, without the conservatism introduced by bounding risk. The optimization-versus-sampling considerations here are the same as noted above; as a sampling-based algorithm, PCC-RRT can also admit nonlinear dynamics without an appreciable increase in complexity. While the particle-based CC-RRT algorithm is similar to the work developed in Melchior and Simmons (2007), the former is generalizable both in the types of probabilistic feasibility that are assessed (timestep-wise and path-wise) and in the types of uncertainty that are modeled using particles. This framework can be extended to consider hybrid combinations of particle-based and distribution-based uncertainty, though this may limit the ability to assess path-wise infeasibility. Furthermore, by not clustering particles, a one-to-one mapping between inputs and nodes is maintained.

3 Problem statement

Consider a discrete-time linear time-invariant system with process noise,

$$\begin{aligned} x_{t+1}&= Ax_t + Bu_t + w_t, \end{aligned}$$
(1)
$$\begin{aligned} x_0&\sim \mathcal N (\hat{x}_0,P_{x_0}), \end{aligned}$$
(2)
$$\begin{aligned} w_t&\sim \mathcal N (0,P_{w_t}), \end{aligned}$$
(3)

where \(x_t \in \mathbb{R ^{n_x}}\) is the state vector, \(u_t \in \mathbb{R ^{n_u}}\) is the input vector, and \(w_t \in \mathbb{R ^{n_x}}\) is a disturbance vector acting on the system; \(\mathcal{N }(\hat{a},P_a)\) represents a random variable whose probability distribution is Gaussian with mean \(\hat{a}\) and covariance \(P_a\). The i.i.d. random variables \(w_t\) are unknown at current and future timesteps, but have the known probability distribution Eq. (3) (\(P_{w_t} \equiv P_w ~\forall ~t\)). Equation (2) represents Gaussian uncertainty in the initial state \(x_0\), corresponding to uncertain localization; Eq. (3) represents a zero-mean, Gaussian process noise, which may correspond to model uncertainty, external disturbances, and/or other factors.

The system is subject to the state and input constraints

$$\begin{aligned} x_t&\in \mathcal{X }_t \equiv \mathcal{X } - { \mathcal X }_{t1} - \cdots - \mathcal{X }_{tB}, \end{aligned}$$
(4)
$$\begin{aligned} u_t&\in \mathcal{U }, \end{aligned}$$
(5)

where \(\mathcal{X }, \mathcal{X }_{t1}, \ldots , \mathcal{X }_{tB} \subset \mathbb{R ^{n_x}}\) are convex polyhedra, \(\mathcal{U } \subset \mathbb{R ^{n_u}}\), and the \(-\) operator denotes set subtraction. The set \(\mathcal{X }\) defines a set of time-invariant convex constraints acting on the state, while \(\mathcal{X }_{t1}, \ldots , \mathcal{X }_{tB}\) represent \(B\) convex, possibly time-varying obstacles to be avoided. Observations of dynamic obstacles are assumed to be available, such as through a vehicle-to-vehicle or vehicle-to-infrastructure system.

For each obstacle, the shape and orientation are assumed known, while the placement is uncertain:

$$\begin{aligned} \mathcal{X }_{tj}&= \mathcal{X }^0_{j} + c_{tj}, ~\forall ~j \in {\mathbb{Z }_{1,B}}, ~\forall ~t,\end{aligned}$$
(6)
$$\begin{aligned} c_{tj}&\sim p(c_{tj})~\forall ~j \in {\mathbb{Z }_{1,B}}, ~\forall ~t, \end{aligned}$$
(7)

where the \(+\) operator denotes set translation and \({\mathbb{Z }_{a,b}}\) represents the set of integers between \(a\) and \(b\) inclusive. In this model, \(\mathcal{X }_j^0 \subset {\mathbb{R }^{n_x}}\) is a convex polyhedron of known, fixed shape, while \(c_{tj} \in {\mathbb{R }^{n_x}}\) is an uncertain and possibly time-varying translation represented by the probability distribution \(p(c_{tj})\). This can represent both environmental sensing uncertainty (Luders et al. 2010b) and environmental predictability uncertainty (e.g. dynamic obstacles), as long as future state distributions are known (Sect. 7.2).

The objective of the planning problem is to reach the goal region \(\mathcal{X }_{\text{ goal }} \subset {\mathbb{R }^{n_x}}\) in minimum time,

$$\begin{aligned} t_{\text{ goal }} = \inf \{ t \in {\mathbb{Z }}_{0,t_f} ~|~ x_t \in \mathcal{X }_{\text{ goal }} \}, \end{aligned}$$
(8)

while ensuring the constraints in Eqs. (45) are satisfied at each timestep \(t \in \{0,\ldots ,t_{\text{ goal }} \}\) with probability of at least \(p_{\text{ safe }}\). In practice, due to state uncertainty, it is assumed sufficient for the distribution mean to reach the goal region \(\mathcal{X }_{\text{ goal }}\). A penalty function \(\psi (x_t,\mathcal{X }_t,\mathcal{U })\) may also be incorporated.

Problem 1

Given the initial state distribution \((\hat{x}_0,P_{x_0})\) and constraint sets \(\mathcal{X }_t\) and \(\mathcal{U }\), compute the input control sequence \(u_t,~t \in {\mathbb{Z }_{0,t_f}},~t_f \in {\mathbb{Z }_{0,\infty }}\) that minimizes

$$\begin{aligned} J(\mathbf{u }) = t_{\text{ goal }} + \sum _{t=0}^{t_{\text{ goal }}} \psi (x_t,\mathcal{X }_t,\mathcal{U }) \end{aligned}$$
(9)

while satisfying Eq. (1) for all timesteps \(t \in \{0,\ldots ,t_{\text{ goal }} \}\) and Eqs. (45) at each timestep \(t \in \{0,\ldots ,t_{\text{ goal }} \}\) with probability of at least \(p_{\text{ safe }}\). \(\square \)

3.1 Motion pattern

A motion pattern is defined here as a mapping from states to a distribution over trajectory derivatives.Footnote 1 In this work, motion patterns are used to represent dynamic obstacles, also referred to as agents. Given an agent’s position \(( x_t , y_t )\) and trajectory derivative \(( \frac{\Delta x_t}{\Delta t} , \frac{\Delta y_t}{\Delta t} )\), its predicted next position \(( x_{t+1} , y_{t+1} )\) is \(( x_t + \frac{\Delta x_t}{\Delta t} \Delta t , y_t + \frac{\Delta y_t}{\Delta t} \Delta t )\). Thus, modeling trajectory derivatives is sufficient for modeling trajectories. By modeling motion patterns as flow fields rather than single paths, the approach is independent of the lengths and discretizations of the trajectories.

3.2 Mixtures of motion patterns

The finite mixture modelFootnote 2 defines a distribution over the \(i\mathrm{{th}}\) observed trajectory \(t^i\). This distribution is written as

$$\begin{aligned} p( t^i ) = \sum _{j=1}^M p(b_j) p( t^i | b_j ), \end{aligned}$$
(10)

where \(b_j\) is the \(j\)th motion pattern and \(p(b_j)\) is its prior probability. It is assumed the number of motion patterns, \(M\), is known a priori based on prior observations, and may be identified by the operator or through an automated clustering process (Joseph et al. 2011).

4 Motion model

The motion model is defined as the mixture of weighted motion patterns (10). Each motion pattern is weighted by its probability and is modeled by a pair of Gaussian processes mapping \(( x , y )\) locations to distributions over trajectory derivatives \(\frac{\Delta x}{\Delta t}\) and \(\frac{\Delta y}{\Delta t}\). This motion model has been previously presented in Aoude et al. (2011), Joseph et al. (2011); Sects. 4.1 and 4.2 briefly review the approach.

4.1 Gaussian process motion patterns

This section describes the model for \(p(t^i|b_j)\) from Eq. (10), the probability of trajectory \(t^i\) given motion pattern \(b_j\). This model is the distribution over trajectories expected for a given mobility pattern.

There are a variety of models that can be chosen to represent these distributions. A simple example is a linear model with Gaussian noise, but this approach cannot capture the dynamics of the variety expected in this work. Discrete Markov models are also commonly used, but are not well-suited to model mobile agents in the types of real-world domains of interest here, particularly due to challenges in choosing the discretization (Tay and Laugier 2007; Joseph et al. 2010, 2011). To fully represent the variety of trajectories that might be encountered, a fine discretization is required. However, such a model either requires a large amount of training data, which is costly or impractical in real-world domains, or is prone to over-fitting. A coarser discretization can be used to prevent over-fitting, but may be unable to accurately capture the agent’s dynamics.

This work uses GP as the model for motion patterns. Although GPs have a significant mathematical and computational cost, they provide a natural balance between generalization in regions with sparse data and avoidance of overfitting in regions of dense data (Rasmussen and Williams 2005). GP models are extremely robust to unaligned, noisy measurements and are well-suited for modeling the continuous paths underlying potentially non-uniform time-series samples of the agent’s locations. Trajectory observations are discrete measurements from its continuous path through space; a GP places a distribution over functions, serving as a non-parametric form of interpolation between these discrete measurements.

After observing an agent’s trajectory \(t^i\), the posterior probability of the \(j\)th motion pattern is

$$\begin{aligned} p(b_j|t^i) \propto p(t^i|b_j) p(b_j), \end{aligned}$$
(11)

where \(p(b_j)\) is the prior probability of motion pattern \(b_j\) and \(p(t^i | b_j)\) is the probability of trajectory \(t^i\) under \(b_j\). This distribution, \(p(t^i | b_j)\), is computed by

$$\begin{aligned} p(t^i | b_j)&= \prod _{t=0}^{L^i} p \left( \left. \frac{\Delta x_{t}}{\Delta t} \right| x^i_{0:t} , y^i_{0:t} , \{ t^k : z_k = j \} , \theta ^{GP}_{x,j} \right) \nonumber \\&\quad \cdot p \left( \left. \frac{\Delta y_{t}}{\Delta t} \right| x^i_{0:t} , y^i_{0:t} , \{ t^k : z_k = j \} , \theta ^{GP}_{y,j}\right) , \end{aligned}$$
(12)

where \(L^i\) is the length of trajectory \(t^i\), \(z_k\) indicates the motion pattern trajectory \(t^k\) is assigned to, \(\{t^k : z_k = j\}\) is the set of all trajectories assigned to motion pattern \(j\), and \(\theta ^{GP}_{x,j}\) and \(\theta ^{GP}_{y,j}\) are the hyperparameters of the Gaussian process for motion pattern \(b_j\).

A motion pattern’s GP is specified by a set of mean and covariance functions. The mean functions are written as \(E\left[ \frac{\Delta x}{\Delta t}\right] = \mu _x(x,y)\) and \(E\left[ \frac{\Delta y}{\Delta t}\right] = \mu _y(x,y)\), both of which are implicitly initialized to zero for all \(x\) and \(y\) by the choice of parametrization of the covariance function. This encodes the prior bias that, without any additional knowledge, the target is expected to stay in the same place. The “true” covariance function of the \(x\)-direction is denoted by \(K_x(x,y,x^{\prime },y^{\prime })\), which describes the correlation between trajectory derivatives at two points, \((x,y)\) and \((x^{\prime },y^{\prime })\). Given locations \((x_1,y_1,\ldots ,x_k,y_k)\), the corresponding trajectory derivatives \(\left( \frac{\Delta x_1}{\Delta t},\ldots ,\frac{\Delta x_k}{\Delta t}\right) \) are jointly distributed according to a Gaussian with mean \(\{\mu _x(x_1,y_1),\ldots ,\mu _x(x_k,y_k)\}\) and covariance \(\Sigma \), where the \(\Sigma _{ij} = K_x(x_i,y_i,x_j,y_j)\). This work uses the squared exponential covariance function

$$\begin{aligned} K_x(x,y,x^{\prime },y^{\prime })&= \sigma _x^2 \exp \left( - \frac{(x - x^{\prime })^2}{2{w_x}^2} - \frac{(y - y^{\prime })^2}{2{w_y}^2} \right) \nonumber \\&+ \sigma _n^2 \delta (x,y,x^{\prime },y^{\prime }), \end{aligned}$$
(13)

where \(\delta (x,y,x^{\prime },y^{\prime }) = 1\) if \(x=x^{\prime }\) and \(y=y^{\prime }\) and zero otherwise. The exponential term encodes that similar trajectories should make similar predictions, while the length-scale parameters \(w_x\) and \(w_y\) normalize for the scale of the data. The \(\sigma _n\)-term represents within-point variation (e.g., due to noisy measurements); the ratio of \(\sigma _n\) and \(\sigma _x\) weights the relative effects of noise and influences from nearby points. Here \(\theta ^{GP}_{x,j}\) is used to refer to the set of hyperparameters \(\sigma _x\), \(\sigma _n\), \(w_x\), and \(w_y\) associated with motion pattern \(b_j\) (each motion pattern has a separate set of hyperparameters). While the covariance is written above for two dimensions, it can easily be generalized to higher dimensional problems.

For a GP over trajectory derivatives trained with tuples \(\left( x_k,y_k,\frac{\Delta x_k}{\Delta t}\right) \), the predictive distribution over the trajectory derivative \(\frac{\Delta x}{\Delta t}^*\) for a new point \((x^*,y^*)\) is given by

$$\begin{aligned} \mu _{\frac{\Delta x}{\Delta t}^*}&= K_x\!(x^*\!,\!y^* \!,\!X\!,\!Y\!)K_x\!(\!X\!,\!Y\!,\!X\!,\!Y)^{-1} \frac{\Delta X}{\Delta t}, \nonumber \\ \sigma ^2_{\frac{\Delta x}{\Delta t}^*}&= K_x\!(x^* \!,\!y^*\!,\!X\!,\!Y\!)K_x\!(X\!,\!Y\!,\!X,\!Y\!)^{-1} K_x\!(X\!,\!Y\!,\!x^*\!,\!y^*\!), \end{aligned}$$
(14)

where the expression \(K_x(X,Y,X,Y)\) is shorthand for the covariance matrix \(\Sigma \) with terms \(\Sigma _{ij} = K_x(x_i,y_i,x_j,y_j)\), with \(\{X,Y\}\) representing the previous trajectory points. The equations for \(\frac{\Delta y}{\Delta t}^*\) are equivalent to those above, using the covariance \(K_y\).

In Eq. (12), the likelihood is assumed to be decoupled, and hence independent, in each position coordinate. This enables decoupled GPs to be used for the trajectory position in each coordinate, dramatically reducing the required computation, and is assumed in subsequent developments. While the algorithm permits the use of correlated GPs, the resulting increase in modeling complexity is generally intractable for real-time operation. Simulation results (Sect. 8) demonstrate that the decoupled GPs provide a sufficiently accurate approximation of the correlated GP to achieve meaningful predictions of future trajectories.

4.2 Estimating future trajectories

The Gaussian process motion model can be used to calculate the Gaussian distribution over trajectory derivatives \(\left( \frac{\Delta x}{\Delta t},\frac{\Delta y}{\Delta t}\right) \) for every location \((x,y)\) using Eq. (14). This distribution over the agent’s next location can be used to generate longer-term predictions over future trajectories, but not in closed form. Instead, the proposed approach is to draw trajectory samples to be used for the future trajectory distribution.

To sample a trajectory from a current starting location \((x_0,y_0)\), first a trajectory derivative \(\left( \frac{\Delta x_0}{\Delta t},\frac{\Delta y_0}{\Delta t}\right) \) is sampled to calculate the agent’s next location \((x_1,y_1)\). Starting from \((x_1,y_1)\), the trajectory derivative \(\left( \frac{\Delta x_1}{\Delta t},\frac{\Delta y_1}{\Delta t}\right) \) is sampled to calculate the agent’s next location \((x_2,y_2)\). This process is repeated until the trajectory is of the desired length \(L\). The entire sampling procedure is then repeated from \((x_0, y_0)\) multiple times to obtain a set of possible future trajectories. Given a current location \((x_t,y_t)\) and a given motion pattern \(b_j\), the agent’s predicted position \(K\) timesteps into the future is computed as

$$\begin{aligned}&p(x_{t+K},y_{t+K} | x_{t}, y_{t}, b_{j})\nonumber \\&\quad = \prod _{k=0}^{K-1} p(x_{t+k+1},y_{t+k+1} | x_{t+k}, y_{t+k}, b_j)\nonumber \\&\quad = \prod _{k=0}^{K-1} p \left( \left. \frac{ \Delta x_{t+k+1}}{\Delta t},\frac{\Delta y_{t+k+1}}{ \Delta t} \right| x_{t+k}, y_{t+k}, b_j \right) \nonumber \\&\quad = \prod _{k=0}^{K-1} p \left( \left. \frac{\Delta x_{t+k+1}}{ \Delta t} \right| x_{t+k}, y_{t+k}, b_j \right) \nonumber \\&\quad ~~~~~~~~~~~~~~ \cdot p \left( \left. \frac{\Delta y_{t+k+1}}{ \Delta t} \right| x_{t+k}, y_{t+k}, b_j \right) \nonumber \\&\quad = \prod _{k=0}^{K-1} \mathcal{N} \left( x_{t+k+1};\mu _{j, \frac{\Delta x_{t+k+1}}{\Delta t}},\sigma ^2_{j,\frac{ \Delta x_{t+k+1}}{\Delta t}} \right) \nonumber \\&\quad ~~~~~~~~~~~~~\cdot \mathcal{N} \left( y_{t+k+1};\mu _{j, \frac{\Delta y_{t+k+1}}{\Delta t}},\sigma ^2_{j,\frac{ \Delta y_{t+k+1}}{\Delta t}} \right) , \end{aligned}$$
(15)

where the Gaussian distribution parameters are calculated using Eq. (14). When this process is done online, the trajectory’s motion pattern \(b_j\) will not be known directly. Given the past observed trajectory \((x_0,y_0),\ldots ,(x_t,y_t)\), the distribution can be calculated \(K\) timesteps into the future by combining Eqs. (10) and (15).

Formally,

$$\begin{aligned}&p(x_{t+K},y_{t+K} | x_{0:t}, y_{0:t}) \nonumber \\&\quad = \sum _{j=1}^M p(x_{t+K},y_{t+K} | x_{0:t}, y_{0:t}, b_j) p(b_j | x_{0:t}, y_{0:t})\end{aligned}$$
(16)
$$\begin{aligned}&\quad = \sum _{j=1}^M p(x_{t+K},y_{t+K} | x_{t}, y_{t}, b_j) p(b_j | x_{0:t}, y_{0:t}), \end{aligned}$$
(17)

where \(p(b_j | x_{0:t}, y_{0:t})\) is the probability of motion pattern \(b_j\) given the observed portion of the trajectory. The progression from Eq. (16) to Eq. (17) is based on the assumption that, given \(b_j\), the trajectory’s history provides no additional information about the future location of the agent (Joseph et al. 2010, 2011).

The GP motion model over trajectory derivatives in this paper gives a Gaussian distribution over possible target locations at each timestep. While samples drawn from this procedure are an accurate representation of the posterior over trajectories, sampling \(N_1\) trajectories \(N_2\) steps in the future requires \(N_1 \times N_2\) queries to the GP. It also does not take advantage of the unimodal, Gaussian distributions being used to model the trajectory derivatives. By using the approach of Girard et al. (2003) and Deisenroth et al. (2009), which provides a fast, analytic approximation of the GP output given the input distribution, future trajectories are efficiently predicted in this work. In particular, given the inputs of a distribution on the target position at time \(t\), and a distribution of trajectory derivatives, the approach yields a distribution on the target position at time \(t+1\), effectively linking the Gaussian distributions together.

By estimating the target’s future trajectories analytically, only \(N_2\) queries to the GP are needed to predict trajectories \(N_2\) steps into the future, and the variance introduced by sampling future trajectories is avoided. This facilitates the use of GPs for accurate and efficient trajectory prediction.

5 RR-GP trajectory prediction algorithm

Section 4 outlined the approach of using GP mixtures to model mobility patterns and its benefits over other approaches. However, in practice, GPs suffer from two interconnected shortcomings: their high computational cost and their inability to embed static feasibility or vehicle dynamical constraints. Since GPs are based on statistical learning, they are unable to model prior knowledge of road boundaries, static obstacle location, or dynamic feasibility constraints (e.g. minimum turning radius). Very dense training data may alleviate this feasibility problem by capturing, in great detail, the environment configuration and physical limitations of the vehicle. Unfortunately, the computation time for predicting future trajectories using the resulting GPs would suffer significantly, rendering the motion model unusable for real-time applications.

To handle both of these problems simultaneously, this section introduces a novel trajectory prediction algorithm, denoted as RR-GP (Fig. 1). RR-GP includes two main components, rapidly-exploring random trees (RRT), and a GP-based mobility model. For each GP-based pattern \(b_j\), \(j \in \{1,\ldots ,M\}\) as defined in Sect. 4, and the current position of the target vehicle, RR-GP uses an RRT-based technique to grow a tree of trajectories that follows \(b_j\) while guaranteeing dynamical feasibility and collision avoidance. More specifically, it is based on the closed-loop RRT (CL-RRT) algorithm (Kuwata et al. 2009), successfully used by the MIT team in the 2007 DARPA Grand Challenge (Leonard et al. 2008). CL-RRT grows a tree by randomly sampling points in the environment and simulating dynamically feasible trajectory towards them in closed-loop, allowing the generation of smoother trajectories more efficiently than traditional RRT algorithms.

Fig. 1
figure 1

RR-GP high level architecture

In contrast to the original CL-RRT approach, the RR-GP tree is used not to create paths leading to a goal location, but instead to grow trees toward regions corresponding to the learned mobility patterns, or intents, \(b_j\) (Fig. 2). RR-GP does not approximate the complete reachability set of the target vehicle; instead, it generates a tree based on each potential motion pattern, and computes the likelihood of each based on the observed partial path. In this manner, RR-GP conditions the original GP prediction by removing infeasible patterns and providing a finer discretization, resulting in better trajectory prediction.

5.1 Single tree RR-GP algorithm

Fig. 2
figure 2

Simple RR-GP illustration (\(M = 1\)). RR-GP grows a tree (in brown) using GP samples (orange dots) sampled at \(\Delta t\) intervals for a given motion pattern. The green circles represent the actual size of the target vehicle. The resulting tree provides a distribution of predicted trajectories of the target at \(\delta t \ll \Delta t\) increments (Color figure online)

figure f

Algorithm 1 details the single-tree RR-GP algorithm, which constructs a tree of dynamically feasible motion segments to generate a distribution of predicted trajectories for a given intent \(b\), dynamical model, and low-level controller. Every time Algorithm 1 is called, a tree \(\mathcal T _{GP}\) is initialized with a root node at the current target vehicle position \((x(t), y(t))\). Variable \(t_{GP}\), which is used for time bookkeeping in the expansion mechanism of \(\mathcal T _{GP}\), is also initialized to \(t + \Delta t\), where \(t\) is the current time, and \(\Delta t\) is the GP sampling time interval.

At each step, only nodes belonging to the “time bucket” \(t_{GP} - \Delta t\) are eligible to be expanded, corresponding to the nodes added at the previous timestep \(t_{GP}\). To grow \(\mathcal T _{GP}\), first a sample \(x_{\text{ samp }}\) is taken from the environment (line 3) at time \(t_{GP} + K\Delta t\), or equivalently \(K\) timesteps in the future, using Eq. (15). The nodes added in the previous step (i.e. belonging to time bucket \(t_{GP} - \Delta t\)) are identified for tree expansion in terms of some distance heuristics (line 4). The algorithm attempts to connect the nearest node to the sample using an appropriate reference path for the closed-loop system consisting of the vehicle and the controller. The resulting path is dynamically feasible (by construction) and is then checked for collisions. Since the \(\mathcal T _{GP}\) is trying to generate typical trajectories that the target vehicle might follow, only a simulated trajectory that reaches the sample without a collision is kept, and any corresponding nodes are added to the tree and \(t_{GP}\) time bucket (line 8). When the total number of successful connections \(n_{\text{ success }}\) is reached, \(t_{GP}\) and \(K\) are incremented, and the next timestep is sampled (line 15).

RR-GP keeps track of the total number of unsuccessful connections \(n_{\text{ infeas }}\) at each iteration. When \(n_{\text{ infeas }}\) reaches some predetermined threshold, the variance of the GP for the current iteration is temporarily grown to capture a more dispersed set of paths (line 3). This heuristic is typically useful to generate feasible trajectories when GP samples are close to obstacles. If \(n_{\text{ infeas }}\) then reaches a second, larger threshold, RR-GP “gives up” on growing the tree, and simply returns \(\mathcal T _{GP}\) (line 11). This situation usually happens when the mobility pattern \(b\) has a low likelihood and is generating a large number of GP samples in infeasible regions. These heuristics, along with the time bucket logic, facilitate efficient feasible trajectory generation in RR-GP.

The resulting tree is post-processed to produce a dense, time-parametrized distribution of the target vehicle position at future timesteps. Since the RR-GP tree is grown at a higher rate compared to the original GP learning phase, the resulting distribution is generated at \(\delta t \ll \Delta t\) increments, where \(\delta t\) is the low-level controller rate. The result is a significant improvement of the accuracy of the prediction without a significant increase in the computation times (Sect. 6).

5.2 Multi-tree RR-GP algorithm

figure g
figure h

This section introduces the Multi-Tree RR-GP (Algorithm 2), which extends Algorithm 1 to consider multiple motion patterns for a dynamic obstacle. The length of the prediction problem is \(T\) seconds, and the prediction time horizon is \(T_h\) seconds. The value of \(T_h\) is problem-specific, and depends on the time length of the training data. For example, in a threat assessment problem for road intersections, \(T_h\) will typically be on the order of \(3\) to \(6\)  seconds (Aoude et al. 2010b). The RR-GP algorithm updates its measurement of the target vehicle every \({dt}\) seconds, chosen such that the inner loop (lines 6–9) of Algorithm 2 reaches completion before the next measurement update. Finally, the time period of the low-level controller is equal to \(\delta t\), typically 0.02–0.1 s for the problems of interest.

The input to Algorithm 2 is a set of GP motion patterns, along with a prior probability distribution, proportional to the number of observed trajectories belonging to each pattern. In line 4, the position of the target vehicle is measured. Then, the probability that the vehicle trajectory belongs to each of the \(M\) motion patterns is updated using Eq. (11). For each motion pattern (in parallel), line 7 grows a single-tree RR-GP rooted at the current position of the target vehicle using Algorithm 1. The means and variances of the predicted positions are computed for each motion pattern at each timestep, using position and time information from the single-tree output (line 8). This process can be parallelized for each motion pattern, since there is no information sharing between the growth operations of each RR-GP tree. Note that even if the vehicle’s position has significantly deviated from the expected GP behaviors, each GP prediction will still attempt to reconcile the vehicle’s current position \((x(t),y(t))\) with the behavior.

The probability of each motion pattern is adjusted using Algorithm 3, which removes the probability of any pattern that ended in an infeasible region (line 4) due to surpassing the second \(n_\text{ infeas }\) threshold. By using dynamic feasibility to recalculate the probabilities (line 11), this important modification helps the RR-GP algorithm converge to the more likely patterns faster, yielding an earlier prediction of the intent of the target vehicle. In the event that all RR-GP trees end with an infeasible condition, probability values are not adjusted (line 10). In practice, this infeasibility case should rarely occur, since the target vehicle is assumed to follow one of the available patterns.

If any of the motion pattern probabilities is altered, line 11 of Algorithm 2 recomputes the probability distribution of the positions of the target vehicle (\(\hat{x}(t),\hat{y}(t)\)) for times \(\tau \in [0, \delta t, 2\delta t, \ldots , t]\). This step is called backward propagation (BP), as it propagates the effects of the updated likelihoods to the previously computed position distributions. Finally, line 12 of Algorithm 2 combines the position predictions from each single-tree RR-GP output into one distribution, by incorporating the updated motion pattern probabilities \(b_j\). This computation is performed for all time \(\tau \) where \(\tau \in [t+ \delta t, t + 2\delta t, \ldots , t + T_h]\), resulting in a probability distribution of the future trajectories of the target vehicle that is based on a mixture of GPs.

In subsequent results, it is assumed that the target vehicle has car-like dynamics, more specifically a bicycle dynamical model (LaValle 2006). The target vehicle inputs are approximated by the outputs of a pure-pursuit (PP) low-level controller (Amidi and Thorpe 1990) that computes a sequence of commands towards samples generated from the GP. This path is then tracked using the propagation function of a PP controller for steering and a proportional-integral controller for tracking the GP-predicted velocity.

Figure 3 demonstrates the dynamic feasibility and collision avoidance features of the RR-GP approach on a simple example consisting of two motion patterns, corresponding to passing a single obstacle on the left or right. (Training and testing procedures of RR-GP scenarios are explained in detail in Sect. 6.) In this illustration, the test trajectory belongs to the left motion pattern. At each step, Algorithm 2 updates the likelihoods of each motion pattern using Eqs. (11) and (12), and generates two new dynamically feasible trees.

Fig. 3
figure 3

Snapshots of the GP samples (top row) and Algorithm 2 output (bottom row) on a test trajectory

At time \(t=0\) s (Fig. 3d), the target vehicle is pointing straight at the obstacle. By \(t=1\) s (Fig. 3e), the vehicle has moved forward and rotated slightly left. Due to the forward movement and left rotation, RR-GP finds more feasible left trajectories than right trajectories (this can be seen by comparing the trajectories as they pass the corners of the obstacle), a behavior GP samples alone cannot capture (Fig. 3b). At \(t=2\) s (Fig. 3f), the vehicle has more clearly turned to the left; the RR-GP algorithm returns an incomplete right tree, reflecting that all attempts to grow the tree further along the right motion pattern failed.

Note that the GP samples for the right pattern are not all infeasible at \(t=2\) s (Fig. 3c); an algorithm based on GP samples alone would not have predicted the dynamic infeasibility of the right pattern. Furthermore, simple interpolation techniques would not have been able to detect dynamic infeasibility, highlighting the importance of the dynamic model embedded within the RR-GP algorithm. RR-GP also predicts that the left-side trees are dynamically feasible and reaching collision-free regions. This early detection of the correct motion pattern is a major advantage of RR-GP compared to GP algorithms alone.

Remark

(complexity) The proposed approach scales linearly with both the number of dynamic obstacles and the number of intents for each. However, both Algorithms 1 and 2 can be run in parallel, with a separate process running for each potential behavior of each dynamic obstacle. Assuming the computational resources are available to implement this parallelization, runtime scaling with dynamic obstacle complexity can be effectively eliminated.

While there are no theoretical limitations with respect to the number of dynamic obstacles or motion patterns considered, in practice few are needed at any one time for the structured environments in this work’s domain of interest. Complex obstacle environments can typically be broken down into a sequence of interactions with a smaller number of dynamic obstacles, while most of the “decisions” associated with motion intentions can be broken down to a series of consecutive decisions involving fewer branching paths. The trade-off in this case is that the time horizons being considered may need to be reduced.

6 RR-GP demonstration on human-operated target

To highlight the advantages of the RR-GP algorithm, Algorithm 2 is applied on an example scenario consisting of a single target vehicle traversing past several fixed obstacles. The purpose of this example is to compare the performance (in terms of accuracy and computation time) of the RR-GP approach against two baseline GP mixture algorithms, given either sparse training data (Sparse-GP) or dense training data (Dense-GP). The results below demonstrate that RR-GP, given only the same data as Sparse-GP, matches or exceeds the accuracy of Dense-GP while maintaining the runtime benefits of sparse data.

6.1 Setup

Trajectories were manually generated by using a steering wheel to guide a simulated robot through a virtual urban environment described in Aoude et al. (2010c). The vehicle uses the iRobot Create software platform (iRobot 2011) with a skid-steered vehicle, modified in software to emulate the standard bicycle model

$$\begin{aligned} \begin{aligned} \dot{x}&= v \cos {(\theta )},&\dot{y}&= v \sin {(\theta )}, \\ \dot{\theta }&= \frac{v}{L} \tan {(\delta )},&\dot{v}&= a, \end{aligned} \end{aligned}$$
(18)

where \((x,y)\) is the rear axle position, \(v\) is the forward speed, \(\theta \) is the heading, \(L\) is the wheelbase equal to \(0.33\) m, \(a\) is the forward acceleration, and \(\delta \) is the steering angle (positive counter-clockwise). The state of the vehicle is \(s = (x,y,\theta ,v) \in \mathcal S \), while the input is \(u = (\delta ,a) \in \mathcal U \), including the constraints \(a_{\min } \le a \le a_{\max }\) and \(|\delta |\le \delta _{\max }\), where \(a_{\min } = -0.7\) m/s\(^2\), \(a_{\max } = 0.4\) m/s\(^2\), and \(\delta _{\max } = 0.6\) rad. Starting from its initial location, the vehicle is either driven to the left of the obstacle or to the right, for a total of \(M=2\) motion patterns.

A total of \(30\) (\(15\) left, \(15\) right) trajectories were generated for training, with data collected at \(50\) Hz (Fig. 4). Each motion pattern is learned according to Eqs. (1114); both RR-GP and Sparse-GP use data that were downsampled to 1 Hz, while Dense-GP was trained with 2-Hz data. The test data consists of 90 additional trajectories (45 left, 45 right) generated in the same manner as the training data. In testing, the algorithms received simulated measurements from the test trajectories at one-second intervals, i.e. \(dt=1\)s in Algorithm 2. For each timestep, Sparse-GP, Dense-GP, and RR-GP are run using the current state of the target vehicle for a time horizon \(T_h = 8\)s.

Fig. 4
figure 4

Training trajectories generated in the simulated road environment according to two motion patterns. The black circle (bottom) represents the target vehicle, and the arrow inside the circle represents its heading. The orange arrows point in the direction of each pattern (left and right) (Color figure online)

In the RR-GP implementation, the control timestep is \(\delta t = 0.1\) s, while the GP samples are produced at \(\Delta t = 1\)s. The limits of successful and infeasible connections per \(\Delta t\) (lines 9 and 11 of Algorithm 1) are 30 and 150, respectively. The approach of Yepes et al. (2007) is followed in calculating prediction error as the root mean square (RMS) difference between the true position \((x,y)\) and mean predicted position \((\hat{x},\hat{y})\). The mean is computed using Eq. (17) for the Sparse-GP and Dense-GP techniques, and the multi-tree probability distribution for RR-GP. Prediction errors are averaged across all 90 test trajectories at each timestep.

6.2 Simulation results

This section presents simulation results for the RR-GP algorithm which compare the prediction accuracy and computation time with both Sparse-GP and Dense-GP. Two variations of RR-GP are implemented, one with BP (line 11 of Algorithm 2) and one without.

6.2.1 Motion pattern probabilities

Figure 5 and Table 1 show the probability of identifying the correct motion pattern given the observed portion of path followed by the target vehicle, computed as a function of time. While Sparse-GP and Dense-GP only use Eq. (11) to update these probabilities, RR-GP embeds additional logic for dynamic feasibility (see Algorithm 3). Note that the probability corresponding to RR-GP without BP is the same as RR-GP with BP.

Fig. 5
figure 5

Average probability (over 90 trajectories) of the correct motion pattern for RR-GP (w/ BP), Sparse-GP (1 Hz), and Dense-GP (2 Hz) algorithms as function of time. For example, the values at \(t = 1\) s represent the probability of the correct motion patterns after the target vehicle has actually moved for 1 s on its path

Table 1 Average probability of correct motion pattern (over 90 tests)

At time \(t=0\) s, Sparse-GP and Dense-GP’s likelihoods are based on the size of the training data of each motion pattern. Since they are equal, the probability of the correct motion pattern is 0.5. On the other hand, by using collision checks and BP RR-GP is able to improve its “guess” of the correct motion pattern from 0.5 to more than 0.92. At time \(t=1\) s, using observation of the previous target position, all three algorithms improve in accuracy, though RR-GP maintains a significant advantage. After three seconds, the probability of the correct motion pattern has nearly reached 1.0 for all algorithms.

6.2.2 Prediction errors

Figure 6 shows the performance of each algorithm in terms of the RMS of the prediction error between the true value and the predicted mean position of the target vehicle. At the start of each test, when time \(t=0\) s (Fig. 6a), the four algorithms are initialized with the same likelihood values for each motion pattern. This is seen in the Sparse-GP and Dense-GP plots, which are almost identical. RR-GP (w/o BP) also has a similar performance from \(t=0\) s until \(t=4\) s because no dynamic infeasibilities or collisions with obstacles happen in this time range. However, the target first encounters the obstacle around \(t = 5\) s, at which point the prediction of RR-GP (w/o BP) improves significantly compared to the GP algorithms since the algorithm is able to detect the infeasibility of the wrong pattern, and therefore adjust the trajectory prediction. The full RR-GP algorithm, denoted as RR-GP (w/ BP) in Fig. 6, displays the best performance. Using the BP feature, its prediction accuracy shows significant improvement over that of RR-GP (w/o BP), between \(t=1\) s and \(t=5\) s. This is accomplished by back-propagating knowledge of the future dynamic infeasibility to previous timesteps, which improves the accuracy of the earlier portion of the prediction. This reduces the RMS prediction errors by a factor of 2.4 over the GP-only based algorithms at \(t=8\) s.

Fig. 6
figure 6

Average position prediction errors (over 90 trajectories) for Sparse-GP (1 Hz), Dense-GP (2 Hz), and the two variations of the RR-GP algorithm at different times of the example

After 1 s has elapsed (Fig. 6b), the vehicle has moved to a new position, and the likelihood values of each motion pattern have been updated (Fig. 5). The probability of the correct motion pattern computed using Eq. (11) has slightly increased, leading to lower errors for all three algorithms. But as in Fig. 6a, a trend is seen for the four algorithms; the performance of the RR-GP (w/ BP) algorithm is consistently and significantly better than both Sparse-GP and Dense-GP, as well as RR-GP (w/o BP) in the time range prior to the collision detection. Note that Dense-GP performs slightly better than Sparse-GP between \(t = 5\) s and \(t = 8\) s, due to a more accurate GP model obtained through additional training data.

After three seconds have elapsed (Fig. 6c), the probability of the correct motion pattern has approached 1.0 (Fig. 5), yielding decreased prediction error across all algorithms. The vehicle has moved to a region where dynamic feasibility and collision checks are not significant, due to the negligible likelihood of the wrong motion pattern prediction. Eq. (17) then simplifies to \(p(x_{t+K},y_{t+K} | x_{t}, y_{t}, b_{j^*})\), where \(j^*\) is the index of the “correct” motion pattern, and thus the prediction accuracy is only related to the position distribution of the correct motion pattern. This explains why Sparse-GP and both RR-GP variations show a very similar accuracy, since their position distributions are based on the same sparse data. On the other hand, Dense-GP, due to more dense data, is the best performer, as expected, with RMS errors 1.5 times smaller than the other algorithms at \(t = 8\) s.

Figure 7 presents box-and-whisker plots of the average difference between the prediction errors of Sparse-GP (1 Hz) and RR-GP (w/ BP), using the same data as Fig. 6. The length of the whisker (dashed black vertical line) is \(W= 1.5\), such that data are considered outliers if they are either smaller than \(Q1-W \times (Q3-Q1)\) or larger than \(Q3+W \times (Q3-Q1)\), where \(Q1 = 25{\text{ th }}\) percentile and \(Q3 = 75{\text{ th }}\) percentile.

Fig. 7
figure 7

Box plots of the difference of prediction errors (for the 90 test trajectories) between Sparse-GP (1 Hz) and the full RR-GP algorithm at different times of the example

At \(t = 0\) s, Fig. 7a shows the improvement of RR-GP (w/ BP) over Sparse-GP consistently increase, reaching a 2.5 m difference at \(t = 8\) s. The outliers in this case are the test trajectories that are dynamically feasible at \(t=0\) s. Here, either of the two motion patterns may be followed, and RR-GP does not have an advantage over Sparse-GP, leading to no error difference. But in the majority of tests, as shown by the whisker lengths and box sizes, RR-GP (w/ BP) significantly reduces prediction error.

Figure 7b presents the error difference after \(1\) s; a similar trend can be seen. The median of the difference grows with the timesteps, but its magnitude slightly decreases compared to the previous timestep, reaching 2.1 m difference at \(t=8\) s. Here the whiskers have increased in length, eliminating many of the outliers from the prior timestep. Finally, after the target vehicle has moved for \(t=3\) s (Fig. 7c), the differences between RR-GP (w/ BP) and Sparse-GP are not statistically significant.

6.2.3 Computation times

Table 2 summarizes the average computation times per iteration of the three algorithms over the \(90\) testing paths. Both RR-GP variations have identical computation times, so only one computation time is shown for RR-GP. Note that the implementation uses the GPML MATLAB toolbox (Rasmussen and Williams 2005), and these tests were run on a 2.5 GHz quad-core computer.

Table 2 Average computation times per iteration (over 90 trajectories)

As expected, Sparse-GP has the lowest computation time, while the RR-GP algorithm ranks second with times well below 1 s, which are suitable for real-time application as shown in Sect. 7.2. Out of the \(0.69\) s of RR-GP computation time, an average \(0.5\) s is spent in the RR-GP tree generation while the remaining \(0.19\) s is used to generate the GP samples. On the other hand, Dense-GP had an average computation time of 2.3 s, which is significantly worse than the other two approaches and violates the \(dt = 1\) s measurement cycle. Such times render the GP mixture model useless for any typical real-time implementation, as expected due to GP’s matrix inversions. If \(dt\) is changed to 2.3 s for the Dense-GP tests, the prediction accuracy decreases due to slower and fewer updates of the probabilities of the different motion patterns. Additional details on computation time can be found in Aoude (2011).

In summary, this simulated environment with a human-driven target vehicle shows that the RR-GP algorithm consistently performs better than Sparse-GP and Dense-GP in the long-term prediction of the target vehicle motion. It is important to highlight that RR-GP can predict trajectories at high output frequencies, significantly higher than both Sparse-GP (1 Hz) and Dense-GP (2 Hz). While the GP approach could be augmented with some form of interpolation technique, the solution approach developed in this work systematically guarantees collision avoidance and dynamic feasibility of the trajectory predictions at the higher rates. Another feature of the RR-GP algorithm is its low computation time (Table 2), which is small enough to be suitable for real-time implementation in collision warning systems or probabilistic path planners.

Finally, note that RR-GP was also validated in Aoude (2011) on intersection traffic data collected through the Cooperative Intersection Collision Avoidance System for Violations (CICAS-V) project (Maile et al. 2008). The obtained results demonstrated that RR-GP reduced prediction errors by almost a factor of 2 when compared to two standard GP-based algorithms, while maintaining computation times that are suitable for real-time implementation.

7 CC-RRT path planning with RR-GP predictions

As noted in Sect. 1, one of the main objectives of this work is to demonstrate that through appropriate choice of planner, an autonomous agent can utilize RR-GP predictions to identify and execute probabilistically feasible paths in real-time, in the presence of uncertain dynamic obstacles. This section introduces a path planning algorithm which extends the CC-RRT framework (Fig. 8) of Luders et al. (2010b); Luders and How (2011) to guarantee probabilistic robustness with respect to dynamic obstacles with uncertain motion patterns. These guarantees are obtained through direct use of RR-GP trajectory predictions (Sect. 5). As these predictions are provided in the form of Gaussian uncertainty distributions at each timestep for each intent, they are well-suited for the CC-RRT framework. After the chance constraint formulation of Blackmore et al. (2006) is reviewed, the CC-RRT framework is presented, then extended to consider dynamic obstacles with uncertain motion patterns. Finally, an alternative particle-based approximation of CC-RRT for nonlinear dynamics and/or non-Gaussian uncertainty is also presented.

Fig. 8
figure 8

Diagram of the chance constrained RRT algorithm. Given an initial state distribution at the tree root (blue) and constraints (gray), the algorithm grows a tree of state distributions to find a probabilistically feasible path to the goal (yellow star). The uncertainty in the state at each node is represented as an uncertainty ellipse. If the probability of collision is too high, the node is discarded (red); otherwise the node is kept (green) and may be used to grow future trajectories (Color figure online)

7.1 Extension of CC-RRT chance constraint formulation

Recall the LTI system Eqs. (13); for now, assume that the uncertainty of each obstacle, Eqs. (67), can be represented by a single Gaussian distribution:

$$\begin{aligned} c_{jt}&\sim \mathcal N (\hat{c}_{jt},P_{c_{jt}}) ~\forall ~j \in {\mathbb{Z }_{1,B}}, ~\forall ~t. \end{aligned}$$
(19)

In the context of RR-GP, this implies that the dynamic obstacle is following a single, known behavior, though its future state is uncertain.

Given a sequence of inputs \(u_0,\ldots ,u_{N-1}\) and the dynamics of Eq. (1), the distribution of the state \(x_t\) (represented as the random variable \(X_t\)) can be shown to be Gaussian (Blackmore et al. 2006):

$$\begin{aligned} P(X_t | u_0,\ldots ,u_{N-1})&\sim \mathcal{N }(\hat{x}_t,P_{x_t})~~\forall ~t \in {\mathbb{Z }_{0,N}}, \end{aligned}$$

where \(N\) is some timestep horizon. The mean \(\hat{x}_t\) and covariance \(P_{x_t}\) can be updated implicitly using the relations

$$\begin{aligned} \hat{x}_{t+1}&= A\hat{x}_t + Bu_t ~~\forall ~t \in { \mathbb{Z }_{0,N-1}}, \end{aligned}$$
(20)
$$\begin{aligned} P_{x_{t+1}}&= AP_{x_t}A^T + P_w ~~ \forall ~t \in {\mathbb{Z }_{0,N-1}}. \end{aligned}$$
(21)

Note that by using Eqs. (2021), CC-RRT can simulate state distributions within an RRT framework in much the same way that a nominal (e.g. disturbance-free) trajectory would be simulated. Instead of propagating the nominal state, the distribution mean is propagated via Eq. (20); Eq. (21) can be used to compute the covariance offline.

As presented in Blackmore et al. (2006), to ensure that the probability of collision with any obstacle on a given timestep does not exceed \(\Delta \equiv 1 - p_{\text{ safe }}\), it is sufficient to show that the probability of collision with each of the \(B\) obstacles at that timestep does not exceed \(\Delta /B\). The \(j\)th obstacle is represented through the conjunction of linear inequalities

$$\begin{aligned} \bigwedge _{i=1}^{n_j} a_{ij}^Tx_t < a_{ij}^Tc_{ijt} ~~\forall ~t \in {\mathbb{Z }_{0,t_f}}, \end{aligned}$$
(22)

where \(n_j\) is the number of constraints defining the \(j\)th obstacle, and \(c_{ijt}\) is a point nominally (i.e. \(c_{jt} = \hat{c}_{jt}\)) on the \(i\)th constraint at timestep \(t\); note that \(a_{ij}\) is not dependent on \(t\), since the obstacle shape and orientation are fixed.

It is shown in Blackmore et al. (2006) (for optimization-based frameworks) and Luders et al. (2010b) (for sampling-based frameworks) that to ensure the probability of constraint satisfaction exceeds \(p_{\text{ safe }}\), the system must satisfy a set of deterministic but tightened constraints for each obstacle, where the degree of tightening is a function of the degree of uncertainty, number of obstacles, and \(p_{\text{ safe }}\). These tightened constraints can be applied offline to assure probabilistic guarantees; however, this requires applying a fixed probability bound \(\Delta /B\) across all obstacles, regardless of how likely they are to cause infeasibility, leading to conservative behavior (Blackmore et al. 2006).

Alternatively, the CC-RRT algorithm leverages a key property of the RRT algorithm—trajectory-wise constraint checking—by explicitly computing a bound on the probability of collision at each node, rather than simply satisfying tightened constraints for a fixed bound (Luders et al. 2010b). In doing so, CC-RRT can compute bounds on the risk of constraint violation online, based on the most recent RR-GP trajectory prediction data.

As shown in Luders et al. (2010b), the upper bound on the probability of collision with any obstacle at timestep \(t\) is

$$\begin{aligned} \Delta _t(\hat{x}_t,P_{x_t})&\equiv \sum _{j=1}^B \min _{i = 1,\ldots ,n_j} \Delta _{ijt}(\hat{x}_t,P_{x_t}), \nonumber \\ \Delta _{ijt}(\hat{x}_t,P_{x_t})&\equiv \frac{1}{2} \left( 1 - \text{ erf } \left[ \frac{a_{ij}^T\hat{x}_t - a_{ij}^Tc_{ijt}}{\sqrt{2a_{ij}^T(P_{x_t} + P_{c_{jt}})a_{ij}}} \right] \right) , \nonumber \\ \end{aligned}$$
(23)

where \(\text{ erf }(\cdot )\) denotes the standard error function. Thus, for a node/timestep with state distribution \(\mathcal N (\hat{x}_t,P_{x_t})\) to be probabilistically feasible, it is sufficient to check that \(\Delta _t(\hat{x}_t,P_{x_t}) \le 1-p_{\text{ safe }}\).

Now, suppose the \(j\)th obstacle is one of the dynamic obstacles modeled using RR-GP (Sect. 5) and that it may follow one of \(k = 1,\ldots ,M\) possible behaviors. At each timestep \(t\), and for each behavior \(k\), the RR-GP algorithm provides a likelihood \(\delta ^k_j\) and Gaussian distribution \(\mathcal N (\hat{c}_{jt}^k,P_{c_{jt}}^k)\). Thus, the overall state distribution for this obstacle at timestep \(t\) is given by

$$\begin{aligned} c_{jt}&\sim \sum _{k=1}^M \delta ^k_j \mathcal N (\hat{c}_{jt}^k,P_{c_{jt}}^k). \end{aligned}$$
(24)

At each timestep, the probability of collision with dynamic obstacle \(j\) can be written as a weighted sum of the probabilities of collision for the dynamic obstacle \(j\) under each behavior. With this modification, it can be shown that all existing probabilistic guarantees (Luders et al. 2010b) are maintained by treating each behavior’s state distribution as a separate obstacle with the resulting risk scaled by \(\delta ^k_j\):

$$\begin{aligned} P(\text{ collision })&\le \sum _{j=1}^B P(\text{ col. } \text{ w/ } \text{ obstacle } j) \\&= \sum _{j=1}^B \sum _{k=1}^M \delta ^k_j P(\text{ col. } \text{ with } \text{ obstacle } j, \text{ behavior } k) \nonumber \\&\le \sum _{j=1}^B \sum _{k=1}^M \delta ^k_j \min _{i = 1,\ldots ,n_j} P(a_{ij}^TX_t < a_{ij}^TC_{ijt}^k) \nonumber \\&= \sum _{j=1}^B \sum _{k=1}^M \delta ^k_j \min _{i = 1,\ldots ,n_j} \Delta _{ijt}^k(\hat{x}_t,P_{x_t}),\nonumber \end{aligned}$$
(25)

where \(C_{ijt}^k\) is a random variable representing the translation of the \(j\)th obstacle under the \(k\)th behavior, and \(\Delta _{ijt}^k\) is used as in Eq. (23) for the \(k\)th behavior. By comparison with Eq. (23), the desired result is obtained.

7.2 CC-RRT with integrated RR-GP

To perform robust planning this work uses CC-RRT, an extension of the traditional RRT algorithm that allows for probabilistic constraints. Whereas the traditional RRT algorithm (LaValle 1998) grows a tree of states that are known to be feasible, the chance constrained RRT algorithm grows a tree of state distributions that are known to satisfy an upper bound on probability of collision (Fig. 8), using the formulation developed in Sect. 7.1.

The fundamental operation in the standard RRT algorithm is the incremental growth of a tree of dynamically feasible trajectories, rooted at the system’s current state \(x_t\). To grow a tree of dynamically feasible trajectories, it is necessary for the RRT to have an accurate model of the (linear) vehicle dynamics, Eq. (1), for simulation. Since the CC-RRT algorithm grows a tree of Gaussian state distributions, in this case the model is assumed to be the propagation of the state conditional mean and covariance, Eqs. (2021). These are rewritten here as

$$\begin{aligned} \hat{x}_{t+k+1|t}&= A\hat{x}_{t+k|t} + Bu_{t+k|t}, \end{aligned}$$
(26)
$$\begin{aligned} P_{t+k+1|t}&= AP_{t+k|t}A^T + P_w, \end{aligned}$$
(27)

where \(t\) is the current system timestep and \((\cdot )_{t+k|t}\) denotes the predicted value of the variable at timestep \(t+k\).

The CC-RRT tree expansion step, used to incrementally grow the tree, is given in Algorithm 4. Each time the algorithm is called, a sample state is taken from the environment (line 2), and the nodes nearest to this sample, in terms of some heuristic(s), are identified as candidates for tree expansion (line 3). An attempt is made to form a connection from the nearest node to the sample by generating a probabilistically feasible trajectory between them (lines 7–12). This trajectory is incrementally simulated by selecting some feasible input (line 8), then applying Eqs. (2627) to yield the state distribution at the next timestep. This input may be selected at the user’s discretion, such as through random sampling or a closed-loop controller, but should guide the state distribution toward the sample. Probabilistic feasibility is then checked using RR-GP trajectory predictions with Eqs. (23) and (25); trajectory simulation continues until either the state is no longer probabilistically feasible, or the distribution mean has reached the sample (line 7). Even if the latter case does not occur, it is useful and efficient to keep probabilistically feasible portions of this trajectory for future expansion (Kuwata et al. 2009), via intermediate nodes (line 10). As a result, one or more probabilistically feasible nodes may be added to the tree (lines 13–16).

figure i

A number of heuristics are also utilized to facilitate tree growth, identify probabilistically feasible trajectories to the goal, and identify “better” paths (in terms of Eq. (9)) once at least one probabilistically feasible path has been found. Samples are identified (line 2) by probabilistically choosing between a variety of global and local sampling strategies, some of which may be used to efficiently generate complex maneuvers (Kuwata et al. 2009). The nearest node selection (lines 3–4) strategically alternates between several distance metrics for sorting the nodes, including an exploration metric based on cost-to-go and a path optimization metric based on estimated total path length (Frazzoli et al. (2002)). Each time a sample is generated, \(m \ge 1\) attempts are made to connect a node to this sample before being discarded. Additional heuristics include attempting direct connections to the goal anytime a new node is added, and maintaining bounds on the cost-to-go to enable a branch-and-bound pruning scheme (Frazzoli et al. 2002).

figure j

For the real-time applications considered in this work, the CC-RRT tree should grow continuously during the execution cycle to account for changes in the situational awareness, such as updated RR-GP predictions. Algorithm 5 shows how the algorithm executes some portion of the tree while continuing to grow it. The planner updates the current path to be executed by the system every \(\Delta \tau \) seconds, using the most recent RR-GP predictions for any dynamic obstacles as they become available (line 3). During each cycle, for the duration of the timestep, the tree is repeatedly expanded using Algorithm 4 (lines 4–6). Following this growth, some cost metric is used to select the “best” path in the tree (line 7). Once a path is chosen, a “lazy check” (Kuwata et al. 2009) is performed in which the path is repropagated from the current state distribution using the same model dynamics, Eqs. (2627), and tested for probabilistic feasibility (line 8). Due to the presence of dynamic obstacles, it is crucial to re-check probabilistic feasibility at every iteration, even if the agent itself is deterministic. If this path is still probabilistically feasible, it is chosen as the current path to execute (line 10); otherwise the infeasible portion of the path is removed and the process is repeated (line 12) until a probabilistically feasible path is found. In the event that no probabilistically feasible path can be found, mitigation strategies can be implemented to maximize safety, e.g. Wu and How (2012).

7.3 PCC-RRT

In the case of nonlinear dynamics and/or non-Gaussian noise, an alternative particle-based framework (Fig. 9) can be used to statistically represent uncertainty at a resolution which can be dictated by the user (Luders and How 2011). Though the generation of particles increases the per-node complexity, the algorithm maintains the benefits of sampling-based approaches for rapid replanning. The PCC-RRT framework of Luders and How (2011) is generalizable both in the types of probabilistic feasibility that are assessed (timestep-wise and path-wise) and in the types of uncertainty that are modeled using particles. This framework can be extended to consider hybrid combinations of particle-based and distribution-based uncertainty; for example, an agent’s dynamics/process noise can be represented via particles, while interactions with dynamic obstacles are modeled using traditional Gaussian distributions. However, this may limit the ability to assess path-wise infeasibility.

Fig. 9
figure 9

Diagram of the PCC-RRT algorithm for a single, forward simulation step. Each green circle/path represents a simulated particle that terminates in a feasible state, while each red circle/path represents a simulated particle that terminates in an infeasible state (Color figure online)

Algorithm 6 presents the tree expansion step for the particle-based extension of CC-RRT, PCC-RRT. A set of \(P_{\max }\) particles are maintained at each node, each with a position \(x\) and a weight \(w\) (\(\sum w = 1\) across all particles at each timestep). There are two parameters the user can specify to indicate the degree of probabilistic constraint violation allowed: the average likelihood of feasibility at each node cannot exceed \(p_{\text{ safe }}^{\text{ node }}\), while the average likelihood of feasibility over an entire path cannot exceed \(p_{\text{ safe }}^{\text{ path }}\). The latter bound is a key advantage of this particle-based approach, as it is quite difficult to approximate analytically in real-time without introducing significant conservatism. The former likelihood is computed by summing the weights of all feasible nodes, \(\sum _p w_{t+k|t}^{(p)}\) (line 7), while the latter is computed iteratively over a path by multiplying the prior node’s path probability by the weight of existing nodes that are still feasible (line 15).

figure k

One of several resampling schemes may be used for identifying new particles as older ones become infeasible (line 8). In the uniform resampling scheme, all particles are assigned an identical weight at line 12, \(w_{t+k+1|t}^{(p)} = 1/P_{\max }\). When this is the case, every particle has an equal likelihood of being resampled. In the probabilistic resampling scheme, each particle is assigned a weight based on the likelihood of that particle actually existing; this is a function of the likelihood of each portion of the uncertainty that is sampled. This can be computed iteratively by node, as

$$\begin{aligned} w_{t+k+1|t}^{(p)}\alpha w_{t+k|t}^{(p)}\cdot P(X_{t+k+1|t} = x_{t+k+1|t}). \end{aligned}$$
(28)

This requires additional computation, especially as the complexity of the uncertainty environment increases; however, it can provide a better overall approximation of the state distribution at each timestep, for the same number of particles.

8 Results

This section presents simulation results which demonstrate the effectiveness of the RR-GP algorithm in predicting the future behavior of an unknown, dynamic vehicle, allowing the CC-RRT planner to design paths which can safely avoid it. Examples are provided for three scenarios of varying complexity, in terms of dynamics, environment, and possible behaviors. In the first two examples, planning is performed on a vehicle with linear dynamics, such that the extended theoretical framework of Sect. 7 is valid. The final example considers a vehicle with car-like dynamics, using the PCC-RRT algorithm of Sect. 7.3 to approximate path-wise feasibility. Though a single dynamic, uncertain obstacle (in these examples, a “target vehicle”) is present in each case, the approach can be extended to multiple dynamic, uncertain obstacles without further modification, and in fact will scale well under such conditions if parallelization is utilized.

8.1 Infrastructure

Algorithms 2–5 have been implemented using a multi-threaded, real-time Java application, modular with respect to all aspects of the problem definition. All simulations were run on a 2.53 GHz quad-core laptop with 3.48 GB of RAM.

The software implementation consists of three primary modules, each in a separate thread. The Vehicle thread manages the overall simulation, including all simulation objects; it is run in real-time at 10–50 Hz, and operates continuously until a collision has occurred or the vehicle has safely reached the goal. The RRT thread implements Algorithms 4 and 5, growing the CC-RRT tree while periodically sending the current best path in the tree to the Vehicle thread. Finally, the RRGP thread maintains predictions on the likelihoods and future state distributions of possible behaviors for each dynamic obstacle by incorporating Algorithm 2, embedded as a MATLAB program.

To ensure the RR-GP algorithm is tested on realistic driving behavior, the target vehicle’s motion is chosen from among a set of simulated trajectories, pre-generated for each behavior by having a human operator manually drive the vehicle in simulation. As in Sect. 6.1, the target vehicle dynamics are based on the iRobot Create platform; the simulated vehicle was driven via a wireless steering apparatus, tuned to emulate traditional, nonlinear control of an automobile. During each trial, one of these paths is randomly selected as the trajectory for the target vehicle.

8.2 Intersection scenario

Consider a ground vehicle operating in a constrained, two-dimensional environment (Fig. 10). This environment is a representative road network designed to emulate a real-world driving environment within the RAVEN testbed (How et al. 2008). The road network is \(11.2 \times 5.5\) m\(^2\) in size, and is capable of accommodating multiple intersection types.

Fig. 10
figure 10

Representative screenshots of the RR-GP and CC-RRT algorithms during trial #25 of the intersection scenario, for two different values of \(p_{\text{ safe }}\). The host vehicle’s path history and current path are in orange. The objective of the host vehicle (large orange circle) is to reach the goal position (green circle) while avoiding all static obstacles (black) and the dynamic target vehicle (magenta diamond). The blue paths indicate the paths predicted by the RR-GP algorithm for each possible behavior, including \(2-\sigma \) uncertainty ellipses; more likely paths are indicated with a brighter shade of blue. All objects are shown at true size; the gray lines are lane markings, which do not serve as constraints (Color figure online)

In this scenario, the objective of the host vehicle is to go straight through the intersection at bottom-center of Fig. 10a, reaching a goal location on the opposite side. However, to get there, the host vehicle must successfully avoid an errant (rule-violating) driver which is traveling through the intersection in the perpendicular direction, and is likely to cross the intersection at the same time as the host vehicle. There are three possible behaviors for the target vehicle as it enters the intersection: (a) left turn, (b) right turn, and (c) straight. The host vehicle is assumed to have a radius of 0.2 m, while the target vehicle has a radius of 0.14 m; both start at zero velocity.

The host vehicle is modeled as a double integrator,

$$\begin{aligned} \left[ \begin{array}{c} x_{t+1} \\ y_{t+1} \\ v^x_{t+1} \\ v^y_{t+1} \end{array} \right]&\!=\!&\left[ \begin{array}{cccc} 1 &{} 0 &{} dt &{} 0 \\ 0 &{} 1 &{} 0 &{} dt \\ 0 &{} 0 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 \end{array} \right] \left[ \begin{array}{c} x_t \\ y_t \\ v^x_t \\ v^y_t \end{array} \right] \!+\! \left[ \begin{array}{cc} \frac{dt^2}{2} &{} 0 \\ 0 &{} \frac{dt^2}{2} \\ 1 &{} 0 \\ 0 &{} 1 \end{array} \right] \left[ \begin{array}{c} u^x_t + w^x_t \\ u^y_t + w^y_t \end{array} \right] , \end{aligned}$$

where \(dt = 0.1\)s, subject to avoidance constraints \(\mathcal{X }\) (including velocity bounds) and input constraints

$$\begin{aligned} \mathcal{U } = \{ (u^x,u^y) ~|~ |u^x| \le 4, |u^y| \le 4 \}. \end{aligned}$$

To emphasize the impact of the dynamic obstacle’s uncertainty, the host vehicle’s own dynamics are assumed deterministic: \(w^x_t \equiv w^y_t \equiv 0\). Trajectories are simulated and executed in closed-loop via the controller

$$\begin{aligned} u^x_t&= -1.5(x_t - r^x_t) - 3(v^x_t - r^{v_x}_t), \\ u^y_t&= -1.5(y_t - r^y_t) - 3(v^y_t - r^{v_y}_t), \end{aligned}$$

where \((r^x_t,r^y_t)\) is the reference position and \((r^{v_x}_t,r^{v_y}_t)\) is the reference velocity; the reference \(r_t\) is moved continuously between waypoints at a fixed speed of 0.35 m/s. The speed of the target vehicle is capped at 0.4 m/s.

In Algorithms 4–5, the tree capacity is limited to 1,000 nodes, with a replan time interval (line 14, Algorithm 5) of 0.5 s. In Algorithms 1-2, \(T_h = 8\) s and \(\Delta t = 1\) s, though the vehicle dynamics are simulated at 10 Hz. The limits of successful and infeasible connections per \(\Delta t\) (lines 9 and 11, Algorithm 1) are 20 and 100, respectively. The RR-GP algorithm is called once every 1.0 s, each time giving Algorithm 2 a total of 0.3 s to grow its trees. If the time limit is reached, the algorithm is terminated without reaching the time horizon \(T_h\).

A total of 400 trials were performed, consisting of 50 trials each for eight different algorithms:

  • Naive RRT: Nominal RRT (no chance constraints) in which target vehicle is ignored; this sets a baseline for the minimum expected likelihood of safety

  • Nominal RRT: Nominal RRT in which target vehicle is treated as a static obstacle at its most recent location

  • Velocity-Avoidance RRT: Nominal RRT in which the future position of the target vehicle is predicted by propagating its current position based on its current velocity and heading

  • CC-RRT (5 cases): Algorithms 4-5 with \(p_{\text{ safe }} =\) 0.5, 0.8, 0.9, 0.99, or 0.999. Note that since \(p_{\text{ safe }}\) is a bound on feasibility at each timestep, rather than over an entire path, it does not act as a bound on the percentage of paths which safely reach the goal.

Each trial differs only in the path followed by the target vehicle and the random sampling used in the RR-GP and CC-RRT algorithms; the sequence of target vehicle paths is consistent across all sets of 50 trials. Four quantities were measured and averaged across these trials: the percentage of trials in which the vehicle safely reaches the goal; the average duration of such paths; the average time to generate an RRT/CC-RRT tree node; and the average time per execution of Algorithm 2.

Table 3 presents the averaged results over the 50 trials for each case. Note that in all five cases using CC-RRT, the host vehicle safely navigates the intersection with a much higher likelihood than any of the cases not using chance constraints. Furthermore, the CC-RRT results demonstrate the clear trade-off between overall path safety (in terms of percentage of trials which reach the goal) and average path duration when using CC-RRT. As \(p_{\text{ safe }}\) is increased from 0.5 to 0.999, the percentage of safe trajectories generally increases (the exception of \(p_{\text{ safe }} = 0.5\) is not statistically significant), culminating with the host vehicle using CC-RRT with \(p_{\text{ safe }} = 0.999\) reaching the goal safely in all fifty trials. On the other hand, as \(p_{\text{ safe }}\) is increased and the planner becomes more conservative, the average time duration of the safe trajectories increases.

Table 3 Simulation results, intersection scenario

Figure 10 sheds some light on how different values of \(p_{\text{ safe }}\) affect the types of paths chosen by the planner. In this particular trial, the target vehicle ultimately makes a left turn through the intersection, and would collide with the host vehicle if it did not deviate from an initial straight-path trajectory. The RR-GP algorithm is initially undecided whether the target vehicle is going straight or turning left (as indicated by the shading on the predicted trajectories in Fig. 10a, b); by \(t=6\) s RR-GP is very confident that the vehicle is turning left (Fig. 10c, d). When \(p_{\text{ safe }} = 0.8\), the planner selects a path with the minimum perturbation needed to avoid the target vehicle’s most likely trajectories (Fig. 10a). As the target vehicle closes in on the intersection (Fig. 10c), the host vehicle continues to hedge that it can cross the intersection safely and avoid the target vehicle’s approach in either direction, and thus does not modify its plan. In contrast, when \(p_{\text{ safe }} = 0.999\), the planner selects a larger initial perturbation to maintain the host vehicle’s distance from the target vehicle (Fig. 10b). After several RR-GP updates, the host vehicle demonstrates a much more risk-averse behavior, by loitering outside the intersection (Fig. 10d) for several seconds before making its approach. Ultimately, the host vehicle using \(p_{\text{ safe }} = 0.8\) reaches the goal (Fig. 10e) before the host vehicle using \(p_{\text{ safe }} = 0.999\) (Fig. 10f). In realistic driving scenarios, the most desirable behavior is likely somewhere between these two extremes.

Table 3 shows that with Naive RRT, by ignoring the target vehicle, the time-optimal path is almost always achieved, but a collision takes place in a majority of trials, with collisions occurring in most instances of the target vehicle going straight or left. In some instances, the Nominal RRT algorithm maintains safety by selecting an alternative trajectory when the target vehicle’s current position renders the host vehicle’s current trajectory infeasible; however, the overall likelihood of safety is still low. In many cases, the target vehicle collides with the host vehicle from the side, such that a replan is not possible. Of the Nominal RRT algorithms, the Velocity-Avoidance RRT algorithm performs the most competitively with CC-RRT, with 74 % of trials yielding a safe trajectory. Since the target vehicle always starts by driving straight toward the intersection (right to left in the figures), the host vehicle responds in nearly all trials by immediately perturbing its own path, based on the assumption that the target vehicle will go straight through the intersection. This contributes to the larger average path duration obtained by Velocity-Avoidance RRT compared to the other nominal algorithms in Table 3. However, in many instances, Velocity-Avoidance RRT is still unable to respond to rapid changes in the target vehicle’s heading, such as if the target vehicle makes a rapid, left turn inside the intersection.

Finally, note that the average time to either generate an RRT node or call RR-GP is largely independent of \(p_{\text{ safe }}\) for CC-RRT. There is a modest increase in average time per node when moving from Naive or Nominal RRT to Velocity-Avoidance RRT (scales by a factor of 1.5) or CC-RRT (scales by a factor of 2.5), though previous work has demonstrated that these factors scale well with environment complexity (Luders et al. 2010b).

8.3 Complex scenario

In this scenario, the problem complexity is increased, with more obstacles and more possible behaviors for the target vehicle (Fig. 11). This environment is the same size as the previous one, with rearranged obstacles; as the target vehicle moves from one side of the environment to the other, it may display as many as six possible behaviors, corresponding to whether each of the four obstacles is passed by the target vehicle on its left or right. Furthermore, by design the target vehicle has a higher maximum speed than the host vehicle, meaning that the host vehicle is at risk of being overtaken from behind if its path is not planned carefully.

Fig. 11
figure 11

Environment used in the complex scenario, including possible behaviors for the target vehicle (at right)

Figure 11b shows the trajectories generated by RR-GP for the six behaviors in this scenario. This demonstrates the RR-GP algorithm’s ability to model complex behaviors for long time horizons, building off available training data in the form of synthetic training trajectories (Fig. 11a).

The same double integrator dynamics and controller are used as in Sect. 8.2, but in this case the reference movement speed is increased from 0.35 to 0.6 m/s. Due to the increased number of behaviors and more complex environment, the Gaussian process formulation is more challenging; as a result, several of the parameters in Algorithms 1 and 2 have been tuned to improve performance. Here, the limits of successful and infeasible connections per \(\Delta t\) (lines 9 and 11, Algorithm 1) are 10 and 100, respectively. The RR-GP algorithm is called once every 1.25 s, each time giving Algorithm 2 up to a full second to grow trees for each behavior, with the time horizon \(T_h\) increased from 8 to 12 s. Both vehicles have a radius of 0.2 m and start at zero velocity.

The objective of this scenario is to demonstrate the ability of the CC-RRT algorithm, using RR-GP dynamic obstacle predictions, to exhibit safe driving behavior for long-duration missions. The same eight algorithms used in Sect. 8.2 are again used here; however, rather than performing 50 trials, each algorithm is used to guide the host vehicle through a continuous sequence of 50 waypoints. The host vehicle starts on the left side of the room, and each subsequent waypoint is among a set of four, located near each of the room’s corners. The host vehicle is given the next waypoint as soon as the current waypoint is reached; consecutive waypoints are required to be on opposite sides of the room, with respect to the room’s long axis. While the host vehicle completes this sequence, the target vehicle moves continuously back and forth between the left and right ends of the room, each time selecting one of the six possible behaviors and one of the four pre-generated trajectories for that behavior (Fig. 11).Footnote 3 Note that the sequences of waypoints and target vehicle behaviors are both consistent across all algorithms.

If the host vehicle collides with either the target vehicle or an element of the environment, the mission continues; however, the target vehicle is penalized for this collision by being re-set back to its last reached waypoint. This allows the trial for each algorithm to be performed as a single, continuous simulation, while also providing a figure of merit which factors in both path duration/length and collision risk.

Four quantities were measured and averaged for each algorithm: the total time required to reach all 50 waypoints, including collision time penalties; the number of collisions which take place; the average time to generate an RRT/CC-RRT tree node; and the average time per execution of Algorithm 2.

Table 4 presents the results for each algorithm. The most desirable behavior is achieved using CC-RRT with \(p_{\text{ safe }} = 0.9\), with only 2 collisions taking place over a sequence of 50 waypoints. Two aspects of this scenario tend to increase the likelihood of collision across all algorithms, such that 100 % safety becomes unreasonable for this scenario. First, when the target vehicle changes direction, the RR-GP prediction environment changes rapidly, and the host vehicle may not be able to react quickly enough if nearby. Second, the host vehicle may find itself in a corridor being pursued by the target vehicle; since the target vehicle has a larger maximum speed, a collision may become inevitable. Regardless, CC-RRT outperforms both Naive RRT and Nominal RRT for all values of \(p_{\text{ safe }}\) tested, as well as Velocity-Avoidance RRT when \(p_{\text{ safe }} = 0.9\).

Table 4 Simulation results, complex scenario

Observing the Nominal RRT results in Table 4, it is clear that neither Naive RRT nor Nominal RRT can provide a sufficient level of safety for the host vehicle, with a double-digit number of collisions occurring in each case. On the other hand, Velocity-Avoidance RRT is quite competitive with CC-RRT. Though it does not achieve the minimum number of collisions obtained by CC-RRT for \(p_{\text{ safe }} = 0.9\), Velocity-Avoidance RRT still only has 4 collisions, as well as a mission duration significantly shorter than any of the CC-RRT trials. Since the trajectories executed by the target vehicle are relatively straight, with few sharp turns (especially compared to Sect. 8.2), the forward propagation done in this case actually tends to be a good prediction, allowing the host vehicle to make a rapid response when needed.

A notable trend in these results is that as \(p_{\text{ safe }}\) increases, both the number of collisions and mission duration tend to decrease, reach minimum values at \(p_{\text{ safe }} = 0.9\), then actually increase beyond that value. As \(p_{\text{ safe }}\) is increased, the minimum required probability of feasibility at each timestep is increased, effectively reducing the set of probabilistically feasible paths that may be identified and selected by the CC-RRT algorithm. Thus, assuming the algorithm continues to identify probabilistically feasible paths satisfying the \(p_{\text{ safe }}\) requirement, the performance (in terms of percentage of safe trials) is expected to increase, with a trade-off of additional path conservatism. However, if this assumption is broken, the agent may be unable to find a path at all, and will come to a stop. This “frozen robot” behavior (Trautman and Krause 2010) actually puts the agent at added risk in a dynamic environment. This is particularly likely to occur in heavily constrained environments for large values of \(p_{\text{ safe }}\), as is the case here. Many of the collisions in this scenario are caused by the agent being unable to find any safe paths at all to execute. In such cases, it is important for the operator to tune \(p_{\text{ safe }}\) to best meet the needs of the problem being considered.

Figure 12 demonstrates how CC-RRT can exhibit complex, robust avoidance behavior for large values of \(p_{\text{ safe }}\) in order to remain risk-averse. In each trial, the agent’s first task (shown in the figure) is to move from the left side of the room to the waypoint at bottom-right; the shortest path is to move along the bottom of the room. The target vehicle’s trajectory takes it to the left of the first and last obstacles, and down the central corridor. When the first RR-GP update is performed, there is little data available to infer which way the target vehicle is going, and thus all six behaviors are equally likely. For \(p_{\text{ safe }} = 0.8\), the planner selects a complete trajectory which reaches the goal (Fig. 12a). Even though this would lead to a head-on collision for one of the behaviors, the likelihood of that behavior being active is roughly 1 in 6, an acceptable risk for \(p_{\text{ safe }} = 0.8 ~(< 5/6)\). On the other hand, when \(p_{\text{ safe }} = 0.999\), the planner is not willing to select a trajectory with crosses the target vehicle’s path for any possible behavior. Instead, it selects a partial path behind one of the central obstacles, the location which brings it closest to the goal without being in any of the target vehicle’s possible paths (Fig. 12b).

Fig. 12
figure 12

Representative screenshots of the RR-GP and CC-RRT algorithms as the host vehicle approaches the first waypoint in the complex scenario, for two different values of \(p_{\text{ safe }}\)

As the mission progresses, the target vehicle’s path is revealed to go through the central corridor. For \(p_{\text{ safe }} = 0.8\), this was not the behavior that risked a head-on collision, so it continues on its initial trajectory (Fig. 12c, e), reaching the goal state quickly. On the other hand, when \(p_{\text{ safe }} = 0.999\), the agent holds its position behind the obstacle until the target vehicle has passed through the central corridor (Fig. 12d); once the target vehicle has passed by, the agent identifies a new trajectory which reaches the goal (Fig. 12f), though it arrives at the goal significantly later than if a lower value of \(p_{\text{ safe }}\) were used.

The average runtime needed to generate a tree node using CC-RRT is larger than the average runtime for Nominal RRT, by a factor which is larger than the one observed in Sect. 8.2. Nonetheless, the runtime per node averages only 5 ms, meaning many hundreds of nodes can be generated every second. Coupled with the fact that the RR-GP update averages a fraction of a second (about 0.65 s), the CC-RRT algorithm with RR-GP is still amenable to real-time implementation for this more complex example. To better demonstrate the operation of this example, a video showing a representative simulation is available at http://acl.mit.edu/rrgp.mp4.

Table 5 Simulation results, nonlinear dynamics scenario

8.4 Nonlinear dynamics example

In this final example, the same environment and behaviors are used as in the previous example (Fig. 11); however, the host vehicle is now modeled as nonlinear car dynamics

$$\begin{aligned} x_{t+1}&= x_t + (dt) v \cos \theta _t + w_t^x, \\ y_{t+1}&= y_t + (dt) v \sin \theta _t + w_t^y, \\ \theta _{t+1}&= \theta _t + (dt) \frac{v}{L_w} \tan \delta _t + w_t^{\theta }, \end{aligned}$$

where \(v = 0.4\) m/s, \(dt = 0.1\)s, \((x,y)\) is the vehicle position, \(\theta \) is the heading, \(L_w = 0.2\) m, and \(\delta _t \in [-\pi /4,+\pi /4]\) is the steering angle input. As in Sect. 5, a PP low-level controller is applied for steering control. In each trial, the agent’s objective is to cross the environment from top-left to bottom-right, while avoiding the dynamic obstacle moving from right to left (as in the previous example).

However, the agent is subject to a non-Gaussian uncertainty: the steering control is subject to an unknown, fixed bias

$$\begin{aligned} \delta _t = \bar{\delta }_t + \widehat{\delta }_t, \end{aligned}$$

where \(\bar{\delta }_t\) is the control input and \(\widehat{\delta }_t\) is a fixed, unknown offset uniformly sampled on the interval \([-\pi /10,+\pi /10]\). This bias does not change over the course of the mission; however, the agent does not receive observations of its own state, and thus cannot ascertain the value of \(\widehat{\delta }_t\). By using closed-loop RRT, this poor mapping will only result in a bounded error (Luders et al. 2010a), making it still possible to control safely through the environment. Nonetheless, PCC-RRT is appropriate here to address the non-Gaussian uncertainty, as well as the nonlinear dynamics.

The PCC-RRT algorithm, given in Algorithm 6, is structured in this case by generating 20 particles for each trajectory state. Each particle samples on a joint distribution for both \(\widehat{\delta }_t\) and the dynamic obstacle placement, which is itself distributed according to the likelihoods generated by the RR-GP prediction. The dynamic obstacle state is assumed to be correlated along each trajectory, modeled as a fixed white noise sample that is transformed relative to each state distribution. A path is deemed probabilistically feasible if \(p_{\text{ safe }}^{\text{ path }} \ge 0.8\) is satisfied.

In this scenario, 24 trials are performed (one for each trajectory in the training data); the same algorithms are compared, but this time only using \(p_{\text{ safe }}^{\text{ path }} = 0.8\) for comparison. Three quantities were measured and averaged across these trials: the percentage of trials in which the vehicle safely reaches the goal; the average duration of such paths; and the average time to generate an RRT/PCC-RRT tree node. Table 5 shows the results for the trials performed. Not only does PCC-RRT successfully guide the uncertain agent to the goal in more trials than the Nominal RRT algorithms, but it also exceeds the desired probabilistic bound of 0.8 across the path. As expected, though, there is a significant trade-off in per-node computational complexity.

Figure 13 shows a typical execution of the PCC-RRT algorithm for this example, as well as a demonstration of typical CC-RRT trees generated in this work (RR-GP output is suppressed for clarity), showing the nonlinearity of the dynamics. The agent initially plans a path above all obstacles to the goal (Fig. 13a); however, this path becomes infeasible as the planner assesses that the likely trajectory of the dynamic obstacle (magenta) will cause a sufficient number of particles to become infeasible (Fig. 13b). On the subsequent step, the planner selects an alternative, incomplete path which successfully avoids the obstacle (Fig. 13c). Within several seconds, a new path to the goal has been found (Fig. 13d).

Fig. 13
figure 13

Representative screenshots of the PCC-RRT algorithm for \(p_{\text{ safe }}^{\text{ path }} = 0.8\) for the nonlinear dynamics scenario. Agent’s PCC-RRT tree is shown in green; the executed path is shown in orange (RR-GP output is suppressed for clarity) (Color figure online)

9 Conclusion

This paper has developed a real-time path planning framework which allows autonomous agents to safely navigate environments while avoiding dynamic obstacles with uncertain motion patterns. A key contribution of the algorithm is the RR-GP learned motion model, which efficiently identifies predicted trajectories for a dynamic obstacle with multiple behaviors by combining Gaussian processes with a sampling-based reachability computation. As demonstrated, this motion model is capable of developing motion predictions conditioned on dynamic feasibility with runtimes suitable for real-time operation. Further, by integrating these RR-GP predictions within an appropriately modified CC-RRT planning framework, an autonomous agent can identify probabilistically safe trajectories in the presence of these dynamic obstacles. Real-time simulation results have demonstrated the effectiveness of the integrated approach in improving overall vehicle safety for a variety of dynamics, environments, and behaviors.

Future work will focus on ways to potentially increase the complexity of the GP modeling while maintaining real-time suitability. By increasing the GP model complexity, a wide variety of potentially relevant behaviors can be represented, such as correlated position GPs, dynamic obstacles with more complex dynamics, and interaction terms between the agent and dynamic obstacles. However, any such increase in the model complexity can significantly affect real-time performance, requiring a careful trade-off and continued improvements to algorithmic performance. Other future work includes demonstration of RRT\(^\star \) integration (Karaman and Frazzoli 2009) and simultaneous interaction with multiple dynamic obstacles.