Keywords

1 Introduction

Guidance and control problems of Unmanned Aerial Vehicles (UAVs) have been widely studied in recent years as the use of UAVs in various fields has increased. One of these problems is the trajectory control problem, which concerns the controller that directs a vehicle to follow a predetermined path. This controller is typically designed by one of two approaches. One is the trajectory-tracking approach, which includes time constraits, and the other is the path-following approach that does not have time constraints [1].

There have been many studies on path-following using various approaches and techniques. Some studies have used the Lyapunov stability condition based on the Lyapunov theory for the convergence of the controller [2]. Other techniques for controlling nonlinear systems have been variously used to solve this problem, including backstepping [3], feedback linearization [4], and sliding mode control [5].

An algorithm that can handle the path-following problem as an optimization problem is Model Predictive Control (MPC). In the MPC framework, a finite horizon optimal control problem can be solved by repeating computation at each sampling instant within the time horizon [6]. This optimization approach can deal more conveniently with the various constraints on states and control inputs than other nonlinear system control techniques. In [7], they addressed a path-following problem in the presence of wind disturbances with the MPC approach.

Many MPC approaches may be intractable to solve optimization problems with complex desired costs and constraints, and require an optimization solver to deal with them. The Model Predictive Path Integral (MPPI) applies the path-integral control theory to solve the optimal control problem [8]. It obtains the optimal control sequence using model predictive trajectory samples without an optimization solver, and it is also comparably tractable and capable of handling the complex costs, constraints, and dynamics. In [9], they addressed an optimal control problem with MPPI in an aggressive driving task.

This paper presents a path-following control algorithm based on the model predictive path integral control for autonomous vehicles. The iterative path integral uses the importance sampling method in the model predictive control to provide some acceleration commands to the vehicle, to track a virtual target on the desired path, and achieve the optimal trajectory under some practical constraints. This approach allows us to efficiently solve the nonlinear control problem under complex costs and constraints without intractable convexification or linearization. We implemented the Graphics Processing Unit (GPU) algorithm to show that this algorithm can rapidly compute this problem. We tested the proposed algorithm on various paths and under wind disturbance using a nonlinear disturbance observer, that allowed us to predict the model more correctly in an uncertain environment. The simulation results showed that the algorithm was effective and applicable to the path-following guidance for various paths under disturbances.

The study of MPPI control for the path-following problem of UAVs is still in its infancy compared to other approaches, such as nonlinear control techniques, and convex optimization-based methods. This paper presents one approach to address the problem using the MPPI control framework. To be clear, this study is not intended to replace the existing approaches mentioned above, but to extend a methodology that works using an attractive approach. In particular, aerial vehicles are vulnerable to disturbances such as wind, which can affect its stability, guidance and control methods. Therefore, in this study, disturbance estimation and model prediction with it were added to the existing MPPI control framework.

2 Problem Formulation

To formulate the path-following problem with a virtual target, it is necessary to define a UAV dynamics model and a look-ahead virtual target on the desired path.

2.1 UAV Dynamics Model

In this study, we consider a two-dimensional planar kinematic model to describe UAV motion as follows,

$$ \dot{x}_{R} = V\,{\text{cos}}\left( \psi \right) + W_{{x_{R} }} $$
(1)
$$ \dot{y}_{R} = V\,{\text{sin}}\left( \psi \right) + W_{{y_{R} }} $$
(2)
$$ \dot{\psi } = \frac{{a_{{y_{B} }} }}{V} $$
(3)

where the subscripts R and B denote the inertial reference frame and the body frame, respectively. \(\left( {x_{R} , y_{R} } \right)\) stands for the inertial position, V is the airspeed, \(\uppsi \) represents the heading angle, \(\left( {W_{{x_{R} }} ,W_{{y_{R} }} } \right)\) denotes the wind disturbance, \(a_{{y_{B} }}\) represents lateral acceleration.

2.2 Virtual Target

It is assumed in this problem that a virtual target moves along the desired path away from a UAV by a look-ahead distance. The desired path is defined as lines connecting waypoints. The virtual target is located on the first waypoint at the beginning of the guidance phase. Then, the first desired path is the line between waypoints 1 and 2, as shown in Fig. 1, where, \(R_{la}\) is the look-ahead distance, \(\uplambda \) is the line-of-sight (LOS) angle. If the virtual target arrives at waypoint 2, the desired path changes the line between waypoints 2 and 3. This process repeats until the virtual target arrives at the final waypoint (Fig. 1).

Fig. 1
A diagram of the desired path has 3 way points, W P 1, W P 2, and W P 3. A vector below is labeled, U A V. The two vector components are, a subscript y B and V. Between the two lines is a dashed line labeled R subscript l a, that goes to a point, labeled Virtual target, in the path heading to W P 3.

Virtual target and relative kinematics

Fig. 2
A right angled diagram with a point labelled U A V and resolved to components, a subscript y B and V. Heading angle is marked from the dashed line to component V.

UAV dynamic model

3 MPPI Control

The MPPI control framework algorithm has a favorable feature: it is convenient to implement using model predictive trajectory samples with the stochastic control approach. Furthermore, it can deal with complex desired costs and constraints extensively. In other words, it just requires many sample trajectories based on Monte-Carlo simulation without other intractable tasks like obtaining derivatives, linearization, convexification.

This approach can be explained as the optimal control problem for a stochastic control and noise affine system. The optimal control problem can be expressed using the stochastic Hamilton-Jacobi-Bellman (HJB) equation, a type of Partial Differential Equation (PDE). By introducing the exponential transform with an assumption [8], the PDE can be linearized. The transformed linearized PDE is the linear Chapman-Kolmogorov equation, and one can apply the Feynman-Kac lemma to transform the equation into a path integral that takes the form of an expectation over trajectories. So, the path integral form of the optimal control has the expectation terms. It is possible to use the empirical expectation over thousands of sample trajectories with Monte-Carlo simulation. This is a brief explanation of the MPPI control method, and a more detailed explanation can be found in [8].

Let us consider a stochastic dynamic system.

$$ d{\boldsymbol{x}}_{{\boldsymbol{t}}} = f\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} ,{\boldsymbol{u}}_{{\boldsymbol{t}}} + \delta {\boldsymbol{u}}_{{\boldsymbol{t}}} } \right) $$
(4)

where \(x_{t} \in {\mathbb{R}}^{n}\) denotes the state vector, \(u_{t} \in {\mathbb{R}}^{m}\) denotes the control input at time t, and \(\delta u_{t} \sim N\left( {0, \,\Sigma } \right)\) is a Gaussian distributed noise vector with a zero mean and variance \(\Sigma \). We can consider the stochastic optimal control problem that minimizes the following objective.

$$ J = \mathop {\min }\limits_{{\boldsymbol{u}}} \,E\left[ {\phi \left( {{\boldsymbol{x}}_{{\boldsymbol{T}}} } \right) + \sum\nolimits_{t = 0}^{T - 1} {\left( {q\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} } \right) + \frac{1}{2}{\boldsymbol{u}}_{{\boldsymbol{t}}}^{{\boldsymbol{T}}} {\boldsymbol{Ru}}_{{\boldsymbol{t}}} } \right)} } \right] $$
(5)

where the subscription T denotes the final time, \(\upphi \) is terminal cost, and q is the state-dependent running cost, \({\boldsymbol{R}} \in {\mathbb{R}}^{m \times m}\) is a positive definite matrix. Based on the MPPI algorithm [8, 9], we can determine the path integral form of the iterative optimal control as

$$ {\boldsymbol{u}}_{{{\boldsymbol{t}}_{{\boldsymbol{i}}} }}^{*} \approx {\boldsymbol{u}}_{{{\boldsymbol{t}}_{{\boldsymbol{i}}} }} + \frac{{\mathop \sum \nolimits_{{{\boldsymbol{k}} = 1}}^{{\boldsymbol{K}}} {\mathbf{exp}}\left( { - \frac{1}{{\boldsymbol{\lambda}}}\tilde{\boldsymbol{S}}({\boldsymbol{\tau}}_{{{\boldsymbol{i}},{\boldsymbol{k}}}} } \right)\boldsymbol{\delta u}_{{{\boldsymbol{i}},{\boldsymbol{k}}}} }}{{\mathop \sum \nolimits_{{{\boldsymbol{k}} = 1}}^{{\boldsymbol{K}}} {\mathbf{exp}}\left( { - \frac{1}{{\boldsymbol{\lambda}}}\tilde{\boldsymbol{S}}({\boldsymbol{\tau}}_{{{\boldsymbol{i}},{\boldsymbol{k}}}} } \right)}} $$
(6)
$$ \tilde{\boldsymbol{S}}\left( {\boldsymbol{\tau}} \right) = { }\phi \left( {{\boldsymbol{x}}_{{\boldsymbol{T}}} } \right) + \sum\nolimits_{{{\boldsymbol{t}} = 0}}^{{{\boldsymbol{T}} - 1}} {\tilde{q}\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} ,{\boldsymbol{u}}_{{\boldsymbol{t}}} ,\boldsymbol{\delta u}_{{\boldsymbol{t}}} } \right)} $$
(7)
$$ \tilde{q}\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} ,{\boldsymbol{u}}_{{\boldsymbol{t}}} ,\boldsymbol{\delta u}_{{\boldsymbol{t}}} } \right) = { }q\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} } \right) + \frac{1}{2}{\boldsymbol{u}}_{{\boldsymbol{t}}}^{{\boldsymbol{T}}} {\boldsymbol{Ru}}_{{\boldsymbol{t}}} + \frac{{1 - {\boldsymbol{\upsilon}}^{ - 1} }}{2}\delta {\boldsymbol{u}}_{{\boldsymbol{t}}}^{{\boldsymbol{T}}} {\boldsymbol{R}}\delta {\boldsymbol{u}}_{{\boldsymbol{t}}} + {\boldsymbol{u}}_{{\boldsymbol{t}}}^{{\boldsymbol{T}}} {\boldsymbol{R}}\delta {\boldsymbol{u}}_{{\boldsymbol{t}}} $$
(8)

where K is the number of random samples, \(\tilde{\boldsymbol{S}}\left( {\boldsymbol{\tau}} \right)\) is the modified cost to go, and \(\tilde{q}\left( {{\boldsymbol{x}}_{{\boldsymbol{t}}} ,{\boldsymbol{u}}_{{\boldsymbol{t}}} ,\boldsymbol{\delta u}_{{\boldsymbol{t}}} } \right)\) is the modified running cost. Hyper-parameter \(\uplambda \in {\mathbb{R}}^{ + }\) is the temperature, and \(\upupsilon \in {\mathbb{R}}^{ + }\) is the exploration variance. These modified terms are derived by following some simplifications as in [9]. The MPPI control algorithm is given in Algorithm 1.

An algorithm is named Model predictive path integral control. It reads as follows, Given, K, number of samples, N, number of timesteps, Initial control sequence, System sampling dynamics and Cost parameters, while task not completed do, d left arrow disturbance estimation for k left arrow, 0 to k minus 1 do, for t left arrow 0 to N minus 1 do. Each step has equations and expressions. Ends with Update the current state and Check for target completion.

4 Simulation Results

In this section, we present the results of various case studies. The first case study is to demonstrate the optimality of the MPPI controller. We set an MPPI controller whose cost function was the same as the objective of the pursuit guidance with optimal error dynamics. We then compared it with the pursuit guidance. The second case study was conducted to show the convenience of the MPPI controller when dealing with a complex cost function. To this end, we designed the cost function as desired. The final case study was performed to investigate the performance of the MPPI controller under wind disturbances, which make the model prediction inaccurate. We then tested the proposed method under the same environment using a nonlinear disturbance observer that helped to predict the model uncertainty more precisely. In the simulation studies, we implemented this algorithm using Python 3.0 with the pycuda library. The computer specifications were Intel(R) Core(TM) i7-10,700 CPU, 32 GB RAM, NVIDIA GeForce RTX 3090 GPU. The simulation condition was the 3DOF UAV simulation with the MPPI controller, K = 4096, N = 100 for 10 s in simulation time with the time step 0.01 s. In the condition, the recorded computation time was 25 s. If we changed the MPPI parameters to K = 1024, N = 20 and the acceleration update time step to 0.05 s while the whole simulation was still running on the 0.01 s time step, the recorded computation time was 5 s and the simulation result was almost the same.

4.1 Comparison of an MPPI Controller and Pursuit Guidance

In this subsection, the proposed MPPI controller is compared with the pursuit guidance to show the optimality of the MPPI controller. The pursuit guidance with optimal error dynamics is written as:

$$ a_{{y_{B} }} = - \frac{kV}{{t_{go} }}\left( {\sigma - \psi } \right) $$
(9)

where k is the guidance gain, V is the airspeed of the UAV, \(\sigma\) is the line-of-sight angle, \(\psi\) is the heading angle, and \(t_{go}\) is the time-to-go, which is calculated as \(t_{go} = \frac{{R_{la} }}{V}\).

The desired heading angle error dynamics of the pursuit guidance law is given by:

$$ \psi_{e} \equiv \psi - \psi_{d} ,{ }\dot{\psi }_{e} + \frac{k}{{t_{go} }}\psi_{e} = 0 $$
(10)

The objective that pursues the minimum control effort by the optimal error dynamics is described [10]:

$$ \mathop {\min }\limits_{u} \;J = \frac{1}{2}\int_{t}^{{t_{f} }} {\frac{1}{{t_{go}^{k - 1} }}u^{2} \left( \tau \right)d\tau } $$
(11)

where \(u = a_{{y_{B} }}\) and \({ }t_{go} = t_{f} - t\). We compare the simulation results between the proposed MPPI control-based guidance and the pursuit guidance. The simulation setting and the sum of costs are provided in Table 1 and Table 2, respectively.

Table 1 Parameter setting of MPPI controllers and pursuit guidance
Table 2 Sum of costs of MPPI controllers and pursuit guidance

The cost of the MPPI controller in this section is the same as the pursuit guidance objective:

$$ {\text{Cost}} = \frac{1}{{t_{go}^{k - 1} }}u^{2} \left( \tau \right) = \frac{1}{{t_{go} }}\left( {\frac{kV}{{t_{go} }}\left( {\sigma - \psi } \right)} \right)^{2} $$
(12)

The difference between MPPI controller #1 and MPPI controller #2 is the hyper-parameter \({\uplambda }\) in Table 1. This is called temperature, which is a parameter similar to a Boltzmann distribution or softmax function. It makes lower-cost control input more weighted. If the \({\uplambda }\) is small, then the low-cost control input becomes more important. Moreover, its scale is also dependent on the scale of cost, N, K. We just compared the two MPPI controllers whose \({\uplambda }\) were 1000 and 10,000, respectively.

In Table 2, MPPI controller #1 has the lowest sum of costs. It shows that MPPI controller #1 is better than the pursuit guidance in this case. This is because \(t_{go} = \frac{{R_{la} }}{V}\) in (9) is actually different than \(t_{go} = t_{f} - t\) in (11) so that the control sequence using the pursuit guidance is not optimal for the whole trajectory. Thus, MPPI controller #1’s control input is closer to the optimal control than this pursuit guidance. However, MPPI controller #2 was worse than the pursuit guidance. This is because \({\uplambda }\) is too big, so that the optimality of the MPPI control is not as good, and it considers all other high-cost trajectories more equivalently.

The performance of the MPPI controller was better with a trapezoidal shaped path than the line shape path. This is because the trapezoidal path’s target changes its path. The MPPI controllers can predict the target’s change, and the change is taken into account by the controllers. However, persuit guidance can not predict that. It makes this different cost-sum between the MPPI controllers and the pursuit guidance especially in trapezoidal shaped path (Figs. 3, 4).

Fig. 3
Four graphs. Left graph, a trajectory from 0 to 50 miles North at 50 miles East is the same for 4 curves, U A V 1, 2, 3 and virtual target. On the right, A cost of 2000 at 0 seconds, drops to 0 and remains flat from 0 up to 6 seconds for the 3 curves, U A V 1, 2 and 3. Lower left, 4 curves from 45 LOS degrees to 65 at 0.5 seconds is the peak. Right graph, acceleration in meters per squared seconds peaks at 0 seconds and remains flat between 1 and 6 seconds for the 3 curves. Values are estimated.

Line shape path simulation results, \({\text{ Cost}} = \frac{1}{{t_{go} }}\left( {\frac{kV}{{t_{go} }}\left( {\sigma - \psi } \right)} \right)^{2}\)

Fig. 4
Four graphs. Top left, a trajectory from (0, 0), to (20, 0) to (30, 20) to (0,0) miles North versus East is the same for 4 curves. Right, U A V 3 Pursuit has the highest cost at 4,000 in 4 seconds. On the bottom left a vertical drop for 3 curves from 175 to minus 175 degrees in 3.8 seconds, and right, acceleration peaks at 120 meters per second square in 3.5 seconds for U A V 1.

Trapezoid shape path simulation results, \({\text{ Cost}} = \frac{1}{{t_{go} }}\left( {\frac{kV}{{t_{go} }}\left( {\sigma - \psi } \right)} \right)^{2}\)

4.2 Complex Cost Function

In this subsection, we test how convenient the MPPI controller is to use when dealing with a complex cost function. The cost function is described:

$$ {\text{Cost}} = {\text{distance }}\;{\text{to}}\;{\text{ path}} + 0.1\left( {{\text{distance }}\;{\text{to }}\;{\text{target}}} \right) + \left| {\psi_{e} } \right| $$
(13)

In this case study, the scale of \({\uplambda }\) is smaller than in the previous case. This is because this case’s cost is smaller. In Table 4, the MPPI controllers have lower cost-sums than the pursuit guidance because while the MPPI controllers are designed to minimize the cost function (13), the pursuit guidance is not. In Fig. 5, the UAV under MPPI controller #1 is closer to the path at locations #1, #2, #4, but not at location #3. This is because the magnitude of the heading angle error increases and becomes dominant in the cost at location #3 due to the drastic change in path angle, so the MPPI controller #1 rotates the UAV in advance even though this increases the distance-to-path cost. This means that the MPPI controller provides good performance for the cost (13) (Table 3).

Fig. 5
A graph depicts trajectory curves for U A V 1 M P P I, lambda equals 1, at 0 to 20 miles North at 0 to 18 miles East, U A V 2 M P P I, lambda equals 10 at 0 to 20 miles north at 0 to 17 miles east, and U A V 3 M P P I, pursuit guide at 0 to 21 at 0 to 17.5. Values are estimated.

Trapezoid shape path simulation results, \({\text{Cost}} = {\text{distance }}\;{\text{to }}\;{\text{path}} + 0.1\left( {{\text{distance }}\;{\text{to }}\;{\text{target}}} \right) + \left| {\psi_{e} } \right|\)

Table 3 Parameter setting of the complex cost function case study
Table 4 Sum of costs of the complex cost function case study

4.3 Simulation with Wind Disturbances

We tested the performance of an MPPI controller in the presence of slowly varying wind disturbances for two cases: when it can or cannot estimate the disturbance using the Nonlinear Disturbance Observer (NDO). There were three cases for the experiment. The first case was an MPPI controller without the NDO, which did not estimate wind disturbances. The second case was an MPPI controller with the NDO, which did estimate wind disturbances. The last case was an MPPI controller without wind disturbances for comparison with the other cases.

The nonlinear disturbance observer model is described [11]:

$$ \begin{array}{*{20}c} {\begin{array}{*{20}c} {\dot{z}_{x} = - l_{x} z_{x} - l_{x} \left( {l_{x} x + Vcos\left( \psi \right)} \right)} \\ {\hat{d}_{x} = z_{x} + l_{x} x} \\ \end{array} } \\ {\begin{array}{*{20}c} {\dot{z}_{y} = - l_{y} z_{y} - l_{y} \left( {l_{y} y + Vsin\left( \psi \right)} \right)} \\ {\hat{d}_{y} = z_{y} + l_{y} y} \\ \end{array} } \\ \end{array} $$
(14)

where l is a constant observer gain, z is an internal state variable, \(\hat{d}_{x}\) and \(\hat{d}_{y}\) are the estimated disturbances, and (x, y) is state variables. In this test, we set \(l_{x} = l_{y} = 20\). We set the same MPPI controller for all three cases. The details are summarized in Table 5. Assuming that the wind disturbances were slowly varying, \(v_{wx} = - 4 + \sin \left( {0.2\uppi \,{\text{t}}} \right), \,v_{wy} = 0\) are chosen.

Table 5 Parameter setting of wind disturbance case study

The simulation results are presented in Table 6. It is the worst case when with wind disturbances and without NDO. This is a natural result because the MPPI controller’s prediction is incorrect due to the unpredictable wind disturbances in the trajectory samples. The cost-sum of the case with wind disturbances and NDO is higher than the case without wind disturbances. This is because the cost function of the MPPI controller includes the magnitude of the heading angle error. A gap exists between the azimuth attitude (i.e., heading direction) and the flight path angle (i.e., velocity direction) due to wind disturbances. The heading error cost continues because the flight path angle follows the path, but the heading is different than the flight path angle in this case with NDO. The Fig. 6 shows that the case with wind disturbance using NDO has the higher cost than the case without wind disturbance even though both cases follow the path well similarly.

Table 6 Sum of costs of wind disturbance case study
Fig. 6
Two graphs. The left depicts the same trajectory trend from 0 to 50 miles North at a range of 0 to 50 miles East for all 4 curves. The graph at right depicts the cost trend for U A V 3 M P P I without wind at a peak from 1.3 to 2.3, then a drop to 0.50 at 1 second. Values are approximated.

Line shape path simulation results with wind disturbance

In Fig. 7, the trajectory of the case without NDO is twisted because wind disturbances affected the UAV’s trajectory, and then the virtual target did not arrive at the third waypoint at once, so it moved to return toward the waypoint. However, the trajectory of the case with NDO is similar to the case without wind disturbances. The MPPI controller can have a good performance with the correct disturbance estimation (Figs. 7, 8).

Fig. 7
Two graphs. A graph at the left depicts the trajectory curve for U A V 1 M P P I without N D O at 0 to18 miles north to 16 miles East, drops to 12.5, loops back to 16, then drops to 0.25 at the range 0 to 4 miles east. A graph at the right depicts, the cost curve for U A V 1 M P P I without N D O at 11 at 5.7 seconds. Values are estimated.

Trapezoid shape path simulation results with wind disturbance

Fig. 8
2 graphs on the left depict the same trend. Both the curves have a vertical drop from 0 to minus 4 wind distribution, for Wind North, and a vertical rise from 0 to 0.28 miles per second for Wind East. 2 graphs on the right depict the same trend. Both the curves have a drop from 0 to 4 for Wind North, and 0.04 to minus 0.3 for Wind East. Values are estimated.

Wind disturbance estimation by the nonlinear disturbance observer

5 Conclusion

In this paper, we proposed a path-following guidance using the Model Predictive Path Integral control to extend a methodology that works using an attractive approach. We tested the algorithm on various paths and under wind disturbance with a nonlinear disturbance observer, which allowed us to predict the model more accurately in an uncertain environment. We provided a comparison between the pursuit guidance and the MPPI controller, whose cost was the same as the pursuit guidance in our problem. The MPPI controller’s performance was better than the pursuit guidance due to the property of the model predictive control. We conducted a simple test of the complex cost function using the MPPI controller. It can deal easily with complex desired costs and constraints just by using the sample trajectories. We also considered wind disturbances in our experiment and tested MPPI controller using the disturbance estimation by a nonlinear disturbance observer under slowly varying wind disturbances.