Keywords

Introduction

This article concerns optimal control problems governed by nonlinear ordinary differential equations of the form

$$\displaystyle{ \dot{x}(t) = f(x(t),u(t)) }$$
(1)

with \(f : \mathbb{R} \times \mathbb{R}^{n} \times \mathbb{R}^{m} \rightarrow \mathbb{R}^{n}\). We assume that for each initial value \(x \in \mathbb{R}^{n}\) and measurable control function \(u(\cdot ) \in L^{\infty }(\mathbb{R}, \mathbb{R}^{m})\) there exists a unique solution x(t) = x(t, x, u(⋅ )) of (1) satisfying x(0, x, u(⋅ )) = x.

Given a state constraint set \(X \subseteq \mathbb{R}^{n}\) and a control constraint set \(U \subseteq \mathbb{R}^{m}\), a running cost \(g : X \times U \rightarrow \mathbb{R}\), a terminal cost F : XU, and a discount rate δ ≥ 0, we consider the optimal control problem

$$\displaystyle{ \mathop{\mbox{ minimize}}_{u(\cdot )\in \mathcal{U}^{T}(x)}\;J^{T}(x,u(\cdot )) }$$
(2)

where

$$\displaystyle{ \begin{array}{rcl} J^{T}(x,u(\cdot ))& :=&\int _{0}^{T}e^{-\delta s}g(x(s,x,u(\cdot )),u(s))ds \\ & &\;\; +\;\; e^{-\delta T}F(x(T,x,u(\cdot ))) \end{array} }$$
(3)

and

$$\displaystyle\begin{array}{rcl} & & \mathcal{U}^{T}(x) := \\ & & \left \{u(\cdot ) \in L^{\infty }(\mathbb{R},U)\,\left \vert \,\begin{array}{l} x(s,x,u(\cdot )) \in X \\ \mbox{ for all }s \in [0,T] \end{array} \right.\right \}{}\end{array}$$
(4)

In addition to this finite horizon optimal control problem, we also consider the infinite horizon problem in which T is replaced by “,” i.e.,

$$\displaystyle{ \mathop{\mbox{ minimize}}_{u(\cdot )\in \mathcal{U}^{\infty }(x)}\;J^{\infty }(x,u(\cdot )) }$$
(5)

where

$$\displaystyle{ J^{\infty }(x,u(\cdot )) :=\int _{ 0}^{\infty }e^{-\delta s}g(x(s,x,u(\cdot )),u(s))ds }$$
(6)

and

$$\displaystyle\begin{array}{rcl} & & \mathcal{U}^{\infty }(x) := \\ & & \left \{u(\cdot ) \in L^{\infty }(\mathbb{R},U)\left \vert \begin{array}{l} x(s,x,u(\cdot )) \in X \\ \mbox{ for all }s \geq 0 \end{array} \right.\right \},{}\end{array}$$
(7)

respectively.

The term “solving” (2)–(4) or (5)–(7) can have various meanings. First, the optimal value functions

$$\displaystyle{V ^{T}(x) =\inf _{ u(\cdot )\in \mathcal{U}^{T}(x)}\;J^{T}(x,u(\cdot ))}$$

or

$$\displaystyle{V ^{\infty }(x) =\inf _{ u(\cdot )\in \mathcal{U}^{\infty }(x)}\;J^{\infty }(x,u(\cdot ))}$$

may be of interest. Second, and often more importantly, one would like to know the optimal control policy. This can be expressed in open-loop form \(u^{\star } : \mathbb{R} \rightarrow U\), in which the function \(u^{\star }\) depends on the initial value x and on the initial time which we set to 0 here. Alternatively, the optimal control can be computed in state- and time-dependent closed-loop form, in which a feedback law \(\mu ^{\star } : \mathbb{R} \times X \rightarrow U\) is sought. Via \(u^{\star }(t) =\mu ^{\star }(t,x(t))\), this feedback law can then be used in order to generate the time-dependent optimal control function for all possible initial values. Since the feedback law is evaluated along the trajectory, it is able to react to perturbations and uncertainties which may make x(t) deviate from the predicted path. Finally, knowing \(u^{\star }\) or \(\mu ^{\star }\), one can reconstruct the corresponding optimal trajectory by solving

$$\displaystyle\begin{array}{rcl} & & \dot{x}(t) = f(x(t),u^{\star }(t))\quad \mbox{ or} {}\\ & & \dot{x}(t) = f(x(t),\mu ^{\star }(t,x(t))). {}\\ \end{array}$$

Hamilton-Jacobi-Bellman Approach

In this section we describe the numerical approach to solving optimal control problems via Hamilton-Jacobi-Bellman equations. We first describe how this approach can be used in order to compute approximations to the optimal value function VT and V, respectively, and afterwards how the optimal control can be synthesized using these approximations. In order to formulate this approach for finite horizon T, we interpret VT(x) as a function in T and x. We denote differentiation w.r.t. T and x with subscript T and x, i.e., V x T(x) = dVT(x)∕dx, V T T(x) = dVT(x)∕dT etc.

We define the Hamiltonian of the optimal control problem as

$$\displaystyle{H(x,p) :=\max _{u\in U}\{ - g(x,u) - p \cdot f(x,u)\},}$$

with \(x,p \in \mathbb{R}^{n}\), f from (1), g from (3) or (6), and “⋅ ” denoting the inner product in \(\mathbb{R}^{n}\). Then, under appropriate regularity conditions on the problem data, the optimal value functions VT and V satisfy the first order partial differential equations (PDEs)

$$\displaystyle{V _{T}^{T}(x) +\delta V ^{T}(x) + H(x,V _{ x}^{T}(x)) = 0}$$

and

$$\displaystyle{\delta V ^{\infty }(x) + H(x,V _{ x}^{\infty }(x)) = 0}$$

in the viscosity solution sense. In the case of VT, the equation holds for all T ≥ 0 with the boundary condition V0(x) = F(x).

The framework of viscosity solutions is needed because in general the optimal value functions will not be smooth; thus, a generalized solution concept for PDEs must be employed (see Bardi and Capuzzo Dolcetta 1997). Of course, appropriate boundary conditions are needed at the boundary of the state constraint set X.

Once the Hamilton-Jacobi-Bellman characterization is established, one can compute numerical approximations to VT or V by solving these PDEs numerically. To this end, various numerical schemes have been suggested, including various types of finite element and finite difference schemes. Among those, semi-Lagrangian schemes Falcone (1997) or Falcone and Ferretti (2013) allow for a particularly elegant interpretation in terms of optimal control synthesis, which we explain for the infinite horizon case.

In the semi-Lagrangian approach, one takes advantage of the fact that by the chain rule for p = V x (x) and constant control functions u, the identity

$$\displaystyle\begin{array}{rcl} \delta V ^{\infty }(x) - p \cdot f(x,u)& =& \left. \frac{d} {dt}\right \vert _{t=0} - (1 -\delta t)V ^{\infty } {}\\ & & (x(t,x,u)) {}\\ \end{array}$$

holds. Hence, the left-hand side of this equality can be approximated by the difference quotient

$$\displaystyle{\frac{V ^{\infty }(x) - (1 -\delta h)V ^{\infty }(x(h,x,u))} {h} }$$

for small h > 0. Inserting this approximation into the Hamilton-Jacobi-Bellman equation, replacing x(h, x, u) by a numerical approximation \(\tilde{x}(h,x,u)\) (in the simplest case, the Euler method \(\tilde{x}(h,x,u) = x + hf(x,u)\)), multiplying by h, and rearranging terms, one arrives at the equation

$$\displaystyle\begin{array}{rcl} V _{h}^{\infty }(x)& =& \min _{ u\in U}\{hg(x,u) {}\\ & & +(1 -\delta h)V _{h}^{\infty }(\tilde{x}(h,x,u))\} {}\\ \end{array}$$

defining an approximation \(V _{h}^{\infty }\approx V ^{\infty }\). This is now a purely algebraic dynamic programming-type equation which can be solved numerically, e.g., by using a finite element approach. The equation is typically solved iteratively using a suitable minimization routine for computing the “min” in each iteration (in the simplest case, U is discretized with finitely many values and the minimum is determined by direct comparison). We denote the resulting approximation of V by \(\tilde{V }_{h}^{\infty }\). Here, approximation is usually understood in the L sense (see Falcone 1997 or Falcone and Ferretti 2013).

The semi-Lagrangian scheme is appealing for synthesis of an approximately optimal feedback because V h is the optimal value function of the auxiliary discrete-time problem defined by \(\tilde{x}\). This implies that the expression

$$\displaystyle\begin{array}{rcl} \mu _{h}^{\star }(x)& :=& \mathop{\mathrm{argmin}}_{ u\in U}\{hg(x,u) {}\\ & & +(1 -\delta h)V _{h}^{\infty }(\tilde{x}(h,x,u))\}, {}\\ \end{array}$$

is an optimal feedback control value for this discrete-time problem for the next time step, i.e., on the time interval [t, t + h) if x = x(t). This feedback law will be approximately optimal for the continuous-time control system when applied as a discrete-time feedback law, and this approximate optimality remains true if we replace V h in the definition of \(\mu _{h}^{\star }\) by its numerically computable approximation \(\tilde{V }_{h}^{\infty }\). A similar construction can be made based on any other numerical approximation \(\tilde{V }^{\infty }\approx V ^{\infty }\), but the explicit correspondence of the semi-Lagrangian scheme to a discrete-time auxiliary system facilitates the interpretation and error analysis of the resulting control law.

The main advantage of the Hamilton-Jacobi approach is that it directly computes an approximately optimal feedback law. Its main disadvantage is that the number of grid nodes needed for maintaining a given accuracy in a finite element approach to compute \(\tilde{V }_{h}^{\infty }\) in general grows exponentially with the state dimension n. This fact – known as the curse of dimensionality – restricts this method to low-dimensional state spaces. Unless special structure is available which can be exploited, as, e.g., in the max-plus approach (see McEneaney 2006), it is currently almost impossible to go beyond state dimensions of about n = 10, typically less for strongly nonlinear problems.

Maximum Principle Approach

In contrast to the Hamilton-Jacobi-Bellman approach, the approach via Pontryagin’s maximum principle does not compute a feedback law. Instead, it yields an approximately open-loop optimal control \(u^{\star }\) together with an approximation to the optimal trajectory \(x^{\star }\) for a fixed initial value. We explain the approach for the finite horizon problem. For simplicity of presentation, we omit state constraints in our presentation, i.e., we set \(X = \mathbb{R}^{n}\) and refer to, e.g., Vinter (2000), Bryson and Ho (1975), or Grass et al. (2008) for more general formulations as well as for rigorous versions of the following statements.

In order to state the maximum principle (which, since we are considering a minimization problem here, could also be called minimum principle), we define the non-minimized Hamiltonian as

$$\displaystyle{\mathcal{H}(x,p,u) = g(x,u) + p \cdot f(x,u).}$$

Then, under appropriate regularity assumptions, there exists an absolutely continuous function \(p : [0,T] \rightarrow \mathbb{R}^{n}\) such that the optimal trajectory \(x^{\star }\) and the corresponding optimal control function \(u^{\star }\) for (2)–(4) satisfy

$$\displaystyle{ \dot{p}(t) =\delta p(t) -\mathcal{H}_{x}(x^{\star }(t),p(t),u^{\star }(t)) }$$
(8)

with terminal or transversality condition

$$\displaystyle{ p(T) = F_{x}(x^{\star }(T)) }$$
(9)

and

$$\displaystyle{ u^{\star }(t) =\mathop{ \mathrm{argmin}}_{ u\in U}\mathcal{H}(x^{\star }(t),p(t),u), }$$
(10)

for almost all t ∈ [0, T] (see Grass et al. 2008, Theorem 3.4). The variable p is referred to as the adjoint or costate variable.

For a given initial value \(x_{0} \in \mathbb{R}^{n}\), the numerical approach now consists of finding functions \(x : [0,T] \rightarrow \mathbb{R}^{n}\), u : [0, T] → U and \(p : [0,T] \rightarrow \mathbb{R}^{n}\) satisfying

$$\displaystyle\begin{array}{rcl} \dot{x}(t)& =& f(x(t),u(t)){}\end{array}$$
(11)
$$\displaystyle\begin{array}{rcl} \dot{p}(t)& =& \delta p(t) -\mathcal{H}_{x}(x(t),p(t),u(t)){}\end{array}$$
(12)
$$\displaystyle\begin{array}{rcl} u(t)& =& \mathop{\mathrm{argmin}}_{u\in U}\mathcal{H}(x(t),p(t),u){}\end{array}$$
(13)
$$\displaystyle\begin{array}{rcl} x(0)& =& x_{0},\quad p(T) = F_{x}(x(T)){}\end{array}$$
(14)

for t ∈ [0, T]. Depending on the regularity of the underlying data, the conditions (11)–(14) may only be necessary but not sufficient for x and u being an optimal trajectory \(x^{\star }\) and control function \(u^{\star }\), respectively. However usually x and u satisfying these conditions, are good candidates for the optimal trajectory and control, thus justifying the use of these conditions for the numerical approach. If needed, optimality of the candidates can be checked using suitable sufficient optimality conditions for which we refer to, e.g., Maurer (1981) or Malanowski et al. (2004). Due to the fact that in the maximum principle approach first optimality conditions are derived which are then discretized for numerical simulation, it is also termed first optimize then discretize.

Solving (11)–(14) numerically amounts to solving a boundary value problem, because the condition \(x^{\star }(0) = x_{0}\) is posed at the beginning of the time interval [0, T] while the condition \(p(T) = F_{x}(x^{\star }(T))\) is required at the end. In order to solve such a problem, the simplest approach is the single shooting method which proceeds as follows:

We select a numerical scheme for solving the ordinary differential equations (11) and (12) for t ∈ [0, T] with initial conditions x(0) = x0, p(0) = p0 and control function u(t). Then, we proceed iteratively as follows:

  1. (0)

    Find initial guesses \(p_{0}^{0} \in \mathbb{R}^{n}\) and u0(t) for the initial costate and the control, fix \(\varepsilon> 0\), and set k : = 0.

  2. (1)

    Solve (11) and (12) numerically with initial values x0 and p0k and control function uk. Denote the resulting trajectories by \(\tilde{x}^{k}(t)\) and \(\tilde{p}^{k}(t)\).

  3. (2)

    Apply one step of an iterative method for solving the zero-finding problem G(p) = 0 with

    $$\displaystyle{G(p_{0}^{k}) :=\tilde{ p}^{k}(T) - F_{ x}(\tilde{x}^{k}(T))}$$

    for computing p0k+1. For instance, in case of the Newton method we get

    $$\displaystyle{p_{0}^{k+1} := p_{ 0}^{k} - DG(p_{ 0}^{k})^{-1}G(p_{ 0}^{k}).}$$

    If \(\|p_{0}^{k+1} - p_{0}^{k}\| < \varepsilon\), stop; else compute

    $$\displaystyle{u^{k+1}(t) :=\mathop{ \mathrm{argmin}}_{ u\in U}\mathcal{H}(x^{k}(t),p^{k}(t),u),}$$

    set k : = k + 1, and go to (1).

The procedure described in this algorithm is called single shooting because the iteration is performed on the single initial value p0k. For an implementable scheme, several details still need to be made precise, e.g., how to parameterize the function u(t) (e.g., piecewise constant, piecewise linear or polynomial), how to compute the derivative DG and its inverse (or an approximation thereof), and the argmin in (2). The last task considerably simplifies if the structure of the optimal control, e.g., the number of switchings in case of a bang-bang control, is known.

However, even if all these points are settled, the set of initial guesses p00 and u0 for which the method is going to converge to a solution of (11)–(14) tends to be very small. One reason for this is that the solutions of (11) and (12) typically depend very sensitively on p00 and u0. In order to circumvent this problem, multiple shooting can be used. To this end, one selects a time grid \(0 = t_{0} < t_{1} < t_{2} <\ldots < t_{N} = T\) and in addition to p0k introduces variables \(x_{1}^{k},\ldots,x_{N-1}^{k},p_{1}^{k},\ldots,p_{N-1}^{k} \in \mathbb{R}^{n}\). Then, starting from initial guesses p00, u0, and x10, , xN−10, p10, , pN−10, in each iteration the Eqs. (11)–(14) are solved numerically on the intervals [t j , tj+1] with initial values x j k and p j k, respectively. We denote the respective solutions in the k-th iteration by \(\tilde{x}_{j}^{k}\) and \(\tilde{p}_{j}^{k}\). In order to enforce that the trajectory pieces computed on the individual intervals [t j , tj+1] fit together continuously, the map G is redefined as

$$\displaystyle{\begin{array}{r} G(x_{1}^{k},\ldots,x_{N-1}^{k},p_{0}^{k},p_{1}^{k},\ldots,p_{N-1}^{k})\;\; = \\ \left (\begin{array}{c} \tilde{x}_{0}^{k}(t_{1}) - x_{1}^{k}\\ \vdots \\ \tilde{x}_{N-2}^{k}(t_{1}) - x_{N-1}^{k} \\ \tilde{p}_{0}^{k}(t_{1}) - p_{1}^{k}\\ \vdots \\ \tilde{p}_{N-2}^{k}(t_{1}) - p_{N-1}^{k} \\ \tilde{p}_{N-1}^{k}(T) - F_{x}(\tilde{x}_{N-1}^{k}(T)) \end{array} \right ). \end{array} }$$

The benefit of this approach is that the solutions on the shortened time intervals depend much less sensitively on the initial values and the control, thus making the problem numerically much better conditioned. The obvious disadvantage is that the problem becomes larger as the function G is now defined on a much higher dimensional space but this additional effort usually pays off.

While the convergence behavior for the multiple shooting method is considerably better than for single shooting, it is still a difficult task to select good initial guesses x j 0, p j 0 and u0. In order to accomplish this, homotopy methods can be used (see, e.g., Pesch 1994) or the result of a direct approach as presented in the next section can be used as an initial guess. The latter can be reasonable as the maximum principle-based approach can yield approximations of higher accuracy than the direct method.

In the presence of state constraints or mixed state and control constraints, the conditions (12)–(14) become considerably more technical and thus more difficult to be implemented numerically (cf. Pesch 1994).

Direct Discretization

Despite being the most straightforward and simple of the approaches described in this article, the direct discretization approach is currently the most widely used approach for computing single finite horizon optimal trajectories. In the direct approach, we first discretize the problem and then solve a finite dimensional nonlinear optimization problem (NLP), i.e., we first discretize, then optimize. The main reasons for the popularity of this approach are the simplicity with which constraints can be handled and the numerical efficiency due to the availability of fast and reliable NLP solvers.

The direct approach again applies to the finite horizon problem and computes an approximation to a single optimal trajectory \(x^{\star }(t)\) and control function \(u^{\star }(t)\) for a given initial value x0X. To this end, a time grid 0 = t0 < t1 < t2 < < t N = T and a set \(\mathcal{U}_{d}\) of control functions which are parameterized by finitely many values are selected. The simplest way to do so is to choose \(u(t) \equiv u_{j} \in U\) for all t ∈ [t i , ti+1]. However, other approaches like piecewise linear or piecewise polynomial control functions are possible, too. We use a numerical algorithm for ordinary differential equations in order to approximately solve the initial value problems

$$\displaystyle{ \dot{x}(t) = f(x(t),u_{i}),\quad x(t_{i}) = x_{i} }$$
(15)

for i = 0, , N − 1 on [t i , ti+1]. We denote the exact and numerical solution of (15) by x(t, t i , x i , u i ) and \(\tilde{x}(t,t_{i},x_{i},u_{i})\), respectively. Finally, we choose a numerical integration rule in order to compute an approximation

$$\displaystyle\begin{array}{rcl} I(t_{i},t_{i+1},x_{i},u_{i})& \approx & \int _{t_{i}}^{t_{i+1} }e^{-\delta t} {}\\ & & g(x(t,t_{i},x_{i},u),u(t))dt. {}\\ \end{array}$$

In the simplest case, one might choose \(\tilde{x}\) as the Euler scheme and I as the rectangle rule, leading to

$$\displaystyle{\tilde{x}(t_{i+1},t_{i},x_{i},u_{i}) = x_{i} + (t_{i+1} - t_{i})f(x_{i},u_{i})}$$

and

$$\displaystyle{I(t_{i},t_{i+1},x_{i},u_{i}) = (t_{i+1} - t_{i})e^{-\delta t_{i} }g(x_{i},u_{i}).}$$

Introducing the optimization variables \(u_{0},\ldots,u_{N-1} \in \mathbb{R}^{m}\) and \(x_{1},\ldots,x_{N} \in \mathbb{R}^{n}\), the discretized version of (2)–(4) reads

$$\displaystyle{\mathop{\mbox{ minimize}}_{x_{j}\in \mathbb{R}^{n},u_{j}\in \mathbb{R}^{m}}\sum _{i=0}^{N-1}I(t_{ i},t_{i+1},x_{i},u) + e^{-\delta T}F(x_{ N})}$$

subject to the constraints

$$\displaystyle{\begin{array}{ll} u_{j} \in U,\quad &j = 0,\ldots,N - 1 \\ x_{j} \in X,\quad &j = 1,\ldots,N \\ x_{j+1} =\tilde{ x}(t_{j+1},t_{j},x_{j},u),\quad &j = 0,\ldots,N \end{array} }$$

This way, we have converted the optimal control problem (2)–(4) into a finite dimensional nonlinear optimization problem (NLP). As such, it can be solved with any numerical method for solving such problems. Popular methods are, for instance, sequential quadratic programming (SQP) or interior point (IP) algorithms. The convergence of this approach was proved in Malanowski et al. (1998); for an up-to-date account on theory and practice of the method, see Gerdts (2012) and Betts (2010). These references also explain how information about the costates p(t) can be extracted from a direct discretization, thus linking the approach to the maximum principle.

The direct method sketched here is again a multiple shooting method, and the benefit of this approach is the same as for solving boundary problems, thanks to the short intervals [t i , ti+1]; the solutions depend much less sensitively on the data than the solution on the whole interval [0, T], thus making the iterative solution of the resulting discretized NLP much easier. The price to pay is again the increase of the number of optimization variables. However, due to the particular structure of the constraints guaranteeing continuity of the solution, the resulting matrices in the NLP have a particular structure which can be exploited numerically by a method called condensing (see Bock and Plitt 1984).

An alternative to multiple shooting methods are collocation methods, in which the internal variables of the numerical algorithm for solving (15) are also optimization variables. However, nowadays, the multiple shooting approach as described above is usually preferred. For a more detailed description of various direct approaches, see also Binder et al. (2001), Sect. 5.

Further Approaches for Infinite Horizon Problems

The last two approaches only apply to finite horizon problems. While the maximum principle approach can be generalized to infinite horizon problems, the necessary conditions become weaker and the numerical solution becomes considerably more involved (see Grass et al. 2008). Both the maximum principle and the direct approach can, however, be applied in a receding horizon fashion, in which an infinite horizon problem is approximated by the iterative solution of finite horizon problems. The resulting control technique is known under the name of model predictive control (MPC; see Grüne and Pannek 2011), and under suitable assumptions, a rigorous approximation result can be established.

Summary and Future Directions

The three main numerical approaches to optimal control are:

  • The Hamilton-Jacobi-Bellman approach, which provides a global solution in feedback form but is computationally expensive for higher dimensional systems

  • The Pontryagin maximum principle approach which computes single optimal trajectories with high accuracy but needs good initial guesses for the iteration

  • The direct approach which also computes single optimal trajectories but is less demanding in terms of the initial guesses at the expense of a somewhat lower accuracy

Currently, the main trends in numerical optimal control lie in the areas of Hamilton-Jacobi-Bellman equations and direct discretization. For the former, the development of discretization schemes suitable for increasingly higher dimensional problems is in the focus. For the latter, the popularity of these methods in online applications like MPC triggers continuing effort to make this approach faster and more reliable.

Beyond ordinary differential equations, the development of numerical algorithms for the optimal control of partial differential equations (PDEs) has attracted considerable attention during the last years. While many of these methods are still restricted to linear systems, in the near future we can expect to see many extensions to (classes of) nonlinear PDEs. It is worth noting that for PDEs, maximum principle-like approaches are more popular than for ordinary differential equations.

Cross-References