Abstract
This entry illustrates the application of Bellman’s dynamic programming principle within the context of optimal control problems for continuous-time dynamical systems. The approach leads to a characterization of the optimal value of the cost functional, over all possible trajectories given the initial conditions, in terms of a partial differential equation called the Hamilton–Jacobi–Bellman equation. Importantly, this can be used to synthesize the corresponding optimal control input as a state-feedback law.
Access provided by Autonomous University of Puebla. Download reference work entry PDF
Similar content being viewed by others
Keywords
Introduction
The dynamic programming principle (DPP) is a fundamental tool in optimal control theory. It was largely developed by Richard Bellman in the 1950s (Bellman 1957) and has since been applied to various problems in deterministic and stochastic optimal control. The goal of optimal control is to determine the control function and the corresponding trajectory of a dynamical system which together optimize a given criterion usually expressed in terms of an integral along the trajectory (the cost functional) (Fleming and Rishel 1975; Macki and Strauss 1982). The function which associates with the initial condition of the dynamical system the optimal value of the cost functional among all the possible trajectories is called the value function. The most interesting point is that via the dynamic programming principle, one can derive a characterization of the value function in terms of a nonlinear partial differential equation (the Hamilton–Jacobi–Bellman equation) and then use it to synthesize a feedback control law. This is the major advantage over the approach based on the Pontryagin Maximum Principle (PMP) (Boltyanskii et al. 1956; Pontryagin et al. 1962). In fact, the PMP merely gives necessary conditions for the characterization of the open-loop optimal control and of the corresponding optimal trajectory. The DPP has also been applied to construct approximation schemes for the value function although this approach suffers from the “curse of dimensionality” since one has to solve a nonlinear partial differential equation in a high dimension. Despite the elegance of the DPP approach, its practical application is limited by this bottleneck, and the solution of many optimal control problems has been accomplished instead via the two-point boundary value problem associated with the PMP.
The Infinite Horizon Problem
Let us present the main ideas for the classical infinite horizon problem. Let a controlled dynamical system be given by
where \(x_{0},y(s) \in \mathbb{R}^{d}\), and
with T finite or + ∞. Under the assumption that the control is measurable, existence and uniqueness properties for the solution of (1) are ensured by the Carathèodory theorem:
Theorem 1 (Carathèodory)
Assume that:
-
1.
f(⋅,⋅) is continuous.
-
2.
There exists a positive constant L f > 0 such that
$$\displaystyle{ \vert f(x,a) - f(y,a)\vert \leq L_{f}\vert x - y\vert, }$$for all \(x, y \in \mathbb{R}^{d }\) , \(t \in \mathbb{R}^{+}\) and a ∈ A.
-
3.
f(x,α(t)) is measurable with respect to t.
Then, there is a unique absolutely continuous function\(y : [t_{0},T] \rightarrow \mathbb{R}^{d}\)that satisfies
which is interpreted as the solution of (1) .
Note that the solution is continuous, but only a.e. differentiable, so it must be regarded as a weak solution of (1). By the theorem above, fixing a control in the set of admissible controls
yields a unique trajectory of (1) which is denoted by \(y_{x_{0},\ t_{0}}(s;\alpha )\). Changing the control policy generates a family of solutions of the controlled system (1) with index α. Since the dynamics (1) are “autonomous,” the initial time t0 can be shifted to 0 by a change of variable. So to simplify the notation for autonomous dynamics, we can set t0 = 0 and we denote this family by \(y_{x_{0}}(s;\alpha )\) (or even write it as y(s) if no ambiguity over the initial state or control arises). It is customary in dynamic programming, moreover, to use the notations x and t instead of x0 and t0 (since x and t appear as variables in the Hamilton–Jacobi–Bellman equation).
Optimal control problems require the introduction of a cost functional \(J : \mathcal{A}\rightarrow \mathbb{R}\) which is used to select the “optimal trajectory” for (1). In the case of the infinite horizon problem, we set t0 = 0, x0 = x, and this functional is defined as
for a given λ > 0. The function g represents the running cost and λ is the discount factor, which can be used to take into account the reduced value, at the initial time, of future costs. From a technical point of view, the presence of the discount factor ensures that the integral is finite whenever g is bounded. Note that one can also consider the undiscounted problem (λ = 0) provided the integral is still finite. The goal of optimal control is to find an optimal pair \((y^{{\ast}},\alpha ^{{\ast}})\) that minimizes the cost functional. If we seek optimal controls in open-loop form , i.e., as functions of t, then the Pontryagin Maximum Principle furnishes necessary conditions for a pair \((y^{{\ast}},\alpha ^{{\ast}})\) to be optimal.
A major drawback of an open-loop control is that being constructed as a function of time, it cannot take into account errors in the true state of the system, due, for example, to model errors or external disturbances, which may take the evolution far from the optimal forecasted trajectory. Another limitation of this approach is that a new computation of the control is required whenever the initial state is changed.
For these reasons, we are interested in the so-called feedback controls, that is, controls expressed as functions of the state of the system. Under feedback control, if the system trajectory is perturbed, the system reacts by changing its control strategy according to the change in the state. One of the main motivations for using the DPP is that it yields solutions to optimal control problems in the form of feedback controls.
DPP for the Infinite Horizon Problem
The starting point of dynamic programming is to introduce an auxiliary function, the value function, which for our problem is
where, as above, x is the initial position of the system. The value function has a clear meaning: it is the optimal cost associated with the initial position x. This is a reference value which can be useful to evaluate the efficiency of a control – if \(J_{x}(\bar{\alpha })\) is close to v(x), this means that \(\bar{\alpha }\) is “efficient.”
Bellman’s dynamic programming principle provides a first characterization of the value function.
Proposition 1 (DPP for the infinite horizon problem)
Under the assumptions of Theorem 1 , for all \(x \in \mathbb{R}^{d}\) and τ > 0,
Proof
Denote by \(\bar{v}(x)\) the right-hand side of (5). First, we remark that for any \(x \in \mathbb{R}^{d}\) and \(\bar{\alpha }\in \mathcal{A}\),
(here, \(y_{x}(s,\bar{\alpha })\) is abbreviated as \(\bar{y}(s)\)). Taking the infimum over all trajectories, first over the right-hand side and then the left of this inequality, yields
To prove the opposite inequality, we recall that \(\bar{v}\) is defined as an infimum, and so, for any \(x \in \mathbb{R}^{d}\) and \(\varepsilon> 0\), there exists a control \(\bar{\alpha }_{\varepsilon }\) (and the corresponding evolution \(\bar{y}_{\varepsilon }\)) such that
On the other hand, the value function v being also defined as an infimum, for any \(x \in \mathbb{R}^{d}\) and \(\varepsilon> 0\), there exists a control \(\tilde{\alpha }_{\varepsilon }\) such that
where \(\hat{\alpha }\) is a control defined by
(Note that \(\hat{\alpha }(\cdot )\) is still measurable). Since \(\varepsilon\) is arbitrary, (9) finally yields \(\bar{v}(x) \geq v(x)\).
We observe that this proof crucially relies on the fact that the control defined by (10) still belongs to \(\mathcal{A}\), being a measurable control. The possibility of obtaining an admissible control by joining together two different measurable controls is known as the concatenation property.
The Hamilton–Jacobi–Bellman Equation
The DPP can be used to characterize the value function in terms of a nonlinear partial differential equation. In fact, let \(\alpha ^{{\ast}} \in \mathcal{A}\) be the optimal control, and y∗ the associated evolution (to simplify, we are assuming that the infimum is a minimum). Then,
that is,
so that adding and subtracting \(e^{-\lambda \tau }v(x)\) and dividing by τ, we get
Assume now that v is regular. By passing to the limit as τ → 0+, we have
where we have assumed that \(\alpha ^{{\ast}}(\cdot )\) is continuous at 0. Then, we can conclude
where \(a^{{\ast}} =\alpha ^{{\ast}}(0)\). Similarly, using the equivalent form
of the DPP and the inequality, this implies for any (continuous at 0) control \(\alpha \in \mathcal{A}\),
Combining (11) and (12), we obtain the Hamilton–Jacobi–Bellman equation (or dynamic programming equation):
which characterizes the value function for the infinite horizon problem associated with minimizing (3). Note that given x, the value of a achieving the max (assuming it exists) corresponds to the control a∗ = α∗(0), and this makes it natural to interpret the argmax in (13) as the optimal feedback at x (see Bardi and Capuzzo Dolcetta (1997) for more details).
In short, (13) can be written as
with \(x \in \mathbb{R}^{d}\), and
Note that H(x, u, ⋅ ) is convex (being the sup of a family of linear functions) and that H(x, ⋅ , p) is monotone (since λ > 0). It is also easy to see that the solution u is not differentiable even when f and g are smooth functions (i.e., \(f,g,\in C^{\infty }(\mathbb{R}^{n},A)\)), so we need to deal with weak solution of the Bellman equation. This can be done in the framework of viscosity solutions, a theory initiated by Crandall and Lions in the 1980s which has been successfully applied in many areas as optimal control, fluid dynamics, and image processing (see the books Barles (1994) and Bardi and Capuzzo Dolcetta (1997) for an extended introduction and numerous applications to optimal control). Typically viscosity solutions are Lipschitz continuous solutions so they are differentiable almost everywhere.
An Extension to the Minimum Time Problem
In the minimum time problem, we want to minimize the time of arrival of the state on a given target set\(\mathcal{T}\). We will assume that \(\mathcal{T} \subset \mathbb{R}^{d}\) is a closed set. Then our cost functional will be given by
where
The corresponding value function is called the minimum time function
The main difference with respect to the previous problem is that now the value function T will be finite valued only on a subset \(\mathcal{R}\) which depends on the target, on the dynamics, and on the set of admissible controls.
Definition 1
The reachable set \(\mathcal{R}\) is defined by
where, for t > 0, \(\mathcal{R}(t) :=\{ x \in \mathbb{R}^{n} : T(x) <t\}\).
The meaning is clear: \(\mathcal{R}\) is the set of initial points which can be driven to the target in finite time. The system is said to be controllable on \(\mathcal{T}\) if for all t > 0, \(\mathcal{T} \subset \mbox{ int}(\mathcal{R}(t))\) (here, int(D) denotes the interior of the set D). Assuming controllability in a neighborhood of the target one gets the continuity of the minimum time function and under the assumptions made on f, A, and \(\mathcal{T}\), one can prove some interesting properties:
-
(i)
\(\mathcal{R}\) is open.
-
(ii)
T is continuous on \(\mathcal{R}\).
-
(iii)
\(\lim \limits _{x\rightarrow x_{0}}T(x) = +\infty\), for any \(x_{0} \in \partial \mathcal{R}\).
Now let us denote by \(\mathcal{X}_{D}\) the characteristic function of the set D. Using in \(\mathcal{R}\) arguments similar to the proof of DPP in the previous section one can obtain the following DPP:
Proposition 2 (DPP for the minimum time problem)
For any \(x \in \mathcal{R}\) , the value function satisfies
and
From the previous DPP, one can also obtain the following characterization of the minimum time function.
Proposition 3
Let \(\mathcal{R}\setminus \mathcal{T}\) be open and \(T \in C(\mathcal{R}\setminus \mathcal{T} )\) , then T is a viscosity solution of
coupled with the natural boundary condition
By the change of variable \(v(x) = 1 - e^{-T(x)}\), one can obtain a simpler problem getting rid of the boundary condition on \(\partial \mathcal{R}\) (which is unknown). The new function v will be the unique viscosity solution of an external Dirichlet problem (see Bardi and Capuzzo Dolcetta (1997) for more details), and the reachable set can be recovered a posteriori via the relation \(\mathcal{R} = \{x \in \mathbb{R}^{d} :\)v(x) < 1}.
Further Extensions and Related Topics
The DPP has been extended from deterministic control problems to many other problems. In the framework of stochastic control problems where the dynamics are given by a diffusion, the characterization of the value function obtained via the DPP leads to a second-order Hamilton–Jacobi–Bellman equation (Fleming and Soner 1993; Kushner and Dupuis 2001). Another interesting extension has been made in differential games where the DPP is based on the delicate notion of nonanticipative strategies for the players and leads to a nonconvex nonlinear partial differential equation (the Isaacs equation) (Bardi and Capuzzo Dolcetta 1997). For a short introduction to numerical methods based on DP and exploiting the so-called “value iteration,” we refer the interested reader to the Appendix A in Bardi and Capuzzo Dolcetta (1997) and to Kushner and Dupuis (2001) (see also the book Howard (1960) for the “policy iteration”).
Bibliography
Bardi M, Capuzzo Dolcetta I (1997) Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations. Birkhäuser, Boston
Barles G (1994) Solutions de viscosité des équations de Hamilton-Jacobi. In: Mathématiques et applications, vol 17. Springer, Paris
Bellman R (1957) Dynamic programming. Princeton University Press, Princeton
Bertsekas DP (1987) Dynamic programming: deterministic and stochastic models. Prentice Hall, Englewood Cliffs
Boltyanskii VG, Gamkrelidze RV, Pontryagin LS (1956) On the theory of optimal processes (in Russian). Doklady Akademii Nauk SSSR 110, 7–10
Fleming WH, Rishel RW (1975) Deterministic and stochastic optimal control. Springer, New York
Fleming WH, Soner HM (1993) Controlled Markov processes and viscosity solutions. Springer, New York
Howard RA (1960) Dynamic programming and Markov processes. Wiley, New York
Kushner HJ, Dupuis P (2001) Numerical methods for stochastic control problems in continuous time. Springer, Berlin
Macki J, Strauss A (1982) Introduction to optimal control theory. Springer, Berlin/Heidelberg/New York
Pontryagin LS, Boltyanskii VG, Gamkrelidze RV, Mishchenko EF (1961) Matematicheskaya teoriya optimal’ nykh prozessov. Fizmatgiz, Moscow. Translated into English. The mathematical theory of optimal processes. John Wiley and Sons (Interscience Publishers), New York, 1962
Ross IM (2009) A primer on Pontryagin’s principle in optimal control. Collegiate Publishers, San Francisco
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag London
About this entry
Cite this entry
Falcone, M. (2015). Optimal Control and the Dynamic Programming Principle. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5058-9_209
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5058-9_209
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5057-2
Online ISBN: 978-1-4471-5058-9
eBook Packages: EngineeringReference Module Computer Science and Engineering