Abstract
In this chapter we give an introduction to nonlinear infinite horizon optimal control. The dynamic programming principle as well as several consequences of this principle are proved. One of the main results of this chapter is that the infinite horizon optimal feedback law asymptotically stabilizes the system and that the infinite horizon optimal value function is a Lyapunov function for the closed loop system. Motivated by this property we formulate a relaxed version of the dynamic programming principle, which allows to prove stability and suboptimality results for nonoptimal feedback laws and without using the optimal value function. A practical version of this principle is provided, too. These results will be central in the following chapters for the stability and performance analysis of NMPC algorithms. For the special case of sampled-data systems we finally show that for suitable integral costs asymptotic stability of the continuous time sampled data closed loop system follows from the asymptotic stability of the associated discrete time system.
Access provided by Autonomous University of Puebla. Download chapter PDF
In this chapter we give an introduction to nonlinear infinite horizon optimal control. The dynamic programming principle as well as several consequences of this principle are proved. One of the main results of this chapter is that the infinite horizon optimal feedback law asymptotically stabilizes the system and that the infinite horizon optimal value function is a Lyapunov function for the closed loop system. Motivated by this property we formulate a relaxed version of the dynamic programming principle, which allows to prove stability and suboptimality results for nonoptimal feedback laws and without using the optimal value function. A practical version of this principle is provided, too. These results will be central in the following chapters for the stability and performance analysis of NMPC algorithms. For the special case of sampled-data systems we finally show that for suitable integral costs asymptotic stability of the continuous time sampled data closed loop system follows from the asymptotic stability of the associated discrete time system.
4.1 Definition and Well Posedness of the Problem
For the finite horizon optimal control problems from the previous chapter we can define infinite horizon counterparts by replacing the upper limitsN−1 in the respective sums by ∞. Since for this infinite horizon formulation the terminal statex u (N) vanishes from the problem, it is not reasonable to consider terminal constraints. Furthermore, we will not consider any weights in the infinite horizon case. Hence, the most general infinite horizon problem we consider is the following:
Here, the functionℓ is as in (3.8), i.e., it penalizes the distance to a (possibly time varying) reference trajectoryx ref. We optimize over the set of admissible control sequences\(\mathbb{U}^{\infty}(x_{0})\) defined in Definition 3.2 and assume that this set is nonempty for all\(x_{0}\in \mathbb{X}\), which is equivalent to the viability of\(\mathbb{X}\) according to Assumption 3.3. In order to keep the presentation self-contained all subsequent statements are formulated for general time varying referencex ref. In the special case of constant referencex ref≡x ∗ the running costℓ and the functionalJ ∞ in (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) do not depend on the time n.
Similar to Definition 3.14 we define the optimal value function and optimal trajectories.
Definition 4.1
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with initial value\(x_{0}\in \mathbb{X}\) and time instantn∈ℕ0.
-
(i)
The function
$$V_\infty(n,x_0) := \inf_{u(\cdot)\in \mathbb{U}^\infty(x_0)} J_\infty\bigl(n,x_0,u(\cdot)\bigr)$$is calledoptimal value function.
-
(ii)
A control sequence\(u^{\star}(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) is calledoptimal control sequence forx 0 if
$$V_\infty(n,x_0) = J_\infty\bigl(n,x_0,u^\star(\cdot)\bigr)$$holds. The corresponding trajectory\(x_{u^{\star}}(\cdot,x_{0})\) is calledoptimal trajectory.
Since now—in contrast to the finite horizon problem—an infinite sum appears in the definition ofJ ∞, it is no longer straightforward thatV ∞ is finite. In order to ensure that this is the case the following definition is helpful.
Definition 4.2
Consider the control system (2.1) and a reference trajectory\(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence\(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\). We say that the system is (uniformly) asymptotically controllable tox ref if there exists a function\(\beta\in \mathcal{KL}\) such that for each initial timen 0∈ℕ0 and each admissible initial value\(x_{0}\in \mathbb{X}\) there exists an admissible control sequence\(u\in \mathbb{U}^{\infty}(x_{0})\) such that the inequality
holds for alln∈ℕ0. We say that this asymptotic controllability has the small control property if\(u\in \mathbb{U}^{\infty}(x_{0})\) can be chosen such that the inequality
holds for alln∈ℕ0. Here, as in Sect. 2.3 we write\(|x_{1}|_{x_{2}}= d_{X}(x_{1},x_{2})\) and\(|u_{1}|_{u_{2}}= d_{U}(u_{1},u_{2})\).
Observe that uniform asymptotic controllability is a necessary condition for uniform feedback stabilization. Indeed, if we assume asymptotic stability of the closed-loop systemx +=g(n,x)=f(x,μ(n,x)), then we immediately get asymptotic controllability with controlu(n)=μ(n+n 0,x(n+n 0,n 0,x 0)). The small control property, however, is not satisfied in general.
In order to use Definition 4.2 for deriving bounds on the optimal value function, we need a result known as Sontag’s\(\mathcal{KL}\)-Lemma [24, Proposition 7]. This proposition states that for each\(\mathcal{KL}\)-functionβ there exist functions\(\gamma_{1},\gamma_{2}\in \mathcal{K}_{\infty}\) such that the inequality
holds for allr,n≥0 (in fact, the result holds for realn≥0 but we only need it for integers here). Using the functionsγ 1 andγ 2 we can define running cost functions
forλ≥0. The following theorem states that under Definition 4.2 this running cost ensures (uniformly) finite upper and positive lower bounds onV ∞.
Theorem 4.3
Consider the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).If the system is asymptotically controllable to x ref,then there exist \(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\) such that the optimal value function V ∞ corresponding to the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) from (4.3)with λ=0satisfies
for all n 0∈ℕ0 and all \(x_{0}\in \mathbb{X}\).
If,in addition,the asymptotic controllability has the small control property then the statement also holds for ℓ from (4.3)with arbitrary λ≥0.
Proof
For eachx 0,n 0 and\(u\in \mathbb{U}^{\infty}(x_{0})\) we get
for eachλ≥0. Hence, from the definition ofV ∞ we get
This proves the lower bound in (4.4) for\(\alpha_{1} = \gamma_{1}^{-1}\).
For proving the upper bound, we first consider the caseλ=0. For alln 0 andx 0 the control\(u\in \mathbb{U}^{\infty}(x_{0})\) from Definition 4.2 yields
i.e., the upper inequality from (4.4) withα 2(r)=eγ 2(r)/(e−1). If the small control property holds, then the upper bound forλ>0 follows similarly withα 2(r)=(1+λ)eγ 2(r)/(e−1). □
In fact, the specific form (4.3) is just one possible choice ofℓ for which this theorem holds. It is rather easy to extend the result to anyℓ which is bounded from below by some\(\mathcal{K}_{\infty}\)-function inx (uniformly for allu andn) and bounded from above byℓ from (4.3) in balls\(\mathcal{B}_{\varepsilon }(x^{\mathrm{ref}}(n))\). Since, however, the choice of appropriate cost functionsℓ for infinite horizon optimal control problems is not a central topic of this book, we leave this extension to the interested reader.
4.2 The Dynamic Programming Principle
In this section we essentially restate and reprove the results from Sect. 3.4 for the infinite horizon case. We begin with thedynamic programming principle for the infinite horizon problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)). Throughout this section we assume thatV ∞(x) is finite for all\(x\in \mathbb{X}\) as ensured, e.g., by Theorem 4.3.
Theorem 4.4
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0.Then for all K∈ℕthe equation
holds.If,in addition,an optimal control sequence u ⋆(⋅)exists for x 0,then we get the equation
In particular,in this case the “inf ”in (4.5)is a “min ”.
Proof
From the definition ofJ ∞ for\(u(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) we immediately obtain
whereu(⋅+K) denotes the shifted control sequence defined byu(⋅+K)(k)=u(k+K), which is admissible forx u (K,x 0).
We now prove (4.5) by showing “≥” and “≤” separately: From (4.7) we obtain
Since this inequality holds for all\(u(\cdot)\in \mathbb{U}^{\infty}\), it also holds when taking the infimum on both sides. Hence we get
i.e., (4.5) with “≥”.
In order to prove “≤”, fixε>0 and letu ε(⋅) be an approximately optimal control sequence for the right hand side of (4.7), i.e.,
Now we decompose\(u(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) analogously to Lemma 3.12(ii) and (iii) into\(u_{1}\in \mathbb{U}^{K}(x_{0})\) and\(u_{2}\in \mathbb{U}^{\infty}(x_{u_{1}}(K,x_{0}))\) via
This implies
Now (4.7) yields
i.e.,
Sinceε>0 was arbitrary and the expressions in this inequality are independent ofε, this inequality also holds forε=0, which shows (4.5) with “≤” and thus (4.5).
In order to prove (4.6) we use (4.7) withu(⋅)=u ⋆(⋅). This yields
where we used the (already proved) Equality (4.5) in the last step. Hence, the two “≥” in this chain are actually “=” which implies (4.6). □
The following corollary states an immediate consequence from the dynamic programming principle. It shows that tails of optimal control sequences are again optimal control sequences for suitably adjusted initial value and time.
Corollary 4.5
If u ⋆(⋅)is an optimal control sequence for (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with initial value x 0 and initial time n,then for each K∈ℕthe sequence \(u^{\star}_{K}(\cdot)=u^{\star}(\cdot+K)\),i.e.,
is an optimal control sequence for initial value \(x_{u^{\star}}(K,x_{0})\) and initial time n+K.
Proof
InsertingV ∞(n,x 0)=J ∞(n,x 0,u ⋆(⋅)) and the definition of\(u_{K}^{\star}(\cdot)\) into (4.7) we obtain
Subtracting (4.6) from this equation yields
which shows the assertion. □
The next two results are the analogs of Theorem 3.17 and Corollary 3.18 in the infinite horizon setting.
Theorem 4.6
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0 and assume that an optimal control sequence u ⋆(⋅)exists.Then the feedback law μ ∞(n,x 0)=u ∗(0)satisfies
and
where in (4.8)—as usual—we interpret \(\mathbb{U}^{1}(x_{0})\) as a subset of U,i.e.,we identify the one element sequence u=u(⋅)with its only element u=u(0).
Proof
The proof is identical to the finite horizon counterpart Theorem 3.17. □
As in the finite horizon case, the following corollary shows that the feedback law (4.8) can be used in order to construct the optimal control sequence.
Corollary 4.7
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0 and consider an admissible feedback law \(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\) in the sense of Definition 3.2(iv).Denote the solution of the closed-loop system
by \(x_{\mu_{\infty}}\) and assume that μ ∞ satisfies (4.8)for initial values \(x_{0}=x_{\mu_{\infty}}(k)\) for all k=0,1,….Then
is an optimal control sequence for initial time n and initial value x 0 and the solution of the closed-loop system (4.10)is a corresponding optimal trajectory.
Proof
From (4.11) forx(n) from (4.10) we immediately obtain
Hence we need to show that
where it is enough to show “≥” because the opposite inequality follows by definition ofV ∞. Using (4.11) and (4.9) we get
fork=0,1,…. Summing these equalities fork=0,…,K−1 for arbitraryK∈ℕ and eliminating the identical termsV ∞(n+k,x 0),k=1,…,K−1 on the left and on the right we obtain
Since the sum is monotone increasing inK and bounded from above, forK→∞ the right hand side converges toJ ∞(n,x 0,u ⋆) showing the assertion. □
Corollary 4.7 implies that infinite horizon optimal control is nothing but NMPC withN=∞: Formula (4.11) fork=0 yields that if we replace the optimization problem (\(\mathrm{OCP}_{\mathrm{N}}^{\mathrm{n}}\)) in Algorithm 3.7 by (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)), then the feedback law resulting from this algorithm equalsμ ∞. The following theorem shows that this infinite horizon NMPC-feedback law yields an asymptotically stable closed loop and thus solves the stabilization and tracking problem.
Theorem 4.8
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities
hold for all \(x\in \mathbb{X}\),n∈ℕ0 and u∈U.Assume furthermore that an optimal feedback μ ∞ exists,i.e.,an admissible feedback law \(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\) satisfying (4.8)for all n∈ℕ0 and all \(x\in \mathbb{X}\).Then this optimal feedback asymptotically stabilizes the closed-loop system
on \(\mathbb{X}\) in the sense of Definition 2.16.
Proof
For the closed-loop system, (4.9) and the last inequality in (4.12) yield
Together with the first two inequalities in (4.12) this shows thatV ∞ is a Lyapunov function on\(\mathbb{X}\) in the sense of Definition 2.21 withα V =α 3. Thus, Theorem 2.22 yields asymptotic stability on\(\mathbb{X}\). □
By Theorem 4.3 we can replace (4.12) by the asymptotic controllability condition from Definition 4.2 ifℓ is of the form (4.3). This is used in the following corollary in order to give a stability result without explicitly assuming (4.12).
Corollary 4.9
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that the system is asymptotically controllable to x ref and that an optimal feedback μ ∞,i.e.,a feedback satisfying (4.8),exists for the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) from (4.3)with λ=0.Then this optimal feedback asymptotically stabilizes the closed-loop system
on \(\mathbb{X}\) in the sense of Definition 2.16.
If,in addition,the asymptotic controllability has the small control property then the statement also holds for ℓ from (4.3)with arbitrary λ≥0.
Proof
Theorem 4.3 yields
for suitable\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\). Furthermore, by (4.3) the third inequality in (4.12) holds with\(\alpha_{3}=\gamma_{1}^{-1}\). Hence, (4.12) holds and Theorem 4.8 yields asymptotic stability on \(\mathbb{X}\). □
4.3 Relaxed Dynamic Programming
The last results of the previous section show that infinite horizon optimal control can be used in order to derive a stabilizing feedback law. Unfortunately, a direct solution of infinite horizon optimal control problems is in general impossible, both analytically and numerically. Still, infinite horizon optimal control plays an important role in our analysis since we will interpret the model predictive control algorithm as an approximation of the infinite horizon optimal control problem. Here the term “approximation” is not necessarily to be understood in the sense of “being close to” (although this aspect is not excluded) but rather in the sense of “sharing the important structural properties”.
Looking at the proof of Theorem 4.8 we see that the important property for stability is the inequality
which follows from the feedback version (4.9) of the dynamic programming principle. Observe that although (4.9) yields equality, only this inequality is needed in the proof of Theorem 4.8.
This observation motivates a relaxed version of this dynamic programming inequality which on the one hand yields asymptotic stability and on the other hand provides a quantitative measure of the closed-loop performance of the system. This relaxed version will be formulated in Theorem 4.11, below. In order to quantitatively measure the closed-loop performance, we use the infinite horizon cost functional evaluated along the closed-loop trajectory which we define as follows.
Definition 4.10
Let\(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law. For the trajectoriesx μ (n) of the closed-loop systemx +=f(x,μ(n,x)) with initial value\(x_{\mu}(n_{0})=x_{0}\in \mathbb{X}\) we define theinfinite horizon cost as
Since by (3.8) our running costℓ is always nonnegative, either the infinite sum has a well defined finite value or it diverges to infinity, in which case we writeJ ∞(n 0,x 0,μ)=∞.
By Corollary 4.7 for the infinite horizon optimal feedback lawμ ∞ we obtain
while for all other admissible feedback lawsμ we get
In other words,V ∞ is a strict lower bound forJ ∞(n 0,x 0,μ).
The following theorem now gives a relaxed dynamic programming condition from which we can derive both asymptotic stability and an upper bound on the infinite horizon costJ ∞(n 0,x 0,μ) for an arbitrary admissible feedback lawμ.
Theorem 4.11
Consider a running cost \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) and a function \(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\).Let \(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law and let \(S(n)\subseteq \mathbb{X}\),n∈ℕ0 be a family of forward invariant sets for the closed-loop system
Assume there exists α∈(0,1]such that the relaxed dynamic programming inequality
holds for all n∈ℕ0 and all x∈S(n).Then the suboptimality estimate
holds for all n∈ℕ0 and all x∈S(n).
If,in addition,there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities
hold for all \(x\in \mathbb{X}\),n∈ℕ0,u∈U and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0} \to \mathbb{X}\),then the closed-loop system (4.13)is asymptotically stable on S(n)in the sense of Definition 2.16.
Proof
In order to prove (4.15) considern∈ℕ0,x∈S(n) and the trajectoryx μ (⋅) of (4.13) withx μ (n)=x. By forward invariance of the setsS(n) this trajectory satisfiesx μ (n+k)∈S(n+k). Hence from (4.14) for allk∈ℕ0 we obtain
Summing overk yields for allK∈ℕ
sinceV(n+K,x μ (n+K))≥0 andx μ (n)=x. Since the running costℓ is nonnegative, the term on the left is monotone increasing and bounded, hence forK→∞ it converges toαJ ∞(n,x,μ). Since the right hand side is independent ofK, this yields (4.15).
The stability assertion now immediately follows by observing thatV satisfies all assumptions of Theorem 2.22 withα V =αα 3. □
Remark 4.12
An inspection of the proof of Theorems 2.19 and 2.22 reveals that for fixed\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\) andα V =α α 3 with fixed\(\alpha_{3}\in \mathcal{K}_{\infty}\) and varyingα∈(0,1] the attraction rate\(\beta\in \mathcal{KL}\) constructed in this proof depends onα in the following way: ifβ α andβ α′ are the attraction rates from Theorem 2.22 forα V =α α 3 andα V =α′α 3, respectively, withα′≥α, thenβ α′(r,t)≤β α (r,t) holds for allr,t≥0. This in particular implies that for every\(\bar{\alpha}\in(0,1)\) the attraction rate\(\beta_{\bar{\alpha}}\) is also an attraction rate for all\(\alpha\in[\bar{\alpha},1]\), i.e., we can find an attraction rate\(\beta\in \mathcal{KL}\) which is independent of\(\alpha\in[\bar{\alpha},1]\).
Remark 4.13
Theorem 4.11 proves asymptotic stability of the discrete time closed-loop system (4.13) or (2.5). For a sampled data system (2.8) with sampling periodT>0 this implies the discrete time stability estimate (2.47) for the sampled data closed-loop system (2.30). For sampled data systems we may define the running costℓ as an integral over a functionL according to (3.4), i.e.,
We show that for this choice ofℓ a mild condition onL ensures that the sampled data closed-loop system (2.30) is also asymptotically stable in the continuous time sense, i.e., that (2.48) holds. For simplicity, we restrict ourselves to a time invariant referencex ref≡x ∗.
The condition we use is that there exists\(\delta\in \mathcal{K}_{\infty}\) such that the vector fieldf c in (2.6) satisfies
for allx∈X, allu∈U and allε>0. For instance, in a linear–quadratic problem withX=ℝd,U=ℝm andx ∗=0 we have ‖f c (x,u)‖=‖Ax+Bu‖≤C 1(‖x‖+‖u‖) andL(x,u)=x ⊤ Qx+u ⊤ Ru≥C 2(‖x‖+‖u‖)2 for suitable constantsC 1,C 2>0 providedQ andR are positive definite. In this case, (4.16) holds with\(\delta(r) = C_{1}^{2}/C_{2} r\), since ‖f c (x,u)‖>ε impliesC 1(‖x‖+‖u‖)>ε and thus
In the general nonlinear case, (4.16) holds iff c is continuous withf c (x ∗,u ∗)=0,L(x,u) is positive definite and the inequality ‖f c (x,u)‖≤CL(x,u) holds for some constantC>0 whenever ‖f c (x,u)‖ is sufficiently large.
We now show that (4.16) together with Theorem 4.11 implies the continuous time stability estimate (2.48). If the assumptions of Theorem 4.11 hold, then (4.15) implies\(\ell(x,\mu(x)) \le V(x)/\alpha \le \alpha_{2}(|x|_{x_{*}})/\alpha\). Thus, fort∈[0,T] Inequality (4.16) yields
Setting\(\varepsilon =\tilde{\gamma}(|x|_{x_{*}})\) with
forr>0 and\(\tilde{\gamma}(0)=0\) yields\(\tilde{\gamma}\in \mathcal{K}_{\infty}\) and
Hence, defining
we finally obtain
for allt∈[0,T] with\(\gamma\in \mathcal{K}_{\infty}\).
Hence, if (4.16) and the assumptions of Theorem 4.11 hold, then the sampled data closed-loop system (2.30) fulfills the uniform boundedness overT property from Definition 2.24 and consequently by Theorem 2.27 the sampled data closed-loop system (2.30) is asymptotically stable.
We now turn to investigating practical stability. Recalling Definitions 2.15 and 2.17 ofP-practical asymptotic stability and their Lyapunov function characterizations in Theorems 2.20 and 2.23 we can formulate the following practical version of Theorem 4.11.
Theorem 4.14
Consider a running cost \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) and a function \(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\).Let \(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law and let \(S(n)\subseteq \mathbb{X}\),and P(n)⊂S(n),n∈ℕ0 be families of forward invariant sets for the closed-loop system (4.13).
Assume there exists α∈(0,1]such that the relaxed dynamic programming inequality (4.14)holds for all n∈ℕ0 and all x∈S(n)∖P(n).Then the suboptimality estimate
holds for all n∈ℕ0 and all x∈S(n),where k ∗∈ℕ0 is the minimal time with x μ (k ∗+n,n,x)∈P(k ∗+n)and
is the truncated closed-loop performance functional from Definition 4.10.
If,in addition,there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities
hold for all \(x\in \mathbb{X}\),n∈ℕ0 and u∈U and a reference \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\),then the closed-loop system (4.13)is P-asymptotically stable on S(n)in the sense of Definition 2.17.
Proof
The proof follows with analogous arguments as the proof of Theorem 4.11 by only consideringk<k ∗ in the first part and using Theorem 2.23 withY(n)=S(n) instead of Theorem 2.22 in the second part. □
Remark 4.15
-
(i)
Note that Remark 4.12 holds accordingly for Theorem 4.14. Furthermore, it is easily seen that both Theorem 4.11 and Theorem 4.14 remain valid iff in (4.13) depends onn.
-
(ii)
The suboptimality estimate (4.17) states that the closed-loop trajectoriesx μ (⋅,x) from (4.13) behave like suboptimal trajectories until they reach the setsP(⋅).
As a consequence of Theorem 4.11, we can show the existence of a stabilizing almost optimal infinite horizon optimal feedback even if no infinite horizon optimal feedback exists. The assumptions of the following Theorem 4.16 are identical with the assumptions of Theorem 4.8 except that we do not assume the existence of an infinite horizon optimal feedback lawμ ∞.
Theorem 4.16
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with running cost ℓ of the form (3.8)for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the Inequalities (4.12)hold for all \(x\in \mathbb{X}\),n∈ℕ0 and u∈U.
Then for each α∈(0,1)there exists an admissible feedback \(\mu_{\alpha}:\mathbb{N}_{0}\times \mathbb{X}\to U\) which asymptotically stabilizes the closed-loop system
on \(\mathbb{X}\) in the sense of Definition 2.16and satisfies
for all \(x\in \mathbb{X}\) and n∈ℕ0.
Proof
Fixα∈(0,1) and pick an arbitrary\(x\in \mathbb{X}\). From (4.5) forK=1 for each\(x\in \mathbb{X}\) and eachε>0 there exists\(u_{x}^{\varepsilon }\in \mathbb{U}^{1}(x)\) with
IfV ∞(n,x)>0, then (4.12) impliesx≠x ref(n) and thus again (4.12) yields the inequality inf u∈U ℓ(n,x,u)>0. Hence, choosingε=(1−α)inf u∈U ℓ(n,x,u) and setting\(\mu_{\alpha}(n,x) = u_{x}^{\varepsilon }\) yields
IfV ∞(n,x)=0, then (4.12) impliesx=x ref(n) and thus from the definition ofu ref we getf(x,u ref(n))=x ref(n+1). Using (4.12) once again gives usV ∞(n+1,f(x,u ref(n)))=0 and from (3.8) we getℓ(n,x,u ref(n))=0. Thus,μ α (n,x)=u ref(n) satisfies (4.18). Hence, we obtain (4.14) withV=V ∞ for all\(x\in \mathbb{X}\). In conjunction with (4.12) this implies that all assumptions of Theorem 4.11 are satisfied forV=V ∞ with\(S(n)=\mathbb{X}\). Thus, the assertion follows. □
Again we can replace (4.12) by the asymptotic controllability condition from Definition 4.2.
Corollary 4.17
Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that the system is asymptotically controllable to x ref and that the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) is of the form (4.3)with λ=0.Then for each α∈(0,1)there exists an admissible feedback \(\mu_{\alpha}:\mathbb{N}_{0}\times \mathbb{X}\to U\) which asymptotically stabilizes the closed-loop system
on \(\mathbb{X}\) in the sense of Definition 2.16and satisfies
for all \(x\in \mathbb{X}\) and n∈ℕ0.
If,in addition,the asymptotic controllability has the small control property then the statement also holds for ℓ from (4.3)with arbitrary λ≥0.
Proof
Theorem 4.3 yields
for suitable\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\). Furthermore, by (4.3) the third inequality in (4.12) holds with\(\alpha_{3}=\gamma_{1}^{-1}\). Hence, (4.12) holds and Theorem 4.16 yields the assertion. □
While Theorem 4.16 and Corollary 4.17 are already nicer than Theorem 4.8 and Corollary 4.9, respectively, in the sense that no existence of an optimal feedback law is needed, for practical applications both theorems require the (at least approximate) solution of an infinite horizon optimal control problem, which is in general a hard, often infeasible computational task, see also the discussion in Sect. 4.4, below.
Hence, in the following chapters we are going to use Theorem 4.11 and Theorem 4.14 in a different way: we will derive conditions under which (4.14) is satisfied by the finite horizon optimal value functionV=V N and the corresponding NMPC-feedback lawμ=μ N . The advantage of this approach lies in the fact that in order to computeμ N (n 0,x 0) it is sufficient to know the finite horizon optimal control sequenceu ⋆ for initial valuex 0. This is a much easier computing task, at least if the optimization horizonN is not too large.
4.4 Notes and Extensions
Infinite horizon optimal control is a classical topic in control theory. The version presented in Sect. 4.1 can be seen as a nonlinear generalization of the classical (discrete time) linear–quadratic regulator (LQR) problem, see, e.g., Dorato and Levis [6]. A rather general existence result for optimal control sequences and trajectories in the metric space setting considered here was given by Keerthi and Gilbert [15]. Note, however, that by Theorem 4.16 we do not need the existence of optimal controls for the existence of almost optimal stabilizing feedback controls.
Dynamic programming as introduced in Sect. 4.2 is a very common approach also for infinite horizon optimal control and we refer to the discussion in Sect. 3.5 for some background information. As in the finite horizon case, the monographs of Bertsekas [2,3] provide a good source for more information on this method.
The connection between infinite horizon optimal control and stabilization problems for nonlinear systems has been recognized for quite a while. Indeed, the well known construction of control Lyapunov functions in continuous time by Sontag [23] is based on techniques from infinite horizon optimal control. As already observed after Corollary 4.7, discrete time infinite horizon optimal control is nothing but NMPC withN=∞. This has lead to the investigation of infinite horizon NMPC algorithms, e.g., by Keerthi and Gilbert [16], Meadows and Rawlings [19], Alamir and Bornard [1]. For linear systems, this approach was also considered in the monograph of Bitmead, Gevers and Wertz [4].
The stability results in this chapter are easily generalized to the stability of sets\(X^{\mathrm{ref}}(n)\subset \mathbb{X}\) whenℓ is of the form (3.24). In this case, it suffices to replace the bounds\(\alpha_{j}(|x|_{x^{\mathrm{ref}}(n)})\),j=1,2,3, in, e.g., Theorem 4.11 by bounds of the form
Alternatively, one could formulate these bounds via so-called proper indicator functions as used, e.g., by Grimm et al. in [8].
By Formula (4.8) the optimal—and stabilizing—feedback lawμ ∞ can be computed by solving a rather simple optimization problem once the optimal value functionV ∞ is known. This has motivated a variety of approaches for solving the dynamic programming equation (4.5) (usually forK=1) numerically in order to obtain an approximation ofμ ∞ from a numerical approximation ofV ∞. Approximation techniques like linear and multilinear approximations are proposed, e.g., in Kreisselmeier and Birkhölzer [17], Camilli, Grüne and Wirth [5] or by Falcone [7]. A set oriented approach was developed in Junge and Osinga [14] and used for computing stabilizing feedback laws in Grüne and Junge [10] (see also [11,12] for further improvements of this method). All such methods, however, suffer from the so-calledcurse of dimensionality which means that the numerical effort grows exponentially with the dimension of the state spaceX. In practice, this means that these approaches can only be applied for low-dimensional systems, typically not higher than 4–5. For homogeneous systems, Tuna [25] (see also Grüne [9]) observed that it is sufficient to computeV ∞ on a sphere, which reduces the dimension of the problem by one. Still, this only slightly reduces the computational burden. In contrast to this, a numerical approximation of the optimal control sequenceu ⋆ for finite horizon optimal control problems like (OCPN) and its variants is possible also in rather high space dimensions, at least when the optimization horizonN is not too large. This makes the NMPC approach computationally attractive.
Relaxed dynamic programming in the form introduced in Sect. 4.3 was originally developed by Lincoln and Rantzer [18] and Rantzer [20] in order to lower the computational complexity of numerical dynamic programming approaches. Instead of trying to solve the dynamic programming equation (4.5) exactly, it is only solved approximately using numerical approximations forV ∞ from a suitable class of functions, e.g., polynomials. The idea of using such relaxations is classical and can be realized in various other ways, too; see, e.g., [2, Chap. 6]. Here we use relaxed dynamic programming not for solving (4.5) but rather for proving properties of closed-loop solutions, cf. Theorems 4.11 and 4.14. While the specific form of the assumptions in these theorems were first used in an NMPC context in Grüne and Rantzer [13], the conceptual idea is actually older and can be found, e.g., in Shamma and Xiong [22] or in Scokaert, Mayne and Rawlings [21]. The fact that stability of the sampled data closed loop can be derived from the stability of the associated discrete time system for integral costs (3.4), cf. Remark 4.13, was, to the best of our knowledge, not observed before.
4.5 Problems
-
1.
Consider the problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with finite optimal value function\(V_{\infty}:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) and asymptotically stabilizing admissible optimal feedback law\(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\). Let\(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) be a function which satisfies
$$V(n,x_0)= \min_{u\in \mathbb{U}^1(x_0)} \bigl\{ \ell(n,x_0,u) + V\bigl(n+1,f(x_0,u)\bigr) \bigr\} $$(4.20)for alln∈ℕ0 and allx 0∈X.
-
(a)
Prove thatV(n,x)≥V ∞(n,x) holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).
-
(b)
Prove that for the optimal feedback law the inequality
$$ \begin{array}{*{20}l} {V(n,x) - V_\infty (n,x) \le } \hfill & {V(n + 1,f(x,\mu _\infty (n,x)))} \hfill \\ {} \hfill & { - V_\infty (n + 1,f(x,\mu _\infty (n,x)))} \hfill \\ \end{array} $$holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).
-
(c)
Assume that in addition there exist\(\alpha_{2}\in \mathcal{K}_{\infty}\) such that the inequality
$$V(n,x) \le \alpha_2\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr)$$holds for alln∈ℕ0,\(x\in \mathbb{X}\) and a reference trajectory\(x^{\mathrm{ref}}:\mathbb{N}_{0} \to \mathbb{X}\). Prove that under this conditionV(n,x)=V ∞(n,x) holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).
-
(d)
Find a function\(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) satisfying (4.20) but for whichV(n,x)=V ∞(n,x) does not hold. Of course, for this function the additional condition onV from (c) must be violated.
Hint for (a): Define a feedbackμ which assigns to each pair (n,x) a minimizer of the right hand side of (4.20), check that Theorem 4.11 is applicable for\(S(n)=\mathbb{X}\) (for whichα∈(0,1]?) and conclude the desired inequality from (4.15).
Hint for (c): Perform an induction over the inequality from (b) along the optimal closed-loop trajectory.
-
(a)
-
2.
Consider the unconstrained linear control system
$$ x^ + = Ax + Bu $$with matricesA∈ℝd×d,B∈ℝd×m. Consider problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with
$$\ell(x,u) = x^\top Q x + u^\top R u$$with symmetric positive definite matricesQ,R of appropriate dimension (this setting is called the linear–quadratic regulator (LQR) problem). If the pair (A,B) is stabilizable, then it is known that the discrete time algebraic Riccati equation
$$ P = Q + A^ \top (P - PB(B^ \top PB + R)^{ - 1} B^ \top P)A $$has a unique symmetric and positive definite solutionP∈ℝd×d.
-
(a)
Show that the functionV(x)=x ⊤ Px satisfies (4.20). Note that since the problem here is time invariant we do not need the argumentn.
-
(b)
Use the results from Problem 1 to conclude thatV ∞(x)=x ⊤ Px holds. You may assume without proof that an optimal feedbackμ ∞ exists.
-
(c)
Prove that the corresponding optimal feedback law asymptotically stabilizes the equilibriumx ∗=0.
Hint for (a): For matricesC,D,E of appropriate dimensions withC,D symmetric andD positive definite the formula
$$\min_{u\in \mathbb{R}^m} \bigl\{ x^\top Cx + u^\top Du + u^\top E^\top x + x^\top Eu\bigr\}= x^\top \bigl(C-ED^{-1} E^\top\bigr)x$$holds. This formula is proved by computing the zero of the derivative of the expression in the “min” with respect tou (which is also a nice exercise).
Hint for (b) and (c): For any symmetric and positive definite matrixM∈ℝd×d there exist constantsC 2≥C 1>0 such that the inequalityC 1‖x‖2≤x ⊤ Mx≤C 2‖x‖2 holds for allx∈ℝd.
-
(a)
-
3.
Consider the finite horizon counterpart (OCPN) of Problem 2. For this setting one can show that the optimal value function is of the formV N (x)=x ⊤ P N x and that the matrixP N converges to the matrixP from Problem 2 asN→∞. This convergence implies that for eachε>0 there existsN ε >0 such that the inequality
$$\big|x^\top P_N x - x^\top P x\big| \le \varepsilon \|x\|^2$$holds for allN≥N ε . Use this property and Theorem 4.11 in order to prove that the NMPC-feedback law from Algorithm 3.1 is asymptotically stabilizing for sufficiently large optimization horizonN>0.
Hint: Look at the hint for Problem 2(b) and (c).
-
4.
Consider the scalar control system
$$ x^ + = x + u $$withx∈X=ℝ,u∈U=ℝ which shall be controlled via the NMPC Algorithm 3.1 using the quadratic running cost function
$$ \ell (x,u) = x^2 + u^2 . $$ComputeV N (x 0) andJ ∞(x 0,μ N (⋅)) forN=2 (cf. Chap. 3, Problem 3). Using these values, derive the degree of suboptimalityα from the relaxed dynamic programming inequality (4.14) and from the suboptimality estimate (4.15).
References
Alamir, M., Bornard, G.: Stability of a truncated infinite constrained receding horizon scheme: the general discrete nonlinear case. Automatica31(9), 1353–1356 (1995)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. I, 3rd edn. Athena Scientific, Belmont (2005)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. II, 2nd edn. Athena Scientific, Belmont (2001)
Bitmead, R.R., Gevers, M., Wertz, V.: Adaptive Optimal Control. The Thinking Man’s GPC. International Series in Systems and Control Engineering. Prentice Hall, New York (1990)
Camilli, F., Grüne, L., Wirth, F.: A regularization of Zubov’s equation for robust domains of attraction. In: Isidori, A., Lamnabhi-Lagarrigue, F., Respondek, W. (eds.) Nonlinear Control in the Year 2000, vol. 1. Lecture Notes in Control and Information Sciences, vol. 258, pp. 277–289. Springer, London (2001)
Dorato, P., Levis, A.H.: Optimal linear regulators: the discrete-time case. IEEE Trans. Automat. Control16, 613–620 (1971)
Falcone, M.: Numerical solution of dynamic programming equations. Appendix A. In: Bardi, M., Capuzzo Dolcetta, I. (eds.) Optimal Control and Viscosity Solutions of Hamilton–Jacobi–Bellman Equations. Birkhäuser, Boston (1997)
Grimm, G., Messina, M.J., Tuna, S.E., Teel, A.R.: Model predictive control: for want of a local control Lyapunov function, all is not lost. IEEE Trans. Automat. Control50(5), 546–558 (2005)
Grüne, L.: Homogeneous state feedback stabilization of homogeneous systems. SIAM J. Control Optim.38, 1288–1314 (2000)
Grüne, L., Junge, O.: A set oriented approach to optimal feedback stabilization. Systems Control Lett.54, 169–180 (2005)
Grüne, L., Junge, O.: Global optimal control of perturbed systems. J. Optim. Theory Appl.136, 411–429 (2008)
Grüne, L., Junge, O.: Set oriented construction of globally optimal controllers. Automatisierungstechnik57, 287–295 (2009)
Grüne, L., Rantzer, A.: On the infinite horizon performance of receding horizon controllers. IEEE Trans. Automat. Control53, 2100–2111 (2008)
Junge, O., Osinga, H.M.: A set oriented approach to global optimal control. ESAIM Control Optim. Calc. Var.10, 259–270 (2004)
Keerthi, S.S., Gilbert, E.G.: An existence theorem for discrete-time infinite-horizon optimal control problems. IEEE Trans. Automat. Control30(9), 907–909 (1985)
Keerthi, S.S., Gilbert, E.G.: Optimal infinite-horizon feedback laws for a general class of constrained discrete-time systems: stability and moving-horizon approximations. J. Optim. Theory Appl.57(2), 265–293 (1988)
Kreisselmeier, G., Birkhölzer, T.: Numerical nonlinear regulator design. IEEE Trans. Automat. Control39, 33–46 (1994)
Lincoln, B., Rantzer, A.: Relaxing dynamic programming. IEEE Trans. Automat. Control51(8), 1249–1260 (2006)
Meadows, E.S., Rawlings, J.B.: Receding horizon control with an infinite cost. In: Proceedings of the American Control Conference – ACC 1993, San Francisco, California, USA, pp. 2926–2930 (1993)
Rantzer, A.: Relaxed dynamic programming in switching systems. IEE Proc., Control Theory Appl.153(5), 567–574 (2006)
Scokaert, P.O.M., Mayne, D.Q., Rawlings, J.B.: Suboptimal model predictive control (feasibility implies stability). IEEE Trans. Automat. Control44(3), 648–654 (1999)
Shamma, J.S., Xiong, D.: Linear nonquadratic optimal control. IEEE Trans. Automat. Control42(6), 875–879 (1997)
Sontag, E.D.: A Lyapunov-like characterization of asymptotic controllability. SIAM J. Control Optim.21(3), 462–471 (1983)
Sontag, E.D.: Comments on integral variants of ISS. Systems Control Lett.34, 93–100 (1998)
Tuna, E.S.: Optimal regulation of homogeneous systems. Automatica41, 1879–1890 (2005)
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Grüne, L., Pannek, J. (2011). Infinite Horizon Optimal Control. In: Nonlinear Model Predictive Control. Communications and Control Engineering. Springer, London. https://doi.org/10.1007/978-0-85729-501-9_4
Download citation
DOI: https://doi.org/10.1007/978-0-85729-501-9_4
Publisher Name: Springer, London
Print ISBN: 978-0-85729-500-2
Online ISBN: 978-0-85729-501-9
eBook Packages: EngineeringEngineering (R0)