In this chapter we give an introduction to nonlinear infinite horizon optimal control. The dynamic programming principle as well as several consequences of this principle are proved. One of the main results of this chapter is that the infinite horizon optimal feedback law asymptotically stabilizes the system and that the infinite horizon optimal value function is a Lyapunov function for the closed loop system. Motivated by this property we formulate a relaxed version of the dynamic programming principle, which allows to prove stability and suboptimality results for nonoptimal feedback laws and without using the optimal value function. A practical version of this principle is provided, too. These results will be central in the following chapters for the stability and performance analysis of NMPC algorithms. For the special case of sampled-data systems we finally show that for suitable integral costs asymptotic stability of the continuous time sampled data closed loop system follows from the asymptotic stability of the associated discrete time system.

4.1 Definition and Well Posedness of the Problem

For the finite horizon optimal control problems from the previous chapter we can define infinite horizon counterparts by replacing the upper limitsN−1 in the respective sums by ∞. Since for this infinite horizon formulation the terminal statex u (N) vanishes from the problem, it is not reasonable to consider terminal constraints. Furthermore, we will not consider any weights in the infinite horizon case. Hence, the most general infinite horizon problem we consider is the following:

figure a

Here, the function is as in (3.8), i.e., it penalizes the distance to a (possibly time varying) reference trajectoryx ref. We optimize over the set of admissible control sequences\(\mathbb{U}^{\infty}(x_{0})\) defined in Definition 3.2 and assume that this set is nonempty for all\(x_{0}\in \mathbb{X}\), which is equivalent to the viability of\(\mathbb{X}\) according to Assumption 3.3. In order to keep the presentation self-contained all subsequent statements are formulated for general time varying referencex ref. In the special case of constant referencex refx the running cost and the functionalJ in (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) do not depend on the time n.

Similar to Definition 3.14 we define the optimal value function and optimal trajectories.

Definition 4.1

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with initial value\(x_{0}\in \mathbb{X}\) and time instantn∈ℕ0.

  1. (i)

    The function

    $$V_\infty(n,x_0) := \inf_{u(\cdot)\in \mathbb{U}^\infty(x_0)} J_\infty\bigl(n,x_0,u(\cdot)\bigr)$$

    is calledoptimal value function.

  2. (ii)

    A control sequence\(u^{\star}(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) is calledoptimal control sequence forx 0 if

    $$V_\infty(n,x_0) = J_\infty\bigl(n,x_0,u^\star(\cdot)\bigr)$$

    holds. The corresponding trajectory\(x_{u^{\star}}(\cdot,x_{0})\) is calledoptimal trajectory.

Since now—in contrast to the finite horizon problem—an infinite sum appears in the definition ofJ , it is no longer straightforward thatV is finite. In order to ensure that this is the case the following definition is helpful.

Definition 4.2

Consider the control system (2.1) and a reference trajectory\(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence\(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\). We say that the system is (uniformly) asymptotically controllable tox ref if there exists a function\(\beta\in \mathcal{KL}\) such that for each initial timen 0∈ℕ0 and each admissible initial value\(x_{0}\in \mathbb{X}\) there exists an admissible control sequence\(u\in \mathbb{U}^{\infty}(x_{0})\) such that the inequality

$$\big|x_u(n,x_0)\big|_{x^{\mathrm{ref}}(n+n_0)} \le \beta \bigl(|x_0|_{x^{\mathrm{ref}}(n_0)},n\bigr) $$
(4.1)

holds for alln∈ℕ0. We say that this asymptotic controllability has the small control property if\(u\in \mathbb{U}^{\infty}(x_{0})\) can be chosen such that the inequality

$$\big|x_u(n,x_0)\big|_{x^{\mathrm{ref}}(n+n_0)} + \big|u(n)\big|_{u^{\mathrm{ref}}(n+n_0)} \le \beta \bigl(|x_0|_{x^{\mathrm{ref}}(n_0)},n\bigr) $$
(4.2)

holds for alln∈ℕ0. Here, as in Sect. 2.3 we write\(|x_{1}|_{x_{2}}= d_{X}(x_{1},x_{2})\) and\(|u_{1}|_{u_{2}}= d_{U}(u_{1},u_{2})\).

Observe that uniform asymptotic controllability is a necessary condition for uniform feedback stabilization. Indeed, if we assume asymptotic stability of the closed-loop systemx +=g(n,x)=f(x,μ(n,x)), then we immediately get asymptotic controllability with controlu(n)=μ(n+n 0,x(n+n 0,n 0,x 0)). The small control property, however, is not satisfied in general.

In order to use Definition 4.2 for deriving bounds on the optimal value function, we need a result known as Sontag’s\(\mathcal{KL}\)-Lemma [24, Proposition 7]. This proposition states that for each\(\mathcal{KL}\)-functionβ there exist functions\(\gamma_{1},\gamma_{2}\in \mathcal{K}_{\infty}\) such that the inequality

$$\beta(r,n)\le \gamma_1\bigl(e^{-n}\gamma_2(r)\bigr)$$

holds for allr,n≥0 (in fact, the result holds for realn≥0 but we only need it for integers here). Using the functionsγ 1 andγ 2 we can define running cost functions

$$\ell(n,x,u) := \gamma_1^{-1}\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr) + \lambda \gamma_1^{-1}\bigl(|u|_{u^{\mathrm{ref}}(n)}\bigr) $$
(4.3)

forλ≥0. The following theorem states that under Definition 4.2 this running cost ensures (uniformly) finite upper and positive lower bounds onV .

Theorem 4.3

Consider the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).If the system is asymptotically controllable to x ref,then there exist \(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\) such that the optimal value function V corresponding to the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) from (4.3)with λ=0satisfies

$$ \alpha_1\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr) \le V_\infty(n_0,x_0)\le \alpha_2\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr)$$
(4.4)

for all n 0∈ℕ0 and all \(x_{0}\in \mathbb{X}\).

If,in addition,the asymptotic controllability has the small control property then the statement also holds for from (4.3)with arbitrary λ≥0.

Proof

For eachx 0,n 0 and\(u\in \mathbb{U}^{\infty}(x_{0})\) we get

$$ \begin{array}{*{20}l} {J_\infty (n_0 ,x_0 ,u)} \hfill & { = \sum\limits_{k = 0}^\infty {\ell (n_0 + k,x_u (k,x_0 ),u(k)) \ge \ell (n,x_u (0,x_0 ),u(0))} } \hfill \\ {} \hfill & { \ge \gamma _1^{ - 1} (|x_0 |_{x^{{\rm ref}} (n_0 )} )} \hfill \\ \end{array}$$

for eachλ≥0. Hence, from the definition ofV we get

$$V_\infty(n_0,x_0) = \inf_{u(\cdot)\in \mathbb{U}^\infty(x_0)} J_\infty\bigl(n_0,x_0,u(\cdot)\bigr)\ge \gamma_1^{-1}\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr).$$

This proves the lower bound in (4.4) for\(\alpha_{1} = \gamma_{1}^{-1}\).

For proving the upper bound, we first consider the caseλ=0. For alln 0 andx 0 the control\(u\in \mathbb{U}^{\infty}(x_{0})\) from Definition 4.2 yields

$$ \begin{array}{*{20}l} {V_\infty (n_0 ,x_0 )} \hfill & { \le J_\infty (n_0 ,x_0 ,u)} \hfill \\ {} \hfill & { = \sum\limits_{k = 0}^\infty {\ell (n_0 + k,x_u (k,x_0 ),u(k))} } \hfill \\ {} \hfill & { = \sum\limits_{k = 0}^\infty {\gamma _1^{ - 1} (|x_u (k,x_0 )|_{x^{{\rm ref}} (n_0 + k)} )} } \hfill \\ {} \hfill & { \le \sum\limits_{k = 0}^\infty {\gamma _1^{ - 1} (\beta (|x_0 |_{x^{{\rm ref}} (n_0 )} ,k)) \le \sum\limits_{k = 0}^\infty {e^{ - k} \gamma ^2 (|x_0 |_{x^{{\rm ref}} (n_0 )} )} } } \hfill \\ {} \hfill & { = \frac{e}{{e - 1}}\gamma _2 (|x_0 |_{x^{{\rm ref}} (n_0 )} ),} \hfill \\ \end{array} $$

i.e., the upper inequality from (4.4) withα 2(r)= 2(r)/(e−1). If the small control property holds, then the upper bound forλ>0 follows similarly withα 2(r)=(1+λ) 2(r)/(e−1). □

In fact, the specific form (4.3) is just one possible choice of for which this theorem holds. It is rather easy to extend the result to any which is bounded from below by some\(\mathcal{K}_{\infty}\)-function inx (uniformly for allu andn) and bounded from above by from (4.3) in balls\(\mathcal{B}_{\varepsilon }(x^{\mathrm{ref}}(n))\). Since, however, the choice of appropriate cost functions for infinite horizon optimal control problems is not a central topic of this book, we leave this extension to the interested reader.

4.2 The Dynamic Programming Principle

In this section we essentially restate and reprove the results from Sect. 3.4 for the infinite horizon case. We begin with thedynamic programming principle for the infinite horizon problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)). Throughout this section we assume thatV (x) is finite for all\(x\in \mathbb{X}\) as ensured, e.g., by Theorem 4.3.

Theorem 4.4

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0.Then for all K∈ℕthe equation

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 )} \hfill & { = \mathop {\inf }\limits_{u( \cdot ) \in \mathbb{U}^K (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k))} } \right.} \hfill \\ {} \hfill & { + V_\infty (n + K,x_u (K,x_0 )\left. ) \right\}} \hfill \\ \end{array} $$
(4.5)

holds.If,in addition,an optimal control sequence u (⋅)exists for x 0,then we get the equation

$$V_\infty(n,x_0) = \sum_{k=0}^{K-1} \ell\bigl(n+k,x_{u^\star}(k,x_0),u^\star(k)\bigr) + V_\infty\bigl(n+K,x_{u^\star}(K,x_0)\bigr). $$
(4.6)

In particular,in this case the “inf ”in (4.5)is a “min ”.

Proof

From the definition ofJ for\(u(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) we immediately obtain

$$ = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k)) + J_\infty (n + K,x_u (K,x_0 ),u( \cdot + K)),} $$
(4.7)

whereu(⋅+K) denotes the shifted control sequence defined byu(⋅+K)(k)=u(k+K), which is admissible forx u (K,x 0).

We now prove (4.5) by showing “≥” and “≤” separately: From (4.7) we obtain

$$ \begin{array}{*{20}l} {J_\infty (n,x_0 ,u( \cdot ))} \hfill & { = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k)) + J_\infty (n + K,x_u (K,x_0 ),u( \cdot + K))} } \hfill \\ {} \hfill & { \ge \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k)) + V_\infty (n + K,x_u (K,x_0 )).} } \hfill \\ \end{array} $$

Since this inequality holds for all\(u(\cdot)\in \mathbb{U}^{\infty}\), it also holds when taking the infimum on both sides. Hence we get

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 )} \hfill & { = \mathop {\inf }\limits_{u( \cdot )U^\infty (x_0 )} J^\infty (n,x_0 ,u( \cdot ))} \hfill \\ {} \hfill & { \ge \mathop {\inf }\limits_{u( \cdot ) \in \user1{U}^K (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k)) + V_\infty (n + K,x_u (K,x_0 ))} } \right\},} \hfill \\ \end{array} $$

i.e., (4.5) with “≥”.

In order to prove “≤”, fixε>0 and letu ε(⋅) be an approximately optimal control sequence for the right hand side of (4.7), i.e.,

$$ \begin{array}{l} \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_{u^\varepsilon } (k,x_0 ),u^\varepsilon (k)) + J_\infty (n + K,x_{u^\varepsilon } (K,x_0 ),u^\varepsilon ( \cdot + K))} \\ \le \mathop {\inf }\limits_{\mathbb{u}( \cdot ) \in (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k))} } \right. \\ + J_\infty (n + K,x_u (K,x_0 ),u( \cdot + K)\left. ) \right\} + \varepsilon \\ \end{array} $$

Now we decompose\(u(\cdot)\in \mathbb{U}^{\infty}(x_{0})\) analogously to Lemma 3.12(ii) and (iii) into\(u_{1}\in \mathbb{U}^{K}(x_{0})\) and\(u_{2}\in \mathbb{U}^{\infty}(x_{u_{1}}(K,x_{0}))\) via

$$ u(k) = \left\{ {\begin{array}{*{20}c} {u_1 (k),} \hfill & {k = 0,...,K - 1,} \hfill \\ {u_2 (k - K),} \hfill & {k \ge K.} \hfill \\ \end{array}} \right. $$

This implies

$$ \begin{array}{l} \mathop {\inf }\limits_{u( \cdot ) \in \mathbb{U}^\infty (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k) + J_\infty (n + K,x_u (K,x_0 ),u)( \cdot + K)} )} \right\} \\ = \mathop {\inf }\limits_{\mathop {u_1 ( \cdot ) \in \mathbb{U}^K (x_0 ))}\limits_{u_2 ( \cdot )\mathbb{U}^\infty (x_{u1} (K,x_0 ))} } \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_{u1} (k,x_0 ),u_1 (k))} } \right. \\ + J_\infty (n + K,x_{u1} (K,x_0 ),u_2 ( \cdot )\left. ) \right\} \\ \end{array} $$

Now (4.7) yields

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 )} \hfill & { \le J_\infty (n,x_0 ,u^\varepsilon ( \cdot ))} \hfill \\ {} \hfill & { = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_{u^\varepsilon } (k,x_0 ),u^\varepsilon (k))} } \hfill \\ {} \hfill & { + J_\infty (n + K,x_{u^\varepsilon } (K,x_0 ),u^\varepsilon ( \cdot + K))} \hfill \\ {} \hfill & { \le \mathop {\inf }\limits_{u( \cdot ) \in \mathbb{U}^K (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k))} } \right.} \hfill \\ {} \hfill & { + V_\infty (n + K,x_u (K,x_0 )\left. ) \right\} + \varepsilon ,} \hfill \\ \end{array} $$

i.e.,

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 ) \le } \hfill & {\mathop {\inf }\limits_{u( \cdot ) \in \mathbb{U}^K (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_0 ),u(k))} } \right.} \hfill \\ {} \hfill & { + V_\infty (n + K,x_u (K,x_0 )\left. ) \right\} + \varepsilon .} \hfill \\ \end{array} $$

Sinceε>0 was arbitrary and the expressions in this inequality are independent ofε, this inequality also holds forε=0, which shows (4.5) with “≤” and thus (4.5).

In order to prove (4.6) we use (4.7) withu(⋅)=u (⋅). This yields

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 )} \hfill & { = J_\infty (n,x_0 ,u^* ( \cdot ))} \hfill \\ {} \hfill & { = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_{u^* } (k,x_0 ),u^* (k)) + J_\infty (n + K,x_{u^* } (K,x_0 ),u^* ( \cdot + K))} } \hfill \\ {} \hfill & { = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_{u^* } (k,x_0 ),u^* (k)) + V_\infty (n + K,x_{u^* } (K,x_0 ))} } \hfill \\ {} \hfill & { \ge \mathop {\inf }\limits_{u( \cdot ) \in \mathbb{U}^K (x_0 )} \left\{ {\sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_u (k,x_u (k,x_0 ),u(k)) + V_\infty (n + K,x_0 ))} } \right\}} \hfill \\ {} \hfill & { = V_\infty (n,x_0 ),} \hfill \\ \end{array} $$

where we used the (already proved) Equality (4.5) in the last step. Hence, the two “≥” in this chain are actually “=” which implies (4.6). □

The following corollary states an immediate consequence from the dynamic programming principle. It shows that tails of optimal control sequences are again optimal control sequences for suitably adjusted initial value and time.

Corollary 4.5

If u (⋅)is an optimal control sequence for (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with initial value x 0 and initial time n,then for each K∈ℕthe sequence \(u^{\star}_{K}(\cdot)=u^{\star}(\cdot+K)\),i.e.,

$$u^\star_K(k) = u^\star(K+k), \quad k=0,1,\ldots $$

is an optimal control sequence for initial value \(x_{u^{\star}}(K,x_{0})\) and initial time n+K.

Proof

InsertingV (n,x 0)=J (n,x 0,u (⋅)) and the definition of\(u_{K}^{\star}(\cdot)\) into (4.7) we obtain

$$V_\infty(n,x_0) = \sum_{k=0}^{K-1} \ell\bigl(n+k,x_{u^\star}(k,x_0),u^\star(k)\bigr) +J_\infty\bigl(n+K,x_{u^\star}(K,x_0),u^\star_K(\cdot)\bigr).$$

Subtracting (4.6) from this equation yields

$$0 = J_\infty\bigl(n+K,x_{u^\star}(K,x_0),u^\star_K(\cdot)\bigr) - V_\infty \bigl(n+K,x_{u^\star}(K,x_0)\bigr)$$

which shows the assertion. □

The next two results are the analogs of Theorem 3.17 and Corollary 3.18 in the infinite horizon setting.

Theorem 4.6

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0 and assume that an optimal control sequence u (⋅)exists.Then the feedback law μ (n,x 0)=u (0)satisfies

$$\mu_\infty(n,x_0)=\operatorname{argmin}_{u\in \mathbb{U}^1(x_0)} \bigl\{ \ell(n,x_0,u) + V_\infty \bigl(n+1,f(x_0,u)\bigr) \bigr\} $$
(4.8)

and

$$V_\infty(n,x_0)= \ell\bigl(n,x_0,\mu_\infty(n,x_0)\bigr) + V_\infty\bigl(n+1,f\bigl(x_0,\mu_\infty(n,x_0)\bigr)\bigr) $$
(4.9)

where in (4.8)—as usual—we interpret \(\mathbb{U}^{1}(x_{0})\) as a subset of U,i.e.,we identify the one element sequence u=u(⋅)with its only element u=u(0).

Proof

The proof is identical to the finite horizon counterpart Theorem 3.17. □

As in the finite horizon case, the following corollary shows that the feedback law (4.8) can be used in order to construct the optimal control sequence.

Corollary 4.7

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with \(x_{0}\in \mathbb{X}\) and n∈ℕ0 and consider an admissible feedback law \(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\) in the sense of Definition 3.2(iv).Denote the solution of the closed-loop system

$$x(0) = x_0, \qquad x(k+1) =f\bigl(x(k),\mu_{\infty}\bigl(n+k,x(k)\bigr)\bigr),\quad k=0,1,\ldots $$
(4.10)

by \(x_{\mu_{\infty}}\) and assume that μ satisfies (4.8)for initial values \(x_{0}=x_{\mu_{\infty}}(k)\) for all k=0,1,….Then

$$u^\star(k) = \mu_{\infty}\bigl(n+k,x_{u^\star}(k,x_0)\bigr), \quad k=0,1,\ldots $$
(4.11)

is an optimal control sequence for initial time n and initial value x 0 and the solution of the closed-loop system (4.10)is a corresponding optimal trajectory.

Proof

From (4.11) forx(n) from (4.10) we immediately obtain

$$x_{u^\star}(n,x_0) = x(n), \quad n=0,1,\ldots{}.$$

Hence we need to show that

$$V_\infty(n,x_0) = J_\infty\bigl(n,x_0,u^\star\bigr),$$

where it is enough to show “≥” because the opposite inequality follows by definition ofV . Using (4.11) and (4.9) we get

$$V_{\infty}(n+k,x_0)= \ell\bigl(n+k,x(k),u^\star(k)\bigr) + V_{\infty}\bigl(n+k+1,x(k+1)\bigr)$$

fork=0,1,…. Summing these equalities fork=0,…,K−1 for arbitraryK∈ℕ and eliminating the identical termsV (n+k,x 0),k=1,…,K−1 on the left and on the right we obtain

$$ \begin{array}{*{20}l} {V_\infty (n,x_0 )} \hfill & { = \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x(k),u^* (k)) + V_\infty (n + K,x(K))} } \hfill \\ {} \hfill & { \ge \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x(k),u^* (k)).} } \hfill \\ \end{array} $$

Since the sum is monotone increasing inK and bounded from above, forK→∞ the right hand side converges toJ (n,x 0,u ) showing the assertion. □

Corollary 4.7 implies that infinite horizon optimal control is nothing but NMPC withN=∞: Formula (4.11) fork=0 yields that if we replace the optimization problem (\(\mathrm{OCP}_{\mathrm{N}}^{\mathrm{n}}\)) in Algorithm 3.7 by (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)), then the feedback law resulting from this algorithm equalsμ . The following theorem shows that this infinite horizon NMPC-feedback law yields an asymptotically stable closed loop and thus solves the stabilization and tracking problem.

Theorem 4.8

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities

$$ \alpha _1 (|x|_{x^{{\rm ref}} (n)} ) \le V_\infty (n,x) \le \alpha _2 |(|x|_{x^{{\rm ref}} (n)} )\,\,and\,\,\ell (n,x,u) \ge \alpha _3 (|x|_{x^{{\rm ref}} (n)} ) $$
(4.12)

hold for all \(x\in \mathbb{X}\),n∈ℕ0 and uU.Assume furthermore that an optimal feedback μ exists,i.e.,an admissible feedback law \(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\) satisfying (4.8)for all n∈ℕ0 and all \(x\in \mathbb{X}\).Then this optimal feedback asymptotically stabilizes the closed-loop system

$$x^+ = g(n,x) = f\bigl(x,\mu_\infty(n,x)\bigr)$$

on \(\mathbb{X}\) in the sense of Definition 2.16.

Proof

For the closed-loop system, (4.9) and the last inequality in (4.12) yield

$$ \begin{array}{*{20}l} {V_\infty (n,x)} \hfill & { = \ell (n,x,\mu _\infty (n,x)) + V_\infty (n + 1,f(x,\mu _\infty (n,x)))} \hfill \\ {} \hfill & { \ge \alpha _3 (|x|_{x^{{\rm ref}} (n)} ) + V_\infty (n + 1,f(x,\mu _\infty (n,x))).} \hfill \\ \end{array} $$

Together with the first two inequalities in (4.12) this shows thatV is a Lyapunov function on\(\mathbb{X}\) in the sense of Definition 2.21 withα V =α 3. Thus, Theorem 2.22 yields asymptotic stability on\(\mathbb{X}\). □

By Theorem 4.3 we can replace (4.12) by the asymptotic controllability condition from Definition 4.2 if is of the form (4.3). This is used in the following corollary in order to give a stability result without explicitly assuming (4.12).

Corollary 4.9

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that the system is asymptotically controllable to x ref and that an optimal feedback μ ,i.e.,a feedback satisfying (4.8),exists for the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) from (4.3)with λ=0.Then this optimal feedback asymptotically stabilizes the closed-loop system

$$x^+ = g(n,x) = f\bigl(x,\mu_\infty(n,x)\bigr)$$

on \(\mathbb{X}\) in the sense of Definition 2.16.

If,in addition,the asymptotic controllability has the small control property then the statement also holds for from (4.3)with arbitrary λ≥0.

Proof

Theorem 4.3 yields

$$\alpha_1\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr) \le V_\infty(n_0,x_0)\le\alpha_2\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr)$$

for suitable\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\). Furthermore, by (4.3) the third inequality in (4.12) holds with\(\alpha_{3}=\gamma_{1}^{-1}\). Hence, (4.12) holds and Theorem 4.8 yields asymptotic stability on \(\mathbb{X}\). □

4.3 Relaxed Dynamic Programming

The last results of the previous section show that infinite horizon optimal control can be used in order to derive a stabilizing feedback law. Unfortunately, a direct solution of infinite horizon optimal control problems is in general impossible, both analytically and numerically. Still, infinite horizon optimal control plays an important role in our analysis since we will interpret the model predictive control algorithm as an approximation of the infinite horizon optimal control problem. Here the term “approximation” is not necessarily to be understood in the sense of “being close to” (although this aspect is not excluded) but rather in the sense of “sharing the important structural properties”.

Looking at the proof of Theorem 4.8 we see that the important property for stability is the inequality

$$V_\infty(n,x) \ge \ell\bigl(n,x,\mu_\infty(n,x)\bigr) + V_\infty\bigl(n+1,f\bigl(x,\mu_\infty(n,x)\bigr)\bigr)$$

which follows from the feedback version (4.9) of the dynamic programming principle. Observe that although (4.9) yields equality, only this inequality is needed in the proof of Theorem 4.8.

This observation motivates a relaxed version of this dynamic programming inequality which on the one hand yields asymptotic stability and on the other hand provides a quantitative measure of the closed-loop performance of the system. This relaxed version will be formulated in Theorem 4.11, below. In order to quantitatively measure the closed-loop performance, we use the infinite horizon cost functional evaluated along the closed-loop trajectory which we define as follows.

Definition 4.10

Let\(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law. For the trajectoriesx μ (n) of the closed-loop systemx +=f(x,μ(n,x)) with initial value\(x_{\mu}(n_{0})=x_{0}\in \mathbb{X}\) we define theinfinite horizon cost as

$$J_\infty(n_0,x_0,\mu) := \sum_{k=0}^\infty \ell\bigl(n_0+k, x_\mu(n_0+k),\mu\bigl(n_0+k,x_\mu(n_0+k)\bigr)\bigr).$$

Since by (3.8) our running cost is always nonnegative, either the infinite sum has a well defined finite value or it diverges to infinity, in which case we writeJ (n 0,x 0,μ)=∞.

By Corollary 4.7 for the infinite horizon optimal feedback lawμ we obtain

$$J_\infty(n_0,x_0,\mu_\infty) = V_\infty(n_0,x_0)$$

while for all other admissible feedback lawsμ we get

$$J_\infty(n_0,x_0,\mu) \ge V_\infty(n_0,x_0).$$

In other words,V is a strict lower bound forJ (n 0,x 0,μ).

The following theorem now gives a relaxed dynamic programming condition from which we can derive both asymptotic stability and an upper bound on the infinite horizon costJ (n 0,x 0,μ) for an arbitrary admissible feedback lawμ.

Theorem 4.11

Consider a running cost \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) and a function \(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\).Let \(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law and let \(S(n)\subseteq \mathbb{X}\),n∈ℕ0 be a family of forward invariant sets for the closed-loop system

$$x^+ = g(n,x) = f\bigl(x,\mu(n,x)\bigr). $$
(4.13)

Assume there exists α∈(0,1]such that the relaxed dynamic programming inequality

$$V(n,x) \ge \alpha\ell\bigl(n,x,\mu(n,x)\bigr) + V\bigl(n+1,f\bigl(x,\mu(n,x)\bigr)\bigr) $$
(4.14)

holds for all n∈ℕ0 and all xS(n).Then the suboptimality estimate

$$J_\infty(n,x,\mu) \le V(n,x)/\alpha $$
(4.15)

holds for all n∈ℕ0 and all xS(n).

If,in addition,there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities

$$\alpha_1\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr) \le V(n,x) \le \alpha_2\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr) \quad \mathit{and} \quad \ell(n,x,u) \ge \alpha_3\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr)$$

hold for all \(x\in \mathbb{X}\),n∈ℕ0,uU and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0} \to \mathbb{X}\),then the closed-loop system (4.13)is asymptotically stable on S(n)in the sense of Definition 2.16.

Proof

In order to prove (4.15) considern∈ℕ0,xS(n) and the trajectoryx μ (⋅) of (4.13) withx μ (n)=x. By forward invariance of the setsS(n) this trajectory satisfiesx μ (n+k)∈S(n+k). Hence from (4.14) for allk∈ℕ0 we obtain

$$ \begin{array}{*{20}l} {\alpha \ell (n + k,x_\mu (n + k),\mu (n + k,x_\mu (n + k)))} \hfill \\ { \le V(n + k,x_\mu (n + k)) - V(n + k + 1,x_\mu (n + k + 1)).} \hfill \\ \end{array} $$

Summing overk yields for allK∈ℕ

$$ \begin{array}{*{20}l} {\alpha \sum\limits_{k = 0}^{K - 1} {\ell (n + k,x_\mu (n + k),\mu (n + k,x_\mu (n + k)))} } \hfill \\ { \le V(n,x_\mu (n)) - V(n + K,x_\mu (n + K))} \hfill \\ { \le V(n,x)} \hfill \\ \end{array} $$

sinceV(n+K,x μ (n+K))≥0 andx μ (n)=x. Since the running cost is nonnegative, the term on the left is monotone increasing and bounded, hence forK→∞ it converges toαJ (n,x,μ). Since the right hand side is independent ofK, this yields (4.15).

The stability assertion now immediately follows by observing thatV satisfies all assumptions of Theorem 2.22 withα V =αα 3. □

Remark 4.12

An inspection of the proof of Theorems 2.19 and 2.22 reveals that for fixed\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\) andα V =αα 3 with fixed\(\alpha_{3}\in \mathcal{K}_{\infty}\) and varyingα∈(0,1] the attraction rate\(\beta\in \mathcal{KL}\) constructed in this proof depends onα in the following way: ifβ α andβ α are the attraction rates from Theorem 2.22 forα V =αα 3 andα V =αα 3, respectively, withα′≥α, thenβ α(r,t)≤β α (r,t) holds for allr,t≥0. This in particular implies that for every\(\bar{\alpha}\in(0,1)\) the attraction rate\(\beta_{\bar{\alpha}}\) is also an attraction rate for all\(\alpha\in[\bar{\alpha},1]\), i.e., we can find an attraction rate\(\beta\in \mathcal{KL}\) which is independent of\(\alpha\in[\bar{\alpha},1]\).

Remark 4.13

Theorem 4.11 proves asymptotic stability of the discrete time closed-loop system (4.13) or (2.5). For a sampled data system (2.8) with sampling periodT>0 this implies the discrete time stability estimate (2.47) for the sampled data closed-loop system (2.30). For sampled data systems we may define the running cost as an integral over a functionL according to (3.4), i.e.,

$$\ell(x,u) := \int_0^T L\bigl(\varphi(t,0,x,u),u(t)\bigr)\,dt.$$

We show that for this choice of a mild condition onL ensures that the sampled data closed-loop system (2.30) is also asymptotically stable in the continuous time sense, i.e., that (2.48) holds. For simplicity, we restrict ourselves to a time invariant referencex refx .

The condition we use is that there exists\(\delta\in \mathcal{K}_{\infty}\) such that the vector fieldf c in (2.6) satisfies

$$\big\|f_c(x,u)\big\| \le \max\bigl\{ \varepsilon , \delta(1/\varepsilon ) L(x,u) \bigr\} $$
(4.16)

for allxX, alluU and allε>0. For instance, in a linear–quadratic problem withX=ℝd,U=ℝm andx =0 we have ‖f c (x,u)‖=‖Ax+Bu‖≤C 1(‖x‖+‖u‖) andL(x,u)=x Qx+u RuC 2(‖x‖+‖u‖)2 for suitable constantsC 1,C 2>0 providedQ andR are positive definite. In this case, (4.16) holds with\(\delta(r) = C_{1}^{2}/C_{2} r\), since ‖f c (x,u)‖>ε impliesC 1(‖x‖+‖u‖)>ε and thus

$$C_1\bigl(\|x\|+\|u\|\bigr) \le \frac{C_1^2}{\varepsilon }\bigl(\|x\|+\|u\|\bigr)^2 \le \frac{C_1^2}{C_2 \varepsilon }C_2\bigl(\|x\|+\|u\|\bigr)^2 = \delta(1/\varepsilon )L(x,u).$$

In the general nonlinear case, (4.16) holds iff c is continuous withf c (x ,u )=0,L(x,u) is positive definite and the inequality ‖f c (x,u)‖≤CL(x,u) holds for some constantC>0 whenever ‖f c (x,u)‖ is sufficiently large.

We now show that (4.16) together with Theorem 4.11 implies the continuous time stability estimate (2.48). If the assumptions of Theorem 4.11 hold, then (4.15) implies\(\ell(x,\mu(x)) \le V(x)/\alpha \le \alpha_{2}(|x|_{x_{*}})/\alpha\). Thus, fort∈[0,T] Inequality (4.16) yields

$$ \begin{array}{*{20}l} {|\varphi (t,0,x,\mu )|_{x^* } } \hfill & { \le |x|_{x^* } + \int_0^t {\left\| {f_c (\varphi (\tau ,0,x,\mu ),\mu (x)(\tau )} \right\|d\tau } } \hfill \\ {} \hfill & { \le |x|_{x^* } + \max \left\{ {t\varepsilon ,\delta (1/\varepsilon )\int_0^t {(\varphi (\tau ,0,x,\mu ),\mu (x)(\tau ))d\tau } } \right\}} \hfill \\ {} \hfill & { \le |x|_{x^* } + \max \left\{ {T\varepsilon ,\delta (1/\varepsilon )\ell (x,u)} \right\}} \hfill \\ {} \hfill & { \le |x|_{x^* } + \max \left\{ {T\varepsilon ,\delta (1/\varepsilon )\alpha _2 (|x|_{x^* } )/\alpha } \right\}.} \hfill \\ \end{array} $$

Setting\(\varepsilon =\tilde{\gamma}(|x|_{x_{*}})\) with

$$\tilde{\gamma}(r)=\frac{1}{\delta^{-1}(\frac{1}{\sqrt{\alpha_2(r)}} )}$$

forr>0 and\(\tilde{\gamma}(0)=0\) yields\(\tilde{\gamma}\in \mathcal{K}_{\infty}\) and

$$\delta(1/\varepsilon )\alpha_2\bigl(|x|_{x_*}\bigr) = \sqrt{\alpha_2\bigl(|x|_{x_*}\bigr)}.$$

Hence, defining

$$\gamma(r) = r + \max\bigl\{ T\tilde{\gamma}(r), \sqrt{\alpha_2(r)}/\alpha\bigr\}$$

we finally obtain

$$\big|\varphi(t,0,x,\mu)\big|_{x_*} \le \gamma\bigl(|x|_{x_*}\bigr)$$

for allt∈[0,T] with\(\gamma\in \mathcal{K}_{\infty}\).

Hence, if (4.16) and the assumptions of Theorem 4.11 hold, then the sampled data closed-loop system (2.30) fulfills the uniform boundedness overT property from Definition 2.24 and consequently by Theorem 2.27 the sampled data closed-loop system (2.30) is asymptotically stable.

We now turn to investigating practical stability. Recalling Definitions 2.15 and 2.17 ofP-practical asymptotic stability and their Lyapunov function characterizations in Theorems 2.20 and 2.23 we can formulate the following practical version of Theorem 4.11.

Theorem 4.14

Consider a running cost \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) and a function \(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\).Let \(\mu:\mathbb{N}_{0}\times \mathbb{X}\to U\) be an admissible feedback law and let \(S(n)\subseteq \mathbb{X}\),and P(n)⊂S(n),n∈ℕ0 be families of forward invariant sets for the closed-loop system (4.13).

Assume there exists α∈(0,1]such that the relaxed dynamic programming inequality (4.14)holds for all n∈ℕ0 and all xS(n)∖P(n).Then the suboptimality estimate

$$J_{k^*}(n,x,\mu) \le V(n,x)/\alpha $$
(4.17)

holds for all n∈ℕ0 and all xS(n),where k ∈ℕ0 is the minimal time with x μ (k +n,n,x)∈P(k +n)and

$$J_{k^*}(n,x,\mu) := \sum_{k=0}^{k^*-1} \ell\bigl(n+k,x_\mu(n+k,n,x),\mu\bigl(n+k,x_\mu(n+k,n,x)\bigr)\bigr)$$

is the truncated closed-loop performance functional from Definition 4.10.

If,in addition,there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the inequalities

$$\alpha_1\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr) \le V(n,x) \le \alpha_2\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr) \quad \mathit{and} \quad \ell(n,x,u) \ge \alpha_3\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr)$$

hold for all \(x\in \mathbb{X}\),n∈ℕ0 and uU and a reference \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\),then the closed-loop system (4.13)is P-asymptotically stable on S(n)in the sense of Definition 2.17.

Proof

The proof follows with analogous arguments as the proof of Theorem 4.11 by only consideringk<k in the first part and using Theorem 2.23 withY(n)=S(n) instead of Theorem 2.22 in the second part. □

Remark 4.15

  1. (i)

    Note that Remark 4.12 holds accordingly for Theorem 4.14. Furthermore, it is easily seen that both Theorem 4.11 and Theorem 4.14 remain valid iff in (4.13) depends onn.

  2. (ii)

    The suboptimality estimate (4.17) states that the closed-loop trajectoriesx μ (⋅,x) from (4.13) behave like suboptimal trajectories until they reach the setsP(⋅).

As a consequence of Theorem 4.11, we can show the existence of a stabilizing almost optimal infinite horizon optimal feedback even if no infinite horizon optimal feedback exists. The assumptions of the following Theorem 4.16 are identical with the assumptions of Theorem 4.8 except that we do not assume the existence of an infinite horizon optimal feedback lawμ .

Theorem 4.16

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))with running cost of the form (3.8)for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that there exist \(\alpha_{1},\alpha_{2},\alpha_{3}\in \mathcal{K}_{\infty}\) such that the Inequalities (4.12)hold for all \(x\in \mathbb{X}\),n∈ℕ0 and uU.

Then for each α∈(0,1)there exists an admissible feedback \(\mu_{\alpha}:\mathbb{N}_{0}\times \mathbb{X}\to U\) which asymptotically stabilizes the closed-loop system

$$x^+ = g(n,x) = f\bigl(x,\mu_\alpha(n,x)\bigr)$$

on \(\mathbb{X}\) in the sense of Definition 2.16and satisfies

$$J_\infty(n,x,\mu_\alpha) \le V_\infty(n,x)/\alpha $$

for all \(x\in \mathbb{X}\) and n∈ℕ0.

Proof

Fixα∈(0,1) and pick an arbitrary\(x\in \mathbb{X}\). From (4.5) forK=1 for each\(x\in \mathbb{X}\) and eachε>0 there exists\(u_{x}^{\varepsilon }\in \mathbb{U}^{1}(x)\) with

$$V_\infty(n,x) \ge \ell\bigl(n,x,u_x^\varepsilon \bigr) + V_\infty\bigl(n+1,f\bigl(x,u_x^\varepsilon \bigr)\bigr)-\varepsilon .$$

IfV (n,x)>0, then (4.12) impliesxx ref(n) and thus again (4.12) yields the inequality inf  uU (n,x,u)>0. Hence, choosingε=(1−α)inf  uU (n,x,u) and setting\(\mu_{\alpha}(n,x) = u_{x}^{\varepsilon }\) yields

$$V_\infty(n,x) \ge \alpha\ell\bigl(n,x,\mu_\alpha(n,x)\bigr) + V_\infty \bigl(n+1,f\bigl(x,\mu_\alpha(n,x)\bigr)\bigr). $$
(4.18)

IfV (n,x)=0, then (4.12) impliesx=x ref(n) and thus from the definition ofu ref we getf(x,u ref(n))=x ref(n+1). Using (4.12) once again gives usV (n+1,f(x,u ref(n)))=0 and from (3.8) we get(n,x,u ref(n))=0. Thus,μ α (n,x)=u ref(n) satisfies (4.18). Hence, we obtain (4.14) withV=V for all\(x\in \mathbb{X}\). In conjunction with (4.12) this implies that all assumptions of Theorem 4.11 are satisfied forV=V with\(S(n)=\mathbb{X}\). Thus, the assertion follows. □

Again we can replace (4.12) by the asymptotic controllability condition from Definition 4.2.

Corollary 4.17

Consider the optimal control problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\))for the control system (2.1)and a reference trajectory \(x^{\mathrm{ref}}:\mathbb{N}_{0}\to \mathbb{X}\) with reference control sequence \(u^{\mathrm{ref}}\in \mathbb{U}^{\infty}(x^{\mathrm{ref}}(0))\).Assume that the system is asymptotically controllable to x ref and that the cost function \(\ell:\mathbb{N}_{0}\times X\times U\to \mathbb{R}_{0}^{+}\) is of the form (4.3)with λ=0.Then for each α∈(0,1)there exists an admissible feedback \(\mu_{\alpha}:\mathbb{N}_{0}\times \mathbb{X}\to U\) which asymptotically stabilizes the closed-loop system

$$x^+ = g(n,x) = f\bigl(x,\mu_\alpha(n,x)\bigr)$$

on \(\mathbb{X}\) in the sense of Definition 2.16and satisfies

$$J_\infty(n,x,\mu_\alpha) \le V_\infty(n,x)/\alpha $$

for all \(x\in \mathbb{X}\) and n∈ℕ0.

If,in addition,the asymptotic controllability has the small control property then the statement also holds for from (4.3)with arbitrary λ≥0.

Proof

Theorem 4.3 yields

$$\alpha_1\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr) \le V_\infty(n_0,x_0)\le\alpha_2\bigl(|x_0|_{x^{\mathrm{ref}}(n_0)}\bigr)$$

for suitable\(\alpha_{1},\alpha_{2}\in \mathcal{K}_{\infty}\). Furthermore, by (4.3) the third inequality in (4.12) holds with\(\alpha_{3}=\gamma_{1}^{-1}\). Hence, (4.12) holds and Theorem 4.16 yields the assertion. □

While Theorem 4.16 and Corollary 4.17 are already nicer than Theorem 4.8 and Corollary 4.9, respectively, in the sense that no existence of an optimal feedback law is needed, for practical applications both theorems require the (at least approximate) solution of an infinite horizon optimal control problem, which is in general a hard, often infeasible computational task, see also the discussion in Sect. 4.4, below.

Hence, in the following chapters we are going to use Theorem 4.11 and Theorem 4.14 in a different way: we will derive conditions under which (4.14) is satisfied by the finite horizon optimal value functionV=V N and the corresponding NMPC-feedback lawμ=μ N . The advantage of this approach lies in the fact that in order to computeμ N (n 0,x 0) it is sufficient to know the finite horizon optimal control sequenceu for initial valuex 0. This is a much easier computing task, at least if the optimization horizonN is not too large.

4.4 Notes and Extensions

Infinite horizon optimal control is a classical topic in control theory. The version presented in Sect. 4.1 can be seen as a nonlinear generalization of the classical (discrete time) linear–quadratic regulator (LQR) problem, see, e.g., Dorato and Levis [6]. A rather general existence result for optimal control sequences and trajectories in the metric space setting considered here was given by Keerthi and Gilbert [15]. Note, however, that by Theorem 4.16 we do not need the existence of optimal controls for the existence of almost optimal stabilizing feedback controls.

Dynamic programming as introduced in Sect. 4.2 is a very common approach also for infinite horizon optimal control and we refer to the discussion in Sect. 3.5 for some background information. As in the finite horizon case, the monographs of Bertsekas [2,3] provide a good source for more information on this method.

The connection between infinite horizon optimal control and stabilization problems for nonlinear systems has been recognized for quite a while. Indeed, the well known construction of control Lyapunov functions in continuous time by Sontag [23] is based on techniques from infinite horizon optimal control. As already observed after Corollary 4.7, discrete time infinite horizon optimal control is nothing but NMPC withN=∞. This has lead to the investigation of infinite horizon NMPC algorithms, e.g., by Keerthi and Gilbert [16], Meadows and Rawlings [19], Alamir and Bornard [1]. For linear systems, this approach was also considered in the monograph of Bitmead, Gevers and Wertz [4].

The stability results in this chapter are easily generalized to the stability of sets\(X^{\mathrm{ref}}(n)\subset \mathbb{X}\) when is of the form (3.24). In this case, it suffices to replace the bounds\(\alpha_{j}(|x|_{x^{\mathrm{ref}}(n)})\),j=1,2,3, in, e.g., Theorem 4.11 by bounds of the form

$$\alpha_j \Bigl(\,\min_{y\in X^{\mathrm{ref}}(n)} |x|_{y} \Bigr). $$
(4.19)

Alternatively, one could formulate these bounds via so-called proper indicator functions as used, e.g., by Grimm et al. in [8].

By Formula (4.8) the optimal—and stabilizing—feedback lawμ can be computed by solving a rather simple optimization problem once the optimal value functionV is known. This has motivated a variety of approaches for solving the dynamic programming equation (4.5) (usually forK=1) numerically in order to obtain an approximation ofμ from a numerical approximation ofV . Approximation techniques like linear and multilinear approximations are proposed, e.g., in Kreisselmeier and Birkhölzer [17], Camilli, Grüne and Wirth [5] or by Falcone [7]. A set oriented approach was developed in Junge and Osinga [14] and used for computing stabilizing feedback laws in Grüne and Junge [10] (see also [11,12] for further improvements of this method). All such methods, however, suffer from the so-calledcurse of dimensionality which means that the numerical effort grows exponentially with the dimension of the state spaceX. In practice, this means that these approaches can only be applied for low-dimensional systems, typically not higher than 4–5. For homogeneous systems, Tuna [25] (see also Grüne [9]) observed that it is sufficient to computeV on a sphere, which reduces the dimension of the problem by one. Still, this only slightly reduces the computational burden. In contrast to this, a numerical approximation of the optimal control sequenceu for finite horizon optimal control problems like (OCPN) and its variants is possible also in rather high space dimensions, at least when the optimization horizonN is not too large. This makes the NMPC approach computationally attractive.

Relaxed dynamic programming in the form introduced in Sect. 4.3 was originally developed by Lincoln and Rantzer [18] and Rantzer [20] in order to lower the computational complexity of numerical dynamic programming approaches. Instead of trying to solve the dynamic programming equation (4.5) exactly, it is only solved approximately using numerical approximations forV from a suitable class of functions, e.g., polynomials. The idea of using such relaxations is classical and can be realized in various other ways, too; see, e.g., [2, Chap. 6]. Here we use relaxed dynamic programming not for solving (4.5) but rather for proving properties of closed-loop solutions, cf. Theorems 4.11 and 4.14. While the specific form of the assumptions in these theorems were first used in an NMPC context in Grüne and Rantzer [13], the conceptual idea is actually older and can be found, e.g., in Shamma and Xiong [22] or in Scokaert, Mayne and Rawlings [21]. The fact that stability of the sampled data closed loop can be derived from the stability of the associated discrete time system for integral costs (3.4), cf. Remark 4.13, was, to the best of our knowledge, not observed before.

4.5 Problems

  1. 1.

    Consider the problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with finite optimal value function\(V_{\infty}:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) and asymptotically stabilizing admissible optimal feedback law\(\mu_{\infty}:\mathbb{N}_{0}\times \mathbb{X}\to U\). Let\(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) be a function which satisfies

    $$V(n,x_0)= \min_{u\in \mathbb{U}^1(x_0)} \bigl\{ \ell(n,x_0,u) + V\bigl(n+1,f(x_0,u)\bigr) \bigr\} $$
    (4.20)

    for alln∈ℕ0 and allx 0X.

    1. (a)

      Prove thatV(n,x)≥V (n,x) holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).

    2. (b)

      Prove that for the optimal feedback law the inequality

      $$ \begin{array}{*{20}l} {V(n,x) - V_\infty (n,x) \le } \hfill & {V(n + 1,f(x,\mu _\infty (n,x)))} \hfill \\ {} \hfill & { - V_\infty (n + 1,f(x,\mu _\infty (n,x)))} \hfill \\ \end{array} $$

      holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).

    3. (c)

      Assume that in addition there exist\(\alpha_{2}\in \mathcal{K}_{\infty}\) such that the inequality

      $$V(n,x) \le \alpha_2\bigl(|x|_{x^{\mathrm{ref}}(n)}\bigr)$$

      holds for alln∈ℕ0,\(x\in \mathbb{X}\) and a reference trajectory\(x^{\mathrm{ref}}:\mathbb{N}_{0} \to \mathbb{X}\). Prove that under this conditionV(n,x)=V (n,x) holds for alln∈ℕ0 and all\(x\in \mathbb{X}\).

    4. (d)

      Find a function\(V:\mathbb{N}_{0}\times X\to \mathbb{R}_{0}^{+}\) satisfying (4.20) but for whichV(n,x)=V (n,x) does not hold. Of course, for this function the additional condition onV from (c) must be violated.

    Hint for (a): Define a feedbackμ which assigns to each pair (n,x) a minimizer of the right hand side of (4.20), check that Theorem 4.11 is applicable for\(S(n)=\mathbb{X}\) (for whichα∈(0,1]?) and conclude the desired inequality from (4.15).

    Hint for (c): Perform an induction over the inequality from (b) along the optimal closed-loop trajectory.

  2. 2.

    Consider the unconstrained linear control system

    $$ x^ + = Ax + Bu $$

    with matricesA∈ℝd×d,B∈ℝd×m. Consider problem (\(\mathrm{OCP}_{\infty}^{\mathrm{n}}\)) with

    $$\ell(x,u) = x^\top Q x + u^\top R u$$

    with symmetric positive definite matricesQ,R of appropriate dimension (this setting is called the linear–quadratic regulator (LQR) problem). If the pair (A,B) is stabilizable, then it is known that the discrete time algebraic Riccati equation

    $$ P = Q + A^ \top (P - PB(B^ \top PB + R)^{ - 1} B^ \top P)A $$

    has a unique symmetric and positive definite solutionP∈ℝd×d.

    1. (a)

      Show that the functionV(x)=x Px satisfies (4.20). Note that since the problem here is time invariant we do not need the argumentn.

    2. (b)

      Use the results from Problem 1 to conclude thatV (x)=x Px holds. You may assume without proof that an optimal feedbackμ exists.

    3. (c)

      Prove that the corresponding optimal feedback law asymptotically stabilizes the equilibriumx =0.

    Hint for (a): For matricesC,D,E of appropriate dimensions withC,D symmetric andD positive definite the formula

    $$\min_{u\in \mathbb{R}^m} \bigl\{ x^\top Cx + u^\top Du + u^\top E^\top x + x^\top Eu\bigr\}= x^\top \bigl(C-ED^{-1} E^\top\bigr)x$$

    holds. This formula is proved by computing the zero of the derivative of the expression in the “min” with respect tou (which is also a nice exercise).

    Hint for (b) and (c): For any symmetric and positive definite matrixM∈ℝd×d there exist constantsC 2C 1>0 such that the inequalityC 1x2x MxC 2x2 holds for allx∈ℝd.

  3. 3.

    Consider the finite horizon counterpart (OCPN) of Problem 2. For this setting one can show that the optimal value function is of the formV N (x)=x P N x and that the matrixP N converges to the matrixP from Problem 2 asN→∞. This convergence implies that for eachε>0 there existsN ε >0 such that the inequality

    $$\big|x^\top P_N x - x^\top P x\big| \le \varepsilon \|x\|^2$$

    holds for allNN ε . Use this property and Theorem 4.11 in order to prove that the NMPC-feedback law from Algorithm 3.1 is asymptotically stabilizing for sufficiently large optimization horizonN>0.

    Hint: Look at the hint for Problem 2(b) and (c).

  4. 4.

    Consider the scalar control system

    $$ x^ + = x + u $$

    withxX=ℝ,uU=ℝ which shall be controlled via the NMPC Algorithm 3.1 using the quadratic running cost function

    $$ \ell (x,u) = x^2 + u^2 . $$

    ComputeV N (x 0) andJ (x 0,μ N (⋅)) forN=2 (cf. Chap. 3, Problem 3). Using these values, derive the degree of suboptimalityα from the relaxed dynamic programming inequality (4.14) and from the suboptimality estimate (4.15).