1 Introduction

Whenever the state of some dynamical system can be influenced be repeatedly applying some control (“decision”) to the system, the question might arise how the sequence of controls – the policy – can be chosen in such a way that some given objective is met. For example, one might be interested in steering the system to an equilibrium point, i.e. to stabilize the otherwise unstable point. In many contexts, the application of some control comes at some cost (fuel, money, time, ...) which then is accumulated over time. Typically, one is interested in meeting the given objective at minimal accumulated cost. This is the context of Richard Bellman’s famous quote which already hints at how to solve the problem: One can recursively construct an optimal sequence of controls backwards in time by starting at the/some final state. It just so happens that this is also the idea of Edsger Dijkstra’s celebrated algorithm for finding shortest paths in weighted directed graphs.

At the core, this procedure requires one to store the minimal accumulated cost at each state, the value function. According to the recursive construction of the sequence of optimal controls, the value function satisfies a recursion, i.e. a fixed point equation, the Bellman equation. From the value function at some state, the optimal control associated to that state can be recovered by solving a static optimization problem. This assignment defines a function on (a subset of) the states into the set of all possible control values and so the state can be fed back into the system, yielding a dynamical system without any external input. By construction, the accumulated cost along some trajectory of this closed loop system will be minimal.

In the case of a finite state space (with a reasonable number of states), storing the value function is easy. In many applications from, e.g., the engineering sciences, however, the state space is a subset of Euclidean space and thus the value function a function defined on a continuum of states. In this case, the value function typically cannot be represented in a closed form. Rather, some approximation scheme has to be decided upon and the value function (and thus the feedback) has to be approximated numerically.

In this chapter, we review contributions by the authors developing an approach for approximating the value function and the associated feedback by piecewise constant functions. This may seem like a bad idea at first, since in general one would prefer approximation spaces of higher order. However, it turns out that this ansatz enables an elegant solution of the discretized problem by standard shortest path algorithms (i.e. Dijkstra’s algorithm). What is more, it also enables a unified treatment of system classes which otherwise would require specialized algorithms, like hybrid systems, event systems or systems with quantized state spaces.

As is common for some discretization, the discrete value function does not inherit a crucial property of the true one: In general, it does not decrease monotonically along trajectories of the closed loop system. In other words, it does not constitute a Lyapunov function of the closed loop system. As a consequence, the associated feedback may fail to stabilize some initial state. This deficiency can be cured by considering a more general problem class, namely a system which can be influenced by two independent controls – a dynamic game. In particular, if the second input is interpreted as some perturbation induced by the discretization, a discrete feedback results which retains the Lyapunov function property.

On the other hand, as any construction based on the Bellman equation, or more generally as any computational scheme which requires to represent a function with domain in some Euclidean space, our construction is prone to the curse of dimension (a term already coined by Bellman): In general, i.e. unless some specialized approximation space is employed, the computational cost for storing the value function grows exponentially in the dimension of state space. That is, in practice, our approach is limited to systems with a low dimensional state space (i.e. of dimension \(\le \)4, say).

2 Problem Formulation

We are given a control system in discrete time

$$\begin{aligned} x_{k+1} = f(x_k, u_k, w_k), \quad k=0,1,\ldots , \end{aligned}$$
(1)

where \(x_k\in X\) is the state of the system, \(u_k\in U\) is the control input and \(w_k\in W\) is some external perturbation. We are further given an instantaneous cost function g which assigns the cost

$$ g(x_k,u_k) \ge 0 $$

to any transition \(x_k \mapsto f(x_k,u_k,w)\), \(w\in W\).

Our task is to globally and optimally stabilize a given target set \(T\subset X\) by constructing a feedback \(u:S\rightarrow U\), \(S\subset X\), such that T is an asymptotically stable set for the closed loop system

$$\begin{aligned} x_{k+1}=f(x_k,u(x_k),w_k), \quad k=0,1,\ldots \end{aligned}$$
(2)

with \(x_0\in S\) for any sequence \((w_k)_k\) of perturbations and such that the accumulated cost

$$\begin{aligned} \sum _{k=0}^\infty g(x_k,u(x_k)) \end{aligned}$$
(3)

is minimal.

System Classes. Depending on the choice of the spaces XU and W and the form of the map f, a quite large class of systems can be modelled by (1). Most generally, XU and W have to be compact metric spaces – in particular, they may be discrete. Common examples which will also be considered later, include

  • sampled-data systems: XU and W are compact subsets of Euclidean space, f is the time-T-map of the control flow of some underlying continuous time control system and g typically integrates terms along the continuous time solution over one sampling interval;

  • hybrid systems: \(X=Y\times D\), where \(Y\subset \mathbb {R}^n\) compact and D is finite, U and W may be continuous (compact) sets or finite (cf. Sect. 8);

  • discrete event systems: f may be chosen as a (generalized) Poincaré map (cf. Sect. 8).

  • quantized systems: The feedback may receive only quantized information on the state x, i.e. x is projected onto a finite subset of X before u is evaluated on this quantized state.

3 The Optimality Principle

The construction of the feedback law u will be based on a discretized version of the optimality principle. In order to convey the basic idea more clearly, we start by considering problem (1) without perturbations, i.e.

$$\begin{aligned} x_{k+1}=f(x_k,u_k), \quad k=0,1,\ldots \end{aligned}$$
(4)

and assume that \(X\subset \mathbb {R}^d\) and \(U\subset \mathbb {R}^m\) are compact, \(0\in X\) and \(0\in U\). We further assume that \(0\in X\) is a fixed point of \(f(\,\cdot \,,0)\), i.e. \(f(0,0)=0\), constituting our target set \(T:=\{0\}\), that \(f: X \times U \rightarrow X\) and \(g:X\times U\rightarrow [0,\infty )\) are continuous, that \(g(0,0)=0\) and \(\inf _{u\in U}g(x,u) > 0\) for all \(x\ne 0\).

For a given initial state \(x_0\in X\) and a given sequence \({\textit{\textbf{u}}}=(u_0,u_1,\ldots )\in U^\mathbb {N}\) of controls, there is a unique trajectory \(\textit{\textbf{x}}(x_0,\textit{\textbf{u}})=(x_k(x_0,\textit{\textbf{u}}))_{k\in \mathbb {N}}\) of (4). For \(x\in X\), let

$$ \mathcal{U}(x) = \{ {\textit{\textbf{u}}}\in U^\mathbb {N}: x_k(x,\textit{\textbf{u}}) \rightarrow 0 \text { as }k\rightarrow \infty \} $$

denote the set of stabilizing control sequences and

$$ S = \{ x \in X : \mathcal{U}(x)\ne \emptyset \} $$

the stabilizable subset of X. The accumulated cost along some trajectory \(\textit{\textbf{x}}(x_0,\textit{\textbf{u}})\) is given by

$$\begin{aligned} J(x_0,{\textit{\textbf{u}}}) = \sum _{k=0}^\infty g(x_k(x_0,{\textit{\textbf{u}}}), u_k). \end{aligned}$$
(5)

Note that this series might not converge for some \((x_0,\textit{\textbf{u}})\). The least possible value of the accumulated cost over all stabilizing control sequences defines the (optimal) value function \(V:X\rightarrow [0,\infty ]\),

$$\begin{aligned} V(x) = \inf _{\textit{\textbf{u}}\in \mathcal {U}(x)} J(x,{\textit{\textbf{u}}}) \end{aligned}$$
(6)

of the problem. Let \(S_0:=\{x\in X: V(x) < \infty \}\) be the set of states in which the value function is finite. Clearly, \(S_0\subset S\). On \(S_0\), the value function satisfies the optimality principle [2]

$$\begin{aligned} V(x) = \inf _{u\in U} \left\{ g(x,u) + V(f(x,u)) \right\} . \end{aligned}$$
(7)

The right hand side

$$ L[v](x) := \inf _{u\in U} \left\{ g(x,u) + v(f(x,u)) \right\} $$

of (7) defines the Bellman operator L on real valued functions on X. The value function V is the unique fixed point of L satisfying the boundary condition \(V(0)=0\).

Using the value function V, one can construct the feedback \(u:S_0\rightarrow U\),

(8)

whenever this minimum exists. Obviously, V then satisfies

$$\begin{aligned} V(x) \ge g(x,u(x)) + V(f(x,u(x))), \end{aligned}$$
(9)

for \(x\in S_0\), i.e. the optimal value function is a Lyapunov function for the closed loop system on \(S_0\) (provided that V is continuous at \(T=\{0\}\)Footnote 1) – and this guarantees asymptotic stability of \(T=\{0\}\) for the closed loop system. By construction, this feedback u is also optimal in the sense that the accumulated cost J is minimized along any trajectory of the closed loop system.

4 A Discrete Optimality Principle

In general, the value function (resp. the associated feedback) cannot be determined exactly and some numerical approximation has to be sought. Here, we are going to approximate V by functions which are piecewise constant on some partition of X. This approach is motivated by the fact that the resulting discrete problem can be solved efficiently and that, via a generalization of the framework to perturbed systems in Sect. 5 the feedback is also piecewise constant and can be computed offline.

Let \(\mathcal{P}\) be a finite partition of the state space X, i.e. a finite collection of pairwise disjoint subsets of X whose union covers X. For \(x\in X\), let \(\pi (x)\in \mathcal{P}\) denote the partition element that contains x. In what follows, we identify any subset \(\{P_1,\ldots ,P_k\}\) of \(\mathcal{P}\) with the corresponding subset \(\bigcup _{i=1,\ldots k} P_i\) of X.

Let \(\mathbb {R}^\mathcal{P}\subset \mathbb {R}^X=\{v:X\rightarrow \mathbb {R}\}\) be the subspace of real valued functions on X which are piecewise constant on the elements of \(\mathcal{P}\). Using the projection

$$\begin{aligned} \psi [v](x) := \inf _{x'\in \pi (x)} v(x'), \end{aligned}$$
(10)

from \(\mathbb {R}^X\) onto \(\mathbb {R}^\mathcal{P}\), we define the discretized Bellman operator

$$ L_\mathcal{P}:= \psi \circ L. $$

Again, this operator has a unique fixed point \(V_\mathcal{P}\) satisfying the boundary condition \(V_\mathcal{P}(0)=0\), which will serve as an approximation to the exact value function V.

Explicitely, the discretized operator reads

$$\begin{aligned} L_{\mathcal{P}}[v](x)= & {} \inf _{x'\in \pi (x)} \left\{ \inf _{u\in U}\left\{ g(x',u)+ v(f(x',u))\right\} \right\} . \end{aligned}$$

and \(V_{\mathcal{P}}\) satisfies the optimality principle

$$\begin{aligned} V_{\mathcal{P}}(x) = \inf _{x'\in \pi (x),u\in U} \left\{ g(x',u)+V_{\mathcal{P}}(f(x',u))\right\} . \end{aligned}$$
(11)

Recalling that \(V_{\mathcal{P}}\) is constant on each element P of the partition \(\mathcal{P}\), we write \(V_\mathcal{P}(P)\) in order to denote the value \(V_\mathcal{P}(x)\) for some \(x\in P\). We can rewrite (11) as

$$\begin{aligned} V_{\mathcal{P}}(x)= & {} \min _P\;\inf _{(x',u)} \left\{ g(x',u)+V_{\mathcal{P}}(P)\right\} \end{aligned}$$
(12)

where the min is taken over all \(P\in \mathcal{P}\) for which \(P\cap f(\pi (x),U)\ne \emptyset \) and the inf over all pairs \(x'\in \pi (x)\), \(u\in U\) such that \(f(x',u)\in P\). Now define the multivalued map \(\mathcal{F}:\mathcal{P}\rightrightarrows \mathcal{P}\),

$$\begin{aligned} \mathcal{F}(P)= & {} \{ P'\in \mathcal{P}: P'\cap f(P,U)\ne \emptyset \} \end{aligned}$$
(13)

and the cost function \(\mathcal{G}:\mathcal{P}\times \mathcal{P}\rightarrow [0,\infty )\),

$$\begin{aligned} \mathcal{G}(P,P')=\inf _{u\in U}\{g(x,u)\mid x\in P, f(x,u)\in P'\}. \end{aligned}$$
(14)

Equation (12) can then be rewritten as

$$ V_{\mathcal{P}}(P)=\min _{P'\in \mathcal{F}(P)}\{ \mathcal{G}(P,P') + V_{\mathcal{P}}(P')\}. $$

Graph Interpretation. It is useful to think of this reformulation of the discrete optimality principle in terms of a directed weighted graph \(G_\mathcal{P}=(\mathcal{P},E_\mathcal{P})\). The nodes of the graph are given by the elements of the partition \(\mathcal{P}\), the edges are defined by the map \(\mathcal{F}\): there is an edge \((P,P')\in E_\mathcal{P}\) whenever \(P'\in \mathcal{F}(P)\) and the edge \(e=(P,P')\) is weighted by \(\mathcal{G}(e):=\mathcal{G}(P,P')\), cf. Fig. 1. In fact, the value \(V_\mathcal{P}(P)\) is the length \(\mathcal{G}(p):=\sum _{k=1}^m \mathcal{G}(e_k)\) of the shortest path \(p=(e_1,\ldots ,e_m)\) from P to the element \(\pi (0)\) containing 0 in this graph. As such, it can be computed by (e.g.) the following algorithm with complexity \(\mathcal{O}(|\mathcal{P}|\log (|\mathcal{P}|)+|E|)\):

Fig. 1.
figure 1

Partition of phase space, image of an element (left) and corresponding edges in the induced graph (right).

Algorithm

Dijkstra [5]

figure a

   \(\square \)

The time complexity of this algorithm depends on the data structure which is used in order to store the set \(\mathcal{Q}\). In our implementation we use a binary heap which leads to a complexity of \(\mathcal{O}((|\mathcal{P}|+|E|)\log |\mathcal{P}|)\). This can be improved to \(\mathcal{O}(|\mathcal{P}|\log |\mathcal{P}|+|E|)\) by employing a Fibonacci heap.

A similar idea is at the core of fast marching methods [16, 18] and ordered upwind methods [17].

Implementation. We use the approach from [3, 4] as implemented in GAIO in order to construct a cubical partition of X, stored in binary tree. For the construction of the edges and their weights, we use a finite set of sample points \(\tilde{U}\subset U\) and \(\tilde{P}\subset P\) for each \(P\in \mathcal{P}\) and compute the approximate image

$$\begin{aligned} \tilde{\mathcal{F}}(P)= & {} \{ P'\in \mathcal{P}: P'\cap f(\tilde{P},\tilde{U})\ne \emptyset \}, \end{aligned}$$
(15)

so that the set of edges is approximately given by all pairs \((P,P')\) for which \(P'\in \tilde{\mathcal{F}}(P)\). Correspondingly, the weight of the edge \((P,P')\) is approximated by

$$ \tilde{\mathcal{G}}(P,P') = \min _{(x,u) \in \tilde{P}\times \tilde{U}} \{g(x,u) \mid f(x,u) \in P'\}. $$

This construction of the graph via the mapping of sample points indeed constitutes the main computational effort in computing the discrete value function. It might be particularly expensive if the control system f is given by the control flow of a continuous time system. Note, however, that a sampling of the system will be required in any method that computes the value function. In fact, in standard methods like value iteration, the same point might be sampled multiple times (in contrast to the approach described here).

Certainly, this approximation of the box images introduces some error, i.e. one always has that \(\tilde{\mathcal{F}}(P)\subset \mathcal{F}(P)\), but typically \(\mathcal{F}(P)\subsetneqq \tilde{\mathcal{F}}(P)\). In experiments, one often increases the number of sample points until the result of the computation stabilizes. Alternatively, in the case that one is interested in a rigorous computation, either techniques based on Lipschitz estimates [13] or interval arithmetic [19] can be employed.

Example 1

(A simple 1D system) . Consider the system

$$\begin{aligned} x_{k+1} = x_k + (1-a) u_k x_k, \quad k=0,1,\ldots , \end{aligned}$$
(16)

where \(x_k\in X=[0,1], u_k\in U=[-1,1]\) and \(a\in (0,1)\) is a fixed parameter. Let

$$ g(x,u) = (1-a)x, $$

such that the optimal control policy is to steer to the origin as fast as possible, i.e. for every x, the optimal sequence of controls is \((-1,-1,\ldots )\). This yields \(V(x)=x\) as the value function.

For the experiment, we consider \(a=0.8\) and use partitions of equally sized subintervals of [0, 1]. The edge weights (14) are approximated by minimizing over 100 equally spaced sample points in each subinterval and 10 equally spaced points in U. Figure 2 shows the exact and two discrete value functions, resulting from running the code in Fig. 3 in Matlab (requires the GAIO toolboxFootnote 2).

Fig. 2.
figure 2

Exact (red) and discrete value functions for the simple example on partitions of 64 (black) and 1024 (blue) intervals.

Fig. 3.
figure 3

Code: value function for a simple 1d system.

4.1 The Discrete Value Function

Proposition 1

[14]. For every partition \(\mathcal{P}\) of X, \(V_\mathcal{P}(x) \le V(x)\) for all \(x\in X\).

Proof

The statement obviously holds for \(x \in X\) with \(V(x) = \infty \). So let \(x\in S_0\), i.e. \(V(x) < \infty \). For arbitrary \(\varepsilon > 0\), let \({\textit{\textbf{u}}} = (u_0, u_1, \ldots ) \in \mathcal{U}(x)\) be a control sequence such that \(J(x,{\textit{\textbf{u}}}) < V(x) + \varepsilon \) and \((x_k(x,\textit{\textbf{u}}))_k\) the associated trajectory of (4). Consider the path

$$ (e_1,\ldots ,e_m), \quad e_k=(\pi (x_{k-1}), \pi (x_k)), \quad k=1,\ldots ,m, $$

where \(x=x_0\) and and m is minimal with \(x_m \in \pi (0)\). The length of this path is

$$\begin{aligned} \sum _{k=1}^m \mathcal{G}(e_k)= & {} \sum _{k=1}^m \; \inf _{u\in U} \{g(x,u) \mid x \in \pi (x_{k-1}), f(x,u) \in \pi (x_k)\} \\\le & {} \sum _{k=1}^m \; g(x_{k-1},u_{k-1}) \le \sum _{k=1}^\infty \; g(x_{k-1},u_{k-1}) = J(x,{\textit{\textbf{u}}}), \end{aligned}$$

yielding the claim.    \(\square \)

This property immediately yields an efficient aposteriori error estimate for \(V_\mathcal{P}\): For \(x\in S_0\) consider

$$\begin{aligned} e(x) = {\inf _{u\in U}} \{g(x,u)+ V_\mathcal {P}(f(x,u)) \} - V_\mathcal {P}(x). \end{aligned}$$
(17)

Note that \(e(x) \ge 0\). Since

$$\begin{aligned} V(x)-V_\mathcal {P}(x)= & {} {\inf _{u\in U}}\{g(x,u)+V(f(x,u))\} - V_\mathcal {P}(x) \\\ge & {} {\inf _{u\in U}}\{g(x,u)+V_\mathcal {P}(f(x,u))\} - V_\mathcal {P}(x) = e(x), \end{aligned}$$

we obtain

Proposition 2

The function \(e:S_0\rightarrow [0,\infty )\) is a lower bound on the error between the true value function V and its approximation \(V_\mathcal {P}\):

$$ e(x) \le V(x)-V_\mathcal {P}(x), \quad x\in S_0. $$

Now consider a sequence \((\mathcal{P}^{(\ell )})_{\ell \in \mathbb {N}}\) of partitions of X which is nested in the sense that for all \(\ell \) and every \(P \in \mathcal{P}^{(\ell +1)}\) there is a \(P' \in \mathcal{P}^{(\ell )}\) such that \(P \subset P'\). For the next proposition recall that \(S\subset X\) is the set of initial conditions that can be asymptotically controlled to 0.

Proposition 3

[14]. For fixed \(x \in S\), the sequence \((V_{\mathcal{P}^{(\ell )}}(x))_{\ell \in \mathbb {N}}\) is monotonically increasing.

Proof

For \(x\in S\), the value \(V_{\mathcal{P}^{(\ell )}}(x)\) is the length of a shortest path \(p=(e_1, \ldots , e_m)\), \(e_k\in E_{\mathcal{P}^{(\ell )}}\), connecting \(\pi (x)\) to \(\pi (0)\) in \(\mathcal{P}^{(\ell )}\). Suppose that the claim was not true, i.e. for some \(\ell \) there are shortest paths p in \(G_{\mathcal{P}^{(\ell )}}\) and \(p'\) in \(G_{\mathcal{P}^{(\ell +1)}}\) such that \(\mathcal{G}(p') < \mathcal{G}(p)\). Using \(p'\), we are going to construct a path \(\tilde{p}\) in \(G_{\mathcal{P}^{(\ell )}}\) with \(\mathcal{G}(\tilde{p}) < \mathcal{G}(p)\), contradicting the minimality of p: Let \(p' = (e'_1, \ldots ,e'_{m'})\), with \(e'_k = (P'_{k-1}, P'_{k}) \in E_{\mathcal{P}^{(\ell +1)}}\). Hence, \(f(P'_{k-1}, U) \cap P'_{k} \ne \emptyset \), for \(k=1,\ldots ,m'\). Since the partitions \(\mathcal{P}^{(\ell )}\) are nested, there are sets \(\tilde{P}_k \in \mathcal{P}^{(\ell )}\) such that \(P'_k \subset \tilde{P}_k\) for \(k=0,\ldots ,m'\). Thus, \(f(\tilde{P}_{k-1}, U) \cap \tilde{P}_{k} \ne \emptyset \), i.e. \(\tilde{e}_{k} = (\tilde{P}_{k-1}, \tilde{P}_{k})\) is an edge in \(E_{\mathcal{P}^{(\ell )}}\) and \(\tilde{p} = (\tilde{e}_1, \ldots , \tilde{e}_{m'})\) is a path in \(G_{\mathcal{P}^{(\ell )}}\). Furthermore, for \(k=1,\ldots ,m'\),

$$\begin{aligned} \mathcal{G}(\tilde{e}_k)= & {} \inf _{u\in U} \{g(x,u) \mid x \in \tilde{P}_{k-1}, f(x,u) \in \tilde{P}_k\} \\\le & {} \inf _{u\in U} \{g(x,u) \mid x\in P'_{k-1}, f(x,u)\in P'_k\} = \mathcal{G}(e'_k). \end{aligned}$$

This yields \(\mathcal{G}(\tilde{p}) \le \mathcal{G}(p') < \mathcal{G}(p)\), contradicting the minimality of p.    \(\square \)

So far we have shown that for every \(x\in S\) we have a monotonically increasing sequence \((V_{\mathcal{P}^{(\ell )}}(x))_{\ell \in \mathbb {N}}\), which is bounded by V(x) due to Proposition 1. The following theorem states that for points \(x\in S\) the limit is indeed V(x) if the maximal diameter of the partition elements goes to 0. For some finite partition \(\mathcal{P}\) of X, let be the diameter of the partition \(\mathcal{P}\).

Theorem 1

[14]. If then \(V_{\mathcal{P}^{(\ell )}}(x) \rightarrow V(x)\) as \(\ell \rightarrow \infty \) for all \(x\in S\).

4.2 The Discrete Feedback

Recall that an optimally stabilizing feedback can be constructed using the (exact) value function for the problem (cf. (8)). We will use this idea, replacing V by its approximation \(V_\mathcal {P}\): using \(\tilde{U}\) from (15)Footnote 3, for \(x\in S\) we define

(18)

(the minimum exists because \(\tilde{U}\) is a finite set) and consider the closed loop system

$$\begin{aligned} x_{k+1} = f(x_k, u_\mathcal{P}(x_k)), \quad k=0,1,\ldots . \end{aligned}$$
(19)

The following theorems state in which sense this feedback is stabilizing and approximately optimal. Let again \((\mathcal {P}^{(\ell )})_{\ell \in \mathbb {N}}\) be a nested sequence of partitions of X and \(D\subseteq S\), \(0\in D\), an open set with the property that for each \(\varepsilon >0\) there exists \(\ell _0(\varepsilon )>0\) such that

$$ \max _{x\in D}|V(x)-V_{\mathcal {P}^{(\ell )}}(x)| \le \varepsilon , \quad \text {for } \ell \ge \ell _0({\varepsilon }). $$

Let further \(c>0\) be the largest value such that

$$ V_{\mathcal{P}^{(1)}}^{-1}([0,c])\subset D. $$

Note that by Proposition 3 this implies that \(V_{\mathcal{P}^{(\ell )}}^{-1}([0,c])\subset D\) for all \(\ell \in \mathbb {N}\).

Theorem 2

[7]. Under the assumptions above, there exists \(\varepsilon _0>0\) and a function \(\delta :\mathbb {R}\rightarrow \mathbb {R}\) with \(\lim _{\alpha \rightarrow 0}\delta (\alpha )=0\), such that for all \(\varepsilon \in (0,\varepsilon _0]\), all \(\ell \ge \ell _0(\varepsilon /2)\), all \(\eta \in (0,1)\) and all \(x_0\in V_{\mathcal{P}^{(\ell )}}^{-1}([0,c])\) the trajectory \((x_k)_k\) generated by the closed loop system (19) with feedback \(u_{\mathcal{P}^{(\ell )}}\) satisfies

$$ V(x_k) \le \max \left\{ V(x_0)-(1-\eta )\sum _{j=0}^{k-1} g(x_j,u_{\mathcal {P}(\ell )}(x_j)), \delta (\varepsilon /\eta )+\varepsilon \right\} . $$

This apriori estimate shows in which sense the feedback \(u_\mathcal{P}\) approximately yields optimal performance. However, the theorem does not give information about the partition \(\mathcal {P}\) which is needed in order to achieve a desired level of accuracy. This can be achieved by employing the error function e from above.

Consider some partition \(\mathcal{P}\) of X. Let \(g_0(x):=\inf _{u\in U} g(x,u)\) and \(C_{\varepsilon }(\mathcal{P}) := \{x\in V_{\mathcal{P}}^{-1}([0,c]\,|\, g_0(x) \le \varepsilon \}\) and define \(\delta (\varepsilon ) := \sup _{x\in C_{\varepsilon }} V(x)\). Note that if V is continuous at \(T=\{0\}\) then \(\delta (\varepsilon )\rightarrow 0\) as \(\varepsilon \rightarrow 0\) because \(C_\varepsilon (\mathcal{P})\) shrinks down to 0 since g and thus \(g_0\) are continuous.

Theorem 3

[7]. Assume that for some \(\varepsilon >0\) and some \(\eta \in (0,1)\), the error function e satisfies

$$\begin{aligned} e(x)\le \max \{ \eta g_0(x),\, \varepsilon \} \quad \text {for all } x\in V_{\mathcal{P}}^{-1}([0,c]). \end{aligned}$$
(20)

Then, for each \(x_0\in V_{\mathcal{P}}^{-1}([0,c]\), the trajectory \((x_k)_k\) generated by the closed loop system (19) satisfies

$$\begin{aligned} V_\mathcal {P}(x_k) \le \max \left\{ V_\mathcal {P}(x_0)-(1-\eta )\sum _{j=0}^{k-1} g(x_j,u_{\mathcal{P}}(x_j)), \delta (\varepsilon /\eta )+\varepsilon \right\} . \end{aligned}$$
(21)

Example 2

(An inverted pendulum) . We consider a model for an inverted pendulum on a cart, cf. [7, 14]. We ignore the dynamics of the cart, and so we only have one degree of freedom, namely the angle \(\varphi \in [0,2\pi ]\) between the pendulum and the upright vertical. The origin \((\varphi ,\dot{\varphi })=(0,0)\) is an unstable equilibrium (with the pendulum pointing upright) which we would like to stabilize. The model reads

$$\begin{aligned} \textstyle \left( \frac{4}{3} - m_r \cos ^2 \varphi \right) \ddot{\varphi }+ \frac{m_r}{2} \dot{\varphi }^2 \sin 2 \varphi - \frac{g}{\ell }\sin \varphi = - u\;\frac{m_r}{m \ell } \cos \varphi , \end{aligned}$$
(22)

where \(m=2\) is the mass of the pendulum, \(M=8\) the mass of the cart, \(m_r = m / (m + M)\), \(\ell = 0.5\) the length of the pendulum and \(g = 9.8\) the gravitational constant. We consider the discrete time control system (4) with \(f(x,u) = \Phi ^t(x,u)\), \(x=(\varphi ,\dot{\varphi })\), for \(t=0.1\), where \(\Phi ^{t}(x,u)\) denotes the controlled flow of (22) with constant control input \(u(\tau )=u\) for \(\tau \in [0,t]\). For the instantaneous cost function we choose

$$ g(x,u) = \int _0^t q(\Phi ^\tau (x,u),u) \; d\tau , $$

with the quadratic cost \(q(x,u) = \frac{1}{2} \left( 0.1 \varphi ^2 + 0.05 \dot{\varphi }^2 + 0.01 u^2\right) \).

We use the classical Runge-Kutta scheme of order 4 with step size 0.02 in order to approximate \(\Phi ^t\), choose \(X = [-8, 8] \times [-10,10]\) as state space for \(x=(\varphi ,\dot{\varphi })\), which we partition into \(2^{9}\times 2^9\) boxes of equal size, and \(U=[-64,64]\) as the control space. In approximating the graph’s edges and their weights, we map an equidistant grid of \(3\times 3\) points on each partition box, choosing from 17 equally spaced values in U.

Figure 4 shows the discrete value function as well as the trajectory generated by the discrete feedback for the initial value (3.1, 0.1), as computed by the GAIO code in Fig. 6. As shown on the right of this figure, the discrete value function does not decrease monotonically along the feedback trajectory, indicating that the assumptions of Theorem 3 are not satisfied. And indeed, as shown in Fig. 5, this trajectory repeatedly moves through regions in state space where the error function e is not smaller than \(g_0\). In fact, on a coarser partition (\(2^7\times 2^7\) boxes), the discrete feedback (18) is not even stabilizing this initial condition any more. We will adress this deficiency in the next sections.

Fig. 4.
figure 4

Left: Discrete value function and feedback trajectory for the inverted pendulum. Right: Behaviour of the discrete value function along the feedback trajectory.

Fig. 5.
figure 5

Inverted pendulum: region where \(e(x) < g_0(x)\) (green) and feedback trajectory.

Fig. 6.
figure 6

Code: discrete value function for the inverted pendulum

5 The Optimality Principle for Perturbed Systems

Let us now return to the full problem from Sect. 2 of optimally stabilizing the discrete time perturbed control system

$$\begin{aligned} x_{k+1} = f(x_k, u_k, w_{k}), \quad k=0,1,\ldots . \end{aligned}$$
(23)

subject to an instantaneous cost \(g(x_{k},u_{k})\). For the convergence statements later, we assume \(f:X\times U\times W\rightarrow X\) and \(g:X\times U\rightarrow [0,\infty )\) to be continuous and \(X\subset \mathbb {R}^{d}, U\subset \mathbb {R}^m\) and \(W\subset \mathbb {R}^\ell \) to be compact. More general spaces will be discussed in Sect. 8. For a given initial state \(x_0\in X\), a control sequence \({\textit{\textbf{u}}}=(u_{k})_{k\in \mathbb {N}}\in U^\mathbb {N}\) and a perturbation sequence \({\textit{\textbf{w}}}=(w_{k})_{k\in \mathbb {N}}\in W^\mathbb {N}\), we obtain the trajectory \((x_{k}(x,{\textit{\textbf{u}}},{\textit{\textbf{w}}}))_{k\in \mathbb {N}}\) satisfying (23) while the associated accumulated cost is given by

$$ J(x,{\textit{\textbf{u}}},{\textit{\textbf{w}}})=\sum _{k=0}^\infty g(x_{k}(x,{\textit{\textbf{u}}},{\textit{\textbf{w}}}),u_{k}). $$

Recall that our goal is to derive a feedback \(u:S\rightarrow U\), \(S\subset X\), that stabilizes the closed loop system

$$\begin{aligned} x_{k+1} = f(x_k,u(x_k),w_k), \quad k=0,1,2,\ldots \end{aligned}$$
(24)

for any perturbation sequence \((w_k)_k\), i.e. for every trajectory \((x_k(x_0,{\textit{\textbf{w}}}))_k\) of (24) with \(x_0\in S\) and \({\textit{\textbf{w}}}\in W^\mathbb {N}\) arbitrary, we have \(x_k\rightarrow T\) as \(k\rightarrow \infty \), where \(T\subset S\) is a given target set, and the accumulated cost \(\sum _{k=0}^\infty g(x_k,u(x_k))\) is minimized.

The problem formulation can be interpreted as describing a dynamic game (see e.g. [6]), where at each step of the iteration (23) two players choose a control \(u_{k}\) and a perturbation \(w_{k}\), respectively. The goal of the controlling player is to minimize J, while the perturbing player wants to maximize it. We assume that the controlling player chooses \(u_{k}\) first and that the perturbing player knows \(u_{k}\) when choosing \(w_{k}\). We further assume that the perturbing player cannot foresee future choices of the controlling player. This can be formalized by restricting the possible \({\textit{\textbf{w}}}\) to

$$ {\textit{\textbf{w}}}=\beta ({\textit{\textbf{u}}}), $$

where \(\beta :U^\mathbb {N}\rightarrow W^\mathbb {N}\) is a nonanticipating strategy, i.e. a strategy satisfying

$$ u_{k}=u_{k}' \quad \forall k\le K\quad \Rightarrow \quad \beta _k({\textit{\textbf{u}}}) = \beta _k({\textit{\textbf{u}}}') \quad \forall k\le K $$

for any \({\textit{\textbf{u}}}=(u_{k})_{k},{\textit{\textbf{u}}}'=(u'_{k})_{k}\in U^\mathbb {N}\). We denote by \(\mathcal {B}\) the set of all nonanticipating strategies \(\beta :U^\mathbb {N}\rightarrow W^\mathbb {N}\).

The control task is finished once we are in T, we therefore assume that T is compact and robustly forward invariant, i.e. for all \(x\in T\) there is a control \(u\in U\) such that \(f(x,u,w)\subset T\) for all \(w\in W\), that \(g(x,u)=0\) for all \(x\in T\), \(u\in U\) and \(g(x,u)>0\) for all \(x\not \in T\), \(u\in U\).

Our construction of the feedback \(u:S\rightarrow U\) will be based on the upper value function \(V:X\rightarrow [0,\infty ]\),

$$\begin{aligned} V(x)=\sup _{\beta \in \mathcal {B}}\inf _{{\textit{\textbf{u}}}\in U^\mathbb {N}} J(x,{\textit{\textbf{u}}},\beta ({\textit{\textbf{u}}})), \end{aligned}$$
(25)

of the game (23), which is finite on the set \(S_0:=\{x\in X\mid V(x)<\infty \}\). The upper value function satisfies the optimality principle [9]

$$\begin{aligned} V(x)= & {} \inf _{u\in U} \left[ g(x,u) + \sup _{w\in W} V(f(x,u,w))\right] , \quad x\in S_0. \end{aligned}$$
(26)

The right hand side \( L[v](x)=\inf _{u\in U}\left[ g(x,u)+\sup _{w\in W} v(f(x,u,w))\right] \) of this fixed point equation again defines a dynamic programming operator \(L:\mathbb {R}^X\rightarrow \mathbb {R}^X\). The upper value function is the unique fixed point of L satisying the boundary condition \(V(x)=0\), \(x\in T\). Like in the unperturbed case, using the upper value function V, one can construct the feedback \(u:S_0\rightarrow U\),

(27)

whenever this minimum exists.

6 A Discrete Optimality Principle for Perturbed Systems

Analogously to the discretization in Sect. 4 we now derive a discrete version of (26), cf. [9]. Again, to this end, we will approximate the upper value function by a function which is piecewise constant on the elements of some partition of X. This approach will lead to a directed weighted hypergraph instead of the ordinary directed graph in Sect. 4 and, again, the approximate upper value function can be computed by an associated shortest path algorithm.

Let \(\mathcal{P}\) be a finite partition of X. Using the projection (10), the discretized dynamic game operator \(L_\mathcal{P}:\mathbb {R}^\mathcal{P}\rightarrow \mathbb {R}^\mathcal{P}\) is defined by

$$ L_\mathcal{P}:= \psi \circ L. $$

Again, this operator has a unique fixed point \(V_\mathcal{P}\) satisfying the boundary condition \(V_\mathcal{P}(x)=0\), \(x\in T\), which will serve as an approximation to the exact value function V.

Explicitely, the discretized operator reads

$$\begin{aligned} L_{\mathcal{P}}[v](x)= & {} \inf _{x'\in \pi (x)} \left( \inf _{u\in U}\left[ g(x',u)+\sup _{w\in W} v(f(x',u,w))\right] \right) \end{aligned}$$

and \(V_{\mathcal{P}}\) satisfies the optimality principle

$$\begin{aligned} V_{\mathcal{P}}(x) = \inf _{x'\in \pi (x),u\in U} \left[ g(x',u)+\sup _{w\in W} V_{\mathcal{P}}(f(x',u,w))\right] . \end{aligned}$$
(28)

Note that since \(V_\mathcal{P}\) is constant on each partition element, we can rewrite this as

$$ V_{\mathcal{P}}(x) = \inf _{x'\in \pi (x),u\in U} \left[ g(x',u)+\sup _{P'\in \mathcal{F}(x',u)} V_{\mathcal{P}}(P')\right] , $$

where

$$ \mathcal{F}(x',u)=\{P\in \mathcal{P}\mid f(x',u,w)\in P \text { for some } w\in W\}. $$

Since the partition \(\mathcal{P}\) is finite, there are only finitely many possible sets \(\mathcal{F}(x',u)\) and we can further rewrite (28) as

$$ V_{\mathcal{P}}(x) = \min _\mathcal{N}\inf _{(x',u)} \left[ g(x',u)+\sup _{P'\in \mathcal{N}} V_{\mathcal{P}}(P')\right] , $$

where the \(\min \) is taken over all collections \(\mathcal{N}\in \{\mathcal{F}(x',u)\mid x'\in \pi (x), u\in U\}\) and the \(\inf \) over all \((x',u)\) such that \(\mathcal{F}(x',u)=\mathcal{N}\). Now define the multivalued map \(\mathcal{F}:\mathcal{P}\rightrightarrows 2^\mathcal{P}\),

$$ \mathcal{F}(P) = \{ \mathcal{F}(x,u) : (x,u)\in P\times U\}, $$

and the cost function

$$ \mathcal{G}(P,\mathcal{N}) = \inf _{u\in U} \{ g(x,u) : x\in P, \mathcal{F}(x,u)=\mathcal{N}\}. $$

Equation (28) can then be rewritten as

$$ V_{\mathcal{P}}(P) = \min _{\mathcal{N}\in \mathcal{F}(P)} \left[ \mathcal{G}(P,\mathcal{N}) + \sup _{P'\in \mathcal{N}} V_{\mathcal{P}}(P')\right] , $$

Graph Interpretation. Like in the unperturbed case, we can think of this reformulation of the optimality principle in terms of a graph. More precisely, we have a directed hypergraph \((\mathcal{P},E_\mathcal{P})\) with the set \(E\subset \mathcal{P}\times 2^\mathcal{P}\) of directed hyperedges given by

$$ E_\mathcal{P}=\left\{ (P,\mathcal{N}) \mid \mathcal{N}=\mathcal{F}(x,u) \text{ for } \text{ some } (x,u)\in P\times U\right\} , $$

and each edge \((P,\mathcal{N})\) is weighted by \(\mathcal{G}(P,\mathcal{N})\), c.f. Fig. 7. The discrete upper value function \(V_\mathcal{P}(P)\) is the length of a shortest path from P to some element \(P'\) which has a nonempty intersection with the target set T (and, thus, by the boundary condition, \(V_\mathcal{P}(P')=0\)).

Fig. 7.
figure 7

Illustration of the construction of the hypergraph.

Shortest Paths in Hypergraphs. Algorithm 1 can be generalized to the hypergraph case, cf. [9, 20]. To this end, we modify lines 5–7 such that the maximization over the perturbations is taken into account:

figure b

Note that during the while-loop of Algorithm 1,

$$ V(P) \ge V(P')\quad \text {for all } P'\in \mathcal{P}\backslash \mathcal{Q}. $$

Thus, if \(\mathcal{N}\subset \mathcal{P}\backslash \mathcal{Q}\), then \( \max _{N\in \mathcal{N}} V(N) = V(P), \) and the value of the node Q will never be decreased again. On the other hand, if \(\mathcal{N}\not \subset \mathcal{P}\backslash \mathcal{Q}\), then the value of Q will be further decreased at a later time – and thus we can save on changing it in the current iteration of the while-loop. We can therefore save on the explicit maximization and replace lines 5–7 by

figure c

The overall algorithm for the hypergraph case is as follows. Here, \(\mathcal{T}:=\{P\in \mathcal{P}\mid P\cap T\ne \emptyset \}\) is the set of target nodes.

Algorithm

minmax-Dijkstra

figure d

   \(\square \)

Time Complexity. In line 5, each hyperedge is considered at most N times, with N being a bound on the cardinality of the hypernodes \(\mathcal{N}\). Additionally, we need to perform the check in line 6, which has linear complexity in N. Thus, the overall complexity of the minmax-Dijkstra algorithm is \(\mathcal{O}(|\mathcal{P}|\log |\mathcal{P}| + |E|N(N+\log |\mathcal{P}|))\) (when using a binary heap for storing \(\mathcal{Q}\)), cf. [20].

Space Complexity. The storage requirement grows linearly with \(|\mathcal{P}|\). This number, however, grows exponentially with the dimension of state space (if the entire state space is covered and under the assumption of uniformly large elements). The number of hyperedges is determined by the Lipschitz constant of f, the size of the hypernodes \(\mathcal{N}\) will be given by the magnitude of the perturbation.

Implementation. We use the same approach as in the unperturbed case: A cubical partition is constructed hierarchically and stored in a binary tree. In order to approximate the set \(E_\mathcal{P}\subset \mathcal{P}\times 2^\mathcal{P}\) of hyperedges, we choose finite sets \(\tilde{P}\subset P\), \(\tilde{U}\subset U\) and \(\tilde{W}\subset W\) of sample points, set

$$ \tilde{\mathcal{F}}(x,u) = \{ P\in \mathcal{P}\mid f(x,u,w)\in P \text { for some } w \in \tilde{W}\} $$

and compute

$$ \tilde{\mathcal{F}}(P) := \{\tilde{\mathcal{F}}(x,u) : (x,u)\in \tilde{P}\times \tilde{U}\} \subset 2^\mathcal{P}$$

as an approximation to \(\mathcal{F}(P)\). Correspondingly, the weight on the hyperedge \((P,\mathcal{N})\) is approximated by

$$ \tilde{\mathcal{G}}(P,\mathcal{N}) = \min \{ g(x,u) : (x,u)\in \tilde{P}\times \tilde{U}, \tilde{\mathcal{F}}(x,u)=\mathcal{N}\}. $$

Example: A simple 1D System. We reconsider system (16), adding a small perturbation at each time step:

$$ x_{k+1} = x_k + (1-a)u_k x_k + w_k, \quad k=0,1,\ldots , $$

with \(x_k\in [0,1]\), \(u_k\in [-1,1]\), \(w_k\in [-{\varepsilon },{\varepsilon }]\) for some \({\varepsilon }> 0\) and the fixed parameter \(a\in (0,1)\). The cost function is still \( g(x,u) = (1-a)x \) so that the optimal control policy is again \(u_k=-1\) for all k, independently of the perturbation sequence. The optimal strategy for the perturbing player is to slow down the dynamics as much as possible, corresponding to \(w_k={\varepsilon }\) for all k. The dynamical system resulting from inserting the optimal strategies is

$$ x_{k+1} = ax_k + {\varepsilon }, \quad k=0,1,\ldots . $$

This map has a fixed point at \(x={\varepsilon }/(1-a)\). In the worst case, i.e. \(w_k={\varepsilon }\) for all k, it is not possible to get closer than \(\alpha _0:={\varepsilon }/(1-a)\) to the origin. We therefore define \(T=[0,\alpha ]\) with \(\alpha > \alpha _0\) as the target set. With

$$ k(x) = \left\lceil \frac{\log \frac{\alpha -\alpha _0}{x-\alpha _0}}{\log a}\right\rceil + 1, $$

the exact optimal value function is

$$ V(x) = (x-\alpha _0)\left( 1-a^{k(x)}\right) +{\varepsilon }k(x), $$

as shown in Fig. 8 for \(a=0.8\), \({\varepsilon }= 0.01\) and \(\alpha =1.1\alpha _0\). In that figure, we also show the approximate optimal value functions on partitions of 64, 256 and 1024 intervals, respectively. In the construction of the hypergraph, we used an equidistant grid of ten points in each partition interval, in the control space and in the perturbation space.

6.1 Convergence

It is natural to ask whether the approximate value function converges to the true one when the element diameter of the underlying partition goes to zero. This has been proven pointwise on the stabilizable set S in the unperturbed case [14], as well as in an \(L^1\)-sense on S and an \(L^\infty \) sense on the domain of continuity in the perturbed case, assuming continuity of V on the boundary of the target set T [9]. The same reference also provides an analysis for state constrained problems. Here an additional robustness condition is needed, namely that the optimal value function changes continuously with respect to the \(L^p\)-norm for some \(p\in \{1,\ldots ,\infty \}\) if the state constraints are tightened. If this condition holds, then the convergence statement remains valid under state constraints, with \(L^\infty \) replaced by \(L^p\).

Fig. 8.
figure 8

Exact (red) and discrete upper value functions for the perturbed simple example on partitions of 64 (black) and 1024 (blue) intervals.

Fig. 9.
figure 9

Code: upper value function for the perturbed simple 1d system.

Due to the construction of the discretization, the approximation \(V_\mathcal{P}\) of the optimal value function is always less or equal than the true optimal value function. This is not necessarily a good property. For instance, for proving stability of the system controlled by the numerical feedback law it would be convenient if \(V_\mathcal{P}\) was a Lyapunov function. Lyapunov functions, however, are supersolutions to the dynamic programming equation, rather than subsolutions as our \(V_\mathcal{P}\). In order to overcome this disadvantage, in the next section we present a particular construction of a dynamic game in which the discretization error is treated as a perturbation.

7 The Discretization as a Perturbation

As shown in Theorems 2 and 3, the discrete feedback (18) will practically stabilize the closed loop system (19) under suitable conditions. Our numerical experiment in Example 2, however, revealed that a rather fine partition might be needed in order to achieve stability. More generally, as we have seen in Fig. 4 (right), the discrete value function is not a Lyapunov function of the closed loop system in every case.

Construction of the Dynamic Game. In order to cope with this problem we are going to use the ideas on treating perturbed systems in Sect. 5 and 6. The idea is to view the discretization error as a perturbation of the original system. Under the discretization described in Sect. 4, the original map \((x,u)\mapsto f(x,u)\) is perturbed to

$$ (x,u)\mapsto \hat{f}(x,u,w) := f(x+w,u), \quad x+w\in \pi (x). $$

Note that this constitutes a generalization of the setting in Sects. 5 and 6 since the perturbation space W here depends on the state, \(W=W(x)\). Correspondingly, the associated cost function is

$$\begin{aligned} \hat{g}(x,u) = \sup _{x'\in \pi (x)} g(x',u). \end{aligned}$$
(29)

Theorem 4

[8]. Let V denote the value function (6) of the control system (fg), \(\hat{V}\) the value function (25) of the associated game \((\hat{f}, \hat{g})\) and \(V_\mathcal{P}\) the discrete value function (28) of \((\hat{f}, \hat{g})\) on a given partition \(\mathcal{P}\) with numerical target set \(T_\mathcal{P}\subset \mathcal{P}\), \(T=\{0\}\subset T_\mathcal{P}\). Then \(V_\mathcal{P}(x) = \hat{V}(x)\) and

$$\begin{aligned} V(x) - \max _{y\in {T_\mathcal{P}}} V(y) \le V_\mathcal{P}(x), \end{aligned}$$
(30)

i.e. \(V_\mathcal{P}\) is an upper bound for \(V-\max V|_{{T_\mathcal{P}}}\). Furthermore, \(V_\mathcal{P}\) satisfies

$$\begin{aligned} V_\mathcal{P}(x) \ge \min _{u\in U} \left\{ g(x,u) + V_\mathcal{P}(f(x,u)) \right\} \end{aligned}$$
(31)

for all \(x\in X\setminus {T_\mathcal{P}}\).

Proof

We first note that \(\hat{V}\) is constant on the elements of \(\mathcal{P}\): On \({T_\mathcal{P}}\), this is true since \({T_\mathcal{P}}\) is a union of partition elements by assumption. Outside of \({T_\mathcal{P}}\), by definition of the game \((\hat{f}, \hat{g})\) we have

$$ \hat{V}(x) = \inf _{u\in U} \left\{ \sup _{x'\in \pi (x)} g(x',u) + \sup _{x'\in f(\pi (x),u)} \hat{V}(x')\right\} , $$

so that \(\inf _{x'\in \pi (x)} \hat{V}(x') = \hat{V}(x)\). On the other hand, according to [9, Proposition 7.1] we have \(V_\mathcal{P}(x) = \inf _{x'\in \pi (x)} \hat{V}(x')\), so that \(V_\mathcal{P}= \hat{V}\).

Now for \(x\notin {T_\mathcal{P}}\), Eq. (26) yields

$$\begin{aligned} \nonumber \hat{V}(x)= & {} \inf _{u\in U} \sup _{x'\in \pi (x)}\left\{ g(x',u) + \hat{V}(f(x',u))\right\} \\\ge & {} \min _{u\in U} \left\{ g(x,u) + \hat{V}(f(x,u))\right\} \end{aligned}$$
(32)

which shows (31).

In order to prove (30), we order the elements \(P_1,P_2,\ldots \in \mathcal{P}\) such that \(i\ge j\) implies \(V_\mathcal{P}(P_i) \ge V_\mathcal{P}(P_j)\). Since \(\inf _{u\in U} g(x,u)>0\) for \(x\ne 0\) by assumption, \(V_\mathcal{P}(P_i)=0\) is equivalent to \(P_i\subseteq {T_\mathcal{P}}\). By the ordering of the elements this implies that there exists \(i^*\ge 1\) such that \(P_i\subseteq {T_\mathcal{P}}\) \(\Leftrightarrow \) \(i\in \{ 1,\ldots ,i^*\}\) and thus (30) holds for \(x\in P_1,\ldots , P_{i^*}\). We now use induction: fix some \(i\in \mathbb {N}\), assume (30) holds for \(x\in P_1,\ldots , P_{i-1}\) and consider \(x\in P_i\). If \(V_\mathcal{P}(P_i)=\infty \) there is nothing to show. Otherwise, since V satisfies the dynamic programming principle, using (32) we obtain

$$\begin{aligned} V(x) - \hat{V}(x)&\le \inf _{u\in U} \left\{ g(x,u) + V(f(x,u))\right\} - \min _{u\in U} \left\{ g(x,u) + \hat{V}(f(x,u))\right\} \\&\le V(f(x,u^*)) - \hat{V}(f(x,u^*)), \end{aligned}$$

where \(u^*\in U\) realizes the minimum in (32). Now, since \(g(x,u^*)>0\), we have \(\hat{V}(f(x,u^*))<\hat{V}(x)\) implying \(f(x,u^*)\in P_j\) for some \(j<i\). Since by the induction assumption the inequality in (30) holds on \(P_j\), this implies that it also holds on \(P_i\) which finishes the induction step.    \(\square \)

The Feedback Is the Shortest Path. As usual, we construct the discrete feedback by

By construction, this feedback is constant on each partition element. Moreover, we can directly extract \(u_\mathcal{P}\) from the minmax-Dijkstra algorithm: We associate the minimizing control value \(\underline{u}(P,\mathcal{N})\) to each hyperedge \((P,\mathcal{N})\),

(33)

The feedback is then immediately given by

$$\begin{aligned} u_\mathcal{P}(x) = \underline{u}(\pi (x),\underline{\mathcal{N}}(\pi (x))), \end{aligned}$$
(34)

where

is defining the hypernode of minimal value adjacent to some node P in the hypergraph. The computation of \(\underline{\mathcal{N}}(P)\) can be done on the fly within the minmax-Dijkstra Algorithm 2:

Algorithm

minmax-Dijkstra with feedback

figure e

   \(\square \)

Consequently, the discrete feedback \(\underline{u}\) can be computed offline. Once \(\underline{u}(P,\underline{\mathcal{N}}(P))\) has been computed for every partition element P, the only remaining online computation is the determination of \(\pi (x_k)\) for each state \(x_k\) on the feedback trajectory. In our case, this can be done efficiently, since we store the partition in a binary tree. Note, however, that the fast online evaluation of the feedback law is enabled by a comparatively large offline computation, namely the construction of the hypergraph.

Behaviour of the Closed Loop System

Theorem 5

[8]. Under the assumptions of Theorem 4, if \((x_k)_k\) denotes the trajectory of the closed loop system (19) with feedback (34) and if \(V_\mathcal{P}(x_0)<\infty \), then there exists \(k^*\in \mathbb {N}\) such that \(x_{k^*} \in T\) and

$$ V_\mathcal{P}(x_{k}) \ge g(x_k,u_{\mathcal{P}}(x_k)) + V_\mathcal{P}(x_{k+1}), \quad k=0,\ldots ,k^*-1. $$

Proof

From the construction of \(u_\mathcal{P}\) we immediately obtain the inequality

$$\begin{aligned} V_\mathcal{P}(x_{k}) \ge g(x_k,u_{\mathcal{P}}(x_k)) + V_\mathcal{P}(x_{k+1}) \end{aligned}$$
(35)

for all \(k\in \mathbb {N}_0\) with \(x_k\in X\setminus {T_\mathcal{P}}\). This implies the existence of \(k^*\) such that the first two properties hold since \(g(x_k,u_\mathcal{P}(x_k))>0\) for \(x_k\not \in {T_\mathcal{P}}\), \(V_\mathcal{P}\) is piecewise constant and equals zero only on \({T_\mathcal{P}}\).    \(\square \)

Theorem 5 implies that the closed-loop solution reaches the target \(T_\mathcal{P}\) at time step \(k^*\) and that the optimal value function decreases monotonically until the target is reached, i.e., it behaves like a Lyapunov function. While it is in principle possible that the closed-loop solution leaves the target after time \(k^*\), this Lyapunov function property implies that after such excursions it will return to \(T_\mathcal{P}\).

If the system (4) is asymptotically controllable to the origin and V is continuous, then we can use the same arguments as in [9] in order to show that on increasingly finer partitions \(\mathcal{P}_\ell \) and for targets \(T_{\mathcal{P}_\ell }\) shrinking down to \(\{0\}\) we obtain \(V_{\mathcal{P}_\ell } \rightarrow V\). This can also be used to conclude that the distance of possible excursions from the target \(T_{\mathcal{P}_\ell }\) become smaller and smaller as \(\mathcal{P}_\ell \) becomes finer.

We note that the Lyapunov function property of \(V_\mathcal{P}\) outside \(T_\mathcal{P}\) holds regardless of the size of the partition elements. However, if the partition is too coarse then \(V_\mathcal{P}=\infty \) will hold on large parts of X, which makes the Lyapunov function property useless. In case that large partition elements are desired—for instance, because they correspond to a quantization of the state space representing, e.g., the resolution of certain sensors—infinite values can be avoided by choosing the control value not only depending on one partition element but on two (or more) consecutive elements. The price to pay for this modification is that the construction of the hypergraph becomes significantly more expensive, but the benefit is that stabilization with much coarser discretization or quantization is possible. For details we refer to [10, 11].

Example 3

(The inverted pendulum reconsidered.) . We reconsider Example 2 and apply the construction from this section. Figure 10, which results from running the code shown in Fig. 11 as well as lines 25ff. from the code in Fig. 6, shows the discrete upper value function on a partition of \(2^{16}\) boxes with target set \(T=[-0.1,0.1]^2\) as well as the trajectory generated by the discrete feedback (33) for the initial value (3.1, 0.1). As expected, the approximate value function is decreasing monotonically along this trajectory. Furthermore, this trajectory is clearly closer to the optimal one because it converges to the origin much faster.

Fig. 10.
figure 10

Inverted pendulum: Discrete upper value function and robust feedback trajectory (left); decrease of the discrete value function along the feedback trajectory.

Fig. 11.
figure 11

Code: discrete upper value function and robust feedback for the inverted pendulum

8 Hybrid, Event and Quantized Systems

Hybrid Systems. The discretization of the optimality principle described in Sects. 47 can be used in order to deal with hybrid systems in a natural way. Hybrid systems can often be modeled by a discrete time control system of the form

$$\begin{aligned} \begin{array}{rcl} x_{k+1} &{} = &{} f_c(x_k, y_k, u_k)\\ y_{k+1} &{} = &{} f_d(x_k, y_k, u_k) \end{array} \quad k=0,1,\ldots , \end{aligned}$$
(36)

with two maps \(f_c:X\times Y\times U\rightarrow X\subset \mathbb {R}^n\) and \(f_d:X\times Y\times U\rightarrow Y\). The set U of control inputs can be discrete or continuous, the (compact) set \(X\subset \mathbb {R}^n\) is the continuous part of state space and the set Y of discrete states (or modes) is a finite set. The class of hybrid systems described by (36) is quite general: It comprises

  • models with purely continuous state space (i.e. \(Y=\{0\}\), \(f_c(x,y,u)=f_c(x,u)\), \(f_d\equiv 0\)), but discrete or finite control space U;

  • models in which the continuous part \(f_c\) is controlled by the mode y and only the discrete part \(f_d\) of the map is controlled by the input (\(f_c(x,y,u)=f_c(x,y)\) and \(f_d(x,y,u) = f_d(y,u)\) may be given by an automaton);

  • models with state dependent switching: Here we have a general map \(f_c\) and \(f_d(x,y,u) = f_d(x)\).

As in the previous chapters, we denote the solutions of (36) for initial values \(x_0=x\), \(y_0=y\) and some control sequence \({\textit{\textbf{u}}}=(u_0,u_1,\ldots )\in U^\mathbb {N}\) by \(x_k(x,y,{\textit{\textbf{u}}})\) and \(y_k(x,y,{\textit{\textbf{u}}})\), respectively. We assume that for each k, the map \(x_k(\cdot ,y,{\textit{\textbf{u}}})\) is continuous for each \(y\in Y\) and each \({\textit{\textbf{u}}}\in U^\mathbb {N}\). We prescribe a target set \(T\subset X\) (i.e. a subset of the continuous part of state space) and our aim is to find a control sequence \({\textit{\textbf{u}}}=(u_k)_{k\in \mathbb {N}}\) such that \(x_k(x,y,{\textit{\textbf{u}}})\rightarrow T\) as \(k\rightarrow \infty \) for initial values xy in some stabilizable set \(S\subset X\times Y\), while minimizing the accumulated cost \(\sum _{k=0}^\infty g(x_k,y_k,u_k)\), where \(g:X\times Y\times U\rightarrow [0,\infty )\) is a given instantaneous cost with \(g(x,y,u) > 0\) for all \(x\notin T\), \(y\in Y\) and \(u\in U\). To this end, we would like to construct an approximately optimal feedback \(u:S\rightarrow U\) such that a suitable asymptotic stability property for the resulting closed loop system holds. Again, the construction will be based on a discrete value function. For an appropriate choice of g this function is continuous in x at least in a neighborhood of T [12].

Computational Approach. Let \(\mathcal{Q}\) be a partition of the continuous part X of state space. Then the sets

$$\begin{aligned} \mathcal {P} := \{ Q_i \times \{y\}\,|\, Q_i\in \mathcal {Q},\, y\in Y\}\end{aligned}$$
(37)

form a partition of the product state space \(Z=X\times Y\). On \(\mathcal{P}\) the approaches from Sects. 47 can be applied literally.

Example 4

(Example: A switched voltage controller) . This is an example taken from [15]: Within a device for DC to DC conversion, a semiconductor is switching the polarity of a voltage source \(V_\text {in}\) in order to keep the ouput voltage \(x_1\) as constant as possible close to a prescribed value \(V_\text {ref}\), cf. Fig. 12, while the load is varying and thus the output current \(I_\text {load}\) changes. The model is

$$\begin{aligned} \nonumber \dot{x}_1= & {} \frac{1}{C}\left( x_2 - I_\text {load}\right) \\ \dot{x}_2= & {} -\frac{1}{L}x_1 - \frac{R}{L}x_2 + \frac{1}{L}uV_\text {in}\\ \nonumber \dot{x}_3= & {} V_\text {ref}-x_1, \end{aligned}$$
(38)

where \(u\in \{-1,1 \}\) is the control input. The corresponding discrete time system is given by the time-t-map \(\Phi ^t\) (\(t=0.1\) in our case) of (38), with the control input held constant during this sampling period. We use the quadratic instantaneous cost function

$$ g(x,u) = q_P(\Phi _1^t(x) - V_\text {ref})^2 + q_D(\Phi _2^t(x) - I_{load})^2 + q_I \Phi _3^t(x)^3. $$

The third component in (38) is only used in order to penalize a large \(L^1\)-error of the output voltage. We slightly simplify the problem (over its original formulation in [15]) by using \(x_3=0\) as initial value in each evaluation of the discrete map. Correspondingly, the map reduces to a two-dimensional one on the \(x_1,x_2\)-plane.

Fig. 12.
figure 12

A switched DC/DC converter (cf. [15]).

In the following numerical experiment we use the same parameter values as given in [15], i.e. \(V_\text {in} = 1 V\), \(V_\text {ref} = 0.5\), \(R = 1\Omega \), \(L=0.1 H\), \(C = 4 F\), \(I_\text {load} = 0.3~A\), \(q_P = 1\), \(q_D=0.3\) and \(q_I = 1\). Confining our domain of interest to the rectangle \(X=[0,1]\times [-1,1]\), our target set is given by \(T=\{V_\text {ref}\}\times [-1,1]\). For the construction of the finite graph, we employ a partition of X into \(64\times 64\) equally sized boxes. We use 4 test points in each box, namely their vertices, in order to construct the edges of the graph.

Using the resulting discrete value function (associated to a nominal \(I_{load}=0.3\) A) and the associated feedback, we repeated the stabilization experiment from [15], where the load current is changed after every 100 iterations. Figure 13 shows the result of this simulation, proving that our controller stabilizes the system as requested.

Fig. 13.
figure 13

Simulation of the controlled switched power converter.

Event Systems. In many cases, the discrete-time system (1) is given by time-sampling an underlying continuous time control system (an ordinary differential equation with inputs u and w), i.e. by the time-t-map of the flow of the continuous time system. In some cases, instead of fixing the time step t in each evaluation of f, it might be more appropriate to chosen t in dependence of the dynamics. Formally, based on the discrete time model (1) of the plant, we are dealing with the discrete time system

$$\begin{aligned} x_{\ell +1} = \tilde{f}(x_\ell ,u_\ell ), \quad \ell =0,1,\ldots , \end{aligned}$$
(39)

where

$$\begin{aligned} \tilde{f}(x,u) = f^{r(x,u)}(x,u), \end{aligned}$$
(40)

\(r:X\times U\rightarrow \mathbb {N}_0\) is a given event function and the iterate \(f^r\) is defined by \(f^0(x,u)=x\) and \(f^r(x,u)=f(f^{r-1}(x,u),u)\), cf. [10]. The associated instantaneous cost \(\tilde{g}:X\times U\rightarrow [0,\infty )\) is given by

$$\begin{aligned} \tilde{g}(x,u) = \sum _{k=0}^{r(x,u)-1} g(f^k(x,u),u). \end{aligned}$$
(41)

The time k of the underlying system (1) can be recovered from the event time \(\ell \) through

$$ k_{\ell +1} = k_\ell +r(x_\ell ,u_\ell ). $$

Note that this model comprises an event-triggered scenario where the event function is constructed from a comparison of the state of (1) with the state of (39), as well as the scenario of self-triggered control (cf. [1]) where the event function is computed from the state of (1) alone.

Quantized Systems. The approach for discretizing the optimality principle described in Sects. 46 is based on a discretization of state space in form of a finite partition. While in general the geometry of the partition elements is arbitrary (except from reasonable regularity assumptions), in many cases (e.g. in our implementation in GAIO) cubical partitions are a convenient choice. In this case, the discretization can be interpreted as a quantization of (1), where the quantized system is given by the finite state system

$$\begin{aligned} P_{k+1} = F(P_k,u_k,\gamma _k), \quad k=0,1,\ldots , \end{aligned}$$
(42)

with

$$ F(P,u,\gamma )=\pi (f(\gamma (P),u)),\qquad P\in \mathcal{P}, u\in U, $$

where \(\gamma :\mathcal{P}\rightarrow X\) is a function which chooses a point x from some partition element \(P\in \mathcal{P}\), i.e. it satisfies \(\pi (\gamma (P))=P\) for all \(\mathcal{P}\in P\) [10]. The choice function models the fact that it is unknown to the controller from which exact state \(x_k\) the system transits to the next cell \(P_{k+1}\). It may be viewed as a perturbation which might prevent us from reaching the target set – in this sense, (42) constitutes a dynamic game in the sense of Sect. 6.

9 Lazy Feedbacks

In some applications, e.g. when data needs to be transmitted between the system and the controller over a channel with limited bandwidth, it might be desirable to minimize the amount of transmitted data. More specifically, the question might be how to minimize the number of times that a new control value has to be transmitted from the controller to the system. In this section, we show how this can be achieved in an optimization based feedback construction by defining a suitable instantaneous cost function.

In order to detect a change in the control value we need to be able to compare its current value to the one in the previous time step. Based on the setting from Sect. 2, we consider the discrete-time control system

$$\begin{aligned} z_{k+1} = \bar{f}(z_k, u_k), \quad k=0,1,2,\ldots \end{aligned}$$
(43)

with \(z_k=(x_k,w_k)\in Z:=X \times U\), \(u_k \in U\) and

$$ \bar{f}(z,u) = \bar{f}((x,w),u) := \left[ \begin{array}{c} f(x,u) \\ u\\ \end{array}\right] . $$

Given some target set \(T\subset X\), we define \(\bar{T}:=T\times U\) as the target set in the extended state space Z. The instantaneous cost function \(\bar{g}:Z\times U \rightarrow [0,\infty )\), which penalizes control value changes is given by

$$\begin{aligned} \bar{g}_\lambda (z,u) = \bar{g}_\lambda ((x,w),u)&:= (1-\lambda ) g(x,u) + \lambda \delta (u-w) \end{aligned}$$
(44)

with

$$\begin{aligned} \delta (u) = \left\{ \begin{array}{ll} 0 &{} \text { if } u = 0,\\ 1 &{} \text { else.} \end{array}\right. \end{aligned}$$
(45)

Here, \(\lambda \in [0,1)\) (in particular, \(\lambda <1\) in order to guarantee that \(\bar{g}(z,u) = 0\) iff \(z \in \bar{T}\)).

In order to apply the construction from Sect. 7, we choose a finite partition \(\mathcal{P}\) of X. Let \(\hat{V}_\mathcal{P}\) denote the associated discrete upper value function, \(\hat{S}=\{x\in X: \hat{V}_\mathcal{P}(x) < \infty \}\) the stabilizable set, and \(\hat{u}_\mathcal{P}\) the associated feedback for the original system (fg). For simplicity, we assume that U is finite and use \(\mathcal{P}\times U\) as the partition of the extended state space Z. We denote the discrete upper value function of \((\bar{f}, \bar{g}_\lambda )\) by \(\bar{V}_\lambda : Z \rightarrow [0,\infty ]\), the stabilizable subset by \(\bar{S}_\lambda :=\{z \in Z : \bar{V}_\lambda (z) < \infty \}\) and the associated feedback by \(\bar{u}_\lambda :\bar{S}_\lambda \rightarrow U\).

For some arbitrary feedback \(u_\lambda :\bar{S}_\lambda \rightarrow U\), consider the closed loop system

$$\begin{aligned} z_{k+1} = \bar{f}(z_k,u_\lambda (z_k)), \quad k=0,1,2,\ldots . \end{aligned}$$
(46)

We will show that for any sufficiently large \(\lambda < 1\) the closed loop system with \(u_\lambda =\bar{u}_\lambda \) is asymptotically stable on \(\bar{S}_\lambda \), more precisely that for \(z_0\in \bar{S}_\lambda \) the trajectory of (46) enters \(\bar{T}\) in finitely many steps and that the number of control value changes along this trajectory is minimal.

To this end, for some initial state \(z_0\in \bar{S}_\lambda \), let \((z_k)_k\in Z^\mathbb {N}\), \(z_k=(x_k,w_k)\), be the trajectory of (46). Let \(\kappa (z_0,u_\lambda ) = \min \{k \ge 0: z_k \in \bar{T}\}\) be the minimal number of time steps until the trajectory reaches the target set \(\bar{T}\),

$$ E(z_0,u_\lambda ) = \sum _{k=0}^{\kappa (z_0,u_\lambda )} \delta \bigl (u_\lambda (z_k)-w_k\bigr ) $$

the number of control value changes along the corresponding trajectory as well as

$$ J(z_0,u_\lambda ) = \sum _{k=0}^{\kappa (z_0,u_\lambda )} g(x_k,u(z_k)), \quad \text {resp.}\quad \bar{J}(z_0,u_\lambda ) = \sum _{k=0}^{\kappa (z_0,u_\lambda )} \bar{g}(z_k,u(z_k)) $$

the associated accumulated costs. Note that

$$ \bar{J}(z_0,u_\lambda ) = (1-\lambda ) J (z_0,u_\lambda ) + \lambda {E}(z_0,u_\lambda ). $$

Theorem 6

For all \(\lambda \in [0,1)\), \(\hat{S} \times U\subset \bar{S}_\lambda \). Using the optimal feedback \(\bar{u}_\lambda \) in (46) and for \(z_0\in \bar{S}_\lambda \), \(z_k \rightarrow \bar{T}\) as \(k\rightarrow \infty \). Further, there exists \(\lambda < 1\) such that for any feedback \(u_\lambda :\bar{S}_\lambda \rightarrow U\) and \(z_0\in \bar{S}_\lambda \) with \(\kappa (z_0,u_\lambda ) < K\) for some arbitrary \(K\in \mathbb {N}\), we have \(E(z_0, u_\lambda ) \ge E(z_0, \bar{u}_\lambda )\).

Proof

By construction, the system (43, 44) fulfills the assumptions of Theorem 5, so we have asymptotic stability of the closed loop system (46) with \(u_\lambda =\bar{u}_\lambda \) for all \(z_0 \in \bar{S}_\lambda \).

In order to show that \(\hat{S}\times U\subset \bar{S}_\lambda \) for all \(\lambda \in [0,1)\), choose \(\lambda \in [0,1)\) and some initial value \(z_0 = (x_0,u_0)\in \hat{S}\times U\). Consider the feedback

$$ u(z) = u((x,u)):= \hat{u}_\mathcal{P}(x) $$

for system (43). This leads to a trajectory \((x_k,u_k)_k\) of the extended system with \((x_k)_k\) being a trajectory of the the closed loop system for f with feedback \(\hat{u}_\mathcal{P}\). Since \(x_0 \in \hat{S}\), \(\hat{V}_\mathcal{P}(x_0)\) is finite and the accumulated cost \(\bar{J}(z_0,u)\) for this trajectory does not exceed \((1-\lambda )\hat{V}_\mathcal{P}(x_0) + \lambda \kappa (z_0,u)\) which is finite. According to the optimality of \(V_\lambda \),

$$ V_\lambda (z_0) \le (1-\lambda ) \hat{V}_\mathcal{P}(x_0) + \lambda \kappa (z_0,u) <\infty $$

follows, i.e. \(z_0 \in \bar{S}_\lambda \).

To show the optimality of \(\bar{u}_\lambda \) with respect to the functional E, assume there exists a feedback \(u_\lambda :\bar{S}_\lambda \rightarrow U\) with \(E(z_0,u_\lambda ) \le E(z_0,\bar{u}_\lambda )-1\) for some \(z_0 \in \bar{S}_\lambda \). Since \(\bar{u}_\lambda \) is optimal, the following inequality holds:

$$\begin{aligned} (1-\lambda ) J (z_0,u_\lambda ) + \lambda E(z_0,u_\lambda )&= \bar{J}(z_0,u_\lambda ) \\&\ge \bar{J}(z_0,\bar{u}_\lambda )\\&= (1-\lambda ) J (z_0,\bar{u}_\lambda ) + \lambda E(z_0,\bar{u}_\lambda ) \\&\ge (1-\lambda ) J (z_0,\bar{u}_\lambda ) + \lambda E(z_0,u_\lambda ) + \lambda \end{aligned}$$

and thus

$$\begin{aligned} (1-\lambda ) J(z_0,u_\lambda ) \ge (1-\lambda ) J(z_0,\bar{u}_\lambda ) + \lambda . \end{aligned}$$
(47)

Let \(C(u_\lambda ) = \max _{z_0} \{ J(z_0,u_\lambda )\mid \kappa (z_0,u_\lambda )<K\}\) which is finite. From (47) we get

$$\begin{aligned} (1-\lambda ) C(u_\lambda ) \ge (1-\lambda ) C(\bar{u}_\lambda ) + \lambda . \end{aligned}$$
(48)

so that \(\lambda \rightarrow 1\) leads to a contradiction.    \(\square \)