Dynamic Programming Based Adaptive Optimal Control for Inverted Pendulum

Nam, Dao Phuong; Van Huong, Nguyen; Minh, Ha Duc; Long, Nguyen Thanh

doi:10.1007/978-3-319-69814-4_44

Dao Phuong Nam³⁴,
Nguyen Van Huong³⁴,
Ha Duc Minh³⁴ &
…
Nguyen Thanh Long³⁴

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 465))

Included in the following conference series:

International Conference on Advanced Engineering Theory and Applications

2431 Accesses

Abstract

This work presents the problem of adaptive optimal control law for a class of continuous-time systems with input disturbance and unknown parameters. The main objective is to find an adaptive optimal control law based on the adaptive dynamic programming (ADP) method and it is able to stabilize the closed-loop system. Besides, the convergence properties of proposed algorithm is pointed out. The theoretical analysis and simulation results demonstrate the performance of the proposed algorithm for inverted pendulum.

Access provided by CONRICYT-eBooks. Download conference paper PDF

Inverted Pendulum Optimal Control Based on First Principle Model

Control of pendulum tracking (including swinging up) of IPC system using zeroing-gradient method

Article 03 March 2017

Approximate feedback linearization based optimal robust control for an inverted pendulum system with time-varying uncertainties

Article 16 June 2020

Keywords

1 Introduction

In order to design optimal control law for uncertain systems, the approximate/adaptive dynamic programming (ADP) approach is a biologically – inspired, non-model-based, computational method that has been used in numerous researches [1], such that appropriate reinforcement learning systems design by Werbos or neuro-dynamic programming by Bertsekas. In [1], the recent development of ADP theory consist of three approaches: operations research, real-time control problems for dynamical systems, applying previous ADP to nonlinear uncertain systems.

The non-model-based approach has been extensively implemented to investigate adaptive optimal stabilization and corresponding tracking problems [5, 6]. In [5], the control law was obtained after transforming the robust control problem into an optimal control problem. The corresponding optimal control depend on discrete-time HJB equation and it was solved by using a neural network. The proposed control law in [5] ensures closed-loop locally asymptotic stability of uncertain nonlinear system with an inequality condition. In [6], Fan et al. proposed the sliding mode controller based on adaptive optimal control theory for partially unknown nonlinear systems with input disturbances. The nearly optimal control design ensures stability of the equivalent sliding-mode dynamics by using policy iteration algorithm [6]. The critic network is utilized to approximate the cost function to overcome the difficulty at the second step of policy iteration algorithm. The proposed controller in [6] ensures closed-loop UUB stability of uncertain nonlinear system depend on the property of bounded signal.

In [7], Jiang and Jiang pointed out the control design based on continuous time systems and the equivalent HJB equation. However, the analytical solution of HJB equation is difficult to be obtained and [7] proposed the PI online technique. The stability analysis of closed-loop system pointed out input state stability (ISS) property and the estimation of attraction region depend on KL functions. In [8], Jiang proposed adaptive optimal control law based on algebraic Ricatti equation for uncertain linear systems without external disturbance. The computational adaptive optimal control algorithm was developed from Kleinman (1968) result. In [2, 9], the discretized model is utilized to propose PI, VI – based output ADP design control techniques. However, the results of these proposed control law much depend on the sample time of discrete systems.

Remarkably, this paper extends [2, 9], which focuses on the continuous time systems with external disturbance, to robust control law of uncertain systems. Additionally, we develop a new adaptive optimal control for continuous-time systems with uncertainties under the framework of the idea of ADP problem and external disturbances.

2 Adaptive Optimal Control Design

In this paper, we study a class of continuous-time systems described by:

$$ \left\{ {\begin{array}{*{20}l} {\dot{z} = g\left( {z;y;v} \right)} \hfill \\ {\dot{x} = Ax + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv} \hfill \\ {\dot{v} = Ev} \hfill \\ {e = Cx + Fv} \hfill \\ \end{array} } \right. $$

(1)

where $ x \in {\mathbf{\mathbb{R}}}^{n} $ is the measured component of the state available for feedback control, $ u \in {\mathbf{\mathbb{R}}}^{m} $ is the input, $ y = Cx \in {\mathbf{\mathbb{R}}}^{r} $ represents the output of plant, $ y_{d} = - Fv \in {\mathbf{\mathbb{R}}}^{r} $ is the reference signal to be tracked, $ e \in {\mathbf{\mathbb{R}}}^{r} $ is tracking error. $ z \in {\mathbf{\mathbb{R}}}^{p} ;v \in {\mathbf{\mathbb{R}}}^{q} $ are the states of the exosystem. The functions $ g:{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{p} $ and $ \Delta :{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{m} $ are two locally Lipschitz functions satisfying, $ g\left( {0;0;0} \right) = 0 $ and $ \Delta \left( {0;0;0} \right) = 0 $. Suppose $ A \in {\mathbf{\mathbb{R}}}^{n \times n} ; $ $ B \in {\mathbf{\mathbb{R}}}^{n \times m} ; $ $ C \in {\mathbf{\mathbb{R}}}^{r \times n} ; $ $ D \in {\mathbf{\mathbb{R}}}^{n \times q} ; $ $ E \in {\mathbf{\mathbb{R}}}^{q \times q} ; $ $ F \in {\mathbf{\mathbb{R}}}^{r \times q} ; $ $ g \in {\mathbf{\mathbb{R}}}^{p} ; $ $ \Delta \in {\mathbf{\mathbb{R}}}^{m} $ are unknown and $ z;v;y;y_{d} $ are unmeasurable.

The control objective is to find the adaptive optimal control law based on an iterative algorithm ensures that tracking errors converge to zero and convergence properties of this iterative algorithm in presence of uncertain and external disturbance in system. It is known from the adaptive optimal control literature that the following assumptions are given for solving the classical adaptive optimal control problem.

Assumption 1:

The pair $ \left( {A;B} \right) $ is controllable.

Assumption 2:

The transmission zeros condition holds, i.e.,

$$ rank\left[ {\begin{array}{*{20}c} {A - \lambda I} & B \\ C & 0 \\ \end{array} } \right] = n + r;\forall \lambda \in \delta \left( E \right). $$

Assumption 3:

The minimal polynomial of $ E $ is available, which is:

$$ {\Gamma}_{E} \left( s \right) = \prod\limits_{i = 1}^{M} {\left( {s - \lambda_{i} } \right)^{{a_{j}^{{}} }} } \prod\limits_{j = 1}^{N} {\left( {s^{2} - 2\mu_{j}^{{}} s + \mu_{j}^{2} + \omega_{j}^{2} } \right)^{{b_{j}^{{}} }} } $$

with degree $ q_{E} \le q $ and $ a_{i} ;b_{j} $ are positive integers and $ \lambda_{i} ;\mu_{j} ;\omega_{j} \in {\mathbf{\mathbb{R}}} $ for $ i = \overline{1;M} ;j = \overline{1;N} $.

Assumption 4:

There exists functions $ \beta_{z} ;\beta_{\Delta } $ of class $ KL $ and a function $ \gamma_{z} ;\gamma_{\Delta } $ of class $ K $, both of which are independent of any $ v $ satisfies:

$$ \begin{aligned} & \left\| {z\left( t \right)} \right\| \le \beta_{z} \left( {\left\| {z\left( 0 \right)} \right\|,t} \right) + \gamma_{z} \left( {\left\| e \right\|} \right) \\ & \left\| {\Delta \left( t \right)} \right\| \le \beta_{z} \left( {\left\| {\Delta \left( 0 \right)} \right\|,t} \right) + \gamma_{z} \left( {\left\| e \right\|} \right) \\ \end{aligned} $$

Assumption 5:

There exist a continuously differentiable, positive definite and radially unbounded function $ {\Pi} :{\mathbf{\mathbb{R}}}^{p} \to {\mathbf{\mathbb{R}}} $ and two constant $ c_{1} > 0;c_{2} > 0 $, such that:

$$ \begin{aligned} & \frac{{\partial {\Pi} }}{\partial z}g\left( {z,y,v} \right) \le - c_{1} \left| {\Delta \left( {z,y,v} \right)} \right|^{2} + c_{2} \left| e \right|^{2} ; \\ & \forall z \in {\mathbf{\mathbb{R}}}^{p} ;y \in {\mathbf{\mathbb{R}}}^{r} \\ \end{aligned} $$

Assumption 6:

There exists a constant known number $ \xi > 0 $ such that matrix $ C $ satisfied $ \left\| C \right\| \le \xi $.

Remark 1:

It is different from [2], this paper implement class of systems satisfying $ z;v;y;y_{d} $ are unmeasurable and $ x,u,e $ are measurable.

We introduce the classical theorem (Kleinman 1968-Method) [4]: Let $ K_{0} $ be any stabilizing feedback gain matrix, and repeat the following steps for $ k = 0;1; \ldots $

Step 1: Solve for the real symmetric positive definite solution $ P_{k} $ of the Lyapunov equation:

$$ A_{k}^{T} P_{k} + P_{k} A_{k} + Q + K_{k}^{T} K_{k} = 0 $$
(2)
Step 2: Update the feedback gain matrix by:
$$ K_{k + 1} = B^{T} P_{k} $$
(3)

Then, the following properties hold:

(1)
$ A - BK_{k} $ is Hurwitz.
(2)
$ P^{*} \le P_{k + 1} \le P_{k} $
(3)
$ \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} $

We propose the adaptive optimal control law based on the next theorems described as follows:

Theorem 1:

We denote $ \varepsilon = x - Xv $, if a controller is designed as $ u = - K^{*} \varepsilon + Uv $ where $ X;U $ solve the following regulator equation:

$$ \left\{ {\begin{array}{*{20}l} {XE = AX + BU + D} \hfill \\ {0 = CX + F} \hfill \\ \end{array} } \right. $$

and $ K^{*} = B^{T} P^{*} $ with the symmetric matrix $ P^{*} > 0 $ is the unique solution of the well-known algebraic Riccati equation:

$$ P^{*} A + A^{T} P^{*} + Q - P^{*} BB^{T} P^{*} = 0 $$

(4)

and the weighting matrices $ Q $ in (4) satisfying $ \lambda_{\hbox{min} } \left( Q \right) > \gamma .\xi^{2} > \frac{{c_{2} }}{{c_{1} }}.\xi^{2} $ then the closed-loop system achieves disturbance rejection and asymptotic tracking.

Proof:

Using assumption 2 implies the regulator equation is solvable for any matrices $ D;F $.

$$ \varepsilon = x - Xv $$

(5)

$$ \begin{aligned} & \Rightarrow \dot{\varepsilon } = \dot{x} - X\dot{v} \\ & \Rightarrow \dot{\varepsilon } = Ax + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv - XEv \Rightarrow \dot{\varepsilon } = A\left( {\varepsilon + Xv} \right) + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv - XEv \\ & \Rightarrow \dot{\varepsilon } = A\varepsilon + B\left( {u - Uv + \Delta \left( {z;y;v} \right)} \right) \\ \end{aligned} $$

$$ e = y - y_{d} = Cx + Fv = C\left( {\varepsilon + Xv} \right) + Fv\quad \quad \Rightarrow e = C\varepsilon $$

(6)

Define $ V_{1} = \varepsilon^{T} P^{*} \varepsilon $, we have:

$$ \begin{aligned} \dot{V}_{1} & = \varepsilon^{T} P^{*} \left( {A\varepsilon + B\left( { - K^{*} \varepsilon + \Delta } \right)} \right) + \left( {A\varepsilon + B\left( { - K^{*} \varepsilon + \Delta } \right)} \right)^{T} P^{*} \varepsilon \\ & = \varepsilon^{T} \left( {P^{*} \left( {A - BK^{*} } \right) + \left( {A - BK^{*} } \right)^{T} P^{*} } \right)\varepsilon + \varepsilon^{T} P^{*} B\Delta + \Delta^{T} B^{T} P^{*} \varepsilon \\ & = \varepsilon^{T} \left( { - Q - P^{*} BB^{T} P^{*} } \right)\varepsilon + 2\varepsilon^{T} P^{*} B\Delta = - \varepsilon^{T} Q\varepsilon - \left| {B^{T} P^{*} \varepsilon - \Delta } \right|^{2} + \left| \Delta \right|^{2} \\ \end{aligned} $$

By using assumption 6, we have:

$$ Q \ge \lambda_{\hbox{min} } \left( Q \right).I > \gamma .\xi^{2} \ge \gamma .\left\| C \right\|^{2} \ge \gamma C^{T} C $$

(7)

Then

$$ \dot{V}_{1} \le - \gamma \varepsilon^{T} C^{T} C\varepsilon + \left| \Delta \right|^{2} = - \gamma \left| e \right|^{2} + \left| \Delta \right|^{2} $$

Setting $ V = V_{1} + \frac{1}{{c_{1} }}{\Pi} \left( z \right) $, by using assumption 5, we have:

$$ \dot{V} \le - \gamma \left| e \right|^{2} + \left| \Delta \right|^{2} + \frac{1}{{c_{1} }}\left( { - c_{1} \left| \Delta \right|^{2} + c_{2} \left| e \right|^{2} } \right) \Rightarrow \dot{V} \le - \left( {\gamma - \frac{{c_{2} }}{{c_{1} }}} \right)\left| e \right|^{2} $$

It is clear that a direct application of LaSalle’s Invariance Principle yields the GAS property of the closed-loop system.

Theorem 2:

There exists a small constant $ \alpha > 0 $, such that for all symmetric matrix $ P > 0 $ satisfying $ \left| {P - P^{*} } \right| < \alpha $, the overall system is GAS with controller is $ u = - B^{T} P\varepsilon + Uv $.

Proof:

Since (4), for any symmetric matrix $ P > 0 $ we have: $ A^{T} P + PA + \widehat{Q} - PBB^{T} P = 0 $.

where:

$$ \widehat{Q} = Q + \left( {P^{*} - P} \right)A + A^{T} \left( {P^{*} - P} \right) + PBB^{T} P - P^{*} BB^{T} P^{*} $$

Since (7), we have $ Q > \gamma C^{T} C $ then exists a constant $ \mu > 0 $ such that: $ Q - \gamma C^{T} C > \mu I $ .Then, by continuity, there exists $ \alpha > 0 $, such that for all symmetric matrix $ P > 0 $ satisfying $ \left| {P - P^{*} } \right| < \alpha $ we have $ \widehat{Q} > Q - \mu I $, which implies $ \widehat{Q} > \gamma C^{T} C $. Therefore, by Theorem 1, the control $ u = - B^{T} P\varepsilon + Uv $ globally asymptotically stabilizes system.

Remark 2:

The proposed optimal control law guarantees the GAS property of the closed-loop system in presence of uncertain parameters and external disturbances.

Using assumption 3, we can always find a vector $ \bar{v}\left( t \right) \in {\mathbf{\mathbb{R}}}^{{q_{E} }} $ and a matrix $ \bar{E} \in {\mathbf{\mathbb{R}}}^{{q_{E} \times q_{E} }} $ such that:

$$ \begin{aligned} \dot{\bar{v}}\left( t \right) = \bar{E}.\bar{v}\left( t \right) \hfill \\ v\left( t \right) = G.\bar{v}\left( t \right) \, \hfill \\ \end{aligned} $$

(8)

with $ G \in {\mathbf{\mathbb{R}}}^{{q \times q_{E} }} $ is an unknown constant matrix.

From (5); (8) and Theorem 1, the LOORP is solved, if we design a controller $ u = K^{*} \left( {x - XG\bar{v}\left( t \right)} \right) + UG\bar{v}\left( t \right) $

PI based output ADP design:

Suppose $ \Delta $ is available during the learning phase.

$$ \begin{aligned} & \dot{\varepsilon } = A\varepsilon + B\left( {u - UG\bar{v} + \Delta \left( {z;y;v} \right)} \right) = \left( {A - BK_{k} } \right)\varepsilon + B\left( {u + K_{k} \varepsilon - UG\bar{v} + \Delta } \right) \\ & \dot{\varepsilon } = A_{k} \varepsilon + B\left( {w + K_{k} \varepsilon - UG\bar{v}} \right) \\ \end{aligned} $$

Define $ \bar{P}_{k} = C^{T} P_{k} C $ (8), from (6), we have:

$$ \begin{array}{*{20}l} {e\left( {t + \delta t} \right)^{T} \overline{P}_{k} e\left( {t + \delta t} \right) - e\left( t \right)^{T} \overline{P}_{k} e\left( t \right) = \,\varepsilon \left( {t + \delta t} \right)^{T} P_{k} \varepsilon \left( {t + \delta t} \right) - \varepsilon \left( t \right)^{T} P_{k} \varepsilon \left( t \right)} \hfill \\ {\;\; = \int\limits_{t}^{t + \delta t} {\left[ {\varepsilon^{T} \left( {A_{k}^{T} P_{k} + P_{k} A_{k} } \right)\varepsilon + 2\left( {w + K_{k} \varepsilon - UG\bar{v}} \right)^{T} B^{T} P_{k} \varepsilon } \right]} d\tau } \hfill \\ \end{array} $$

(9)

$$ = - \int\limits_{t}^{t + \delta t} {\varepsilon^{T} \left( {Q + K_{k}^{T} K_{k} } \right)} \varepsilon d\tau + 2\int\limits_{t}^{t + \delta t} {\left( {w + K_{k} \varepsilon } \right)^{T} K_{k + 1} \varepsilon } d\tau - 2\int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left( {UG} \right)^{T} K_{k + 1} \varepsilon } d\tau $$

(10)

$$ {\text{Define}}\;XG = \bar{X};UG = \bar{U};Q + K_{k}^{T} K_{k} = Q_{k} . $$

(11)

Since (5); (8); (11), we have:

$$ \varepsilon^{T} \left( {Q + K_{k}^{T} K_{k} } \right)\varepsilon = \left( {x^{T} - \bar{v}^{T} \bar{X}^{T} } \right)Q_{k} \left( {x - \bar{X}\bar{v}} \right) = x^{T} Q_{k} x - \bar{v}^{T} \bar{X}^{T} Q_{k} x - x^{T} Q_{k} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} Q_{k} \bar{X}\bar{v} $$

$$ \begin{aligned} & \left( {w + K_{k} \varepsilon } \right)^{T} K_{k + 1} \varepsilon = w^{T} K_{k + 1} \varepsilon + \varepsilon^{T} K_{k}^{T} K_{k + 1} \varepsilon = w^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) + \left( {x^{T} - \bar{v}^{T} \bar{X}^{T} } \right)K_{k}^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) \\ & = w^{T} K_{k + 1} x - w^{T} K_{k + 1} \bar{X}\bar{v} + x^{T} K_{k}^{T} K_{k + 1} x - \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} x - x^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} \\ \end{aligned} $$

$$ \bar{v}^{T} \left( {UG} \right)^{T} K_{k + 1} \varepsilon = \bar{v}^{T} \bar{U}^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) = \bar{v}^{T} \bar{U}^{T} K_{k + 1} x - \bar{v}^{T} \bar{U}^{T} K_{k + 1} \bar{X}\bar{v} $$

Then

$$ \begin{aligned} & \,\;e\left( {t + \delta t} \right)^{T} \overline{P}_{k} e\left( {t + \delta t} \right) - e\left( t \right)^{T} \overline{P}_{k} e\left( t \right) \\ & \,\; = - \int\limits_{t}^{t + \delta t} {\left[ {x^{T} Q_{k} x - \bar{v}^{T} \bar{X}^{T} Q_{k} x - x^{T} Q_{k} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} Q_{k} \bar{X}\bar{v}} \right]} d\tau + 2\int\limits_{t}^{t + \delta t} {\left[ {\begin{array}{*{20}l} {w^{T} K_{k + 1} x - w^{T} K_{k + 1} \bar{X}\bar{v}} \hfill \\ { + x^{T} K_{k}^{T} K_{k + 1} x - \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} x} \hfill \\ { - x^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v}} \hfill \\ \end{array} } \right]} d\tau \\ & \,\; - 2\int\limits_{t}^{t + \delta t} {\left[ {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x - \bar{v}^{T} \bar{U}^{T} K_{k + 1} \bar{X}\bar{v}} \right]} d\tau \\ & \,\; = 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} x} \right)} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} \bar{X}\bar{v}} \right)} d\tau + \int\limits_{t}^{t + \delta t} {x^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right)x} d\tau + \int\limits_{t}^{t + \delta t} {x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v}} d\tau \\ & \,\; + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \bar{X}^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)x} d\tau + + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v}} d\tau + - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x} \right)} d\tau \, \\ & \;\;{ = }\,2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} x} \right)} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} \bar{X}\bar{v}} \right)} d\tau + 2\int\limits_{t}^{t + \delta t} {x^{T} K_{k}^{T} K_{k + 1} x} d\tau - \int\limits_{t}^{t + \delta t} {x^{T} Q_{k} x} d\tau \\ & \;\; + \, 2\int\limits_{t}^{t + \delta t} {x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v}} d\tau + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v}} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x} \right)} d\tau \\ \end{aligned} $$

Applying Kronecker product representation gives:

$$ e^{T} \overline{P}_{k} e = \left( {e^{T} \otimes e^{T} } \right)vec\left( {\overline{P}_{k} } \right);w^{T} K_{k + 1} x = \left( {x^{T} \otimes w^{T} } \right)vec\left( {K_{k + 1} } \right);w^{T} K_{k + 1} \bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes w^{T} } \right)vec\left( {K_{k + 1} \bar{X}} \right); $$

$$ x^{T} K_{k}^{T} K_{k + 1} x = \left( {x^{T} \otimes \left( {K_{k} x} \right)^{T} } \right)vec\left( {K_{k + 1} } \right);x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes x^{T} } \right)vec\left( {\left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}} \right); $$

$$ \bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes \bar{v}^{T} } \right)vec\left[ {\left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}} \right] $$

$$ \bar{v}^{T} \bar{U}^{T} K_{k + 1} x = \left( {x^{T} \otimes \bar{v}^{T} } \right)vec\left( {\bar{U}^{T} K_{k + 1} } \right); \, $$

Define: $ \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X} = G_{1;k} ; $

$$ \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X} = G_{2;k} ;\bar{U}^{T} K_{k + 1} = G_{3;k} $$

$$ {\Phi}_{k} = \left[ {\begin{array}{*{20}c} {{\Upsilon} \left( {t_{0}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{0}^{\left( k \right)} } \right)} & {\nu \left( {t_{0}^{\left( k \right)} } \right)} & {\chi \left( {t_{0}^{\left( k \right)} } \right)} & {\pi \left( {t_{0}^{\left( k \right)} } \right)} & {\sigma \left( {t_{0}^{\left( k \right)} } \right)} \\ {{\Upsilon} \left( {t_{1}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{1}^{\left( k \right)} } \right)} & {\nu \left( {t_{1}^{\left( k \right)} } \right)} & {\chi \left( {t_{1}^{\left( k \right)} } \right)} & {\pi \left( {t_{1}^{\left( k \right)} } \right)} & {\sigma \left( {t_{1}^{\left( k \right)} } \right)} \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ {{\Upsilon} \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\nu \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\chi \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\pi \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\sigma \left( {t_{l - 1}^{\left( k \right)} } \right)} \\ \end{array} } \right];{\Psi}_{k} = \left[ \begin{aligned} \rho \left( {t_{0}^{\left( k \right)} } \right) \hfill \\ \rho \left( {t_{1}^{\left( k \right)} } \right) \hfill \\ \ldots \hfill \\ \rho \left( {t_{l - 1}^{\left( k \right)} } \right) \hfill \\ \end{aligned} \right] $$

$$ \begin{aligned} & {\Upsilon} \left( t \right) = e^{T} \left( t \right) \otimes e^{T} \left( t \right) - e^{T} \left( {t + \delta t} \right) \otimes e^{T} \left( {t + \delta t} \right);\vartheta \left( t \right) = 2\int\limits_{t}^{t + \delta t} {\left( {x^{T} \otimes \left( {w + K_{k} x} \right)^{T} } \right)d\tau } \\ & \nu \left( t \right) = - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes w^{T} } \right)d\tau } ;\chi \left( t \right) = 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes x^{T} } \right)d\tau } ;\pi \left( t \right) = \int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes \bar{v}^{T} } \right)d\tau } \\ & \delta \left( t \right) = - 2\int\limits_{t}^{t + \delta t} {\left( {x^{T} \otimes \bar{v}^{T} } \right)d\tau } ;\rho \left( t \right) = \int\limits_{t}^{t + \delta t} {x^{T} Q_{k} } xd\tau \\ \end{aligned} $$

Consequently, we have:

$$ {\Phi}_{k} .\left[ {\begin{array}{*{20}l} {vec\left( {\bar{P}_{k} } \right)} \hfill \\ {vec\left( {K_{k + 1} } \right)} \hfill \\ {vec\left( {K_{k + 1} \bar{X}} \right)} \hfill \\ {vec\left( {G_{1;k} } \right)} \hfill \\ {vec\left( {G_{2;k} } \right)} \hfill \\ {vec\left( {G_{3;k} } \right)} \hfill \\ \end{array} } \right] = {\Psi}_{k} $$

(12)

Assumption 7:

For each $ k = 1;2; \ldots $ there exists an integer $ N $ such that, when $ k \ge N $, the following rank condition holds:

$$ rank\left( {{\Phi}_{k} } \right) = \frac{{n\left( {n + 1} \right)}}{2} + \left( {m + q} \right)n $$

By assumption 7, then

$ \left[ {vec\left( {\bar{P}_{k} } \right),vec\left( {K_{k + 1} } \right),vec\left( {K_{k + 1} \bar{X}} \right),vec\left( {G_{1;k} } \right),vec\left( {G_{2;k} } \right),vec\left( {G_{3;k} } \right)} \right]^{T} $ can be uniquely determined by:

$$ \left[ {vec\left( {\bar{P}_{k} } \right),vec\left( {K_{k + 1} } \right),vec\left( {K_{k + 1} \bar{X}} \right),vec\left( {G_{1;k} } \right),vec\left( {G_{2;k} } \right),vec\left( {G_{3;k} } \right)} \right]^{T} = \left( {{\Phi}_{k}^{T} {\Phi}_{k} } \right)^{ - 1} {\Phi}_{k}^{T} {\Phi}_{k} $$

(13)

Assumption 2 implies that $ B $ is in full column rank, so $ K_{k + 1} = B^{T} P_{k} $ is in full row rank, then:

$$ \bar{U}^{T} = G_{3;k} .K_{k + 1}^{T} .\left( {K_{k + 1} .K_{k + 1}^{T} } \right)^{ - 1} $$

(14)

Remark 3:

It is different from [2], we obtain the adaptive optimal control law for continuous time systems affected by external disturbances.

Now, we are ready to propose the following adaptive optimal control algorithm for practical online implementation.

$$ u_{k} = - K_{{j^{*} }} .\varepsilon + \left( {G_{{3;j^{*} }} .K_{{j^{*} + 1}}^{T} .\left( {K_{{j^{*} + 1}} .K_{{j^{*} + 1}}^{T} } \right)^{ - 1} } \right)^{T} \bar{v} $$

Theorem 3:

Let $ K_{0} $ be any stailizing feedback gain matrix, and let $ \left( {P_{k} ;K_{k + 1} ;\bar{U}} \right) $ be obtained from Algorithm 1. Then, under assumption 7, the following properties hold:

(1)
$ A - BK_{k} $ is Hurwitz.
(2)
$ P^{*} \le P_{k + 1} \le P_{k} $
(3)
$ \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} $

Proof:

From (9); (10) one sees that the $ \left( {P_{k} ;K_{k + 1} } \right) $ obtained from (2); (3) must satisfy the condition (12). In addition, by assumption 7, it is unique. Therefore, the solution in theorem Kleinman 1968 is the same as the solution in (13) for any $ k \ge N $.

3 Simulation Results

$$ \begin{aligned} & \left[ {\begin{array}{*{20}c} {\dot{x}} \\ {\ddot{x}} \\ {\dot{\phi }} \\ {\ddot{\phi }} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 & 1 & 0 & 0 \\ 0 & {\frac{{ - \left( {I + ml^{2} } \right)b}}{{I\left( {M + m} \right) + Mml^{2} }}} & {\frac{{m^{2} gl^{2} }}{{I\left( {M + m} \right) + Mml^{2} }}} & 0 \\ 0 & 0 & 0 & 1 \\ 0 & {\frac{ - mlb}{{I\left( {M + m} \right) + Mml^{2} }}} & {\frac{{mgl\left( {M + m} \right)}}{{I\left( {M + m} \right) + Mml^{2} }}} & 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ {\dot{x}} \\ \phi \\ {\dot{\phi }} \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ {\frac{{I + ml^{2} }}{{I\left( {M + m} \right) + Mml^{2} }}} \\ 0 \\ {\frac{ml}{{I\left( {M + m} \right) + Mml^{2} }}} \\ \end{array} } \right]u \\ & y = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & 0 & 0 & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 0 & 1 & 0 \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ {\dot{x}} \\ \phi \\ {\dot{\phi }} \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right]u \\ \end{aligned} $$

(15)

In this section, we apply the proposed adaptive optimal control law to an inverted pendulum on a cart described as (15) and Table 1. The simulation results in Fig. 1 show the convergence of matrix P and K of proposed algorithm and the tracking errors converge to zero.

Table 1. The parameters of inverted pendullum

Full size table

4 Conclusion

This paper presents an adaptive optimal control algorithm for practical online implementation of continuous-time systems with unknown system dynamics and external disturbance. The proposed algorithm pointed out the global asymptotical stability property and convergence properties. The theory and simulation results illustrate the effectiveness of proposed algorithm.

References

Jiang, Z.P., Jiang, Y.: Robust adaptive dynamic programming for linear and nonlinear systems: an overview. Eur. J. Control 19(5), 417–425 (2013)
Article MathSciNet MATH Google Scholar
Gao, W., Jiang, Z.P.: Adaptive optimal output regulation via output-feedback: an adaptive dynamic programing approach. In: IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 5845–5850 (2016)
Google Scholar
Hewer, G.: An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans. Autom. Control 16(4), 382–384 (1971)
Article Google Scholar
Lancaster, P., Rodman, L.: Algebraic Riccati Equations. Oxford University Press Inc., New York (1995)
MATH Google Scholar
Wang, D., Liu, D., Li, H., Luo, B., Ma, H.: An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans. Syst. Man Cybern.: Syst. 46(5), 713–717 (2016)
Article Google Scholar
Fan, Q.Y., Yang, G.H.: Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 27(1), 165–177 (2016)
Article MathSciNet Google Scholar
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)
Article Google Scholar
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Article MathSciNet MATH Google Scholar
Gao, W., Jiang, Y., Jiang, Z.P., Chai, T.: Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 72, 37–45 (2016)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Hanoi University of Science and Technology, Hanoi, Vietnam
Dao Phuong Nam, Nguyen Van Huong, Ha Duc Minh & Nguyen Thanh Long

Authors

Dao Phuong Nam
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Van Huong
View author publications
You can also search for this author in PubMed Google Scholar
Ha Duc Minh
View author publications
You can also search for this author in PubMed Google Scholar
Nguyen Thanh Long
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dao Phuong Nam .

Editor information

Editors and Affiliations

Faculty of Electrical - Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Vo Hoang Duy
International Cooperation, Research and Training Institute, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Tran Trong Dao
Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VŠB-TUO, Ostrava, Czech Republic
Ivan Zelinka
Department of Mechanical Design Engineering, Pukyong National University, Busan, Korea (Republic of)
Sang Bong Kim
Faculty of Electrical - Electronics Engineering, Ton Duc Thang University, Ho Chi Minh City, Vietnam
Tran Thanh Phuong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nam, D.P., Van Huong, N., Minh, H.D., Long, N.T. (2018). Dynamic Programming Based Adaptive Optimal Control for Inverted Pendulum. In: Duy, V., Dao, T., Zelinka, I., Kim, S., Phuong, T. (eds) AETA 2017 - Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. AETA 2017. Lecture Notes in Electrical Engineering, vol 465. Springer, Cham. https://doi.org/10.1007/978-3-319-69814-4_44

Download citation

DOI: https://doi.org/10.1007/978-3-319-69814-4_44
Published: 11 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69813-7
Online ISBN: 978-3-319-69814-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Dynamic Programming Based Adaptive Optimal Control for Inverted Pendulum

Abstract

Similar content being viewed by others

Inverted Pendulum Optimal Control Based on First Principle Model

Control of pendulum tracking (including swinging up) of IPC system using zeroing-gradient method

Approximate feedback linearization based optimal robust control for an inverted pendulum system with time-varying uncertainties

Keywords

1 Introduction