Abstract
This work presents the problem of adaptive optimal control law for a class of continuous-time systems with input disturbance and unknown parameters. The main objective is to find an adaptive optimal control law based on the adaptive dynamic programming (ADP) method and it is able to stabilize the closed-loop system. Besides, the convergence properties of proposed algorithm is pointed out. The theoretical analysis and simulation results demonstrate the performance of the proposed algorithm for inverted pendulum.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In order to design optimal control law for uncertain systems, the approximate/adaptive dynamic programming (ADP) approach is a biologically – inspired, non-model-based, computational method that has been used in numerous researches [1], such that appropriate reinforcement learning systems design by Werbos or neuro-dynamic programming by Bertsekas. In [1], the recent development of ADP theory consist of three approaches: operations research, real-time control problems for dynamical systems, applying previous ADP to nonlinear uncertain systems.
The non-model-based approach has been extensively implemented to investigate adaptive optimal stabilization and corresponding tracking problems [5, 6]. In [5], the control law was obtained after transforming the robust control problem into an optimal control problem. The corresponding optimal control depend on discrete-time HJB equation and it was solved by using a neural network. The proposed control law in [5] ensures closed-loop locally asymptotic stability of uncertain nonlinear system with an inequality condition. In [6], Fan et al. proposed the sliding mode controller based on adaptive optimal control theory for partially unknown nonlinear systems with input disturbances. The nearly optimal control design ensures stability of the equivalent sliding-mode dynamics by using policy iteration algorithm [6]. The critic network is utilized to approximate the cost function to overcome the difficulty at the second step of policy iteration algorithm. The proposed controller in [6] ensures closed-loop UUB stability of uncertain nonlinear system depend on the property of bounded signal.
In [7], Jiang and Jiang pointed out the control design based on continuous time systems and the equivalent HJB equation. However, the analytical solution of HJB equation is difficult to be obtained and [7] proposed the PI online technique. The stability analysis of closed-loop system pointed out input state stability (ISS) property and the estimation of attraction region depend on KL functions. In [8], Jiang proposed adaptive optimal control law based on algebraic Ricatti equation for uncertain linear systems without external disturbance. The computational adaptive optimal control algorithm was developed from Kleinman (1968) result. In [2, 9], the discretized model is utilized to propose PI, VI – based output ADP design control techniques. However, the results of these proposed control law much depend on the sample time of discrete systems.
Remarkably, this paper extends [2, 9], which focuses on the continuous time systems with external disturbance, to robust control law of uncertain systems. Additionally, we develop a new adaptive optimal control for continuous-time systems with uncertainties under the framework of the idea of ADP problem and external disturbances.
2 Adaptive Optimal Control Design
In this paper, we study a class of continuous-time systems described by:
where \( x \in {\mathbf{\mathbb{R}}}^{n} \) is the measured component of the state available for feedback control, \( u \in {\mathbf{\mathbb{R}}}^{m} \) is the input, \( y = Cx \in {\mathbf{\mathbb{R}}}^{r} \) represents the output of plant, \( y_{d} = - Fv \in {\mathbf{\mathbb{R}}}^{r} \) is the reference signal to be tracked, \( e \in {\mathbf{\mathbb{R}}}^{r} \) is tracking error. \( z \in {\mathbf{\mathbb{R}}}^{p} ;v \in {\mathbf{\mathbb{R}}}^{q} \) are the states of the exosystem. The functions \( g:{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{p} \) and \( \Delta :{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{m} \) are two locally Lipschitz functions satisfying, \( g\left( {0;0;0} \right) = 0 \) and \( \Delta \left( {0;0;0} \right) = 0 \). Suppose \( A \in {\mathbf{\mathbb{R}}}^{n \times n} ; \) \( B \in {\mathbf{\mathbb{R}}}^{n \times m} ; \) \( C \in {\mathbf{\mathbb{R}}}^{r \times n} ; \) \( D \in {\mathbf{\mathbb{R}}}^{n \times q} ; \) \( E \in {\mathbf{\mathbb{R}}}^{q \times q} ; \) \( F \in {\mathbf{\mathbb{R}}}^{r \times q} ; \) \( g \in {\mathbf{\mathbb{R}}}^{p} ; \) \( \Delta \in {\mathbf{\mathbb{R}}}^{m} \) are unknown and \( z;v;y;y_{d} \) are unmeasurable.
The control objective is to find the adaptive optimal control law based on an iterative algorithm ensures that tracking errors converge to zero and convergence properties of this iterative algorithm in presence of uncertain and external disturbance in system. It is known from the adaptive optimal control literature that the following assumptions are given for solving the classical adaptive optimal control problem.
Assumption 1:
The pair \( \left( {A;B} \right) \) is controllable.
Assumption 2:
The transmission zeros condition holds, i.e.,
Assumption 3:
The minimal polynomial of \( E \) is available, which is:
with degree \( q_{E} \le q \) and \( a_{i} ;b_{j} \) are positive integers and \( \lambda_{i} ;\mu_{j} ;\omega_{j} \in {\mathbf{\mathbb{R}}} \) for \( i = \overline{1;M} ;j = \overline{1;N} \).
Assumption 4:
There exists functions \( \beta_{z} ;\beta_{\Delta } \) of class \( KL \) and a function \( \gamma_{z} ;\gamma_{\Delta } \) of class \( K \), both of which are independent of any \( v \) satisfies:
Assumption 5:
There exist a continuously differentiable, positive definite and radially unbounded function \( {\Pi} :{\mathbf{\mathbb{R}}}^{p} \to {\mathbf{\mathbb{R}}} \) and two constant \( c_{1} > 0;c_{2} > 0 \), such that:
Assumption 6:
There exists a constant known number \( \xi > 0 \) such that matrix \( C \) satisfied \( \left\| C \right\| \le \xi \).
Remark 1:
It is different from [2], this paper implement class of systems satisfying \( z;v;y;y_{d} \) are unmeasurable and \( x,u,e \) are measurable.
We introduce the classical theorem (Kleinman 1968-Method) [4]: Let \( K_{0} \) be any stabilizing feedback gain matrix, and repeat the following steps for \( k = 0;1; \ldots \)
-
Step 1: Solve for the real symmetric positive definite solution \( P_{k} \) of the Lyapunov equation:
$$ A_{k}^{T} P_{k} + P_{k} A_{k} + Q + K_{k}^{T} K_{k} = 0 $$(2) -
Step 2: Update the feedback gain matrix by:
-
$$ K_{k + 1} = B^{T} P_{k} $$(3)
Then, the following properties hold:
-
(1)
\( A - BK_{k} \) is Hurwitz.
-
(2)
\( P^{*} \le P_{k + 1} \le P_{k} \)
-
(3)
\( \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} \)
We propose the adaptive optimal control law based on the next theorems described as follows:
Theorem 1:
We denote \( \varepsilon = x - Xv \), if a controller is designed as \( u = - K^{*} \varepsilon + Uv \) where \( X;U \) solve the following regulator equation:
and \( K^{*} = B^{T} P^{*} \) with the symmetric matrix \( P^{*} > 0 \) is the unique solution of the well-known algebraic Riccati equation:
and the weighting matrices \( Q \) in (4) satisfying \( \lambda_{\hbox{min} } \left( Q \right) > \gamma .\xi^{2} > \frac{{c_{2} }}{{c_{1} }}.\xi^{2} \) then the closed-loop system achieves disturbance rejection and asymptotic tracking.
Proof:
Using assumption 2 implies the regulator equation is solvable for any matrices \( D;F \).
Define \( V_{1} = \varepsilon^{T} P^{*} \varepsilon \), we have:
By using assumption 6, we have:
Then
Setting \( V = V_{1} + \frac{1}{{c_{1} }}{\Pi} \left( z \right) \), by using assumption 5, we have:
It is clear that a direct application of LaSalle’s Invariance Principle yields the GAS property of the closed-loop system.
Theorem 2:
There exists a small constant \( \alpha > 0 \), such that for all symmetric matrix \( P > 0 \) satisfying \( \left| {P - P^{*} } \right| < \alpha \), the overall system is GAS with controller is \( u = - B^{T} P\varepsilon + Uv \).
Proof:
Since (4), for any symmetric matrix \( P > 0 \) we have: \( A^{T} P + PA + \widehat{Q} - PBB^{T} P = 0 \).
where:
Since (7), we have \( Q > \gamma C^{T} C \) then exists a constant \( \mu > 0 \) such that: \( Q - \gamma C^{T} C > \mu I \) .Then, by continuity, there exists \( \alpha > 0 \), such that for all symmetric matrix \( P > 0 \) satisfying \( \left| {P - P^{*} } \right| < \alpha \) we have \( \widehat{Q} > Q - \mu I \), which implies \( \widehat{Q} > \gamma C^{T} C \). Therefore, by Theorem 1, the control \( u = - B^{T} P\varepsilon + Uv \) globally asymptotically stabilizes system.
Remark 2:
The proposed optimal control law guarantees the GAS property of the closed-loop system in presence of uncertain parameters and external disturbances.
Using assumption 3, we can always find a vector \( \bar{v}\left( t \right) \in {\mathbf{\mathbb{R}}}^{{q_{E} }} \) and a matrix \( \bar{E} \in {\mathbf{\mathbb{R}}}^{{q_{E} \times q_{E} }} \) such that:
with \( G \in {\mathbf{\mathbb{R}}}^{{q \times q_{E} }} \) is an unknown constant matrix.
From (5); (8) and Theorem 1, the LOORP is solved, if we design a controller \( u = K^{*} \left( {x - XG\bar{v}\left( t \right)} \right) + UG\bar{v}\left( t \right) \)
PI based output ADP design:
Suppose \( \Delta \) is available during the learning phase.
Define \( \bar{P}_{k} = C^{T} P_{k} C \) (8), from (6), we have:
Since (5); (8); (11), we have:
Then
Applying Kronecker product representation gives:
Define: \( \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X} = G_{1;k} ; \)
Consequently, we have:
Assumption 7:
For each \( k = 1;2; \ldots \) there exists an integer \( N \) such that, when \( k \ge N \), the following rank condition holds:
By assumption 7, then
\( \left[ {vec\left( {\bar{P}_{k} } \right),vec\left( {K_{k + 1} } \right),vec\left( {K_{k + 1} \bar{X}} \right),vec\left( {G_{1;k} } \right),vec\left( {G_{2;k} } \right),vec\left( {G_{3;k} } \right)} \right]^{T} \) can be uniquely determined by:
Assumption 2 implies that \( B \) is in full column rank, so \( K_{k + 1} = B^{T} P_{k} \) is in full row rank, then:
Remark 3:
It is different from [2], we obtain the adaptive optimal control law for continuous time systems affected by external disturbances.
Now, we are ready to propose the following adaptive optimal control algorithm for practical online implementation.
Theorem 3:
Let \( K_{0} \) be any stailizing feedback gain matrix, and let \( \left( {P_{k} ;K_{k + 1} ;\bar{U}} \right) \) be obtained from Algorithm 1. Then, under assumption 7, the following properties hold:
-
(1)
\( A - BK_{k} \) is Hurwitz.
-
(2)
\( P^{*} \le P_{k + 1} \le P_{k} \)
-
(3)
\( \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} \)
Proof:
From (9); (10) one sees that the \( \left( {P_{k} ;K_{k + 1} } \right) \) obtained from (2); (3) must satisfy the condition (12). In addition, by assumption 7, it is unique. Therefore, the solution in theorem Kleinman 1968 is the same as the solution in (13) for any \( k \ge N \).
3 Simulation Results
In this section, we apply the proposed adaptive optimal control law to an inverted pendulum on a cart described as (15) and Table 1. The simulation results in Fig. 1 show the convergence of matrix P and K of proposed algorithm and the tracking errors converge to zero.
4 Conclusion
This paper presents an adaptive optimal control algorithm for practical online implementation of continuous-time systems with unknown system dynamics and external disturbance. The proposed algorithm pointed out the global asymptotical stability property and convergence properties. The theory and simulation results illustrate the effectiveness of proposed algorithm.
References
Jiang, Z.P., Jiang, Y.: Robust adaptive dynamic programming for linear and nonlinear systems: an overview. Eur. J. Control 19(5), 417–425 (2013)
Gao, W., Jiang, Z.P.: Adaptive optimal output regulation via output-feedback: an adaptive dynamic programing approach. In: IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 5845–5850 (2016)
Hewer, G.: An iterative technique for the computation of the steady state gains for the discrete optimal regulator. IEEE Trans. Autom. Control 16(4), 382–384 (1971)
Lancaster, P., Rodman, L.: Algebraic Riccati Equations. Oxford University Press Inc., New York (1995)
Wang, D., Liu, D., Li, H., Luo, B., Ma, H.: An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans. Syst. Man Cybern.: Syst. 46(5), 713–717 (2016)
Fan, Q.Y., Yang, G.H.: Adaptive actor-critic design-based integral sliding-mode control for partially unknown nonlinear systems with input disturbances. IEEE Trans. Neural Netw. Learn. Syst. 27(1), 165–177 (2016)
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming and feedback stabilization of nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 882–893 (2014)
Jiang, Y., Jiang, Z.P.: Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 48(10), 2699–2704 (2012)
Gao, W., Jiang, Y., Jiang, Z.P., Chai, T.: Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming. Automatica 72, 37–45 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Nam, D.P., Van Huong, N., Minh, H.D., Long, N.T. (2018). Dynamic Programming Based Adaptive Optimal Control for Inverted Pendulum. In: Duy, V., Dao, T., Zelinka, I., Kim, S., Phuong, T. (eds) AETA 2017 - Recent Advances in Electrical Engineering and Related Sciences: Theory and Application. AETA 2017. Lecture Notes in Electrical Engineering, vol 465. Springer, Cham. https://doi.org/10.1007/978-3-319-69814-4_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-69814-4_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69813-7
Online ISBN: 978-3-319-69814-4
eBook Packages: EngineeringEngineering (R0)