Abstract
A novel self-learning optimal control method for a class of discrete-time nonlinear systems is proposed based on iteration adaptive dynamic programming (ADP) algorithm. It is proven that the iteration costate functions converge to the optimal one, and a detailed convergence analysis of the iteration ADP algorithm is given. Furthermore, echo state network (ESN) architecture is used as the approximator of the costate function for each iteration. To ensure the reliability of the ESN approximator, the ESN mean square training error is constrained in the satisfactory range. Two simulation examples are given to demonstrate that the proposed control method has a fast response speed due to the special structure and the fast training process.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Werbos P J. A Menu of Designs for Reinforcement Learning Over Time, in Neural Networks for Control. Massachusetts: MIT Press, 1991. 67–95
Werbos P J. Approximate Dynamic Programming for Real-Time Control and Neural Modeling, in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches. New York: Van Nostrand Reinhold, 1992.
Liu D, Javaherian H, Kovalenko O, et al. Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Trans Syst Man Cybern B Cybern, 2008, 38: 988–993
Liu D, Xiong X, Zhang Y. Action-dependent adaptive critic designs. In: Proceedings of International Joint Conference on Neural Networks, Washington, 2001. 2: 990–995
Liu D, Zhang H. A neural dynamic programming approach for learning control of failure avoidance problems. Int J Intell Syst, 2005, 10: 21–32
Liu D, Zhang Y, Zhang H. A self-learning call admission control scheme for CDMA cellular networks. IEEE Trans Neural Netw, 2005, 16: 1219–1228
Powell W B. Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York: Wiley, 2009
Zheng C, Jagannathan S. Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discrete-time systems. IEEE Trans Neural Netw, 2008, 19: 90–106
He P, Jagannathan S. Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints. IEEE Trans Syst Man Cybern B Cybern, 2007, 37: 425–436
Al-Tamimi A, Lewis F L, Abu-Khalaf M. Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control. Automatica, 2007, 43: 473–481
Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45: 477–484
Vamvoudakis K G, Lewis F L. Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica, 2010, 46: 878–888
Abu-Khalaf M, Lewis F L. Nearly optimal control laws for nonlinear systems withsaturating actuators using a neural network HJB approach. Automatica, 2005, 41: 779–791
Murray J J, Cox C J, Lendaris G G, et al. Adaptive dynamic programming. IEEE Trans Syst Man Cybern C Appl Rev, 2002, 32: 140–153
Si J, Wang Y T. On-line learning control by association and reinforcement. IEEE Trans Neural Netw, 2001, 12: 264–276
Enns R, Si J. Helicopter trimming and tracking control using direct neural dynamic programming. IEEE Trans Neural Netw, 2003, 14: 929–939
Zhang H G, Wei Q L, Luo Y H. A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man Cybern B Cybern, 2008, 38: 937–942
Zhang H G, Luo Y H, Liu D R. Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw, 2009, 20: 1490–1503
Wang F Y, Jin N, Liu D R, et al. Adaptive dynamic programming for finite horizon optimal control of discrete-time nonlinear systems with ɛ-error bound. IEEE Trans Neural Netw, 2011, 22: 24–36
Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47: 207–214
Chang W D, Hwang R C, Hsieh J G. Stable direct adaptive neural controller of nonlinear systems based on single auto-tuning neuron. Neurocomputing, 2002, 48: 541–554
Du H B, Chen X C. NN-based output feedback adaptive variable structure control for a class of non-affine nonlinear systems: A nonseparation principle design. Neurocomputing, 2009, 72: 2009–2016
Song R Z, Zhang H G, Luo Y H, et al. Optimal control laws for time-delay systems with saturating actuators based on heuristic dynamic programming. Neurocomputing, 2010, 73: 3020–3027
Wei Q L, Zhang H G, Dai J. Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing, 2009, 72: 1839–1848
Li X, Xian B, Diao C, et al. Output feedback control of hypersonic vehicles based on neural network and high gain observer. Sci China Inf Sci, 2011, 54: 429–447
Xu B, Gao D, Wang S. Adaptive neural control based on HGO for hypersonic flight vehicles. Sci China Inf Sci, 2011, 54: 511–520
Wang M, Zhang S, Chen B, et al. Direct adaptive neural control for stabilization of nonlinear time-delay systems. Sci China Inf Sci, 2010, 53: 800–812
Huang Z, Wang X, Sannay M. Self-excitation of neurons leads to multiperiodicity of discrete-time neural networks with distributed delays. Sci China Inf Sci, 2011, 54: 305–317
Jaeger H. A Tutorial on Training Recurrent Neural Networks, Covering BPPT, RTRL, EKF and the Echo State Network Approach. Bremen: International University Bremen, 2002
Čerňanský M. Feed-forward Echo State Networks. In: Proceedings of International Joint Conference on Neural Networks, Montreal, 2005. 14 1479–148
Liu Z, Zhang H, Zhang Q. Novel stability analysis for recurrent neural networks with multiple delays via line integraltype L-K functional. IEEE Trans Neural Netw, 2010, 21: 1710–1718
Zhang H, Liu Z, Huang G, et al. Novel weighting-delay-based stability criteria for recurrent neural networks with time-varying delay. IEEE Trans Neural Netw, 2010, 21: 91–106
Lukoševičius M, Popovici D, Jaeger H, et al. T Time warping invariant echo state networks, 2006. A Available form: http://jpubs.jacobs-university.de/bitstream/579/149/1/twiesniubtechreport.pd
Koprinkova-Hristova P, Oubbati M, Palm G. Adaptive critic design with echo state network. In: Proceedings of the IEEE International Conference on Systems Man and Cybernetics, Istanbul, 2010. 1010–1015
Jaeger H. The Echo State Approach to Analysing and Training Recurrent Neural Networks. GMD Report 148, GMDGerman National Research Institute for Computer Science. 2001
Jaeger H. Short Term Memory in Echo State Networks. GMD Report 152, GMD-German National Research Institute for Computer Science. 2002
Prokhorov D. Echo state networks: appeal and challenges. In: Proceedings of the International Joint Conference on Neural Networks, Montreal, 2005. 1463–1466
Rodan A, Tiňo P. Minimum complexity echo state network. IEEE Trans Neural Netw, 2011, 22: 131–144
Xia Y L, Jelfs B, van Hulle Marc M, et al. An augmented echo state network for nonlinear adaptive filtering of complex noncircular signals, IEEE Trans Neural Netw, 2011, 22: 74–83
Lin W S. Optimality and convergence of adaptive optimal control by reinforcement synthesis. Automatica, 2008, 44: 2716–2723
Zhang H G, Song R Z, Wei Q L, et al. Optimal tracking control for a class of nonlinear discrete-time systems with time delays based on heuristic dynamic programming. IEEE Trans Neural Netw, 2011, 22: 1851–1862
Al-Tamimi A, Lewis F L. Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern B Cybern, 2007, 38: 943–949
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, R., Xiao, W. & Sun, C. A new self-learning optimal control laws for a class of discrete-time nonlinear systems based on ESN architecture. Sci. China Inf. Sci. 57, 1–10 (2014). https://doi.org/10.1007/s11432-013-4954-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11432-013-4954-y