Abstract
In this paper, a novel Q-learning based policy iteration adaptive dynamic programming (ADP) algorithm is developed to solve the optimal control problems for discrete-time nonlinear systems. The idea is to use a policy iteration ADP technique to construct the iterative control law which stabilizes the system and simultaneously minimizes the iterative Q function. Convergence property is analyzed to show that the iterative Q function is monotonically non-increasing and converges to the solution of the optimality equation. Finally, simulation results are presented to show the performance of the developed algorithm.
This work was supported in part by the National Natural Science Foundation of China under Grants 61273140, 61304086, 61374105, and 61233001, and in part by Beijing Natural Science Foundation under Grant 4132078.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Abu-Khalaf, M., Lewis, F.L.: Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5), 779–791 (2005)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Transactions on Systems, Man, and Cybernetics–Part B: Cybernetics 38(4), 943–949 (2008)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
Jiang, Y., Jiang, Z.P.: Robust adaptive dynamic programming with an application to power systems. IEEE Transactions on Neural Networks and Learning Systems 24(7), 1150–1156 (2013)
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(3), 621–634 (2014)
Modares, H., Lewis, F.L.: Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Transactions on Automatic Control 59(11), 3051–3056 (2014)
Modares, H., Lewis, F.L., Naghibi-Sistani, M.B.: Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Transactions on Neural Networks and Learning systems 24(10), 1513–1525 (2013)
Modares, H., Lewis, F.L.: Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica 50(7), 1780–1792 (2014)
Kiumarsi, B., Lewis, F.L., Modares, H., Karimpur, A., Naghibi-Sistani, M.B.: Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50(4), 1167–1175 (2014)
Prokhorov, D.V., Wunsch, D.C.: Adaptive critic designs. IEEE Transactions on Neural Networks 8(5), 997–1007 (1997)
Song, R., Xiao, W., Zhang, H., Sun, C.: Adaptive dynamic programming for a class of complex-valued nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 25(9), 1733–1739 (2014)
Song, R., Lewis, F.L., Wei, Q., Zhang, H., Jiang, Z.-P., Levine, D.: Multiple actor-critic structures for continuous-time optimal control using input-output data. IEEE Transactions on Neural Networks and Learning Systems 26(4), 851–865 (2015)
Si, J., Wang, Y.-T.: On-line learning control by association and reinforcement. IEEE Transactions on Neural Networks 12(2), 264–276 (2001)
Watkins, C.: Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge (1989)
Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)
Wei, Q., Liu, D.: An iterative ε-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Networks 32, 236–244 (2012)
Wei, Q., Zhang, H., Dai, J.: Model-free multiobjective approximate dynamic programming for discrete-time nonlinear systems with general performance index functions. Neurocomputing 72(7–9), 1839–1848 (2009)
Wei, Q., Liu, D.: A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Transactions on Automation Science and Engineering 11(4), 1176–1190 (2014)
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics 61(11), 6399–6408 (2014)
Wei, Q., Liu, D.: Neural-network-based adaptive optimal tracking control scheme for discrete-time nonlinear systems with approximation errors. Neurocomputing 149(3), 106–115 (2015)
Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Transactions on Industrial Electronics (2015) (article in press)
Wei, Q., Liu, D., Shi, G.: A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Transactions on Industrial Electronics 62(4), 2509–2518 (2015)
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering 11(4), 1020–1036 (2014)
Wei, Q., Liu, D., Yang, X.: Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Transactions on Neural Networks and Learning Systems 26(4), 866–879 (2015)
Werbos, P.J.: Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook 22, 25–38 (1977)
Werbos, P.J.: A menu of designs for reinforcement learning over time. In: Miller, W.T., Sutton, R.S., Werbos, P.J. (eds.) Neural Networks for Control, pp. 67–95. MIT Press, Cambridge (1991)
Xu, X., Lian, C., Zuo, L., He, H.: Kernel-based approximate dynamic programming for real-time online learning control: An experimental study. IEEE Transactions on Control Systems Technology 22(1), 146–156 (2014)
Zhang, H., Wei, Q., Luo, Y.: A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Transactions on System, Man, and cybernetics–Part B: Cybernetics 38(4), 937–942 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wei, Q., Liu, D. (2015). A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning. In: Hu, X., Xia, Y., Zhang, Y., Zhao, D. (eds) Advances in Neural Networks – ISNN 2015. ISNN 2015. Lecture Notes in Computer Science(), vol 9377. Springer, Cham. https://doi.org/10.1007/978-3-319-25393-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-25393-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25392-3
Online ISBN: 978-3-319-25393-0
eBook Packages: Computer ScienceComputer Science (R0)