Abstract
In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
W. Gao, M. Huang, Z.-P. Jiang, and T. Chai, “Sampleddata-based adaptive optimal output-feedback control of a 2-degree-of-freedom helicopter,” IET Control Theory & Applications, vol. 10, no. 12, pp. 1440–1447, 2016. [click]
B. Zheng and Y. Zhong, “Robust attitude regulation of a 3-DOF helicopter benchmark: theory and experiments,” IEEE Trans. Ind. Electron., vol. 58, no. 2, pp. 660–670, 2011. [click]
T. Bresciani, “Modelling, identification and control of a quadrotor helicopter,” MSc Theses, 2008.
E. V. Kumar, G. S. Raaja, and J. Jerome, “Adaptive PSO for optimal LQR tracking control of 2 DoF laboratory helicopter,” Applied Soft Computing, vol. 41, pp. 77–90, 2016.
G.-R. Yu and H.-T. Liu, “Sliding mode control of a twodegree-of-freedom helicopter via linear quadratic regulator,” in Proc. of IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3299–3304, IEEE, 2005.
H. Ríos, A. Rosales, A. Ferreira, and A. Dávilay, “Robust regulation for a 3-dof helicopter via sliding-modes control and observation techniques,” Proceedings of the 2010 American Control Conference, pp. 4427–4432, IEEE, 2010.
Y. Yu, G. Lu, C. Sun, and H. Liu, “Robust backstepping decentralized tracking control for a 3-DOF helicopter,” Nonlinear Dynamics, vol. 82, no. 1-2, pp. 947–960, 2015.
P. J. Werbos, A Menu of Designs for Reinforcement Learning Over Time. MIT Press, Cambridge, 1990.
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, 2009. [click]
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, 2004.
J. M. Lee and J. H. Lee, “Approximate dynamic programming-based approaches for input-output datadriven control of nonlinear processes,” Automatica, vol. 41, no. 7, pp. 1281–1288, 2005. [click]
Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time deterministic Q-learning: a novel convergence analysis,” IEEE Trans. Cybern., to be published, doi: 10.1109/TCYB.2016.2542923.
L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement learning and dynamic programming using function approximators, vol. 39. CRC press, 2010.
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” Handbook of intelligent control: Neural, fuzzy, and adaptive approaches, vol. 15, pp. 493–525, 1992.
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral temporal difference learning for continuous-time linear quadratic regulations,” International Journal of Control, Automation and Systems, vol. 15, no. 1, pp. 226–238, 2017. [click]
S. T. Hagen and B. Kröse, “Linear quadratic regulation using reinforcement learning,” Proc. Belgian Dutch Conf. Mech. Learn., pp. 39–46, 1998.
M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 165–176, 2015. [click]
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proc. of American Control Conference (ACC), vol. 3, pp. 3475–3479, 1994.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge Univ Press, 1998.
Q. Wei, D. Liu, and X. Yang, “Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866–879, 2015. [click]
D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 45, no. 12, pp. 1577–1591, 2015. [click]
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Stability and monotone convergence of generalized policy iteration for discrete-time linear quadratic regulations,” International Journal of Control, vol. 89, no. 3, pp. 437–450, 2016. [click]
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalized policy iteration,” to appear in International Journal of Control( 10.1080/00207179.2017.1312669.), 2016.
B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2134–2144, 2016. [click]
A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, “Adaptive critic designs for discrete-time zero-sum games with application to H ∞ control,” IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 1, pp. 240–247, 2007. [click]
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014. [click]
J. Y. Lee, J. B. Park, and Y. H. Choi, “On integral generalized policy iteration for continuous-time linear quadratic regulations,” Automatica, vol. 50, no. 2, pp. 475–489, 2014. [click]
J. W. Brewer, “Kronecker products and matrix calculus in system theory,” IEEE Trans. Circuits Syst., vol. 25, no. 9, pp. 772–781, 1978. [click]
Q. Quanser, “2-DOF helicopter user and control manual,” Markham, Ontario, 2006.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Do Wan Kim under the direction of Editor Euntai Kim.
Tae Yoon Chun received his B.S., M.S., and Ph.D. degrees in Electrical and Electronic Engineering from Yonsei University, Seoul, Korea, in 2010, 2012, and 2017, respectively. His major research interests include approximate dynamic programming/reinforcement learning, optimal/ adaptive control, synchrophasor, and power systems.
Jin Bae Park received the B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1977, and the M.S. and Ph.D. degrees in electrical engineering from Kansas State University, Manhattan, KS, USA, in 1985 and 1990, respectively. He has been with the Department of Electrical and Electronic Engineering, Yonsei University, since 1992, where he is currently a Professor. His current research interests include robust control and filtering, nonlinear control, drone, intelligent mobile robot, fuzzy logic control, neural networks, adaptive dynamic programming, and genetic algorithms. Dr. Park served as the Editor-in-Chief of the International Journal of Control, Automation, and Systems from 2006 to 2010, and the President of the Institute of Control, Robot, and Systems Engineers in 2013.
Yoon Ho Choi received his B.S., M.S., and Ph.D. degrees in electrical engineering from Yonsei University, Seoul, Korea, in 1980, 1982, and 1991, respectively. He was with the Department of Electrical Engineering, Ohio State University, Columbus, OH, USA, as a Visiting Scholar from 2000 to 2002 and from 2009 to 2010. He has been with the Department of Electronic Engineering, Kyonggi University, Suwon, Korea, since 1993, where he is currently a Professor. His current research interests include nonlinear control, intelligent control, multilegged and mobile robots, networked control systems, and ADP-based control. He was the Director of the Institute of Control, Robotics and Systems from 2003 to 2004 and from 2007 to 2008, where he also served as the Vice President from 2012 to 2015.
Rights and permissions
About this article
Cite this article
Chun, T.Y., Park, J.B. & Choi, Y.H. Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter. Int. J. Control Autom. Syst. 16, 377–386 (2018). https://doi.org/10.1007/s12555-017-0172-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-017-0172-5