Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Chun, Tae Yoon; Park, Jin Bae; Choi, Yoon Ho

doi:10.1007/s12555-017-0172-5

Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Regular Papers
Intelligent Control and Applications
Published: 02 March 2018

Volume 16, pages 377–386, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Download PDF

Tae Yoon Chun¹,
Jin Bae Park¹ &
Yoon Ho Choi²

319 Accesses
23 Citations
Explore all metrics

Abstract

In this paper, we propose a novel Q-learning method based on multirate generalized policy iteration (MGPI) for unknown discrete-time (DT) linear quadratic regulation (LQR) problems. Q-learning is an effective scheme for unknown dynamical systems because it does not require any knowledge of the system dynamics to solve optimal control problems. By applying the MGPI concept, which is an extension of basic GPI with multirate time horizon steps, a new Q-learning algorithm is proposed for solving the LQR problem. Further, it is proven that the proposed algorithm converges to an optimal solution i.e., it learns the optimal control policy iteratively using the states and the control-input information. Finally, we employ the two degree-of-freedom helicopter model to verify the effectiveness of the proposed method and investigate its convergence properties.

Article PDF

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Article 08 April 2024

Minimax Q-learning design for H_∞ control of linear discrete-time systems

Article 04 February 2022

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

Article 18 November 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

W. Gao, M. Huang, Z.-P. Jiang, and T. Chai, “Sampleddata-based adaptive optimal output-feedback control of a 2-degree-of-freedom helicopter,” IET Control Theory & Applications, vol. 10, no. 12, pp. 1440–1447, 2016. [click]
Article MathSciNet Google Scholar
B. Zheng and Y. Zhong, “Robust attitude regulation of a 3-DOF helicopter benchmark: theory and experiments,” IEEE Trans. Ind. Electron., vol. 58, no. 2, pp. 660–670, 2011. [click]
Article Google Scholar
T. Bresciani, “Modelling, identification and control of a quadrotor helicopter,” MSc Theses, 2008.
Google Scholar
E. V. Kumar, G. S. Raaja, and J. Jerome, “Adaptive PSO for optimal LQR tracking control of 2 DoF laboratory helicopter,” Applied Soft Computing, vol. 41, pp. 77–90, 2016.
Article Google Scholar
G.-R. Yu and H.-T. Liu, “Sliding mode control of a twodegree-of-freedom helicopter via linear quadratic regulator,” in Proc. of IEEE International Conference on Systems, Man and Cybernetics, vol. 4, pp. 3299–3304, IEEE, 2005.
Google Scholar
H. Ríos, A. Rosales, A. Ferreira, and A. Dávilay, “Robust regulation for a 3-dof helicopter via sliding-modes control and observation techniques,” Proceedings of the 2010 American Control Conference, pp. 4427–4432, IEEE, 2010.
Chapter Google Scholar
Y. Yu, G. Lu, C. Sun, and H. Liu, “Robust backstepping decentralized tracking control for a 3-DOF helicopter,” Nonlinear Dynamics, vol. 82, no. 1-2, pp. 947–960, 2015.
Article MathSciNet MATH Google Scholar
P. J. Werbos, A Menu of Designs for Reinforcement Learning Over Time. MIT Press, Cambridge, 1990.
Google Scholar
F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,” IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32–50, 2009. [click]
Article Google Scholar
J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, 2004.
Book Google Scholar
J. M. Lee and J. H. Lee, “Approximate dynamic programming-based approaches for input-output datadriven control of nonlinear processes,” Automatica, vol. 41, no. 7, pp. 1281–1288, 2005. [click]
Article MathSciNet MATH Google Scholar
Q. Wei, F. L. Lewis, Q. Sun, P. Yan, and R. Song, “Discrete-time deterministic Q-learning: a novel convergence analysis,” IEEE Trans. Cybern., to be published, doi: 10.1109/TCYB.2016.2542923.
L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement learning and dynamic programming using function approximators, vol. 39. CRC press, 2010.
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” Handbook of intelligent control: Neural, fuzzy, and adaptive approaches, vol. 15, pp. 493–525, 1992.
Google Scholar
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Integral temporal difference learning for continuous-time linear quadratic regulations,” International Journal of Control, Automation and Systems, vol. 15, no. 1, pp. 226–238, 2017. [click]
Article Google Scholar
S. T. Hagen and B. Kröse, “Linear quadratic regulation using reinforcement learning,” Proc. Belgian Dutch Conf. Mech. Learn., pp. 39–46, 1998.
Google Scholar
M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, “Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems,” IEEE Transactions on Cybernetics, vol. 45, no. 2, pp. 165–176, 2015. [click]
Article Google Scholar
S. J. Bradtke, B. E. Ydstie, and A. G. Barto, “Adaptive linear quadratic control using policy iteration,” Proc. of American Control Conference (ACC), vol. 3, pp. 3475–3479, 1994.
Google Scholar
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge Univ Press, 1998.
Google Scholar
Q. Wei, D. Liu, and X. Yang, “Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems,” IEEE Trans. Neural Netw. Learn. Syst., vol. 26, no. 4, pp. 866–879, 2015. [click]
Article MathSciNet Google Scholar
D. Liu, Q. Wei, and P. Yan, “Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems,” IEEE Trans. Syst., Man, Cybern., Syst., vol. 45, no. 12, pp. 1577–1591, 2015. [click]
Article Google Scholar
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Stability and monotone convergence of generalized policy iteration for discrete-time linear quadratic regulations,” International Journal of Control, vol. 89, no. 3, pp. 437–450, 2016. [click]
Article MathSciNet MATH Google Scholar
T. Y. Chun, J. Y. Lee, J. B. Park, and Y. H. Choi, “Adaptive dynamic programming for discrete-time linear quadratic regulation based on multirate generalized policy iteration,” to appear in International Journal of Control( 10.1080/00207179.2017.1312669.), 2016.
Google Scholar
B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only Q-learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2134–2144, 2016. [click]
Article MathSciNet Google Scholar
A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, “Adaptive critic designs for discrete-time zero-sum games with application to H _∞ control,” IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 1, pp. 240–247, 2007. [click]
Article MATH Google Scholar
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, “Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014. [click]
Article MathSciNet MATH Google Scholar
J. Y. Lee, J. B. Park, and Y. H. Choi, “On integral generalized policy iteration for continuous-time linear quadratic regulations,” Automatica, vol. 50, no. 2, pp. 475–489, 2014. [click]
Article MathSciNet MATH Google Scholar
J. W. Brewer, “Kronecker products and matrix calculus in system theory,” IEEE Trans. Circuits Syst., vol. 25, no. 9, pp. 772–781, 1978. [click]
Article MathSciNet MATH Google Scholar
Q. Quanser, “2-DOF helicopter user and control manual,” Markham, Ontario, 2006.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Yonsei University, 50 Yonsei-ro, Seodaemungu, Seoul, Korea
Tae Yoon Chun & Jin Bae Park
Department of Electronic Engineering, Kyonggi University, 94-6 Yiui-dong, Yeongtong-gu, Suwon, Kyonggi-Do, Korea
Yoon Ho Choi

Authors

Tae Yoon Chun
View author publications
You can also search for this author in PubMed Google Scholar
Jin Bae Park
View author publications
You can also search for this author in PubMed Google Scholar
Yoon Ho Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Bae Park.

Additional information

Recommended by Associate Editor Do Wan Kim under the direction of Editor Euntai Kim.

Tae Yoon Chun received his B.S., M.S., and Ph.D. degrees in Electrical and Electronic Engineering from Yonsei University, Seoul, Korea, in 2010, 2012, and 2017, respectively. His major research interests include approximate dynamic programming/reinforcement learning, optimal/ adaptive control, synchrophasor, and power systems.

Jin Bae Park received the B.S. degree in electrical engineering from Yonsei University, Seoul, Korea, in 1977, and the M.S. and Ph.D. degrees in electrical engineering from Kansas State University, Manhattan, KS, USA, in 1985 and 1990, respectively. He has been with the Department of Electrical and Electronic Engineering, Yonsei University, since 1992, where he is currently a Professor. His current research interests include robust control and filtering, nonlinear control, drone, intelligent mobile robot, fuzzy logic control, neural networks, adaptive dynamic programming, and genetic algorithms. Dr. Park served as the Editor-in-Chief of the International Journal of Control, Automation, and Systems from 2006 to 2010, and the President of the Institute of Control, Robot, and Systems Engineers in 2013.

Yoon Ho Choi received his B.S., M.S., and Ph.D. degrees in electrical engineering from Yonsei University, Seoul, Korea, in 1980, 1982, and 1991, respectively. He was with the Department of Electrical Engineering, Ohio State University, Columbus, OH, USA, as a Visiting Scholar from 2000 to 2002 and from 2009 to 2010. He has been with the Department of Electronic Engineering, Kyonggi University, Suwon, Korea, since 1993, where he is currently a Professor. His current research interests include nonlinear control, intelligent control, multilegged and mobile robots, networked control systems, and ADP-based control. He was the Director of the Institute of Control, Robotics and Systems from 2003 to 2004 and from 2007 to 2008, where he also served as the Vice President from 2012 to 2015.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chun, T.Y., Park, J.B. & Choi, Y.H. Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter. Int. J. Control Autom. Syst. 16, 377–386 (2018). https://doi.org/10.1007/s12555-017-0172-5

Download citation

Received: 28 March 2017
Accepted: 09 May 2017
Published: 02 March 2018
Issue Date: February 2018
DOI: https://doi.org/10.1007/s12555-017-0172-5

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Minimax Q-learning design for H_∞ control of linear discrete-time systems

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter

Abstract

Article PDF

Similar content being viewed by others

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Minimax Q-learning design for H∞ control of linear discrete-time systems

A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Minimax Q-learning design for H_∞ control of linear discrete-time systems