Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Zhao, Jin-Gang

doi:10.1007/s12555-022-1133-1

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Regular Papers
Intelligent Control and Applications
Published: 08 April 2024

Volume 22, pages 1751–1759, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Control, Automation and Systems Aims and scope Submit manuscript

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Download PDF

Jin-Gang Zhao ORCID: orcid.org/0000-0002-2583-8446¹

128 Accesses
Explore all metrics

Abstract

This paper studies the fully cooperative game tracking control problem (FCGTCP) for a class of discrete-time multi-player linear systems with unknown dynamics. The reference trajectory is generated by a command generator system. An augmented multi-player systems composed of the origin multi-player systems and the command generator system is constructed, and an exponential discounted cost function is introduced to derive an augmented fully cooperative game tracking algebraic Riccati equation (FCGTARE). When the system dynamics are known, a model-based policy iteration (PI) algorithm is proposed to solve the augmented FCGTARE. Furthermore, to relax the system dynamics, an online reinforcement Q-learning algorithm is designed to obtain the solution to the augmented FCGTARE. The convergence of designed online reinforcement Q-learning algorithm is proved. Finally, two simulation examples are given to verify the validity of the model-based PI algorithm and online reinforcement Q-learning algorithm.

Article PDF

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Article 12 November 2019

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Article 12 December 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

F. L. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control, 3rd ed., John Wiley and Sons, 2015.
C. Deng, C. Wen, W. Wang, X. Li, and D. Yue, “Distributed adaptive tracking control for high-order nonlinear multi-agent systems over event-triggered communication,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 1176–1183, 2023.
Article MathSciNet Google Scholar
R. Postoyan, N. Van de Wouw, D. Nesic, and W. P. M.H Heemels, “Tracking control for nonlinear networked control systems,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1539–1554, 2014.
Article MathSciNet Google Scholar
M. Chen, S. S. Ge, and B. Ren, “Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints,” Automatica, vol. 47, no. 3, pp. 452–465, 2011.
Article MathSciNet Google Scholar
H. Chen, Y. C. Fang, and N. Sun, “An adaptive tracking control method with swing suppression for 4-DOF tower crane systems,” Mechanical Systems and Signal Processing, vol. 123, pp. 426–442, 2019.
Article Google Scholar
Z. P. Jiang and H. Nijmeijer, “Tracking control of mobile robots: A case study in backstepping,” Automatica, vol. 33, no. 7, pp. 1393–1399, 1997.
MathSciNet Google Scholar
Q. Gao, X. T. Wei, D. H. Li, Y. H. Ji, and C. Jia, “Tracking control for a quadrotor via dynamic surface control and adaptive dynamic programming,” International Journal of Control, Automation, and Systems, vol. 20, pp. 349–363, 2022.
Article Google Scholar
H. Chen, Y. C. Fang, and N. Sun, “Optimal trajectory planning and tracking control method for overhead cranes,” IET Control Theory & Applications, vol. 10, no. 6, pp. 692–699, 2016.
Article MathSciNet Google Scholar
C. Deng, C. Wen, J. Huang, X. M. Zhang, and Y. Zou, “Distributed observer-based cooperative control approach for uncertain nonlinear MASs under event-triggered communication,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2669–2676, 2022.
Article MathSciNet Google Scholar
D. G. Xu, Q. L. Wang, and Y. Li, “Optimal guaranteed cost tracking of uncertain nonlinear systems using adaptive dynamic programming with concurrent learning,” International Journal of Control, Automation, and Systems, vol. 18, no. 5, pp. 1116–1127, 2020.
Article Google Scholar
B. Zhao and Y. C. Li, “Model-free adaptive dynamic programming based near-optimal decentralized tracking control of reconfigurable manipulators,” International Journal of Control, Automation, and Systems, vol. 16, no. 2, pp. 478–490, 2018.
Article Google Scholar
A. Mannava, S. N. Balakrishnan, L. Tang, and R. G. Landers, “Optimal tracking control of motion systems,” IEEE Transactions on Control Systems Technology, vol. 20, no. 6, pp. 1548–1558, 2012.
Article Google Scholar
J. Zhao, “Neural network-based optimal tracking control of continuous-time uncertai nonlinear system via reinforcement learning,” Neural Processing Letters, vol. 51, no. 3, pp. 2513–2530, 2020.
Article Google Scholar
Q. Wei and D. Liu, “Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification,” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 4, pp. 1020–1036, 2014.
Article Google Scholar
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M. B. Naghibi-Sistani, “Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014.
Article MathSciNet Google Scholar
Y. Wen, H. Zhang, H. Su, and H. Ren, “Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning,” Optimal Control Applications and Methods, vol. 41, no. 4, pp. 1233–1250, 2020.
Article MathSciNet Google Scholar
Q. Zhang, D. Zhao, and Y. Zhu, “Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs,” Neurocomputing, vol. 238, pp. 377–386, 2017.
Article Google Scholar
K. Zhang, S. L. Ge, and Y. L. Ge, “Adaptive dynamic programming for minimal energy control with guaranteed convergence rate of linear systems,” International Journal of Control, Automation, and Systems, vol. 17, no. 2, pp. 3140–3148, 2019.
Article Google Scholar
W. N. Gao, Y. Y. Liu, A. Odekunle, Y. J. Yu, and P. L. Lu, “Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems,” International Journal of Control, Automation, and Systems, vol. 16, no. 5, pp. 2273–2281, 2018.
Article Google Scholar
L. An and G. Yang, “Optimal transmission power scheduling of networked control systems via fuzzy adaptive dynamic programming,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 6, pp. 1629–1639, 2021.
Article Google Scholar
J. Zhao and P. Vishal, “Neural network-based optimal tracking control for partially unknown discrete-time nonlinear systems using reinforcement learning,” IET Control Theory and Applications, vol. 15, no. 2, pp. 260–271, 2021.
Article MathSciNet Google Scholar
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6423–6434, 2021.
Article Google Scholar
A. AI-tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007.
Article MathSciNet Google Scholar
T. Y. Chun, J. B. Park, and Y. H. Choi, “Reinforcement Q-learning based on multirate generalized policy iteration and its application to a 2-DOF helicopter,” International Journal of Control, Automation, and Systems, vol. 16, pp. 377–386, 2018.
Article Google Scholar
A. Odekunle, W. N. Gao, M. Davari, and Z. P. Jiang, “Reinforcement learning and non-zero-sum game output regulaton for multi-player linear uncertain systems,” Automatica, vol. 112, 108672, 2020.
Article Google Scholar
L. An and G. Yang, “Opacity enforcement for confidential robust control in linear cyber-physical systems,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 1234–1241, 2020.
Article MathSciNet Google Scholar
L. An and G. Yang, “Data-driven coordinated attack policy design based on adaptive L2-gain optimal theory,” IEEE Transactions on Automatic Control, vol. 63, no. 6, pp. 1850–1857, 2018.
Article MathSciNet Google Scholar
B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only q-learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2134–2144, 2016.
Article MathSciNet Google Scholar
M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 6, pp. 3692–3703, 2022.
Article Google Scholar
J. Lu, Q. Wei, and F. Y. Wang, “Parallel control for optimal tracking via adaptive dynamic programming,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 6, pp. 1662–1674, 2020.
Article MathSciNet Google Scholar
C. Li, J. Ding, F. L. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, 109687, 2021.
Article MathSciNet Google Scholar
J. Li, Z. Xiao, P. Li, and J. Cao, “Robust optimal tracking control for multiplayer systems by off-policy q-learning approach,” International Journal of Robust and Nonlinear Control, vol. 31, no. 1, pp. 87–106, 2021.
Article MathSciNet Google Scholar
Y. Lv, X. Ren, and J. Na, “Adaptive optimal tracking controls of unknown multi-input systems based on nonzero-sum game theory,” Journal of the Franklin Institute, vol. 356, no. 15, pp. 8255–8277, 2019.
Article MathSciNet Google Scholar
J. Zhao, “Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning,” Neurocomputing, vol. 412, pp. 167–176, 2020.
Article Google Scholar
H. Jiang, H. Zhang, X. Xie, and J. Han, “Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming,” Neurocomputing, vol. 344, pp. 13–19, 2019.
Article Google Scholar
J. Zhao, “Data-driven adaptive dynamic programming for optimal control of continuous-time multicontroller systems with unknown dynamics,” IEEE Access, vol. 10, pp. 41503–41511, 2022.
Article Google Scholar
Y. Yang, Y. Wan, J. Zhu, and F. L. Lewis, “H∞ tracking control for linear discrete-time systems: Model-free q-learning designs,” IEEE Control Systems Letters, vol. 5, no. 1, pp. 175–180, 2021.
Article MathSciNet Google Scholar
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree q-learning designs for linear discrete-time zero-sum games with application to H∞ control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007.
Article MathSciNet Google Scholar
H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, 2014.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Machinery and Automation, Institute of Intelligent Perception and Optimization Control of Complex Systems, Weifang University, Weifang, Shandong, 261061, China
Jin-Gang Zhao

Authors

Jin-Gang Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Gang Zhao.

Ethics declarations

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by the Shandong Provincial Natural Science Foundation under Grant ZR2022QF096, and in part by the Weifang University Doctoral Research Startup Fund under Grant 2021BS26. The author would like to thank to his alma mater, Beijing Institute of Technology.

Jin-Gang Zhao received his B.E. degree in automation from Qingdao University of Technology, Qingdao, China, in 2013, an M.Sc. degree in pattern recognition and intelligence system from Beijing Information Science and Technology University, Beijing, China, in 2016, and a Ph.D degree in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2020. From 2018 to 2019, he was a Visiting Scholar with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA. He is currently a lecturer with School of Machinery and Automation, WeiFang University. His research interests include optimal control, reinforcement learning, adaptive dynamic programming, and hybrid system.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, JG. Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory. Int. J. Control Autom. Syst. 22, 1751–1759 (2024). https://doi.org/10.1007/s12555-022-1133-1

Download citation

Received: 25 November 2022
Revised: 12 April 2023
Accepted: 27 July 2023
Published: 08 April 2024
Issue Date: May 2024
DOI: https://doi.org/10.1007/s12555-022-1133-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Abstract

Article PDF

Similar content being viewed by others

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Abstract

Article PDF

Similar content being viewed by others

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems

Robust control design for zero-sum differential games problem based on off-policy reinforcement learning technique

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation