Abstract
This paper studies the fully cooperative game tracking control problem (FCGTCP) for a class of discrete-time multi-player linear systems with unknown dynamics. The reference trajectory is generated by a command generator system. An augmented multi-player systems composed of the origin multi-player systems and the command generator system is constructed, and an exponential discounted cost function is introduced to derive an augmented fully cooperative game tracking algebraic Riccati equation (FCGTARE). When the system dynamics are known, a model-based policy iteration (PI) algorithm is proposed to solve the augmented FCGTARE. Furthermore, to relax the system dynamics, an online reinforcement Q-learning algorithm is designed to obtain the solution to the augmented FCGTARE. The convergence of designed online reinforcement Q-learning algorithm is proved. Finally, two simulation examples are given to verify the validity of the model-based PI algorithm and online reinforcement Q-learning algorithm.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
F. L. Lewis, D. L. Vrabie, and V. L. Syrmos, Optimal Control, 3rd ed., John Wiley and Sons, 2015.
C. Deng, C. Wen, W. Wang, X. Li, and D. Yue, “Distributed adaptive tracking control for high-order nonlinear multi-agent systems over event-triggered communication,” IEEE Transactions on Automatic Control, vol. 68, no. 2, pp. 1176–1183, 2023.
R. Postoyan, N. Van de Wouw, D. Nesic, and W. P. M.H Heemels, “Tracking control for nonlinear networked control systems,” IEEE Transactions on Automatic Control, vol. 59, no. 6, pp. 1539–1554, 2014.
M. Chen, S. S. Ge, and B. Ren, “Adaptive tracking control of uncertain MIMO nonlinear systems with input constraints,” Automatica, vol. 47, no. 3, pp. 452–465, 2011.
H. Chen, Y. C. Fang, and N. Sun, “An adaptive tracking control method with swing suppression for 4-DOF tower crane systems,” Mechanical Systems and Signal Processing, vol. 123, pp. 426–442, 2019.
Z. P. Jiang and H. Nijmeijer, “Tracking control of mobile robots: A case study in backstepping,” Automatica, vol. 33, no. 7, pp. 1393–1399, 1997.
Q. Gao, X. T. Wei, D. H. Li, Y. H. Ji, and C. Jia, “Tracking control for a quadrotor via dynamic surface control and adaptive dynamic programming,” International Journal of Control, Automation, and Systems, vol. 20, pp. 349–363, 2022.
H. Chen, Y. C. Fang, and N. Sun, “Optimal trajectory planning and tracking control method for overhead cranes,” IET Control Theory & Applications, vol. 10, no. 6, pp. 692–699, 2016.
C. Deng, C. Wen, J. Huang, X. M. Zhang, and Y. Zou, “Distributed observer-based cooperative control approach for uncertain nonlinear MASs under event-triggered communication,” IEEE Transactions on Automatic Control, vol. 67, no. 5, pp. 2669–2676, 2022.
D. G. Xu, Q. L. Wang, and Y. Li, “Optimal guaranteed cost tracking of uncertain nonlinear systems using adaptive dynamic programming with concurrent learning,” International Journal of Control, Automation, and Systems, vol. 18, no. 5, pp. 1116–1127, 2020.
B. Zhao and Y. C. Li, “Model-free adaptive dynamic programming based near-optimal decentralized tracking control of reconfigurable manipulators,” International Journal of Control, Automation, and Systems, vol. 16, no. 2, pp. 478–490, 2018.
A. Mannava, S. N. Balakrishnan, L. Tang, and R. G. Landers, “Optimal tracking control of motion systems,” IEEE Transactions on Control Systems Technology, vol. 20, no. 6, pp. 1548–1558, 2012.
J. Zhao, “Neural network-based optimal tracking control of continuous-time uncertai nonlinear system via reinforcement learning,” Neural Processing Letters, vol. 51, no. 3, pp. 2513–2530, 2020.
Q. Wei and D. Liu, “Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification,” IEEE Transactions on Automation Science and Engineering, vol. 11, no. 4, pp. 1020–1036, 2014.
B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M. B. Naghibi-Sistani, “Reinforcement q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics,” Automatica, vol. 50, no. 4, pp. 1167–1175, 2014.
Y. Wen, H. Zhang, H. Su, and H. Ren, “Optimal tracking control for non-zero-sum games of linear discrete-time systems via off-policy reinforcement learning,” Optimal Control Applications and Methods, vol. 41, no. 4, pp. 1233–1250, 2020.
Q. Zhang, D. Zhao, and Y. Zhu, “Data-driven adaptive dynamic programming for continuous-time fully cooperative games with partially constrained inputs,” Neurocomputing, vol. 238, pp. 377–386, 2017.
K. Zhang, S. L. Ge, and Y. L. Ge, “Adaptive dynamic programming for minimal energy control with guaranteed convergence rate of linear systems,” International Journal of Control, Automation, and Systems, vol. 17, no. 2, pp. 3140–3148, 2019.
W. N. Gao, Y. Y. Liu, A. Odekunle, Y. J. Yu, and P. L. Lu, “Adaptive dynamic programming and cooperative output regulation of discrete-time multi-agent systems,” International Journal of Control, Automation, and Systems, vol. 16, no. 5, pp. 2273–2281, 2018.
L. An and G. Yang, “Optimal transmission power scheduling of networked control systems via fuzzy adaptive dynamic programming,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 6, pp. 1629–1639, 2021.
J. Zhao and P. Vishal, “Neural network-based optimal tracking control for partially unknown discrete-time nonlinear systems using reinforcement learning,” IET Control Theory and Applications, vol. 15, no. 2, pp. 260–271, 2021.
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6423–6434, 2021.
A. AI-tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007.
T. Y. Chun, J. B. Park, and Y. H. Choi, “Reinforcement Q-learning based on multirate generalized policy iteration and its application to a 2-DOF helicopter,” International Journal of Control, Automation, and Systems, vol. 16, pp. 377–386, 2018.
A. Odekunle, W. N. Gao, M. Davari, and Z. P. Jiang, “Reinforcement learning and non-zero-sum game output regulaton for multi-player linear uncertain systems,” Automatica, vol. 112, 108672, 2020.
L. An and G. Yang, “Opacity enforcement for confidential robust control in linear cyber-physical systems,” IEEE Transactions on Automatic Control, vol. 65, no. 3, pp. 1234–1241, 2020.
L. An and G. Yang, “Data-driven coordinated attack policy design based on adaptive L2-gain optimal theory,” IEEE Transactions on Automatic Control, vol. 63, no. 6, pp. 1850–1857, 2018.
B. Luo, D. Liu, T. Huang, and D. Wang, “Model-free optimal tracking control via critic-only q-learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 10, pp. 2134–2144, 2016.
M. Lin, B. Zhao, and D. Liu, “Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 6, pp. 3692–3703, 2022.
J. Lu, Q. Wei, and F. Y. Wang, “Parallel control for optimal tracking via adaptive dynamic programming,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 6, pp. 1662–1674, 2020.
C. Li, J. Ding, F. L. Lewis, and T. Chai, “A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems,” Automatica, vol. 129, 109687, 2021.
J. Li, Z. Xiao, P. Li, and J. Cao, “Robust optimal tracking control for multiplayer systems by off-policy q-learning approach,” International Journal of Robust and Nonlinear Control, vol. 31, no. 1, pp. 87–106, 2021.
Y. Lv, X. Ren, and J. Na, “Adaptive optimal tracking controls of unknown multi-input systems based on nonzero-sum game theory,” Journal of the Franklin Institute, vol. 356, no. 15, pp. 8255–8277, 2019.
J. Zhao, “Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning,” Neurocomputing, vol. 412, pp. 167–176, 2020.
H. Jiang, H. Zhang, X. Xie, and J. Han, “Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming,” Neurocomputing, vol. 344, pp. 13–19, 2019.
J. Zhao, “Data-driven adaptive dynamic programming for optimal control of continuous-time multicontroller systems with unknown dynamics,” IEEE Access, vol. 10, pp. 41503–41511, 2022.
Y. Yang, Y. Wan, J. Zhu, and F. L. Lewis, “H∞ tracking control for linear discrete-time systems: Model-free q-learning designs,” IEEE Control Systems Letters, vol. 5, no. 1, pp. 175–180, 2021.
A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, “Modelfree q-learning designs for linear discrete-time zero-sum games with application to H∞ control,” Automatica, vol. 43, no. 3, pp. 473–481, 2007.
H. Modares and F. L. Lewis, “Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning,” Automatica, vol. 50, no. 7, pp. 1780–1792, 2014.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This work was supported in part by the Shandong Provincial Natural Science Foundation under Grant ZR2022QF096, and in part by the Weifang University Doctoral Research Startup Fund under Grant 2021BS26. The author would like to thank to his alma mater, Beijing Institute of Technology.
Jin-Gang Zhao received his B.E. degree in automation from Qingdao University of Technology, Qingdao, China, in 2013, an M.Sc. degree in pattern recognition and intelligence system from Beijing Information Science and Technology University, Beijing, China, in 2016, and a Ph.D degree in control science and engineering from Beijing Institute of Technology, Beijing, China, in 2020. From 2018 to 2019, he was a Visiting Scholar with the Department of Electrical and Computer Engineering, The Ohio State University, Columbus, OH, USA. He is currently a lecturer with School of Machinery and Automation, WeiFang University. His research interests include optimal control, reinforcement learning, adaptive dynamic programming, and hybrid system.
Rights and permissions
About this article
Cite this article
Zhao, JG. Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory. Int. J. Control Autom. Syst. 22, 1751–1759 (2024). https://doi.org/10.1007/s12555-022-1133-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-022-1133-1