Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Li, Xinxing; Peng, Zhihong; Liang, Li; Zha, Wenzhong

doi:10.1007/s11432-018-9602-1

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Research Paper
Published: 02 April 2019

Volume 62, article number 52204, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Science China Information Sciences Aims and scope Submit manuscript

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Download PDF

Xinxing Li^1,2,
Zhihong Peng^1,2,
Li Liang^1,2 &
…
Wenzhong Zha³

412 Accesses
17 Citations
Explore all metrics

Abstract

In this paper, a policy iteration-based Q-learning algorithm is proposed to solve infinite horizon linear nonzero-sum quadratic differential games with completely unknown dynamics. The Q-learning algorithm, which employs off-policy reinforcement learning (RL), can learn the Nash equilibrium and the corresponding value functions online, using the data sets generated by behavior policies. First, we prove equivalence between the proposed off-policy Q-learning algorithm and an offline PI algorithm by selecting specific initially admissible polices that can be learned online. Then, the convergence of the off-policy Q-learning algorithm is proved under a mild rank condition that can be easily met by injecting appropriate probing noises into behavior policies. The generated data sets can be repeatedly used during the learning process, which is computationally effective. The simulation results demonstrate the effectiveness of the proposed Q-learning algorithm.

Article PDF

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

Article 12 November 2019

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Article 08 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Basar T, Olsder G J. Dynamic Noncooperative Game Theory (Classics in Applied Mathematics). 2nd ed. Philadelphia: SIAM, 1999
MATH Google Scholar
Falugi P, Kountouriotis P A, Vinter R B. Differential games controllers that confine a system to a safe region in the state space, with applications to surge tank control. IEEE Trans Automat Contr, 2012, 57: 2778–2788
Article MathSciNet MATH Google Scholar
Zha W Z, Chen J, Peng Z H, et al. Construction of barrier in a fishing game with point capture. IEEE Trans Cybern, 2017, 47: 1409–1422
Article Google Scholar
Lin F H, Liu Q, Zhou X W, et al. Towards green for relay in InterPlaNetary Internet based on differential game model. Sci China Inf Sci, 2014, 57: 042306
Google Scholar
Luo B, Wu H N, Huang T. Off-policy reinforcement learning for H _∞ control design. IEEE Trans Cybern, 2015, 45: 65–76
Article Google Scholar
Bea R W. Successive Galerkin approximation algorithms for nonlinear optimal and robust control. Int J Control, 1998, 71: 717–743
Article MathSciNet Google Scholar
Abu-Khalaf M, Lewis F L, Huang J. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw, 2008, 19: 1243–1252
Article Google Scholar
Freiling G, Jank G, Abou-Kandil H. On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Automat Contr, 1996, 41: 264–269
Article MathSciNet MATH Google Scholar
Li T Y, Gajic Z. Lyapunov iterations for solving coupled algebraic riccati equations of nash differential games and algebraic riccati equations of zero-sum game. In: New Trends in Dynamic Games and Applications. Boston: Birkhäuser, 1995. 333–3
Chapter Google Scholar
Possieri C, Sassano M. An algebraic geometry approach for the computation of all linear feedback Nash equilibria in LQ differential games. In: Proceedings of the 54th IEEE Conference on Decision and Control, Osaka, 2015. 5197–3
Engwerda J C. LQ Dynamic Optimization and Differential Games. New York: Wiley, 2005
Google Scholar
Mylvaganam T, Sassano M, Astolfi A. Constructive α-Nash equilibria for nonzero-sum differential games. IEEE Trans Automat Contr, 2015, 60: 950–965
Article MathSciNet MATH Google Scholar
Sutton R S, Barto A G. Reinforcement Learning: an Introduction. Cambridge: MIT Press, 1998
MATH Google Scholar
Werbos P J. Approximate dynamic programming for real-time control and neural modeling. In: Handbook of Intelligent Control. New York: Van Nostrand, 1992
Google Scholar
Bertsekas D P, Tsitsiklis J N. Neuro-Dynamic Programming. Belmont: Athena Scientific, 1996
MATH Google Scholar
Werbos P J. The elements of intelligence. Cybernetica, 1968, 11: 131
Google Scholar
Doya K. Reinforcement learning in continuous time and space. Neural Computation, 2000, 12: 219–245
Article Google Scholar
Wei Q L, Lewis F L, Sun Q Y, et al. Discrete-time deterministic Q-learning: a novel convergence analysis. IEEE Trans Cyber, 2016, 47: 1–14
Article Google Scholar
Wang D, Mu C X. Developing nonlinear adaptive optimal regulators through an improved neural learning mechanism. Sci China Inf Sci, 2017, 60: 058201
Article Google Scholar
Vrabie D, Pastravanu O, Abu-Khalaf M, et al. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica, 2009, 45: 477–484
Article MathSciNet MATH Google Scholar
Jiang Y, Jiang Z P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica, 2012, 48: 2699–2704
Article MathSciNet MATH Google Scholar
Luo B, Wu H N, Huang T W, et al. Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 2014, 50: 3281–3290
Article MathSciNet MATH Google Scholar
Zhang H G, Wei Q L, Liu D R. An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games. Automatica, 2011, 47: 207–214
Article MathSciNet MATH Google Scholar
Vrabie D, Lewis F L. Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theor Appl, 2011, 9: 353–360
Article MathSciNet MATH Google Scholar
Zhu Y H, Zhao D B, Li X G. Iterative adaptive dynamic programming for solving unknown nonlinear zero-sum game based on online data. IEEE Trans Neural Netw Learn Syst, 2017, 28: 714–725
Article MathSciNet Google Scholar
Modares H, Lewis F L, Jiang Z P. H _∞ tracking control of completely unknown continuous-time systems via off-policy reinforcement learning. IEEE Trans Neural Netw Learn Syst, 2015, 26: 2550–2562
Article MathSciNet Google Scholar
Kiumarsi B, Lewis F L, Jiang Z P. H _∞ control of linear discrete-time systems: off-policy reinforcement learning. Automatica, 2017, 78: 144–152
Article MathSciNet MATH Google Scholar
Vamvoudakis K G, Lewis F L, Hudas G R. Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica, 2012, 48: 1598–1611
Article MathSciNet MATH Google Scholar
Zhang H G, Cui L L, Luo Y H. Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern, 2013, 43: 206–216
Article Google Scholar
Zhang H G, Jiang H, Luo C M, et al. Discrete-time nonzero-sum games for multiplayer using policy-iteration-based adaptive dynamic programming algorithms. IEEE Trans Cybern, 2017, 47: 3331–3340
Article Google Scholar
Vamvoudakis K G. Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica, 2015, 61: 274–281
Article MathSciNet MATH Google Scholar
Zhao D B, Zhang Q C, Wang D, et al. Experience replay for optimal control of nonzero-sum game systems with unknown dynamics. IEEE Trans Cybern, 2016, 46: 854–865
Article Google Scholar
Johnson M, Kamalapurkar R, Bhasin S, et al. Approximate N-player nonzero-sum game solution for an uncertain continuous nonlinear system. IEEE Trans Neural Netw Learn Syst, 2015, 26: 1645–1658
Article MathSciNet Google Scholar
Liu D R, Li H L, Wang D. Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics. IEEE Trans Syst Man Cybern Syst, 2014, 44: 1015–1027
Article Google Scholar
Song R Z, Lewis F L, Wei Q L. Off-policy integral reinforcement learning method to solve nonlinear continuous-time multiplayer nonzero-sum games. IEEE Trans Neural Netw Learn Syst, 2017, 28: 704–713
Article MathSciNet Google Scholar
Vrabie D, Lewis F L. Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proceedings of the 49th IEEE Conference on Decision and Control, Atlanta, 2010: 3066–3071
Vamvoudakis K G, Modares H, Kiumarsi B, et al. Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst, 2017, 37: 33–52
MathSciNet Google Scholar
Leake R J, Liu R W. Construction of suboptimal control sequences. SIAM J Control, 1967, 5: 54–63
Article MathSciNet MATH Google Scholar
Vamvoudakis K G, Lewis F L. Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica, 2011, 47: 1556–1569
Article MathSciNet MATH Google Scholar
Watkins C, Dayan P. Q-Learning. Mach Learn, 1992, 8: 279–292
MATH Google Scholar
Bradtke S J, Ydstie B E, Barto A G. Adaptive linear quadratic control using policy iteration. In: Proceedings of American Control Conference, Baltimore, 1994. 3475–3
Chen C L, Dong D Y, Li H X, et al. Hybrid MDP based integrated hierarchical Q-learning. Sci China Inf Sci, 2011, 54: 2279–2294
Article MathSciNet MATH Google Scholar
Wei Q L, Liu D R. A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci, 2015, 58: 122203
Article Google Scholar
Palanisamy M, Modares H, Lewis F L, et al. Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems. IEEE Trans Cybern, 2015, 45: 165–176
Article Google Scholar
Yan P F, Wang D, Li H L, et al. Error bound analysis of Q-function for discounted optimal control problems with policy iteration. IEEE Trans Syst Man Cybern Syst, 2017, 47: 1207–1216
Article Google Scholar
Luo B, Liu D R, Wu H N, et al. Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans Cybern, 2017, 47: 3341–3354
Article Google Scholar
Vamvoudakis K G. Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst Control Lett, 2017, 100: 14–20
Article MathSciNet MATH Google Scholar
Vamvoudakis K G, Hespanha J P. Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems. IEEE Trans Automat Contr, 2018, 63: 1018–1031
Article MathSciNet MATH Google Scholar
Rizvi S A A, Lin Z L. Output feedback Q-learning for discrete-time linear zero-sum games with application to the H-infinity control. Automatica, 2018, 95: 213–221
Article MathSciNet MATH Google Scholar
Li J N, Chai T Y, Lewis F L, et al. Off-policy Q-learning: set-point design for optimizing dual-rate rougher flotation operational processes. IEEE Trans Ind Electron, 2018, 65: 4092–4102
Article Google Scholar
Kleinman D. On an iterative technique for Riccati equation computations. IEEE Trans Automat Contr, 1968, 13: 114–115
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61203078) and the Key Project of Shenzhen Robotics Research Center NSFC (Grant No. U1613225).

Author information

Authors and Affiliations

School of Automation, Beijing Institute of Technology, Beijing, 100081, China
Xinxing Li, Zhihong Peng & Li Liang
State Key Laboratory of Intelligent Control and Decision of Complex System, Beijing, 100081, China
Xinxing Li, Zhihong Peng & Li Liang
Information Science Academy, China Electronics Technology Group Corporation, Beijing, 100086, China
Wenzhong Zha

Authors

Xinxing Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhihong Peng
View author publications
You can also search for this author in PubMed Google Scholar
Li Liang
View author publications
You can also search for this author in PubMed Google Scholar
Wenzhong Zha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhihong Peng.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, X., Peng, Z., Liang, L. et al. Policy iteration based Q-learning for linear nonzero-sum quadratic differential games. Sci. China Inf. Sci. 62, 52204 (2019). https://doi.org/10.1007/s11432-018-9602-1

Download citation

Received: 04 July 2018
Accepted: 05 September 2018
Published: 02 April 2019
DOI: https://doi.org/10.1007/s11432-018-9602-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Abstract

Article PDF

Similar content being viewed by others

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

Abstract

Article PDF

Similar content being viewed by others

Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games

A New Discrete-Time Iterative Adaptive Dynamic Programming Algorithm Based on Q-Learning

Reinforcement Q-learning and Optimal Tracking Control of Unknown Discrete-time Multi-player Systems Based on Game Theory

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation