Abstract
This book entry discusses model-free reinforcement learning algorithms based on a continuous-time Q-learning framework. The presented schemes solve infinite-horizon optimization problems of linear time-invariant systems with completely unknown dynamics and single or multiple players/controllers. We first formulate the appropriate Q-functions (action-dependent value functions) and the tuning laws based on actor/critic structures for several cases including optimal regulation; Nash games, multi-agent systems, and cases with intermittent feedback.
Similar content being viewed by others
Bibliography
Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont
Cao X-R (2007) Stochastic learning and optimization. Springer US, New York
Heemels W, Johansson KH, Tabuada P (2012) An introduction to event-triggered and self-triggered control. In: 2012 IEEE 51st IEEE conference on decision and control (CDC), pp 3270–3285. IEEE
Hespanha JP, Naghshtabrizi P, Xu Y (2007) A survey of recent results in networked control systems. Proc IEEE 95(1):138–162
Hovakimyan N, Cao C (2010) L1 adaptive control theory: guaranteed robustness with fast adaptation. Advances in design and control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
Ioannou P, Fidan B (2006) Adaptive control tutorial. Advances in design and control. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
Kontoudis GP, Vamvoudakis KG (2019) Kinodynamic motion planning with continuous-time Q-learning: an online, model-free, and safe navigation framework. IEEE Trans Neural Netw Learn Syst 30(12): 3803–3817
Krstić M, Kanellakopoulos I (1995) Nonlinear and adaptive control design. Adaptive and learning systems for Signal processing, communication and control. Wiley, New York
Liu D, Wei Q, Wang D, Yang X, Li H (2017) Adaptive dynamic programming with application in optimal control. Springer-Verlag, London
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction vol 1, 2nd edn. MIT Press, Cambridge
Tao G (2003) Adaptive control design and analysis. Adaptive and cognitive dynamic systems: signal processing, learning, communication and control. Wiley, New York
Vamvoudakis KG (2014) Event-triggered optimal adaptive control algorithm for continuous-time nonlinear systems. IEEE/CAA J Automat Sin 1(3):282–293
Vamvoudakis KG (2015) Non-zero sum Nash Q-learning for unknown deterministic continuous-time linear systems. Automatica 61:274–281
Vamvoudakis KG (2016) Optimal trajectory output tracking control with a Q-learning algorithm. In: 2016 American control conference (ACC), pp 5752–5757
Vamvoudakis KG (2017a) Q-learning for continuous-time graphical games on large networks with completely unknown linear system dynamics. Int J Robust Nonlinear Control 27(16):2900–2920
Vamvoudakis KG (2017b) Q-learning for continuous-time linear systems: a model-free infinite horizon optimal control approach. Syst Control Lett 100:14–20
Vamvoudakis KG, Ferraz H (2016) Event-triggered h-infinity control for unknown continuous-time linear systems using Q-learning. In: 2016 IEEE 55th conference on decision and control (CDC), pp 1376–1381
Vamvoudakis KG, Ferraz H (2018) Model-free event-triggered control algorithm for continuous-time linear systems with optimal performance. Automatica 87:412–420
Vamvoudakis KG, Hespanha JP (2018) Cooperative Q-learning for rejection of persistent adversarial inputs in networked linear quadratic systems. IEEE Trans Autom Control 63:1018–1031
Vamvoudakis K, Antsaklis P, Dixon W, Hespanha J, Lewis F, Modares H, Kiumarsi B (2015) Autonomy and machine intelligence in complex systems: a tutorial. In: American control conference (ACC), pp 5062–5079
Vamvoudakis KG, Mojoodi A, Ferraz H (2017a) Event-triggered optimal tracking control of nonlinear systems. Int J Robust Nonlinear Control 27(4):598–619
Vamvoudakis KG, Modares H, Kiumarsi B, Lewis FL (2017b) Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst Mag 37:33–52
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, Cambridge University, Cambridge, England
Yang Y, Vamvoudakis KG, Modares H, Ding D, Yin Y, Wunsch DC (2018a) Dynamic intermittent suboptimal control: performance quantification and comparisons. In: 2018 37th Chinese control conference (CCC), pp 2017–2022
Yang Y, Modares H, Vamvoudakis KG, Yin Y, Wunsch DC (2018b) Model-free event-triggered containment control of multi-agent systems. In: 2018 annual American control conference (ACC), pp 877–884
Yang Y, Vamvoudakis KG, Ferraz H, Modares H (2019) Dynamic intermittent Q-learning-based model-free suboptimal co-design of -stabilization. Int J Robust Nonlinear Control 29(9):2673–2694
Acknowledgements
This material is based upon the work supported by NSF under Grant Numbers CPS-1851588, SATC-1801611, and S&AS-1849198, by ARO under Grant Number W911NF1910270, and by Minerva Research Initiative under Grant Number N00014-18-1-2160.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2020 Springer-Verlag London Ltd., part of Springer Nature
About this entry
Cite this entry
Vamvoudakis, K.G. (2020). Model-Free Reinforcement Learning-Based Control for Continuous-Time Systems. In: Baillieul, J., Samad, T. (eds) Encyclopedia of Systems and Control. Springer, London. https://doi.org/10.1007/978-1-4471-5102-9_100065-1
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5102-9_100065-1
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5102-9
Online ISBN: 978-1-4471-5102-9
eBook Packages: Springer Reference EngineeringReference Module Computer Science and Engineering