Abstract
This study explores and compares three model-free learning methods, namely, deep Q-networks (DQN), dueling deep Q-networks (DDQN) and state-action-reward-state-action (SARSA), while detailing the mathematical principles behind each method. These methods were chosen as to bring out the contrast between off-policy (DQN) and on-policy (SARSA) learners. The DDQN method was included as it is a modification of DQN. The results of these methods and their performance on the classic problem, CartPole were compared. Post-training, testing results for each of the models were as follows: DQN obtained an average per episode reward of 496.36; its variant and improvement, DDQN obtained a perfect score of 500 and SARSA obtained a score of 438.28. To conclude, the theoretical inferences were decisively reaffirmed with observations based on descriptive plots of training and testing results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2014, 2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg1, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518 (7540), 529–533 (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning (2013)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning (2015)
Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models (2015)
Guo, X., Singh, S., Lee, H., Lewis, R.L., Wang, X.: Deep learning for real-time Atari game play using offline monte-carlo tree search planning, pp. 3338–3346. NIPS (2014)
Bellemare, M.G., Ostrovski, G., Guez, A., Thomas, P.S., Munos, R.: Increasing the action gap: new operators for reinforcement learning. AAAI (2016)
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A.D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K., Silver, D.: Massively parallel methods for deep reinforcement learning (2015)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ICLR (2016)
Watkins, C.: Learning from delayed rewards (1989)
Melo, F.S.: Convergence of q-learning: a simple proof (2001)
Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning (2016)
Rummery, G., Niranjan, M.: Online q-learning using connectionist systems (1994)
Marco, W., Jurgen, S.: Fast online q(λ). Mach. Learn. 33(1), 105–115 (1998)
Perez, J., Germain-Renaud, C., K´egl, B., Loomis, C.: Grid differentiated services: a reinforcement learning approach. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pp. 287–294. IEEE, (2008)
Plappert, M.: keras-rl (2016). https://github.com/keras-rl/keras-rl
Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS Central Sci 3(12), 1337–1344 (2017)
PIVA, A.: Dynamic trading strategy for a risk-averse investor via the q-learning algorithm (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Moudgalya, A., Shafi, A., Arun, B.A. (2021). A Comparative Study of Model-Free Reinforcement Learning Approaches. In: Hassanien, A., Bhatnagar, R., Darwish, A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore. https://doi.org/10.1007/978-981-15-3383-9_50
Download citation
DOI: https://doi.org/10.1007/978-981-15-3383-9_50
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3382-2
Online ISBN: 978-981-15-3383-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)