Skip to main content

A Comparative Study of Model-Free Reinforcement Learning Approaches

  • Conference paper
  • First Online:
Advanced Machine Learning Technologies and Applications (AMLTA 2020)

Abstract

This study explores and compares three model-free learning methods, namely, deep Q-networks (DQN), dueling deep Q-networks (DDQN) and state-action-reward-state-action (SARSA), while detailing the mathematical principles behind each method. These methods were chosen as to bring out the contrast between off-policy (DQN) and on-policy (SARSA) learners. The DDQN method was included as it is a modification of DQN. The results of these methods and their performance on the classic problem, CartPole were compared. Post-training, testing results for each of the models were as follows: DQN obtained an average per episode reward of 496.36; its variant and improvement, DDQN obtained a perfect score of 500 and SARSA obtained a score of 438.28. To conclude, the theoretical inferences were decisively reaffirmed with observations based on descriptive plots of training and testing results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press (2014, 2015)

    Google Scholar 

  2. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg1, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518 (7540), 529–533 (2015)

    Article  Google Scholar 

  3. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning (2013)

    Google Scholar 

  4. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning (2015)

    Google Scholar 

  5. Stadie, B.C., Levine, S., Abbeel, P.: Incentivizing exploration in reinforcement learning with deep predictive models (2015)

    Google Scholar 

  6. Guo, X., Singh, S., Lee, H., Lewis, R.L., Wang, X.: Deep learning for real-time Atari game play using offline monte-carlo tree search planning, pp. 3338–3346. NIPS (2014)

    Google Scholar 

  7. Bellemare, M.G., Ostrovski, G., Guez, A., Thomas, P.S., Munos, R.: Increasing the action gap: new operators for reinforcement learning. AAAI (2016)

    Google Scholar 

  8. Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A.D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K., Silver, D.: Massively parallel methods for deep reinforcement learning (2015)

    Google Scholar 

  9. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ICLR (2016)

    Google Scholar 

  10. Watkins, C.: Learning from delayed rewards (1989)

    Google Scholar 

  11. Melo, F.S.: Convergence of q-learning: a simple proof (2001)

    Google Scholar 

  12. Nair, A., McGrew, B., Andrychowicz, M., Zaremba, W., Abbeel, P.: Overcoming exploration in reinforcement learning with demonstrations. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6292–6299. IEEE (2018)

    Google Scholar 

  13. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement learning (2016)

    Google Scholar 

  14. Rummery, G., Niranjan, M.: Online q-learning using connectionist systems (1994)

    Google Scholar 

  15. Marco, W., Jurgen, S.: Fast online q(λ). Mach. Learn. 33(1), 105–115 (1998)

    Article  Google Scholar 

  16. Perez, J., Germain-Renaud, C., K´egl, B., Loomis, C.: Grid differentiated services: a reinforcement learning approach. In: 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pp. 287–294. IEEE, (2008)

    Google Scholar 

  17. Plappert, M.: keras-rl (2016). https://github.com/keras-rl/keras-rl

  18. Zhou, Z., Li, X., Zare, R.N.: Optimizing chemical reactions with deep reinforcement learning. ACS Central Sci 3(12), 1337–1344 (2017)

    Article  Google Scholar 

  19. PIVA, A.: Dynamic trading strategy for a risk-averse investor via the q-learning algorithm (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anant Moudgalya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Moudgalya, A., Shafi, A., Arun, B.A. (2021). A Comparative Study of Model-Free Reinforcement Learning Approaches. In: Hassanien, A., Bhatnagar, R., Darwish, A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore. https://doi.org/10.1007/978-981-15-3383-9_50

Download citation

Publish with us

Policies and ethics