Skip to main content

Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN

  • Chapter
  • First Online:
Reinforcement Learning Algorithms: Analysis and Applications

Abstract

After the Deep Q-Networks algorithm (DQN) claimed success in applying deep neural networks to reinforcement learning (RL) at first, several limitations had been revealed over the past years. In consequence, many powerful extensions of DQN have been developed making it capable of achieving state-of-the-art performance and, hence, making it coequal to other recent RL algorithms. In this paper, we give an overview about the limitations of DQN coupled with extensions so that issues in DQN settings and ways to overcome them can be effectively identified. Finally, the recent and outstandingly well-performing Rainbow agent, which is a combination of six DQN extensions, is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Asis, K.D., Hernandez-Garcia, J.F., Holland, G.Z., Sutton, R.S.: Multi-step reinforcement learning: A unifying algorithm. CoRR abs/1703.01327 (2017)

    Google Scholar 

  2. Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. CoRR abs/1707.06887 (2017)

    Google Scholar 

  3. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4148–4152. AAAI Press (2015)

    Google Scholar 

  4. Bellman, R., Corporation, R., Collection, K.M.R.: Dynamic Programming. Rand Corporation research study, Princeton University Press (1957)

    Google Scholar 

  5. Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006

    Article  MathSciNet  MATH  Google Scholar 

  6. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb), 281–305 (2012)

    Google Scholar 

  7. Caflisch, R.E., Morokoff, W.J., Owen, A.B.: Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. University of California, Los Angeles, Department of Mathematics (1997)

    Book  Google Scholar 

  8. Daley, B., Amato, C.: Efficient eligibility traces for deep reinforcement learning. arXiv preprint arXiv:1810.09967 (2018)

  9. Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: A survey and comparison. The Journal of Machine Learning Research 15(1), 809–883 (2014)

    MathSciNet  MATH  Google Scholar 

  10. Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: In AAAI/IAAI, pp. 761–768. AAAI Press (1998)

    Google Scholar 

  11. Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Foundations and Trends® in Robotics 2(1–2), 1–142 (2013)

    Google Scholar 

  12. Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., Legg, S.: Noisy networks for exploration. CoRR abs/1706.10295 (2017)

    Google Scholar 

  13. French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3(4), 128–135 (1999)

    Article  Google Scholar 

  14. Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36(4), 193–202 (1980)

    Article  Google Scholar 

  15. Gu, S., Lillicrap, T.P., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. CoRR abs/1603.00748 (2016)

    Google Scholar 

  16. Gulcehre, C., Moczulski, M., Denil, M., Bengio, Y.: Noisy activation functions. In: International conference on machine learning, pp. 3059–3068 (2016)

    Google Scholar 

  17. van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI (2016)

    Google Scholar 

  18. Hasselt, H.V.: Double q-learning. In: J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Inc. (2010)

    Google Scholar 

  19. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)

    Google Scholar 

  20. Hernandez-Garcia, J.F., Sutton, R.S.: Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target. arXiv e-prints arXiv:1901.07510 (2019)

  21. Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. CoRR abs/1710.02298 (2017)

    Google Scholar 

  22. Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8

    Article  MATH  Google Scholar 

  23. Ji Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992). https://doi.org/10.1007/BF00992699

    Article  Google Scholar 

  24. Kearns, M.J., Singh, S.P.: Bias-variance error bounds for temporal difference updates. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT ’00, pp. 142–147. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  25. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  26. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539

    Article  Google Scholar 

  27. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  28. Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)

    Article  Google Scholar 

  29. Mahmood, A.R., van Hasselt, H.P., Sutton, R.S.: Weighted importance sampling for off-policy learning with linear function approximation. In: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 27, pp. 3014–3022. Curran Associates, Inc. (2014)

    Google Scholar 

  30. McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: The sequential learning problem. In: Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier (1989)

    Google Scholar 

  31. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)

    Google Scholar 

  32. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  33. Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.G.: Safe and efficient off-policy reinforcement learning. CoRR abs/1606.02647 (2016)

    Google Scholar 

  34. Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)

  35. van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. CoRR abs/1609.03499 (2016)

    Google Scholar 

  36. van den Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., van den Driessche, G., Lockhart, E., Cobo, L.C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., Walters, T., Belov, D., Hassabis, D.: Parallel wavenet: Fast high-fidelity speech synthesis. CoRR abs/1711.10433 (2017)

    Google Scholar 

  37. Osband, I., Van Roy, B., Russo, D., Wen, Z.: Deep Exploration via Randomized Value Functions. arXiv e-prints arXiv:1703.07608 (2017)

  38. Osband, I., Van Roy, B., Wen, Z.: Generalization and Exploration via Randomized Value Functions. arXiv e-prints arXiv:1402.0635 (2014)

  39. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)

  40. Precup, D.: Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series p. 80 (2000)

    Google Scholar 

  41. Rösler, U.: A fixed point theorem for distributions. Stochastic Processes and their Applications 42(2), 195–214 (1992)

    Article  MathSciNet  Google Scholar 

  42. Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive modeling 5(3), 1 (1988)

    MATH  Google Scholar 

  43. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. CoRR abs/1511.05952 (2015)

    Google Scholar 

  44. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR abs/1502.05477 (2015)

    Google Scholar 

  45. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)

    Google Scholar 

  46. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014)

    Google Scholar 

  47. Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)

    Google Scholar 

  48. Sutton, R.S., Barto, A.: Reinforcement learning : an introduction (2018)

    Google Scholar 

  49. Thrun, S., Schwartz, A.: Issues in using function approximation for reinforcement learning. In: M. Mozer, P. Smolensky, D. Touretzky, J. Elman, A. Weigend (eds.) Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum. (1993)

    Google Scholar 

  50. Wang, Z., de Freitas, N., Lanctot, M.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)

    Google Scholar 

  51. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Oxford (1989)

    Google Scholar 

  52. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonas Jäger .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Jäger, J., Helfenstein, F., Scharf, F. (2021). Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_12

Download citation

Publish with us

Policies and ethics