Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN

Jäger, Jonas; Helfenstein, Felix; Scharf, Fabian

doi:10.1007/978-3-030-41188-6_12

Part of the book series: Studies in Computational Intelligence ((SCI,volume 883))

3600 Accesses
1 Citations

Abstract

After the Deep Q-Networks algorithm (DQN) claimed success in applying deep neural networks to reinforcement learning (RL) at first, several limitations had been revealed over the past years. In consequence, many powerful extensions of DQN have been developed making it capable of achieving state-of-the-art performance and, hence, making it coequal to other recent RL algorithms. In this paper, we give an overview about the limitations of DQN coupled with extensions so that issues in DQN settings and ways to overcome them can be effectively identified. Finally, the recent and outstandingly well-performing Rainbow agent, which is a combination of six DQN extensions, is presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Survey of Deep Q-Networks used for Reinforcement Learning: State of the Art

Deep Q-Networks

A Comparative Study of Model-Free Reinforcement Learning Approaches

References

Asis, K.D., Hernandez-Garcia, J.F., Holland, G.Z., Sutton, R.S.: Multi-step reinforcement learning: A unifying algorithm. CoRR abs/1703.01327 (2017)
Google Scholar
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. CoRR abs/1707.06887 (2017)
Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4148–4152. AAAI Press (2015)
Google Scholar
Bellman, R., Corporation, R., Collection, K.M.R.: Dynamic Programming. Rand Corporation research study, Princeton University Press (1957)
Google Scholar
Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006
Article MathSciNet MATH Google Scholar
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb), 281–305 (2012)
Google Scholar
Caflisch, R.E., Morokoff, W.J., Owen, A.B.: Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. University of California, Los Angeles, Department of Mathematics (1997)
Book Google Scholar
Daley, B., Amato, C.: Efficient eligibility traces for deep reinforcement learning. arXiv preprint arXiv:1810.09967 (2018)
Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: A survey and comparison. The Journal of Machine Learning Research 15(1), 809–883 (2014)
MathSciNet MATH Google Scholar
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: In AAAI/IAAI, pp. 761–768. AAAI Press (1998)
Google Scholar
Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Foundations and Trends® in Robotics 2(1–2), 1–142 (2013)
Google Scholar
Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., Legg, S.: Noisy networks for exploration. CoRR abs/1706.10295 (2017)
Google Scholar
French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3(4), 128–135 (1999)
Article Google Scholar
Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36(4), 193–202 (1980)
Article Google Scholar
Gu, S., Lillicrap, T.P., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. CoRR abs/1603.00748 (2016)
Google Scholar
Gulcehre, C., Moczulski, M., Denil, M., Bengio, Y.: Noisy activation functions. In: International conference on machine learning, pp. 3059–3068 (2016)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI (2016)
Google Scholar
Hasselt, H.V.: Double q-learning. In: J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Inc. (2010)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
Hernandez-Garcia, J.F., Sutton, R.S.: Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target. arXiv e-prints arXiv:1901.07510 (2019)
Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. CoRR abs/1710.02298 (2017)
Google Scholar
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Article MATH Google Scholar
Ji Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992). https://doi.org/10.1007/BF00992699
Article Google Scholar
Kearns, M.J., Singh, S.P.: Bias-variance error bounds for temporal difference updates. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT ’00, pp. 142–147. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Article Google Scholar
Mahmood, A.R., van Hasselt, H.P., Sutton, R.S.: Weighted importance sampling for off-policy learning with linear function approximation. In: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 27, pp. 3014–3022. Curran Associates, Inc. (2014)
Google Scholar
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: The sequential learning problem. In: Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier (1989)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Article Google Scholar
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.G.: Safe and efficient off-policy reinforcement learning. CoRR abs/1606.02647 (2016)
Google Scholar
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. CoRR abs/1609.03499 (2016)
Google Scholar
van den Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., van den Driessche, G., Lockhart, E., Cobo, L.C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., Walters, T., Belov, D., Hassabis, D.: Parallel wavenet: Fast high-fidelity speech synthesis. CoRR abs/1711.10433 (2017)
Google Scholar
Osband, I., Van Roy, B., Russo, D., Wen, Z.: Deep Exploration via Randomized Value Functions. arXiv e-prints arXiv:1703.07608 (2017)
Osband, I., Van Roy, B., Wen, Z.: Generalization and Exploration via Randomized Value Functions. arXiv e-prints arXiv:1402.0635 (2014)
Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)
Precup, D.: Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series p. 80 (2000)
Google Scholar
Rösler, U.: A fixed point theorem for distributions. Stochastic Processes and their Applications 42(2), 195–214 (1992)
Article MathSciNet Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive modeling 5(3), 1 (1988)
MATH Google Scholar
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. CoRR abs/1511.05952 (2015)
Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR abs/1502.05477 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014)
Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.: Reinforcement learning : an introduction (2018)
Google Scholar
Thrun, S., Schwartz, A.: Issues in using function approximation for reinforcement learning. In: M. Mozer, P. Smolensky, D. Touretzky, J. Elman, A. Weigend (eds.) Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum. (1993)
Google Scholar
Wang, Z., de Freitas, N., Lanctot, M.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Google Scholar
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Oxford (1989)
Google Scholar
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, 64289, Darmstadt, Germany
Jonas Jäger, Felix Helfenstein & Fabian Scharf

Authors

Jonas Jäger
View author publications
You can also search for this author in PubMed Google Scholar
Felix Helfenstein
View author publications
You can also search for this author in PubMed Google Scholar
Fabian Scharf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonas Jäger .

Editor information

Editors and Affiliations

Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Boris Belousov
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Hany Abdulsamad
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Pascal Klink
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Simone Parisi
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jäger, J., Helfenstein, F., Scharf, F. (2021). Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-41188-6_12
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41187-9
Online ISBN: 978-3-030-41188-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Deep Q-Networks used for Reinforcement Learning: State of the Art

Deep Q-Networks

A Comparative Study of Model-Free Reinforcement Learning Approaches

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Survey of Deep Q-Networks used for Reinforcement Learning: State of the Art

Deep Q-Networks

A Comparative Study of Model-Free Reinforcement Learning Approaches

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation