Abstract
After the Deep Q-Networks algorithm (DQN) claimed success in applying deep neural networks to reinforcement learning (RL) at first, several limitations had been revealed over the past years. In consequence, many powerful extensions of DQN have been developed making it capable of achieving state-of-the-art performance and, hence, making it coequal to other recent RL algorithms. In this paper, we give an overview about the limitations of DQN coupled with extensions so that issues in DQN settings and ways to overcome them can be effectively identified. Finally, the recent and outstandingly well-performing Rainbow agent, which is a combination of six DQN extensions, is presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Asis, K.D., Hernandez-Garcia, J.F., Holland, G.Z., Sutton, R.S.: Multi-step reinforcement learning: A unifying algorithm. CoRR abs/1703.01327 (2017)
Bellemare, M.G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. CoRR abs/1707.06887 (2017)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. In: Proceedings of the 24th International Conference on Artificial Intelligence, pp. 4148–4152. AAAI Press (2015)
Bellman, R., Corporation, R., Collection, K.M.R.: Dynamic Programming. Rand Corporation research study, Princeton University Press (1957)
Bengio, Y.: Learning deep architectures for ai. Found. Trends Mach. Learn. 2(1), 1–127 (2009). https://doi.org/10.1561/2200000006
Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. Journal of Machine Learning Research 13(Feb), 281–305 (2012)
Caflisch, R.E., Morokoff, W.J., Owen, A.B.: Valuation of mortgage backed securities using Brownian bridges to reduce effective dimension. University of California, Los Angeles, Department of Mathematics (1997)
Daley, B., Amato, C.: Efficient eligibility traces for deep reinforcement learning. arXiv preprint arXiv:1810.09967 (2018)
Dann, C., Neumann, G., Peters, J.: Policy evaluation with temporal differences: A survey and comparison. The Journal of Machine Learning Research 15(1), 809–883 (2014)
Dearden, R., Friedman, N., Russell, S.: Bayesian q-learning. In: In AAAI/IAAI, pp. 761–768. AAAI Press (1998)
Deisenroth, M.P., Neumann, G., Peters, J., et al.: A survey on policy search for robotics. Foundations and Trends® in Robotics 2(1–2), 1–142 (2013)
Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., Blundell, C., Legg, S.: Noisy networks for exploration. CoRR abs/1706.10295 (2017)
French, R.M.: Catastrophic forgetting in connectionist networks. Trends in cognitive sciences 3(4), 128–135 (1999)
Fukushima, K.: Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics 36(4), 193–202 (1980)
Gu, S., Lillicrap, T.P., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. CoRR abs/1603.00748 (2016)
Gulcehre, C., Moczulski, M., Denil, M., Bengio, Y.: Noisy activation functions. In: International conference on machine learning, pp. 3059–3068 (2016)
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI (2016)
Hasselt, H.V.: Double q-learning. In: J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (eds.) Advances in Neural Information Processing Systems 23, pp. 2613–2621. Curran Associates, Inc. (2010)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Hernandez-Garcia, J.F., Sutton, R.S.: Understanding Multi-Step Deep Reinforcement Learning: A Systematic Study of the DQN Target. arXiv e-prints arXiv:1901.07510 (2019)
Hessel, M., Modayil, J., van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M.G., Silver, D.: Rainbow: Combining improvements in deep reinforcement learning. CoRR abs/1710.02298 (2017)
Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Networks 2(5), 359–366 (1989). https://doi.org/10.1016/0893-6080(89)90020-8
Ji Lin, L.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8, 293–321 (1992). https://doi.org/10.1007/BF00992699
Kearns, M.J., Singh, S.P.: Bias-variance error bounds for temporal difference updates. In: Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, COLT ’00, pp. 142–147. Morgan Kaufmann Publishers Inc. (2000)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32(11), 1238–1274 (2013)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural network architectures and their applications. Neurocomputing 234, 11–26 (2017)
Mahmood, A.R., van Hasselt, H.P., Sutton, R.S.: Weighted importance sampling for off-policy learning with linear function approximation. In: Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, K.Q. Weinberger (eds.) Advances in Neural Information Processing Systems 27, pp. 3014–3022. Curran Associates, Inc. (2014)
McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: The sequential learning problem. In: Psychology of learning and motivation, vol. 24, pp. 109–165. Elsevier (1989)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.A.: Playing atari with deep reinforcement learning. CoRR abs/1312.5602 (2013)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Munos, R., Stepleton, T., Harutyunyan, A., Bellemare, M.G.: Safe and efficient off-policy reinforcement learning. CoRR abs/1606.02647 (2016)
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., De Maria, A., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., et al.: Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296 (2015)
van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., Kalchbrenner, N., Senior, A.W., Kavukcuoglu, K.: Wavenet: A generative model for raw audio. CoRR abs/1609.03499 (2016)
van den Oord, A., Li, Y., Babuschkin, I., Simonyan, K., Vinyals, O., Kavukcuoglu, K., van den Driessche, G., Lockhart, E., Cobo, L.C., Stimberg, F., Casagrande, N., Grewe, D., Noury, S., Dieleman, S., Elsen, E., Kalchbrenner, N., Zen, H., Graves, A., King, H., Walters, T., Belov, D., Hassabis, D.: Parallel wavenet: Fast high-fidelity speech synthesis. CoRR abs/1711.10433 (2017)
Osband, I., Van Roy, B., Russo, D., Wen, Z.: Deep Exploration via Randomized Value Functions. arXiv e-prints arXiv:1703.07608 (2017)
Osband, I., Van Roy, B., Wen, Z.: Generalization and Exploration via Randomized Value Functions. arXiv e-prints arXiv:1402.0635 (2014)
Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R.Y., Chen, X., Asfour, T., Abbeel, P., Andrychowicz, M.: Parameter space noise for exploration. arXiv preprint arXiv:1706.01905 (2017)
Precup, D.: Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series p. 80 (2000)
Rösler, U.: A fixed point theorem for distributions. Stochastic Processes and their Applications 42(2), 195–214 (1992)
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive modeling 5(3), 1 (1988)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. CoRR abs/1511.05952 (2015)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. CoRR abs/1502.05477 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014)
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Sutton, R.S., Barto, A.: Reinforcement learning : an introduction (2018)
Thrun, S., Schwartz, A.: Issues in using function approximation for reinforcement learning. In: M. Mozer, P. Smolensky, D. Touretzky, J. Elman, A. Weigend (eds.) Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum. (1993)
Wang, Z., de Freitas, N., Lanctot, M.: Dueling network architectures for deep reinforcement learning. In: ICML (2016)
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King’s College, Oxford (1989)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Jäger, J., Helfenstein, F., Scharf, F. (2021). Bring Color to Deep Q-Networks: Limitations and Improvements of DQN Leading to Rainbow DQN. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-41188-6_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41187-9
Online ISBN: 978-3-030-41188-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)