Abstract
World models facilitate sample-efficient reinforcement learning (RL) and, by design, can benefit from the multitask information. However, it is not used by typical model-based RL (MBRL) agents. We propose a data-centric approach to this problem. We build a controllable optimization process for MBRL agents that selectively prioritizes the data used by the model-based agent to improve its performance. We show how this can favor implicit task generalization in a custom environment based on MetaWorld with a parametric task variability. Furthermore, by bootstrapping the agent’s data, our method can boost the performance on unstable environments from DeepMind Control Suite. This is done without any additional data and architectural changes outperforming state-of-the-art visual model-based RL algorithms. Additionally, we frame the approach within the scope of methods that have unintentionally followed the controllable optimization process paradigm, filling the gap of the data-centric task-bootstrapping methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Mnih, V., et al.: Playing Atari with deep reinforcement learning (2013). CoRR, abs/1312.5602
Chaplot, D.S., Gandhi, D., Gupta, A., Salakhutdinov, R.: Object goal navigation using goal-oriented semantic exploration. Adv. Neural Inf. Syst. 33, 1–11 (2020)
Staroverov, A., Panov, A.: Hierarchical landmark policy optimization for visual indoor navigation. IEEE Access 10, 70447–70455 (2022)
Yu, L., Shao, X., Wei, Y., Zhou, K.: Intelligent land-vehicle model transfer trajectory planning method based on deep reinforcement learning. Sensors. 18, 2905 (2018)
Gorbov, G., Jamal, M., Panov, A.: Learning adaptive parking Manevours for self-driving cars. In: Proceedings of the Sixth International Scientific Conference "Intelligent Information Technologies for Industry" (IITI 2022). IITI 2022. Lecture Notes in Networks and Systems (2022)
Zhu, H., et al.: The ingredients of real-world robotic reinforcement learning. ICLR 2020, 1–20 (2020)
Andrychowicz, M., et al.: Hindsight experience replay (2017). CoRR, abs/1707.01495
Oh, J., Singh, S.P. Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning (2017). CoRR, abs/1706.05064
Kulkarni, T.D., Narasimhan, K.R., Saeedi, A., Tenenbaum, J.B.: Hierarchical deep reinforcement learning: integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Syst. 29, 1–9 (2016)
Li, A.C., Florensa, C., Clavera, I., Abbeel, P.: Sub-policy adaptation for hierarchical reinforcement learning. In: ICLR 2020, pp. 1–15 (2020)
Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks (2017)
Duan, Y., Schulman, J., Chen, X., Bartlett, P.L., Sutskever, I., Abbeel, P.: Rl\(^2\): fast reinforcement learning via slow reinforcement learning (2016). CoRR, abs/1611.02779
Skrynnik, A., et al.: Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations. Knowl. Based Syst. 218, 106844 (2021)
Skrynnik, A., Staroverov, A., Aitygulov, E., Aksenov, K., Davydov, V., Panov, A.I.: Hierarchical deep q-network from imperfect demonstrations in Minecraft. Cogn. Syst. Res. 65, 74–78 (2021)
Moerland, T.M., Broekens, J., Jonker, C.M.: Model-based reinforcement learning: a survey (2021)
Zholus, Artem, Panov, Aleksandr I..: Case-based task generalization in model-based reinforcement learning. In: Goertzel, Ben, Iklé, Matthew, Potapov, Alexey (eds.) AGI 2021. LNCS (LNAI), vol. 13154, pp. 344–354. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-93758-4_35
Panov, A.I.: Simultaneous learning and planning in a hierarchical control system for a cognitive agent. Autom Rem. Control. 83(6), 869–883 (2022). https://doi.org/10.1134/S0005117922060054
Kirk, R., Zhang, A., Grefenstette, E., Rocktäschel, T.: A survey of generalisation in deep reinforcement learning (2021). CoRR, abs/2111.09794
Sekar, R., Rybkin, O., Daniilidis, K., Abbeel, P., Hafner, D., Pathak, D.: Planning to explore via self-supervised world models. In: ICML (2020)
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay (2016)
Jiang, M., Grefenstette, E., Rocktäschel, T.: Prioritized level replay (2020). CoRR, abs/2010.03934
Yoon, J., Arik, S., Pfister, T.: Data valuation using reinforcement learning (2019). CoRR, abs/1909.11671
Hafner, D., et al.: Learning latent dynamics for planning from pixels (2018). CoRR, abs/1811.04551
Ha, D., Schmidhuber, J.: Recurrent world models facilitate policy evolution. In: Advances in Neural Information Processing Systems, vol. 31, pp. 2451–2463. Curran Associates Inc. (2018). https://worldmodels.github.io
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. ICLR 2016 (2015). arxiv:1511.05952Comment
Sutton, R.S.: Dyna, an integrated architecture for learning, planning, and reacting. SIGART Bull. 2(4), 160–163 (1991)
Hafner, D., Lillicrap, T., Ba, J., Norouzi, M.: Dream to control: Learning behaviors by latent imagination. In: International Conference on Learning Representations (2020)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. The MIT Press, Cambridge (2018)
Bengio, Y., Léonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation (2013). CoRR, abs/1308.3432
Pritzel, A., et al.: Neural episodic control, Demis Hassabis (2017)
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning (2019)
Tassa, Y., et al.: dm_control: software and tasks for continuous control (2020)
Tianhe, Y., et al.: and Sergey Levine. A benchmark and evaluation for multi-task and meta reinforcement learning, Meta-world (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zholus, A., Ivchenkov, Y., Panov, A.I. (2023). Addressing Task Prioritization in Model-based Reinforcement Learning. In: Kryzhanovsky, B., Dunin-Barkowski, W., Redko, V., Tiumentsev, Y. (eds) Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence, vol 1064. Springer, Cham. https://doi.org/10.1007/978-3-031-19032-2_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-19032-2_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-19031-5
Online ISBN: 978-3-031-19032-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)