Unsupervised Real-Time Control Through Variational Empowerment

Karl, Maximilian; Becker-Ehmck, Philip; Soelch, Maximilian; Benbouzid, Djalel; van der Smagt, Patrick; Bayer, Justin

doi:10.1007/978-3-030-95459-8_10

Maximilian Karl¹⁵,
Philip Becker-Ehmck^15,16,
Maximilian Soelch¹⁵,
Djalel Benbouzid¹⁵,
Patrick van der Smagt¹⁵ &
…
Justin Bayer¹⁵

Part of the book series: Springer Proceedings in Advanced Robotics ((SPAR,volume 20))

Included in the following conference series:

The International Symposium of Robotics Research

1861 Accesses

Abstract

Intrinsic motivation is vital for living beings. It enables skill acquisitions, triggers explorative behaviour, and hence enhances cognitive capabilities. One way of formalising the variety of behaviours induced by intrinsic motivation is empowerment, an information-theoretic measure that encodes the influence an agent exerts on its environment. Formally, empowerment is the maximum mutual information between actions and the resulting states which is prohibitively hard to compute, especially in nonlinear continuous spaces. In this work, we introduce a method for efficiently computing a lower bound on empowerment, enabling its use as an unsupervised cost function for real-time control. We demonstrate that our algorithm can reliably handle continuous dynamical systems even when system dynamics are learnt from raw data. The resulting empowerment-maximizing policies consistently drive the agents into states with high potential impact.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synergizing habits and goals with variational Bayes

Article Open access 25 May 2024

Complex behavior from intrinsic motivation to occupy future action-state path space

Article Open access 29 July 2024

Maximum diffusion reinforcement learning

Article 02 May 2024

Notes

1.
We adopted the term source from the channel capacity literature.
2.
https://developer.nvidia.com/physx-sdk.

References

Polani, D.: Information: currency of life? HFSP J. 3(5), 307–316 (2009)
Article Google Scholar
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: The 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135. IEEE (2005a)
Google Scholar
Mohamed, S., Rezende, D.J.: Variational information maximisation for intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2125–2133 (2015)
Google Scholar
Karl, M., Bayer, J., van der Smagt, P.: Efficient empowerment. arXiv:1509.08455, September 2015
Karl, M., Soelch, M., Bayer, J., van der Smagt, P.: Deep variational Bayes filters: unsupervised learning of state space models from raw data. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)
Google Scholar
Stengel, R.F.: Optimal Control and Estimation. Courier Corporation (2012)
Google Scholar
Barber, D., Agakov, F.V.: The IM algorithm: a variational approach to information maximization. In: Advances in Neural Information Processing Systems, vol. 16, pp. 201–208 (2003)
Google Scholar
Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 1530–1538 (2015)
Google Scholar
Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 (2015)
Kingma, D.P., Salimans, T., Józefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational autoencoders with inverse autoregressive flow. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4736–4744 (2016)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp. 1278–1286 (2014)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1, pp. 318–362. MIT Press (1986)
Google Scholar
Rawlik, K., Toussaint, M., Vijayakumar, S.: Approximate inference and stochastic optimal control. arXiv preprint arXiv:1009.3958 (2010)
Salge, C., Glackin, C., Polani, D.: Empowerment–an introduction. In: Prokopenko, M. (ed.) Guided Self-Organization: Inception. ECC, vol. 9, pp. 67–114. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53734-9_4
Chapter Google Scholar
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Keep your options open: an information-based driving principle for sensorimotor systems. PLoS ONE 3(12), 1–14 (2008). https://doi.org/10.1371/journal.pone.0004018
Article Google Scholar
Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 18(4), 460–473 (1972)
Article MathSciNet Google Scholar
Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. Anim. Animats Softw. Agents Robots Adapt. Syst. 19(1), 16–39 (2011). https://doi.org/10.1177/1059712310392389. ISSN: 1059-7123
Article Google Scholar
Salge, C., Glackin, C., Polani, D.: Approximation of empowerment in the continuous domain. Adv. Complex Syst. 16(02n03), 1250079 (2013). https://doi.org/10.1142/S0219525912500798. ISSN: 0219-5259, 1793-6802
Article MathSciNet Google Scholar
Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv preprint arXiv:1611.07507 (2016)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Article MathSciNet Google Scholar
Singh, S.P., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288 (2004)
Google Scholar
Oudeyer, P.-Y., Kaplan, F.: How can we define intrinsic motivation? In: Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, Lund: LUCS, Brighton (2008)
Google Scholar
Schmidhuber, J.: Curious model-building control systems. In: IEEE International Joint Conference on Neural Networks, pp. 1458–1463. IEEE (1991)
Google Scholar
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2(3), 230–247 (2010)
Article Google Scholar
Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479 (2016)
Google Scholar
Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Advances in Neural Information Processing Systems, pp. 547–554 (2006)
Google Scholar
Achiam, J., Sastry, S.: Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732 (2017)
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)
Google Scholar
Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)
Article Google Scholar
Sukhbaatar, S., Kostrikov, I., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)
Wissner-Gross, A.D., Freer, C.E.: Causal entropic forces. Phys. Rev. Lett. 110(16), 168702 (2013)
Article Google Scholar
Coumans, E., Bai, Y.: PyBullet, a Python module for physics simulation for games, robotics and machine learning. GitHub repository (2016)
Google Scholar
Klyubin, A.S., Polani, D., Nehaniv, C.L.: All else being equal be empowered. In: Capcarrère, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) ECAL 2005. LNCS (LNAI), vol. 3630, pp. 744–753. Springer, Heidelberg (2005b). https://doi.org/10.1007/11553090_75
Brockman, G., et al.: OpenAI gym (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

Machine Learning Research Lab, Volkswagen Group, Munich, Germany
Maximilian Karl, Philip Becker-Ehmck, Maximilian Soelch, Djalel Benbouzid, Patrick van der Smagt & Justin Bayer
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Philip Becker-Ehmck

Authors

Maximilian Karl
View author publications
You can also search for this author in PubMed Google Scholar
Philip Becker-Ehmck
View author publications
You can also search for this author in PubMed Google Scholar
Maximilian Soelch
View author publications
You can also search for this author in PubMed Google Scholar
Djalel Benbouzid
View author publications
You can also search for this author in PubMed Google Scholar
Patrick van der Smagt
View author publications
You can also search for this author in PubMed Google Scholar
Justin Bayer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maximilian Karl .

Editor information

Editors and Affiliations

Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, Baden-Württemberg, Germany
Tamim Asfour
Department of Information Technology and Human Factors, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan
Eiichi Yoshida
Seoul National University, Seoul, Korea (Republic of)
Jaeheung Park
Jacobs School of Engineering, Institute for Contextual Robotics, San Diego, CA, USA
Henrik Christensen
Department of Computer Science, Stanford University, Stanford, CA, USA
Oussama Khatib

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (ppt 355 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Karl, M., Becker-Ehmck, P., Soelch, M., Benbouzid, D., van der Smagt, P., Bayer, J. (2022). Unsupervised Real-Time Control Through Variational Empowerment. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_10

Download citation

DOI: https://doi.org/10.1007/978-3-030-95459-8_10
Published: 17 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95458-1
Online ISBN: 978-3-030-95459-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Unsupervised Real-Time Control Through Variational Empowerment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synergizing habits and goals with variational Bayes

Complex behavior from intrinsic motivation to occupy future action-state path space

Maximum diffusion reinforcement learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (ppt 355 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Unsupervised Real-Time Control Through Variational Empowerment

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synergizing habits and goals with variational Bayes

Complex behavior from intrinsic motivation to occupy future action-state path space

Maximum diffusion reinforcement learning

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (ppt 355 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation