Abstract
Intrinsic motivation is vital for living beings. It enables skill acquisitions, triggers explorative behaviour, and hence enhances cognitive capabilities. One way of formalising the variety of behaviours induced by intrinsic motivation is empowerment, an information-theoretic measure that encodes the influence an agent exerts on its environment. Formally, empowerment is the maximum mutual information between actions and the resulting states which is prohibitively hard to compute, especially in nonlinear continuous spaces. In this work, we introduce a method for efficiently computing a lower bound on empowerment, enabling its use as an unsupervised cost function for real-time control. We demonstrate that our algorithm can reliably handle continuous dynamical systems even when system dynamics are learnt from raw data. The resulting empowerment-maximizing policies consistently drive the agents into states with high potential impact.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We adopted the term source from the channel capacity literature.
- 2.
References
Polani, D.: Information: currency of life? HFSP J. 3(5), 307–316 (2009)
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Empowerment: a universal agent-centric measure of control. In: The 2005 IEEE Congress on Evolutionary Computation, vol. 1, pp. 128–135. IEEE (2005a)
Mohamed, S., Rezende, D.J.: Variational information maximisation for intrinsically motivated reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2125–2133 (2015)
Karl, M., Bayer, J., van der Smagt, P.: Efficient empowerment. arXiv:1509.08455, September 2015
Karl, M., Soelch, M., Bayer, J., van der Smagt, P.: Deep variational Bayes filters: unsupervised learning of state space models from raw data. In: Proceedings of the International Conference on Learning Representations (ICLR) (2017)
Stengel, R.F.: Optimal Control and Estimation. Courier Corporation (2012)
Barber, D., Agakov, F.V.: The IM algorithm: a variational approach to information maximization. In: Advances in Neural Information Processing Systems, vol. 16, pp. 201–208 (2003)
Rezende, D.J., Mohamed, S.: Variational inference with normalizing flows. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 1530–1538 (2015)
Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 (2015)
Kingma, D.P., Salimans, T., Józefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational autoencoders with inverse autoregressive flow. In: Advances in Neural Information Processing Systems, vol. 29, pp. 4736–4744 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (ICLR) (2014)
Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, pp. 1278–1286 (2014)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1, pp. 318–362. MIT Press (1986)
Rawlik, K., Toussaint, M., Vijayakumar, S.: Approximate inference and stochastic optimal control. arXiv preprint arXiv:1009.3958 (2010)
Salge, C., Glackin, C., Polani, D.: Empowerment–an introduction. In: Prokopenko, M. (ed.) Guided Self-Organization: Inception. ECC, vol. 9, pp. 67–114. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-53734-9_4
Klyubin, A.S., Polani, D., Nehaniv, C.L.: Keep your options open: an information-based driving principle for sensorimotor systems. PLoS ONE 3(12), 1–14 (2008). https://doi.org/10.1371/journal.pone.0004018
Blahut, R.: Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 18(4), 460–473 (1972)
Jung, T., Polani, D., Stone, P.: Empowerment for continuous agent-environment systems. Adapt. Behav. Anim. Animats Softw. Agents Robots Adapt. Syst. 19(1), 16–39 (2011). https://doi.org/10.1177/1059712310392389. ISSN: 1059-7123
Salge, C., Glackin, C., Polani, D.: Approximation of empowerment in the continuous domain. Adv. Complex Syst. 16(02n03), 1250079 (2013). https://doi.org/10.1142/S0219525912500798. ISSN: 0219-5259, 1793-6802
Gregor, K., Rezende, D.J., Wierstra, D.: Variational intrinsic control. arXiv preprint arXiv:1611.07507 (2016)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artif. Intell. 112(1–2), 181–211 (1999)
Singh, S.P., Barto, A.G., Chentanez, N.: Intrinsically motivated reinforcement learning. In: NIPS, pp. 1281–1288 (2004)
Oudeyer, P.-Y., Kaplan, F.: How can we define intrinsic motivation? In: Proceedings of the 8th International Conference on Epigenetic Robotics: Modeling Cognitive Development in Robotic Systems. Lund University Cognitive Studies, Lund: LUCS, Brighton (2008)
Schmidhuber, J.: Curious model-building control systems. In: IEEE International Joint Conference on Neural Networks, pp. 1458–1463. IEEE (1991)
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2(3), 230–247 (2010)
Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying count-based exploration and intrinsic motivation. In: Advances in Neural Information Processing Systems, vol. 29, pp. 1471–1479 (2016)
Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Advances in Neural Information Processing Systems, pp. 547–554 (2006)
Achiam, J., Sastry, S.: Surprise-based intrinsic motivation for deep reinforcement learning. arXiv preprint arXiv:1703.01732 (2017)
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., Abbeel, P.: VIME: variational information maximizing exploration. In: Advances in Neural Information Processing Systems, pp. 1109–1117 (2016)
Schmidhuber, J.: PowerPlay: training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Front. Psychol. 4, 313 (2013)
Sukhbaatar, S., Kostrikov, I., Szlam, A., Fergus, R.: Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407 (2017)
Wissner-Gross, A.D., Freer, C.E.: Causal entropic forces. Phys. Rev. Lett. 110(16), 168702 (2013)
Coumans, E., Bai, Y.: PyBullet, a Python module for physics simulation for games, robotics and machine learning. GitHub repository (2016)
Klyubin, A.S., Polani, D., Nehaniv, C.L.: All else being equal be empowered. In: Capcarrère, M.S., Freitas, A.A., Bentley, P.J., Johnson, C.G., Timmis, J. (eds.) ECAL 2005. LNCS (LNAI), vol. 3630, pp. 744–753. Springer, Heidelberg (2005b). https://doi.org/10.1007/11553090_75
Brockman, G., et al.: OpenAI gym (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Karl, M., Becker-Ehmck, P., Soelch, M., Benbouzid, D., van der Smagt, P., Bayer, J. (2022). Unsupervised Real-Time Control Through Variational Empowerment. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research. ISRR 2019. Springer Proceedings in Advanced Robotics, vol 20. Springer, Cham. https://doi.org/10.1007/978-3-030-95459-8_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-95459-8_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95458-1
Online ISBN: 978-3-030-95459-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)