Bayesian Reinforcement Learning with Exploration

Lattimore, Tor; Hutter, Marcus

doi:10.1007/978-3-319-11662-4_13

Tor Lattimore²³ &
Marcus Hutter²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8776))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

1399 Accesses
2 Citations

Abstract

We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Reinforcement Learning with Guarantees that Hold for Ever

Bayesian optimistic Kullback–Leibler exploration

Article 19 December 2018

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research 99, 1532–4435 (2010) ISSN 1532-4435
Google Scholar
Azar, M.G., Lazaric, A., Brunskill, E.: Regret bounds for reinforcement learning with policy advice. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013, Part I. LNCS, vol. 8188, pp. 97–112. Springer, Heidelberg (2013)
Chapter Google Scholar
Bubeck, S., Cesa-Bianchi, N.: Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning. Now Publishers Incorporated (2012) ISBN 9781601986269
Google Scholar
Chakraborty, D., Stone, P.: Structure learning in ergodic factored mdps without knowledge of the transition function’s in-degree. In: Proceedings of the Twenty Eighth International Conference on Machine Learning (2011)
Google Scholar
Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Danyluk, A.P., Bottou, L., Littman, M.L. (eds.) Proceedings of the 26th Annual International Conference on Machine Learning, pp. 249–256. ACM (2009)
Google Scholar
Dyagilev, K., Mannor, S., Shimkin, N.: Efficient reinforcement learning in parameterized Models: Discrete parameter case. In: Girgin, S., Loth, M., Munos, R., Preux, P., Ryabko, D. (eds.) EWRL 2008. LNCS (LNAI), vol. 5323, pp. 41–54. Springer, Heidelberg (2008)
Chapter Google Scholar
Even-Dar, E., Mannor, S., Mansour, Y.: PAC Bounds for Multi-armed Bandit and Markov Decision Processes. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 255–270. Springer, Heidelberg (2002)
Chapter Google Scholar
Even-Dar, E., Kakade, S., Mansour, Y.: Reinforcement learning in POMDPs without resets. In: International Joint Conference on Artificial Intelligence, pp. 690–695 (2005)
Google Scholar
Hutter, M.: Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In: Kivinen, J., Sloan, R.H. (eds.) COLT 2002. LNCS (LNAI), vol. 2375, pp. 364–379. Springer, Heidelberg (2002)
Chapter Google Scholar
Hutter, M.: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)
Google Scholar
Hutter, M., Muchnik, A.: On semimeasures predicting Martin-Löf random sequences. Theoretical Computer Science 382(3), 247–261 (2007)
Article MathSciNet MATH Google Scholar
Kearns, M., Singh, S.: Near-optimal reinforcement learning in polynomial time. Machine Learning 49(2-3), 209–232 (2002)
Article MATH Google Scholar
Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 320–334. Springer, Heidelberg (2012)
Chapter Google Scholar
Lattimore, T., Hutter, M.: Bayesian reinforcement learning with exploration. arxiv (2014)
Google Scholar
Lattimore, T., Hutter, M., Sunehag, P.: The sample-complexity of general reinforcement learning. In: Proceedings of the 30th International Conference on Machine Learning (2013a)
Google Scholar
Lattimore, T., Hutter, M., Sunehag, P.: Concentration and confidence for discrete bayesian sequence predictors. In: Jain, S., Munos, R., Stephan, F., Zeugmann, T. (eds.) ALT 2013. LNCS, vol. 8139, pp. 324–338. Springer, Heidelberg (2013)
Chapter Google Scholar
Mannor, S., Tsitsiklis, J.: The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research 5, 623–648 (2004)
MathSciNet MATH Google Scholar
Odalric-Ambrym, M., Nguyen, P., Ortner, R., Ryabko, D.: Optimal regret bounds for selecting the state representation in reinforcement learning. In: Proceedings of the Thirtieth International Conference on Machine Learning (2013)
Google Scholar
Orseau, L.: Optimality issues of universal greedy agents with static priors. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS (LNAI), vol. 6331, pp. 345–359. Springer, Heidelberg (2010)
Chapter Google Scholar
Osband, I., Russo, D., Van Roy, B.: (More) efficient reinforcement learning via posterior sampling. In: Advances in Neural Information Processing Systems, pp. 3003–3011 (2013)
Google Scholar
Sunehag, P., Hutter, M.: Optimistic agents are asymptotically optimal. In: Thielscher, M., Zhang, D. (eds.) AI 2012. LNCS, vol. 7691, pp. 15–26. Springer, Heidelberg (2012)
Google Scholar
Szita, I., Szepesvári, C.: Model-based reinforcement learning with nearly tight exploration complexity bounds. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1031–1038. ACM, New York (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Alberta, Canada
Tor Lattimore
Australian National University, Australia
Marcus Hutter

Authors

Tor Lattimore
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Hutter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Montanuniversitaet Leoben, 8700, Leoben, Austria
Peter Auer
Department of Philosophy, King’s College, WC2R 2LS, London, UK
Alexander Clark
Division of Computer Science, Hokkaido University, N-14, W-9, 060-0814, Sapporo, Japan
Thomas Zeugmann
Department of Computer Science, University of Regina, S4S 0A2, Regina, SK, Canada
Sandra Zilles

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lattimore, T., Hutter, M. (2014). Bayesian Reinforcement Learning with Exploration. In: Auer, P., Clark, A., Zeugmann, T., Zilles, S. (eds) Algorithmic Learning Theory. ALT 2014. Lecture Notes in Computer Science(), vol 8776. Springer, Cham. https://doi.org/10.1007/978-3-319-11662-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-11662-4_13
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11661-7
Online ISBN: 978-3-319-11662-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Bayesian Reinforcement Learning with Exploration

Abstract

Chapter PDF

Similar content being viewed by others

Reinforcement Learning with Guarantees that Hold for Ever

Bayesian optimistic Kullback–Leibler exploration

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Bayesian Reinforcement Learning with Exploration

Abstract

Chapter PDF

Similar content being viewed by others

Reinforcement Learning with Guarantees that Hold for Ever

Bayesian optimistic Kullback–Leibler exploration

Trading Utility and Uncertainty: Applying the Value of Information to Resolve the Exploration–Exploitation Dilemma in Reinforcement Learning

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation