FPL Analysis for Adaptive Bandits

Poland, Jan

doi:10.1007/11571155_7

Jan Poland¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3777))

Included in the following conference series:

International Symposium on Stochastic Algorithms

764 Accesses
7 Citations

Abstract

A main problem of “Follow the Perturbed Leader” strategies for online decision problems is that regret bounds are typically proven against oblivious adversary. In partial observation cases, it was not clear how to obtain performance guarantees against adaptive adversary, without worsening the bounds. We propose a conceptually simple argument to resolve this problem. Using this, a regret bound of \(O(t^{\frac{2}{3}})\) for FPL in the adversarial multi-armed bandit problem is shown. This bound holds for the common FPL variant using only the observations from designated exploration rounds. Using all observations allows for the stronger bound of \(O(\sqrt{t})\), matching the best bound known so far (and essentially the known lower bound) for adversarial bandits. Surprisingly, this variant does not even need explicit exploration, it is self-stabilizing. However the sampling probabilities have to be either externally provided or approximated to sufficient accuracy, using O(t ²log t) samples in each step.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

An Efficient Algorithm for Learning with Semi-bandit Feedback

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Article 16 July 2021

Linear Bandits in Unknown Environments

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences 55, 119–139 (1997)
Article MATH MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: Gambling in a rigged casino: The adversarial multi-armed bandit problem. In: Proc. 36th Annual Symposium on Foundations of Computer Science (FOCS), pp. 322–331. IEEE, Los Alamitos (1995)
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32, 48–77 (2003)
Article MathSciNet Google Scholar
Hannan, J.: Approximation to Bayes risk in repeated plays. In: Dresher, M., Tucker, A.W., Wolfe, P. (eds.) Contributions to the Theory of Games 3, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Kalai, A., Vempala, S.: Efficient algorithms for online decision. In: Proc. 16th Annual Conference on Learning Theory (COLT), pp. 506–521. Springer, Heidelberg (2003)
Google Scholar
Hutter, M., Poland, J.: Adaptive online prediction by following the perturbed leader. Journal of Machine Learning Research 6, 639–660 (2005)
MathSciNet Google Scholar
McMahan, H.B., Blum, A.: Online geometric optimization in the bandit setting against an adaptive adversary. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 109–123. Springer, Heidelberg (2004)
Chapter Google Scholar
Awerbuch, B., Kleinberg, R.D.: Adaptive routing with end-to-end feedback: distributed learning and geometric approaches. In: STOC 2004: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 45–53 (2004)
Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Minimizing regret with label efficient prediction. In: Shawe-Taylor, J., Singer, Y. (eds.) COLT 2004. LNCS (LNAI), vol. 3120, pp. 77–92. Springer, Heidelberg (2004)
Chapter Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. Technical report (2004)
Google Scholar
Poland, J., Hutter, M.: Defensive universal learning with experts. In: Jain, S., Simon, H.U., Tomita, E. (eds.) ALT 2005. LNCS (LNAI), vol. 3734, pp. 356–370. Springer, Heidelberg (2005) (to appear)
Google Scholar

Download references

Author information

Authors and Affiliations

Grad. School of Inf. Sci. and Tech., Hokkaido University, Japan
Jan Poland

Authors

Jan Poland
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Discrete Mathematics, Leninskie Gory, Moscow State University, 119992, Moscow, Russia
Oleg B. Lupanov & Oktay M. Kasim-Zade &
Faculty of Mechanics and Mathematics, Department of Discrete Mathematics, Leninskie Gory, Moscow State University, 119992, Moscow, Russia
Alexander V. Chaskin
Department of Computer Science, King’s College London, WC2R 2LS, London, UK
Kathleen Steinhöfel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Poland, J. (2005). FPL Analysis for Adaptive Bandits. In: Lupanov, O.B., Kasim-Zade, O.M., Chaskin, A.V., Steinhöfel, K. (eds) Stochastic Algorithms: Foundations and Applications. SAGA 2005. Lecture Notes in Computer Science, vol 3777. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11571155_7

Download citation

DOI: https://doi.org/10.1007/11571155_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29498-6
Online ISBN: 978-3-540-32245-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FPL Analysis for Adaptive Bandits

Abstract

Chapter PDF

Similar content being viewed by others

An Efficient Algorithm for Learning with Semi-bandit Feedback

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Linear Bandits in Unknown Environments

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

FPL Analysis for Adaptive Bandits

Abstract

Chapter PDF

Similar content being viewed by others

An Efficient Algorithm for Learning with Semi-bandit Feedback

Follow the perturbed approximate leader for solving semi-bandit combinatorial optimization

Linear Bandits in Unknown Environments

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation