Regret bounds for sleeping experts and bandits

Kleinberg, Robert; Niculescu-Mizil, Alexandru; Sharma, Yogeshwer

doi:10.1007/s10994-010-5178-7

Regret bounds for sleeping experts and bandits

Published: 29 April 2010

Volume 80, pages 245–272, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Regret bounds for sleeping experts and bandits

Download PDF

Robert Kleinberg¹,
Alexandru Niculescu-Mizil^1,2 &
Yogeshwer Sharma¹

2061 Accesses
98 Citations
Explore all metrics

Abstract

We study on-line decision problems where the set of actions that are available to the decision algorithm varies over time. With a few notable exceptions, such problems remained largely unaddressed in the literature, despite their applicability to a large number of practical problems. Departing from previous work on this “Sleeping Experts” problem, we compare algorithms against the payoff obtained by the best ordering of the actions, which is a natural benchmark for this type of problem. We study both the full-information (best expert) and partial-information (multi-armed bandit) settings and consider both stochastic and adversarial rewards models. For all settings we give algorithms achieving (almost) information-theoretically optimal regret bounds (up to a constant or a sub-logarithmic factor) with respect to the best-ordering benchmark.

Article PDF

Linear Bandits in Unknown Environments

BelMan: An Information-Geometric Approach to Stochastic Bandits

Asymptotically optimal algorithms for budgeted multiple play bandits

Article 16 May 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Audibert, J.-Y., & Bubeck, S. (2009). Minimax policies for adversarial and stochastic bandits. In Proceedings of the 22nd conference on learning theory (COLT).
Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002a). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235–256.
Article MATH Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., & Schapire, R. E. (2002b). The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32, 48–77.
Article MATH MathSciNet Google Scholar
Azuma, K. (1967). Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, 19, 357–367.
Article MATH MathSciNet Google Scholar
Blum, A., & Mansour, Y. (2005). From external to internal regret. In Proceedings of the 18th conference on learning theory (COLT) (pp. 621–636).
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of ACM, 44, 427–485.
Article MATH MathSciNet Google Scholar
Cover, T. M., & Thomas, J. A. (1999). Elements of information theory. New York: Wiley.
Google Scholar
Freund, Y., & Schapire, R. E. (1999). Adaptive game playing using multiplicative weights. Games and Economic Behavior, 29, 79–103.
Article MATH MathSciNet Google Scholar
Freund, Y., Schapire, R. E., Singer, Y., & Warmuth, M. K. (1997). Using and combining predictors that specialize. In Proceedings of the 29th ACM symp. on theory of computing (STOC) (pp. 334–343).
Garey, M. R., & Johnson, D. S. (1979). Computers and intractability: a guide to the theory of NP-completeness. New York: Freeman.
MATH Google Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41, 148–177.
MATH MathSciNet Google Scholar
Gittins, J. C., & Jones, D. M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66, 561–565.
Article Google Scholar
Hannan, J. (1957). Approximation to Bayes risk in repeated plays. In M. Dresher, A. Tucker, & P. Wolfe (Eds.), Contributions to the theory of games (pp. 97–139). Princeton: Princeton University Press.
Google Scholar
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13–30.
Article MATH MathSciNet Google Scholar
Kalai, A. T., & Vempala, S. (2005). Efficient algorithms for on-line optimization. Journal of Computer and System Sciences, 71, 291–307.
Article MATH MathSciNet Google Scholar
Karp, R. M., & Kleinberg, R. (2007). Noisy binary search and its applications. In Proceedings of the 18th ACM-SIAM symp. discrete algorithms (SODA) (pp. 881–890).
Khintchine, A. (1923). Über dyadische Brüche. Mathematische Zeitschsift, 18, 109–116.
Article MathSciNet Google Scholar
Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocations rules. Advances in Applied Mathematics, 6, 4–22.
Article MATH MathSciNet Google Scholar
Langford, J., & Zhang, T. (2007). The Epoch-Greedy algorithm for multiarmed bandits with side information. In Proceedings of the 21st conference on neural information processing systems (NIPS).
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261. An extended abstract appeared in IEEE symposium on foundations of computer science, 1989 (pp. 256–261).
Article MATH MathSciNet Google Scholar
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58, 527–535.
Article MATH MathSciNet Google Scholar
Vovk, V. G. (1990). Aggregating strategies. In Proceedings of the 3rd conference on learning theory (COLT) (pp. 371–386).
Vovk, V. G. (1998). A game of prediction with expert advice. Journal of Computer and System Sciences, 56, 153–173. An extended abstract appeared in COLT, 1995 (pp. 51–60).
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Cornell University, Ithaca, NY, 14853, USA
Robert Kleinberg, Alexandru Niculescu-Mizil & Yogeshwer Sharma
Mathematical Sciences Department, IBM T.J. Watson Research Center, Yorktown Heights, NY, 10598, USA
Alexandru Niculescu-Mizil

Authors

Robert Kleinberg
View author publications
You can also search for this author in PubMed Google Scholar
Alexandru Niculescu-Mizil
View author publications
You can also search for this author in PubMed Google Scholar
Yogeshwer Sharma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robert Kleinberg.

Additional information

Editors: Sham Kakade and Ping Li.

Robert Kleinberg was supported by NSF awards CCF-0643934, IIS-0905467, and AF-0910940, a Microsoft Research New Faculty Fellowship, and an Alfred P. Sloan Foundation Fellowship.

Alexandru Niculescu-Mizil was supported by NSF awards 0347318, 0412930, 0427914, and 0612031.

Yogeshwer Sharma was supported by NSF grants CCF-0514628 and CCF-0830519.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kleinberg, R., Niculescu-Mizil, A. & Sharma, Y. Regret bounds for sleeping experts and bandits. Mach Learn 80, 245–272 (2010). https://doi.org/10.1007/s10994-010-5178-7

Download citation

Received: 15 March 2009
Accepted: 01 November 2009
Published: 29 April 2010
Issue Date: September 2010
DOI: https://doi.org/10.1007/s10994-010-5178-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Regret bounds for sleeping experts and bandits

Abstract

Article PDF

Similar content being viewed by others

Linear Bandits in Unknown Environments

BelMan: An Information-Geometric Approach to Stochastic Bandits

Asymptotically optimal algorithms for budgeted multiple play bandits

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regret bounds for sleeping experts and bandits

Abstract

Article PDF

Similar content being viewed by others

Linear Bandits in Unknown Environments

BelMan: An Information-Geometric Approach to Stochastic Bandits

Asymptotically optimal algorithms for budgeted multiple play bandits

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation