From External to Internal Regret

Blum, Avrim; Mansour, Yishay

doi:10.1007/11503415_42

Avrim Blum²⁰ &
Yishay Mansour²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3559))

Included in the following conference series:

International Conference on Computational Learning Theory

3659 Accesses
21 Citations

Abstract

External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another.

In this paper, we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium.

For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A General Internal Regret-Free Strategy

Article 14 February 2015

Minimizing Regret in Dynamic Decision Problems

Minimizing regret in dynamic decision problems

Article 26 November 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and self-confident on-line learning algorithms. JCSS 64(1), 48–75 (2002); A preliminary version has appeared. In: Proc. 13th Ann. Conf. Computational Learning Theory
Google Scholar
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Article MATH MathSciNet Google Scholar
Aumann, R.J.: Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 67–96 (1974)
Article MATH MathSciNet Google Scholar
Blackwell, D.: An analog ofthe mimimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)
MATH MathSciNet Google Scholar
Blum, A.: Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain. Machine Learning 26, 5–23 (1997)
Article Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., Warmuth, M.K.: How to use expert advice. STOC, 382–391 (1993); Also, Journal of the Association for Computing Machinery, 44(3), 427-485 (1997).
Google Scholar
Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Machine Learning 51(3), 239–261 (2003)
Article MATH Google Scholar
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. unpublished manuscript (2004)
Google Scholar
Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 217–232. Springer, Heidelberg (2005)
Chapter Google Scholar
Cohen, W., Singer, Y.: Learning to query the web. In: AAAI Workshop on Internet-Based Information Systems (1996)
Google Scholar
Cohen, W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems 17(2), 141–173 (1999)
Article Google Scholar
Foster, D., Vohra, R.: Calibrated learning and correlated equilibrium. Games and Economic Behavior 21, 40–55 (1997)
Article MATH MathSciNet Google Scholar
Foster, D., Vohra, R.: Asymptotic calibration. Biometrika 85, 379–390 (1998)
Article MATH MathSciNet Google Scholar
Foster, D., Vohra, R.: Regret in the on-line decision problem. Games and Economic Behavior 29, 7–36 (1999)
Article MATH MathSciNet Google Scholar
Foster, D.P., Vohra, R.V.: A randomization rule for selecting forecasts. Operations Research 41(4), 704–709 (1993)
Article MATH Google Scholar
Freund, Y., Schapire, R., Singer, Y., Warmuth, M.: Using and combining predictors that specialize. In: Proceedings of the 29th Annual Symposium on Theory of Computing, pp. 334–343 (1997)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Euro-COLT, pp. 23–37. Springer, Heidelberg (1995); Also, JCSS, 55(1), 119-139 (1997)
Google Scholar
Freund, Y., Schapire, R.E.: Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 79–103 (1999); A preliminary version appeared in the Proceedings of the Ninth Annual Conference on Computational Learning Theory. pp. 325–332 (1996)
Google Scholar
Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)
Google Scholar
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68, 1127–1150 (2000)
Article MATH MathSciNet Google Scholar
Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilibrium. In: Debreu, W.N.G., Trockel, W. (eds.) Economic Essays, pp. 181–200. Springer, Heidelberg (2001)
Google Scholar
Herbster, M., Warmuth, M.K.: Tracking the best expert. In: International Conference on Machine Learning, pp. 286–294 (1995)
Google Scholar
Lehrer, E.: A wide range no-regret theorem. Games and Economic Behavior 42, 101–115 (2003)
Article MATH MathSciNet Google Scholar
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
Article MATH MathSciNet Google Scholar
Stoltz, G.: Private communication
Google Scholar
Stoltz, G., Lugosi, G.: Internal regret in on-line portfolio selection. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 403–417. Springer, Heidelberg (2003)
Chapter Google Scholar
Stoltz, G., Lugosi, G.: Learning correlated equilibria in games with compact sets of strategies. submitted to Games and Economic Behavior (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 15213
Avrim Blum
School of Computer Science, Tel-Aviv University, Israel
Yishay Mansour

Authors

Avrim Blum
View author publications
You can also search for this author in PubMed Google Scholar
Yishay Mansour
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Leoben, A-8700, Leoben, Austria
Peter Auer
Department of Electrical Engineering, Technion, P.O. Box, 3200, Haifa, Israel
Ron Meir

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Blum, A., Mansour, Y. (2005). From External to Internal Regret. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_42

Download citation

DOI: https://doi.org/10.1007/11503415_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26556-6
Online ISBN: 978-3-540-31892-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From External to Internal Regret

Abstract

Chapter PDF

Similar content being viewed by others

A General Internal Regret-Free Strategy

Minimizing Regret in Dynamic Decision Problems

Minimizing regret in dynamic decision problems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

From External to Internal Regret

Abstract

Chapter PDF

Similar content being viewed by others

A General Internal Regret-Free Strategy

Minimizing Regret in Dynamic Decision Problems

Minimizing regret in dynamic decision problems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation