Abstract
External regret compares the performance of an online algorithm, selecting among N actions, to the performance of the best of those actions in hindsight. Internal regret compares the loss of an online algorithm to the loss of a modified online algorithm, which consistently replaces one action by another.
In this paper, we give a simple generic reduction that, given an algorithm for the external regret problem, converts it to an efficient online algorithm for the internal regret problem. We provide methods that work both in the full information model, in which the loss of every action is observed at each time step, and the partial information (bandit) model, where at each time step only the loss of the selected action is observed. The importance of internal regret in game theory is due to the fact that in a general game, if each player has sublinear internal regret, then the empirical frequencies converge to a correlated equilibrium.
For external regret we also derive a quantitative regret bound for a very general setting of regret, which includes an arbitrary set of modification rules (that possibly modify the online algorithm) and an arbitrary set of time selection functions (each giving different weight to each time step). The regret for a given time selection and modification rule is the difference between the cost of the online algorithm and the cost of the modified online algorithm, where the costs are weighted by the time selection function. This can be viewed as a generalization of the previously-studied sleeping experts setting.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Auer, P., Cesa-Bianchi, N., Gentile, C.: Adaptive and self-confident on-line learning algorithms. JCSS 64(1), 48–75 (2002); A preliminary version has appeared. In: Proc. 13th Ann. Conf. Computational Learning Theory
Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM Journal on Computing 32(1), 48–77 (2002)
Aumann, R.J.: Subjectivity and correlation in randomized strategies. Journal of Mathematical Economics 1, 67–96 (1974)
Blackwell, D.: An analog ofthe mimimax theorem for vector payoffs. Pacific Journal of Mathematics 6, 1–8 (1956)
Blum, A.: Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain. Machine Learning 26, 5–23 (1997)
Cesa-Bianchi, N., Freund, Y., Helmbold, D.P., Haussler, D., Schapire, R.E., Warmuth, M.K.: How to use expert advice. STOC, 382–391 (1993); Also, Journal of the Association for Computing Machinery, 44(3), 427-485 (1997).
Cesa-Bianchi, N., Lugosi, G.: Potential-based algorithms in on-line prediction and game theory. Machine Learning 51(3), 239–261 (2003)
Cesa-Bianchi, N., Lugosi, G., Stoltz, G.: Regret minimization under partial monitoring. unpublished manuscript (2004)
Cesa-Bianchi, N., Mansour, Y., Stoltz, G.: Improved second-order bounds for prediction with expert advice. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 217–232. Springer, Heidelberg (2005)
Cohen, W., Singer, Y.: Learning to query the web. In: AAAI Workshop on Internet-Based Information Systems (1996)
Cohen, W., Singer, Y.: Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems 17(2), 141–173 (1999)
Foster, D., Vohra, R.: Calibrated learning and correlated equilibrium. Games and Economic Behavior 21, 40–55 (1997)
Foster, D., Vohra, R.: Asymptotic calibration. Biometrika 85, 379–390 (1998)
Foster, D., Vohra, R.: Regret in the on-line decision problem. Games and Economic Behavior 29, 7–36 (1999)
Foster, D.P., Vohra, R.V.: A randomization rule for selecting forecasts. Operations Research 41(4), 704–709 (1993)
Freund, Y., Schapire, R., Singer, Y., Warmuth, M.: Using and combining predictors that specialize. In: Proceedings of the 29th Annual Symposium on Theory of Computing, pp. 334–343 (1997)
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. In: Euro-COLT, pp. 23–37. Springer, Heidelberg (1995); Also, JCSS, 55(1), 119-139 (1997)
Freund, Y., Schapire, R.E.: Adaptive game playing using multiplicative weights. Games and Economic Behavior 29, 79–103 (1999); A preliminary version appeared in the Proceedings of the Ninth Annual Conference on Computational Learning Theory. pp. 325–332 (1996)
Hannan, J.: Approximation to bayes risk in repeated plays. In: Dresher, M., Tucker, A., Wolfe, P. (eds.) Contributions to the Theory of Games, vol. 3, pp. 97–139. Princeton University Press, Princeton (1957)
Hart, S., Mas-Colell, A.: A simple adaptive procedure leading to correlated equilibrium. Econometrica 68, 1127–1150 (2000)
Hart, S., Mas-Colell, A.: A reinforcement procedure leading to correlated equilibrium. In: Debreu, W.N.G., Trockel, W. (eds.) Economic Essays, pp. 181–200. Springer, Heidelberg (2001)
Herbster, M., Warmuth, M.K.: Tracking the best expert. In: International Conference on Machine Learning, pp. 286–294 (1995)
Lehrer, E.: A wide range no-regret theorem. Games and Economic Behavior 42, 101–115 (2003)
Littlestone, N., Warmuth, M.K.: The weighted majority algorithm. Information and Computation 108, 212–261 (1994)
Stoltz, G.: Private communication
Stoltz, G., Lugosi, G.: Internal regret in on-line portfolio selection. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 403–417. Springer, Heidelberg (2003)
Stoltz, G., Lugosi, G.: Learning correlated equilibria in games with compact sets of strategies. submitted to Games and Economic Behavior (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blum, A., Mansour, Y. (2005). From External to Internal Regret. In: Auer, P., Meir, R. (eds) Learning Theory. COLT 2005. Lecture Notes in Computer Science(), vol 3559. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11503415_42
Download citation
DOI: https://doi.org/10.1007/11503415_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26556-6
Online ISBN: 978-3-540-31892-7
eBook Packages: Computer ScienceComputer Science (R0)