Boundary Crossing Probabilities for General Exponential Families

Maillard, O.-A.

doi:10.3103/S1066530718010015

Boundary Crossing Probabilities for General Exponential Families

Published: 11 May 2018

Volume 27, pages 1–31, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Statistics Aims and scope Submit manuscript

Boundary Crossing Probabilities for General Exponential Families

Download PDF

O.-A. Maillard¹

84 Accesses
2 Citations
Explore all metrics

Abstract

We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that B^ψ(θ̂_n, θ*) ≥ f(t/n)/n, where θ* is the parameter of an unknown target distribution, θ̂_n is the empirical parameter estimate built from n observations, ψ is the log-partition function of the exponential family and B^ψ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function f is logarithmic, as it is enables to analyze the regret of the state-of-the-art KL-ucb and KL-ucb+ strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when K = 1, while we provide results for arbitrary finite dimension K, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T. L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.

Avoid common mistakes on your manuscript.

References

Rajeev Agrawal, “Sample Mean Based Index Policies by o(log n) Regret for the Multi-Armed Bandit Problem”, Adv. in Appl. Probab. 27 (04), 1054–1078 (1995).
Article MathSciNet MATH Google Scholar
J.-Y. Audibert, R. Munos, and Cs. Szepesvári, “Exploration-Exploitation Trade-Off Using Variance Estimates in Multi-Armed Bandits”, Theoret. Comp. Sci. 410 (19), (2009).
Google Scholar
J.-Y. Audibert and S. Bubeck, “Regret Bounds andMinimax Policies under PartialMonitoring”, J. Machine Learning Res. 11, 2635–2686 (2010).
MathSciNet Google Scholar
P. Auer, N. Cesa-Bianchi, and P. Fischer, “Finite-Time Analysis of the Multiarmed Bandit Problem”, Machine Learning 47 (2), 235–256 (2002).
Article MATH Google Scholar
L. M. Bregman, “The Relaxation Method of Finding the Common Point of Convex Sets and Its Application to the Solution of Problems in Convex Programming”, USSR Comput. Math. and Math. Phys. (Elsevier) 7 (3), 200–217 (1967).
Article MathSciNet MATH Google Scholar
S. Bubeck, N. Cesa-Bianchi, et al., “Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems”, Foundations and Trends ® in Machine Learning 5 (1), 1–122 (2012).
Article MATH Google Scholar
A. N. Burnetas and M. N. Katehakis, “Optimal Adaptive Policies for Markov Decision Processes”, in Mathematics of Operations Research (1997), pp. 222–255.
Google Scholar
O. Cappé, A. Garivier, O.-A. Maillard, R. Munos, and G. Stoltz, “Kullback–Leibler Upper Confidence Bounds for Optimal Sequential Allocation”, Ann. Statist. 41 (3), 1516–1541 (2013).
Article MathSciNet MATH Google Scholar
Y. S. Chow and H. Teicher, Probability Theory, 2nd. ed. (Springer, 1988).
Book MATH Google Scholar
I. H. Dinwoodie, “Mesures dominantes et théoreme de sanov”, in Annales de l’IHP Probabilités et statistiques (1992), Vol. 28, pp. 365–373.
MathSciNet MATH Google Scholar
A. Garivier, P. Ménard, and G. Stoltz, Explore first, Exploit Next: The True Shape of Regret in Bandit Problems arXiv preprint arXiv:1602.07182 (2016).
Google Scholar
J. C. Gittins, “Bandit Processes and Dynamic Allocation Indices”, J. Roy. Statist. Soc., Ser. B 41 (2), 148–177 (1979).
MathSciNet MATH Google Scholar
J. Honda and A. Takemura, “An Asymptotically Optimal Bandit Algorithm for Bounded SupportModels”, in Conf. Comput. Learning Theory, Ed. by T. Kalai and M. Mohri (Haifa, Israel, 2010).
Google Scholar
T. L. Lai and H. Robbins, “Asymptotically Efficient Adaptive Allocation Rules”, Advances in Appl. Math. 6 (1), 4–22 (1985).
Article MathSciNet MATH Google Scholar
T. L. Lai, “Adaptive Treatment Allocation and the Multi-Armed Bandit Problem”, Ann. Statist, 1091–1114 (1987).
Google Scholar
T. L. Lai, “Boundary Crossing Problems for SampleMeans”, Ann. Probab., 375–396 (1988).
Google Scholar
O.-A. Maillard, R. Munos, and G. Stoltz, “A Finite-Time Analysis of Multi-Armed Bandits Problems with Kullback–LeiblerDivergences”, in Proc. 24th ConferenceOn Learning Theory (Budapest,Hungary), 497–514 (2011).
Google Scholar
H. Robbins, “Some Aspects of the Sequential Design of Experiments”, Bull.Amer. Math. Soc. 58 (5), 527–535 (1952).
Article MathSciNet MATH Google Scholar
H. Robbins, Herbert Robbins Selected Papers (Springer, 2012).
Google Scholar
W. R. Thompson, “On the Likelihood That One Unknown Probability Exceeds Another in View of the Evidence of Two Samples”, Biometrika 25 (3/4), 285–294 (1933).
Article MATH Google Scholar
W. R. Thompson, “On a Criterion for the Rejection of Observations and the Distribution of the Ratio of Deviation to Sample Standard Deviation”, Ann.Math. Statist. 6 (4), 214–219 (1935).
Article MATH Google Scholar
A. Wald, “Sequential Tests of Statistical Hypotheses”, Ann.Math. Statist. 16 (2), 117–186 (1945).
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Inria Lille–Nord Europe, Villeneuve d’Ascq, France
O.-A. Maillard

Authors

O.-A. Maillard
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O.-A. Maillard.

About this article

Cite this article

Maillard, OA. Boundary Crossing Probabilities for General Exponential Families. Math. Meth. Stat. 27, 1–31 (2018). https://doi.org/10.3103/S1066530718010015

Download citation

Received: 12 June 2017
Accepted: 22 December 2017
Published: 11 May 2018
Issue Date: January 2018
DOI: https://doi.org/10.3103/S1066530718010015

Keywords

2000 Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Boundary Crossing Probabilities for General Exponential Families

Abstract

Article PDF

Similar content being viewed by others

An Engineered Empirical Bernstein Bound

Continuum Armed Bandit Problem of Few Variables in High Dimensions

Prophet secretary through blind strategies

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

2000 Mathematics Subject Classification

Navigation

Boundary Crossing Probabilities for General Exponential Families

Abstract

Article PDF

Similar content being viewed by others

An Engineered Empirical Bernstein Bound

Continuum Armed Bandit Problem of Few Variables in High Dimensions

Prophet secretary through blind strategies

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

2000 Mathematics Subject Classification

Search

Navigation