Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Dalalyan, A.; Tsybakov, A. B.

doi:10.1007/s10994-008-5051-0

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Published: 09 April 2008

Volume 72, pages 39–61, (2008)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Download PDF

A. Dalalyan¹ &
A. B. Tsybakov^1,2

1285 Accesses
93 Citations
Explore all metrics

Abstract

We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities.

Article PDF

PAC-Bayesian Aggregation of Affine Estimators

Sharp oracle inequalities for low-complexity priors

Article 19 November 2018

An Upper Bound for Aggregating Algorithm for Regression with Changing Dependencies

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Audibert, J.-Y. (2004). Une approche PAC-bayésienne de la théorie statistique de l’apprentissage. PhD thesis. University of Paris 6.
Audibert, J.-Y. (2006). A randomized online learning algorithm for better variance control. In Lecture notes in artificial intelligence : Vol. 4005. Proceedings of the 19th annual conference on learning theory, COLT 2006 (pp. 392–407). Heidelberg: Springer.
Google Scholar
Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2007, submitted). Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics. http://arxiv:0801.1158.
Bunea, F., Nobel, A.B. (2005). Sequential procedures for aggregating arbitrary estimators of a conditional mean, Preprint Florida State University. ttp://www.stat.fsu.edu/~flori.
Bunea, F., Tsybakov, A. B., & Wegkamp, M. H. (2006). Aggregation and sparsity via ℓ ₁-penalized least squares. In Lecture notes in artificial intelligence : Vol. 4005. Proceedings of 19th annual conference on learning theory, COLT 2006 (pp. 379–391). Springer: Heidelberg.
Google Scholar
Bunea, F., Tsybakov, A. B., & Wegkamp, M. H. (2007a). Aggregation for Gaussian regression. Annals of Statistics, 35, 1674–1697.
Article MathSciNet MATH Google Scholar
Bunea, F., Tsybakov, A. B., & Wegkamp, M. H. (2007b). Sparsity oracle inequalities for the Lasso. Electronic Journal of Statistics, 1, 169–194.
Article MathSciNet MATH Google Scholar
Candes, E., & Tao, T. (2007). The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Annals of Statistics, 35, 2313–2404.
Article MathSciNet MATH Google Scholar
Catoni, O. (1999). “Universal” aggregation rules with exact bias bounds. Preprint n.510, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7. http://www.proba.jussieu.fr/mathdoc/preprints/index.html#1999.
Catoni, O., (2004). Statistical learning theory and stochastic optimization. In Lecture notes in mathematics. Ecole d’été de Probabilités de Saint-Flour 2001, New York: Springer.
Google Scholar
Cesa-Bianchi, N., & Lugosi, G. (2006). Prediction, learning, and games. New York: Cambridge University Press.
MATH Google Scholar
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Shapire, R., & Warmuth, M. (1997). How to use expert advice. Journal of the ACM, 44, 427–485.
Article MathSciNet MATH Google Scholar
Cesa-Bianchi, N., Conconi, A., & Gentile, G. (2004). On the generalization ability of on-line learning algorithms. IEEE Transactions on Information Theory, 50, 2050–2057.
Article MathSciNet Google Scholar
Dalalyan, A., & Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In N. H. Bshouty & C. Gentile (Eds.), Lecture notes in artificial intelligence : Vol. 4539. Proceedings of the 20th annual conference on learning theory (COLT 2007) (pp. 97–111). Berlin: Springer.
Google Scholar
Dembo, A., & Zeitouni, O. (1998). Large deviations techniques and applications. New York: Springer.
MATH Google Scholar
Donoho, D. L., Elad, M., & Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Transactions on Information Theory, 52, 6–18.
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–451.
Article MathSciNet MATH Google Scholar
Greenshtein, E., & Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli, 10, 971–988.
Article MathSciNet MATH Google Scholar
Frank, I. E., & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics, 35, 109–148.
Article MATH Google Scholar
Juditsky, A. B., Nazin, A. V., Tsybakov, A. B., & Vayatis, N. (2005). Recursive aggregation of estimators via the Mirror Descent Algorithm with averaging. Problems of Information Transmission, 41, 368–384.
Article MathSciNet MATH Google Scholar
Juditsky, A., Rigollet, P., & Tsybakov, A. (2008, to appear). Learning by mirror averaging. Annals of Statistics. https://hal.ccsd.cnrs.fr/ccsd-00014097.
Kivinen, J., & Warmuth, M. K. (1999). Averaging expert predictions. In H. U. Simon & P. Fischer (Eds.), Lecture notes in artificial intelligence : Vol. 1572. Proceedings of the fourth European conference on computational learning theory (pp. 153–167). Berlin: Springer.
Google Scholar
Koltchinskii, V. (2006, submitted). Sparsity in penalized empirical risk minimization.
Lehmann, E. L., & Casella, G. (1998). Theory of point estimation. New York: Springer.
MATH Google Scholar
Leung, G., & Barron, A. (2006). Information theory and mixing least-square regressions. IEEE Transactions on Information Theory, 52, 3396–3410.
Article MathSciNet Google Scholar
Littlestone, N., & Warmuth, M. K. (1994). The weighted majority algorithm. Information and Computation, 108, 212–261.
Article MathSciNet MATH Google Scholar
Nemirovski, A. (2000). Topics in non-parametric statistics. In Lecture notes in mathematics : Vol. 1738. Ecole d’été de probabilités de saint-flour, 1998. New York: Springer.
Google Scholar
Obloj, J. (2004). The Skorokhod embedding problem and its offspring. Probability Surveys, 1, 321–392.
Article MathSciNet Google Scholar
Petrov, V. V. (1995). Limit theorems of probability theory. Oxford: Clarendon.
MATH Google Scholar
Revuz, D., & Yor, M. (1999). Continuous martingales and Brownian motion. Berlin: Springer.
MATH Google Scholar
Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf & M. Warmuth (Eds.), Lecture notes in artificial intelligence : Vol. 2777. Computational learning theory and kernel machines (pp. 303–313). Heidelberg: Springer.
Google Scholar
Tsybakov, A. B. (2004). Mathématiques et applications : Vol. 41. Introduction à l’estimation non-paramétrique. Berlin: Springer.
MATH Google Scholar
Tsybakov, A. B. (2006). Comments on “Regularization in statistics”, by P. Bickel and B. Li. Test, 15, 303–310.
Google Scholar
van de Geer, S. A. (2006). High dimensional generalized linear models and the Lasso (Research report No. 133). Seminar für Statistik, ETH, Zürich.
Vovk, V. (1990). Aggregating strategies. In Proceedings of the 3rd annual workshop on computational learning theory, COLT1990 (pp. 371–386). San Mateo: Morgan Kaufmann.
Google Scholar
Vovk, V. (2001). Competitive on-line statistics. International Statistical Review, 69, 213–248.
Article MATH Google Scholar
Yang, Y. (2000). Combining different procedures for adaptive regression. Journal of Multivariate Analysis, 74, 135–161.
Article MathSciNet MATH Google Scholar
Yang, Y. (2001). Adaptive regression by mixing. Journal of the American Statistical Association, 96, 574–588.
Article MathSciNet MATH Google Scholar
Yang, Y. (2003). Regression with multiple candidate models: selecting or mixing? Statistica Sinica, 13, 783–809.
MathSciNet MATH Google Scholar
Zhang, T. (2006a). From epsilon-entropy to KL-complexity: analysis of minimum information complexity density estimation. Annals of Statistics, 34, 2180–2210.
Article MathSciNet MATH Google Scholar
Zhang, T. (2006b). Information theoretical upper and lower bounds for statistical estimation. IEEE Transactions on Information Theory, 52, 1307–1321.
Article Google Scholar
Zhang, C.-H., & Huang, J. (2008, to appear). Model-selection consistency of the Lasso in high-dimensional regression. Annals of Statistics.

Download references

Author information

Authors and Affiliations

LPMA, University of Paris 6, 4, Place Jussieu, 75252, Paris cedex 05, France
A. Dalalyan & A. B. Tsybakov
Laboratoire de Statistique, CREST, Timbre J340, 3, av. Pierre Larousse, 92240, Malakoff cedex, France
A. B. Tsybakov

Authors

A. Dalalyan
View author publications
You can also search for this author in PubMed Google Scholar
A. B. Tsybakov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Dalalyan.

Additional information

Editors: Claudio Gentile, Nader H. Bshouty.

Research partially supported by the grant ANR-06-BLAN-0194 and by the PASCAL Network of Excellence.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dalalyan, A., Tsybakov, A.B. Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Mach Learn 72, 39–61 (2008). https://doi.org/10.1007/s10994-008-5051-0

Download citation

Received: 10 September 2007
Revised: 17 March 2008
Accepted: 21 March 2008
Published: 09 April 2008
Issue Date: August 2008
DOI: https://doi.org/10.1007/s10994-008-5051-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Abstract

Article PDF

Similar content being viewed by others

PAC-Bayesian Aggregation of Affine Estimators

Sharp oracle inequalities for low-complexity priors

An Upper Bound for Aggregating Algorithm for Regression with Changing Dependencies

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

Abstract

Article PDF

Similar content being viewed by others

PAC-Bayesian Aggregation of Affine Estimators

Sharp oracle inequalities for low-complexity priors

An Upper Bound for Aggregating Algorithm for Regression with Changing Dependencies

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation