Random classification noise defeats all convex potential boosters

Long, Philip M.; Servedio, Rocco A.

doi:10.1007/s10994-009-5165-z

Random classification noise defeats all convex potential boosters

Published: 22 December 2009

Volume 78, pages 287–304, (2010)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Random classification noise defeats all convex potential boosters

Download PDF

Philip M. Long¹ &
Rocco A. Servedio²

1965 Accesses
116 Citations
6 Altmetric
Explore all metrics

Abstract

A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent to minimize some potential function of the margins of a data set. This class includes AdaBoost, LogitBoost, and other widely used and well-studied boosters. In this paper we show that for a broad class of convex potential functions, any such boosting algorithm is highly susceptible to random classification noise. We do this by showing that for any such booster and any nonzero random classification noise rate η, there is a simple data set of examples which is efficiently learnable by such a booster if there is no noise, but which cannot be learned to accuracy better than 1/2 if there is random classification noise at rate η. This holds even if the booster regularizes using early stopping or a bound on the L ₁ norm of the voting weights. This negative result is in contrast with known branching program based boosters which do not fall into the convex potential function framework and which can provably learn to high accuracy in the presence of random classification noise.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Bartlett, P. L., & Traskin, M. (2007). Adaboost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
MathSciNet Google Scholar
Bradley, J., & Schapire, R. (2007). Filterboost: Regression and classification on large datasets. In Proceedings of the twenty-first annual conference on neural information processing systems (NIPS).
Breiman, L. (1997). Arcing the edge (Technical report 486). Department of Statistics. Berkeley: University of California.
Breiman, L. (2004). Some infinity theory for predictor ensembles. Annals of Statistics, 32(1), 1–11.
Article MATH MathSciNet Google Scholar
Dietterich, T.G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158.
Article Google Scholar
Domingo, C., & Watanabe, O. (2000). MadaBoost: a modified version of AdaBoost. In Proceedings of the thirteenth annual conference on computational learning theory (COLT) (pp. 180–189).
Duffy, N., & Helmbold, D. (1999). Potential boosters? In Advances in neural information processing systems (NIPS) (pp. 258–264).
Duffy, N., & Helmbold, D. (2002). A geometric approach to leveraging weak learners. Theoretical Computer Science, 284, 67–108.
Article MATH MathSciNet Google Scholar
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285.
Article MATH MathSciNet Google Scholar
Freund, Y. (2001). An adaptive version of the boost-by-majority algorithm. Machine Learning, 43(3), 293–318.
Article MATH Google Scholar
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on machine learning (pp. 148–156).
Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2), 337–407.
Article MathSciNet Google Scholar
Kalai, A., & Servedio, R. (2005). Boosting in the presence of noise. Journal of Computer & System Sciences, 71(3), 266–290.
Article MATH MathSciNet Google Scholar
Long, P., & Servedio, R. (2005). Martingale boosting. In Proc. 18th annual conference on learning theory (COLT) (pp. 79–94).
Long, P., & Servedio, R. (2008). Adaptive martingale boosting. In Proc. 22nd annual conference on neural information processing systems (NIPS) (pp. 977–984).
Lugosi, G., & Vayatis, N. (2004). On the bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32(1), 30–55.
MATH MathSciNet Google Scholar
Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the fourteenth national conference on artificial intelligence and ninth innovative applications of artificial intelligence conference (AAAI/IAAI) (pp. 546–551).
Mannor, S., Meir, R., & Zhang, T. (2003). The consistency of greedy algorithms for classification. Journal of Machine Learning Research, 4, 713–741.
Article MathSciNet Google Scholar
Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. In Proc. 14th international conference on machine learning (pp. 211–218). San Mateo: Morgan Kaufmann.
Google Scholar
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In Advances in neural information processing systems (NIPS) (pp. 512–518).
Ratsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.
Article Google Scholar
Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297–336.
Article MATH Google Scholar
Servedio, R. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648.
Article MathSciNet Google Scholar
Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.
Article MATH MathSciNet Google Scholar
Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33(4), 1538–1579.
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Google, 1600 Amphitheatre Parkway, Mountain View, CA, 94043, USA
Philip M. Long
Computer Science Department, Columbia University, New York, NY, 10027, USA
Rocco A. Servedio

Authors

Philip M. Long
View author publications
You can also search for this author in PubMed Google Scholar
Rocco A. Servedio
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rocco A. Servedio.

Additional information

Editor: Avrim Blum.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Long, P.M., Servedio, R.A. Random classification noise defeats all convex potential boosters. Mach Learn 78, 287–304 (2010). https://doi.org/10.1007/s10994-009-5165-z

Download citation

Received: 09 September 2008
Revised: 20 June 2009
Accepted: 15 November 2009
Published: 22 December 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s10994-009-5165-z

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Random classification noise defeats all convex potential boosters

Abstract

Article PDF

Similar content being viewed by others

Optimization by Gradient Boosting

Robust Algorithms via PAC-Bayes and Laplace Distributions

Cost-sensitive boosting algorithms: Do we really need them?

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Navigation

Random classification noise defeats all convex potential boosters

Abstract

Article PDF

Similar content being viewed by others

Optimization by Gradient Boosting

Robust Algorithms via PAC-Bayes and Laplace Distributions

Cost-sensitive boosting algorithms: Do we really need them?

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation