Abstract
A broad class of boosting algorithms can be interpreted as performing coordinate-wise gradient descent to minimize some potential function of the margins of a data set. This class includes AdaBoost, LogitBoost, and other widely used and well-studied boosters. In this paper we show that for a broad class of convex potential functions, any such boosting algorithm is highly susceptible to random classification noise. We do this by showing that for any such booster and any nonzero random classification noise rate η, there is a simple data set of examples which is efficiently learnable by such a booster if there is no noise, but which cannot be learned to accuracy better than 1/2 if there is random classification noise at rate η. This holds even if the booster regularizes using early stopping or a bound on the L 1 norm of the voting weights. This negative result is in contrast with known branching program based boosters which do not fall into the convex potential function framework and which can provably learn to high accuracy in the presence of random classification noise.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bartlett, P. L., & Traskin, M. (2007). Adaboost is consistent. Journal of Machine Learning Research, 8, 2347–2368.
Bradley, J., & Schapire, R. (2007). Filterboost: Regression and classification on large datasets. In Proceedings of the twenty-first annual conference on neural information processing systems (NIPS).
Breiman, L. (1997). Arcing the edge (Technical report 486). Department of Statistics. Berkeley: University of California.
Breiman, L. (2004). Some infinity theory for predictor ensembles. Annals of Statistics, 32(1), 1–11.
Dietterich, T.G. (2000). An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2), 139–158.
Domingo, C., & Watanabe, O. (2000). MadaBoost: a modified version of AdaBoost. In Proceedings of the thirteenth annual conference on computational learning theory (COLT) (pp. 180–189).
Duffy, N., & Helmbold, D. (1999). Potential boosters? In Advances in neural information processing systems (NIPS) (pp. 258–264).
Duffy, N., & Helmbold, D. (2002). A geometric approach to leveraging weak learners. Theoretical Computer Science, 284, 67–108.
Freund, Y. (1995). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285.
Freund, Y. (2001). An adaptive version of the boost-by-majority algorithm. Machine Learning, 43(3), 293–318.
Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the thirteenth international conference on machine learning (pp. 148–156).
Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.
Friedman, J., Hastie, T., & Tibshirani, R. (1998). Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2), 337–407.
Kalai, A., & Servedio, R. (2005). Boosting in the presence of noise. Journal of Computer & System Sciences, 71(3), 266–290.
Long, P., & Servedio, R. (2005). Martingale boosting. In Proc. 18th annual conference on learning theory (COLT) (pp. 79–94).
Long, P., & Servedio, R. (2008). Adaptive martingale boosting. In Proc. 22nd annual conference on neural information processing systems (NIPS) (pp. 977–984).
Lugosi, G., & Vayatis, N. (2004). On the bayes-risk consistency of regularized boosting methods. Annals of Statistics, 32(1), 30–55.
Maclin, R., & Opitz, D. (1997). An empirical evaluation of bagging and boosting. In Proceedings of the fourteenth national conference on artificial intelligence and ninth innovative applications of artificial intelligence conference (AAAI/IAAI) (pp. 546–551).
Mannor, S., Meir, R., & Zhang, T. (2003). The consistency of greedy algorithms for classification. Journal of Machine Learning Research, 4, 713–741.
Margineantu, D.D., & Dietterich, T.G. (1997). Pruning adaptive boosting. In Proc. 14th international conference on machine learning (pp. 211–218). San Mateo: Morgan Kaufmann.
Mason, L., Baxter, J., Bartlett, P., & Frean, M. (1999). Boosting algorithms as gradient descent. In Advances in neural information processing systems (NIPS) (pp. 512–518).
Ratsch, G., Onoda, T., & Müller, K.-R. (2001). Soft margins for AdaBoost. Machine Learning, 42(3), 287–320.
Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37, 297–336.
Servedio, R. (2003). Smooth boosting and learning with malicious noise. Journal of Machine Learning Research, 4, 633–648.
Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32(1), 56–85.
Zhang, T., & Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Annals of Statistics, 33(4), 1538–1579.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Avrim Blum.
Rights and permissions
About this article
Cite this article
Long, P.M., Servedio, R.A. Random classification noise defeats all convex potential boosters. Mach Learn 78, 287–304 (2010). https://doi.org/10.1007/s10994-009-5165-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-009-5165-z