Robustness and generalization

Xu, Huan; Mannor, Shie

doi:10.1007/s10994-011-5268-1

Robustness and generalization

Published: 15 November 2011

Volume 86, pages 391–423, (2012)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Robustness and generalization

Download PDF

Huan Xu¹ &
Shie Mannor²

6524 Accesses
203 Citations
3 Altmetric
Explore all metrics

Abstract

We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is “similar” to a training sample, then the testing error is close to the training error. This provides a novel approach, different from complexity or stability arguments, to study generalization of learning algorithms. One advantage of the robustness approach, compared to previous methods, is the geometric intuition it conveys. Consequently, robustness-based analysis is easy to extend to learning in non-standard setups such as Markovian samples or quantile loss. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property that is required for learning algorithms to work.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Alon, N., Ben-David, S., Cesa-Bianchi, N., & Haussler, D. (1997). Scale-sensitive dimension, uniform convergence, and learnability. Journal of the ACM, 44(4), 615–631.
Article MathSciNet MATH Google Scholar
Bartlett, P. L. (1998). The sample complexity of pattern classification with neural networks: the size of the weight is more important than the size of the network. IEEE Transactions on Information Theory, 44(2), 525–536.
Article MathSciNet MATH Google Scholar
Bartlett, P. L., & Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.
MathSciNet Google Scholar
Bartlett, P. L., Bousquet, O., & Mendelson, S. (2005). Local Rademacher complexities. The Annals of Statistics, 33(4), 1497–1537.
Article MathSciNet MATH Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., & Pereira, F. (2007). Analysis of representations for domain adaptation. In Advances in neural information processing systems (Vol. 19).
Google Scholar
Ben-Tal, A., & Nemirovski, A. (1998). Robust convex optimization. Mathematics of Operations Research, 23, 769–805.
Article MathSciNet MATH Google Scholar
Ben-Tal, A., & Nemirovski, A. (1999). Robust solutions of uncertain linear programs. Operations Research Letters, 25(1), 1–13.
Article MathSciNet MATH Google Scholar
Bertsimas, D., & Sim, M. (2004). The price of robustness. Operations Research, 52(1), 35–53.
Article MathSciNet MATH Google Scholar
Bhattacharyya, C., Pannagadatta, K. S., & Smola, A. J. (2004). A second order cone programming formulation for classifying missing data. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems (NIPS17). Cambridge: MIT Press.
Google Scholar
Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. Journal of Machine Learning Research, 2, 499–526.
MathSciNet MATH Google Scholar
Bousquet, O., Boucheron, S., & Lugosi, G. (2005). Theory of classification: a survey of recent advances. ESAIM Probability and Statistics, 9, 323–375.
Article MathSciNet MATH Google Scholar
Cade, B., & Noon, B. (2003). A gentle introduction to quantile regression for ecologists. Frontiers in Ecology and the Environment, 1, 412–420.
Article Google Scholar
Caramanis, C., Mannor, S., & Xu, H. (2011). Robust optimization and machine learning. In S. Sra, S. Nowozin, & S. Wright (Eds.), Optimization for machine learning. Cambridge: MIT Press.
Google Scholar
Cole, T. (1988). Fitting smoothed centile curves to reference data. Journal of the Royal Statistical Society, Ser. A, 151, 385–418.
Article Google Scholar
Cortes, C., & Vapnik, V. N. (1995). Support vector networks. Machine Learning, 20, 1–25.
Google Scholar
Devroye, L., & Wagner, T. (1979a). Distribution-free inequalities for the deleted and holdout error estimates. IEEE Transactions of Information Theory, 25(2), 202–207.
Article MathSciNet MATH Google Scholar
Devroye, L., & Wagner, T. (1979b). Distribution-free performance bounds for potential function rules. IEEE Transactions of Information Theory, 25(2), 601–604.
Article MATH Google Scholar
Devroye, L., Györfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. New York: Springer.
MATH Google Scholar
Doob, J. L. (1953). Stochastic processes. New York: Wiley.
MATH Google Scholar
Evgeniou, T., Pontil, M., & Poggio, T. (2000). Regularization networks and support vector machines. In A. J. Smola, P. L. Bartlett, B. Schölkopf, & D. Schuurmans (Eds.), Advances in large margin classifiers (pp. 171–203). Cambridge: MIT Press.
Google Scholar
Gamarnik, D. (2003). Extension of the PAC framework to finite and countable Markov chains. IEEE Transaction on Information Theory, 49(1), 338–345.
Article MathSciNet MATH Google Scholar
Globerson, A., & Roweis, S. (2006). Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd international conference on machine learning (pp. 353–360). New York: ACM Press.
Google Scholar
Glynn, P. W., & Ormoneit, D. (2002). Hoeffding’s inequality for uniformly ergodic Markov chains. Statistics and Probability Letters, 56, 143–146.
Article MathSciNet MATH Google Scholar
Huber, P. J. (1981). Robust statistics. New York: Wiley.
Book MATH Google Scholar
Kakade, S., Sridharan, K., & Tewari, A. (2009). On the complexity of linear predictions: risk bounds, margin bounds, and regularization. In Advances in neural information processing systems (Vol. 21, pp. 793–800).
Google Scholar
Kearns, M., & Schapire, R. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48(3), 464–497.
Article MathSciNet MATH Google Scholar
Klivans, A., Long, P., & Servedio, R. (2009). Learning halfspaces with malicious noise. Journal of Machine Learning Research, 10, 2715–2740.
MathSciNet Google Scholar
Koenker, R., & Bassett, G. (1978). Regression quantiles. Econometrica, 46, 33–50.
Article MathSciNet MATH Google Scholar
Kolmogorov, A. N., & Tihomirov, V. (2002). ϵ-entropy and ϵ-capaciity of sets in functional spaces. American Mathematical Society Translations (2), 17, 227–364.
MathSciNet Google Scholar
Koltchinskii, V. (2002). Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory, 47(5), 1902–1914.
Article MathSciNet Google Scholar
Koltchinskii, V., & Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1), 1–50.
MathSciNet MATH Google Scholar
Lanckriet, G. R., El Ghaoui, L., Bhattacharyya, C., & Jordan, M. I. (2003). A robust minimax approach to classification. Journal of Machine Learning Research, 3, 555–582.
MathSciNet MATH Google Scholar
Lozano, A. C., Kulkarni, S. R., & Schapire, R. E. (2006). Convergence and consistency of regularized boosting algorithms with stationary β-mixing observations. In Advances in neural information processing systems (Vol. 18).
Google Scholar
Mansour, Y., Mohri, M., & Rostamizadeh, A. (2009). Domain adaptation: learning bounds and algorithms. In Proceedings of the 22nd annual conference on learning theory.
Google Scholar
McDiarmid, C. (1989). On the method of bounded differences. In Surveys in combinatorics (pp. 148–188).
Google Scholar
Meyn, S. P., & Tweedie, R. L. (1993). Markov chains and stochastic stability. New York: Springer.
MATH Google Scholar
Mukherjee, S., Niyogi, P., Poggio, T., & Rifkin, R. (2006). Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Advances in Computational Mathematics, 25(1–3), 161–193.
Article MathSciNet MATH Google Scholar
Poggio, T., Rifkin, R., Mukherjee, S., & Niyogi, P. (2004). General conditions for predictivity in learning theory. Nature, 428(6981), 419–422.
Article Google Scholar
Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York: Wiley.
Book MATH Google Scholar
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Google Scholar
Shalev-Shwartz, S., Shamir, O., Srebro, N., & Sridharan, K. (2009). Learnability and stability in the general learning setting. In Proceedings of 22nd annual conference of learning theory.
Google Scholar
Shivaswamy, P. K., Bhattacharyya, C., & Smola, A. J. (2006). Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7, 1283–1314.
MathSciNet MATH Google Scholar
Steinwart, I. (2006). Consistency of support vector machines and other regularized kernel classifiers. IEEE Transactions on Information Theory, 51(1), 128–142.
Article MathSciNet Google Scholar
Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1), 267–288.
MathSciNet MATH Google Scholar
van der Vaart, A. W., & Wellner, J. A. (2000). Weak convergence and empirical processes. New York: Springer.
Google Scholar
Vapnik, V. N., & Chervonenkis, A. (1974). Theory of pattern recognition. Moscow: Nauka.
MATH Google Scholar
Vapnik, V. N., & Chervonenkis, A. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3), 260–284.
Google Scholar
Xu, H., & Mannor, S. (2010). Robustness and generalization. In Proceedings of the twenty-third annual conference on learning theory (pp. 503–515).
Google Scholar
Xu, H., Caramanis, C., & Mannor, S. (2009a). Robustness and regularization of support vector machines. Journal of Machine Learning Research, 10, 1485–1510.
MathSciNet Google Scholar
Xu, H., Caramanis, C., Mannor, S., & Yun, S. (2009b). Risk sensitive robust support vector machines. In Proceedings of the forty-eighth IEEE conference on decision and control (pp. 4655–4661).
Google Scholar
Xu, H., Caramanis, C., & Mannor, S. (2010a). Principal component analysis with contaminated data: the high dimensional case. In Proceeding of the twenty-third annual conference on learning theory (pp. 490–502).
Google Scholar
Xu, H., Caramanis, C., & Mannor, S. (2010b). Robust regression and lasso. IEEE Transactions on Information Theory, 56(7), 3561–3574.
Article MathSciNet Google Scholar
Yu, Y., Yang, M., Xu, L., White, M., & Schuurmans, D. (2011). Relaxed clipping: a global training method for robust regression and classification. In Advances in neural information processing systems (Vol. 23).
Google Scholar
Zakai, A., & Ritov, R. (2009). Consistency and localizability. Journal of Machine Learning Research, 10, 827–856.
MathSciNet Google Scholar
Zou, B., Li, L. Q., & Xu, Z. B. (2009). The generalization performance of ERM algorithm with strongly mixing observations. Machine Learning, 75, 275–295.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mechanical Engineering, National University of Singapore, Singapore, Singapore
Huan Xu
Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa, Israel
Shie Mannor

Authors

Huan Xu
View author publications
You can also search for this author in PubMed Google Scholar
Shie Mannor
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huan Xu.

Additional information

Editor: Phil Long.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xu, H., Mannor, S. Robustness and generalization. Mach Learn 86, 391–423 (2012). https://doi.org/10.1007/s10994-011-5268-1

Download citation

Received: 16 February 2011
Accepted: 10 October 2011
Published: 15 November 2011
Issue Date: March 2012
DOI: https://doi.org/10.1007/s10994-011-5268-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robustness and generalization

Abstract

Article PDF

Similar content being viewed by others

Relative deviation learning bounds and generalization with unbounded loss functions

Robust statistical learning with Lipschitz and convex loss functions

Robust regression using biased objectives

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robustness and generalization

Abstract

Article PDF

Similar content being viewed by others

Relative deviation learning bounds and generalization with unbounded loss functions

Robust statistical learning with Lipschitz and convex loss functions

Robust regression using biased objectives

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation