Robust Statistics

Bühlmann, Peter

doi:10.1007/978-1-4614-5544-8_2

Peter Bühlmann⁹

Part of the book series: Selected Works in Probability and Statistics ((SWPS,volume 13))

2447 Accesses

Abstract

This is a short introduction to three papers on robustness, published by Peter Bickel as single author in the period 1975–1984: “One-step Huber estimates in the linear model” (Bickel 1975), “Parametric robustness: small biases can be worthwhile” (Bickel 1984a), and “Robust regression based on infinitesimal neighbourhoods” (Bickel1984b). It was the time when fundamental developments and understanding in robustness took place, and Peter Bickel has made deep contributions in this area. I am trying to place the results of the three papers in a new context of contemporary statistics.

You have full access to this open access chapter, Download chapter PDF

Editorial, special issue on “Advances in Robust Statistics”

Article Open access 28 June 2021

Robustness

Robust Estimators in Econometrics

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

2.1 Introduction to Three Papers on Robustness

2.1.1 General Introduction

This is a short introduction to three papers on robustness, published by Peter Bickel as single author in the period 1975–1984: “One-step Huber estimates in the linear model” (Bickel 1975), “Parametric robustness: small biases can be worthwhile” (Bickel 1984a), and “Robust regression based on infinitesimal neighbourhoods” (Bickel 1984b). It was the time when fundamental developments and understanding in robustness took place, and Peter Bickel has made deep contributions in this area. I am trying to place the results of the three papers in a new context of contemporary statistics.

2.1.2 One-Step Huber Estimates in the Linear Model

The paper by Bickel (1975) about the following procedure. Given a \(\sqrt{n}\)-consistent initial estimator \(\tilde{\theta }\) for an unknown parameter θ, performing one Gauss-Newton iteration with respect to the objective function to be optimized leads to an asymptotically efficient estimator. Interestingly, this results holds even when the MLE is not efficient, and it is equivalent to the MLE if the latter is efficient. Such a result was known for the case where the loss function corresponds to the maximum likelihood estimator (Le Cam 1956). Bickel (1975) extends this result to much more general loss functions and models.

The idea of a computational short-cut without sacrificing statistical was relevant more than 30 years ago (summary point 5 in Sect. 3 of Bickel 1975). Yet, the idea is still very important in large scale and high-dimensional applications nowadays. Two issues emerge.

In some large-scale problems, one is willing to pay a price in terms of statistical accuracy while gaining substantially with respect to computing power. Peter Bickel has recently co-authored a paper on this subject (Meinshausen et al. 2009): having some sort of guarantee on statistical accuracy is then highly desirable. Results as in Bickel (1975), probably of weaker form which do not touch on the concept of efficiency, are underdeveloped for large-scale problems.

The other issue concerns the fact that iterations in algorithms correspond to some form of (algorithmic) regularization which is often very effective for large datasets. A prominent example of this is with boosting: instead of a Gauss-Newton step, boosting proceeds with Gauss-Southwell iterations which are coordinatewise up-dates based on an n-dimensional approximate gradient vector (where n denotes sample size). It is known, at least for some cases, that boosting with such Gauss-Southwell iterations achieves minimax convergence rate optimality (Bühlmann and Yu 2003; Bissantz et al. 2007) while being computationally attractive. Furthermore, in view of robustness, boosting can be easily modified such that each Gauss-Southwell up-date is performed in a robust way and hence, the overall procedure has desirable robustness properties (Lutz et al. 2008). As discussed in Sect. 3 of Bickel (1975), the starting value (i.e., the initial estimator) matters also in robustified boosting.

2.1.3 Parametric Robustness: Small Biases Can Be Worthwhile

The following problem is studied in Bickel (1984a): construct an estimator that performs well for a particular parametric model \({\mathcal{M}}_{0}\) while its risk is upper-bounded for another larger parametric model \({\mathcal{M}}_{1} \supset {\mathcal{M}}_{0}\). As an interpretation, one believes that \({\mathcal{M}}_{0}\) is adequate but one wants to guard against deviations coming from \({\mathcal{M}}_{1}\). It is shown in the paper that the corresponding optimality problem has not an explicit solution: however, approximate answers are presented and interesting connections are developed to the Efron-Morris (Efron and Morris 1971) family of translation estimates, i.e., adding a soft-thresholded additional correction term to the optimal estimator under \({\mathcal{M}}_{0}\). (The reference Efron and Morris (1971) is appearing in the text but is missing in the list of references in Bickel’s paper).

The notion of parametric robustness could be interesting in high-dimensional problems. Guarding against specific deviations (which may be easier to specify in some applications than in others) can be more powerful than trying to protect nonparametrically against point-mass distributions in any direction. In this sense, this paper is a key reference for developing effective high-dimensional robust inference.

2.1.4 Robust Regression Based on Infinitesimal Neighbourhoods

Robust regression is analyzed in Bickel (1984b) using a nice mathematical framework where the perturbation is within a \(1/\sqrt{n}\)-neighbourhood of the uncontaminated ideal model. The presented results in Bickel (1984b) give a clear (mathematical) interpretation of various procedures and suggest new robust methods for regression.

A major issue in robust regression is to guard against contaminations in X-space. Bickel (1984b) gives nice insights for the classical case where the dimension of X is relatively small: a new challenge is to deal with robustness in high-dimensional regression problems where the dimension of X can be much larger than sample size. One attempt has been to robustify high-dimensional estimators such as the Lasso (Khan et al. 2007) or L ₂Boosting (Lutz et al. 2008), in particular with respect to contaminations in X-space. An interesting and different path has been initiated by Friedman (2001) with tree-based procedures which are robust in X-space (in connection with a robust loss function for the error). There is clearly a need of a unifying theory, in the spirit of Bickel (1984b), for robust regression when the dimension of X is large.

References

Begun JM, Hall WJ, Huang W-M, Wellner JA (1983) Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat 11(2):432–452
Article MATH MathSciNet Google Scholar
Beran R (1974) Asymptotically efficient adaptive rank estimates in location models. Ann Stat 2:63–74
Article MATH MathSciNet Google Scholar
Bickel P (1975) One-step Huber estimates in the linear model. J Am Stat Assoc 70:428–434
Article MATH MathSciNet Google Scholar
Bickel PJ (1982) On adaptive estimation. Ann Stat 10(3):647–671
Article MATH MathSciNet Google Scholar
Bickel P (1984a) Parametric robustness: small biases can be worthwhile. Ann Stat 12:864–879
Article MATH MathSciNet Google Scholar
Bickel P (1984b) Robust regression based on infinitesimal neighbourhoods. Ann Stat 12:1349–1368
Article MATH MathSciNet Google Scholar
Bickel PJ, Klaassen CAJ (1986) Empirical Bayes estimation in functional and structural models, and uniformly adaptive estimation of location. Adv Appl Math 7(1):55–69
Article MATH MathSciNet Google Scholar
Bickel PJ, Ritov Y (1987) Efficient estimation in the errors in variables model. Ann Stat 15(2):513–540
Article MATH MathSciNet Google Scholar
Bickel PJ, Ritov Y (1988) Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhyā Ser A 50(3):381–393
MATH MathSciNet Google Scholar
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins series in the mathematical sciences. Johns Hopkins University Press, Baltimore
Google Scholar
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1998) Efficient and adaptive estimation for semiparametric models. Springer, New York. Reprint of the 1993 original
Google Scholar
Birgé L, Massart P (1993) Rates of convergence for minimum contrast estimators. Probab Theory Relat Fields 97(1–2):113–150
Article MATH Google Scholar
Birgé L, Massart P (1995) Estimation of integral functionals of a density. Ann Stat 23(1):11–29
Article MATH Google Scholar
Bissantz N, Hohage T, Munk A, Ruymgaart F (2007) Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J Numer Anal 45:2610–2636
Article MATH MathSciNet Google Scholar
Bühlmann P, Yu B (2003) Boosting with the L ₂ loss: regression and classification. J Am Stat Assoc 98:324–339
Article MATH Google Scholar
Efron B (1977) The efficiency of Cox’s likelihood function for censored data. J Am Stat Assoc 72(359):557–565
Article MATH MathSciNet Google Scholar
Efron B, Morris C (1971) Limiting the risk of Bayes and empirical Bayes estimators – part I: Bayes case. J Am Stat Assoc 66:807–815
MATH MathSciNet Google Scholar
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Article MATH Google Scholar
Hájek J (1962) Asymptotically most powerful rank-order tests. Ann Math Stat 33:1124–1147
Article MATH Google Scholar
Khan J, Van Aelst S, Zamar R (2007) Robust linear model selection based on least angle regression. J Am Stat Assoc 102:1289–1299
Article MATH Google Scholar
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:887–906
Article MATH MathSciNet Google Scholar
Klaassen CAJ (1987) Consistent estimation of the influence function of locally asymptotically linear estimators. Ann Stat 15(4):1548–1562
Article MATH MathSciNet Google Scholar
Kosorok MR (2009) What’s so special about semiparametric methods? Sankhyā 71(2, Ser A): 331–353
Google Scholar
Laurent B, Massart P (2000) Adaptive estimation of a quadratic functional by model selection. Ann Stat 28(5):1302–1338
Article MATH MathSciNet Google Scholar
Le Cam L (1956) On the asymptotic theory of estimation and testing hypotheses. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 129–156
Google Scholar
Lutz R, Kalisch M, Bühlmann P (2008) Robustified L2 boosting. Comput Stat Data Anal 52:3331–3341
Article MATH Google Scholar
Meinshausen N, Bickel P, Rice J (2009) Efficient blind search: optimal power of detection under computational cost constraint. Ann Appl Stat 3:38–60
Article MATH MathSciNet Google Scholar
Murphy SA, van der Vaart AW (1996) Likelihood inference in the errors-in-variables model. J Multivar Anal 59(1):81–108
Article MATH Google Scholar
Neyman J, Scott EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32
Article MathSciNet Google Scholar
Pfanzagl J (1990a) Estimation in semiparametric models. Lecture notes in statistics, vol 63. Springer, New York. Some recent developments
Google Scholar
Pfanzagl J (1990b) Large deviation probabilities for certain nonparametric maximum likelihood estimators. Ann Stat 18(4):1868–1877
Article MATH MathSciNet Google Scholar
Pfanzagl J (1993) Incidental versus random nuisance parameters. Ann Stat 21(4):1663–1691
Article MATH MathSciNet Google Scholar
Reiersol O (1950) Identifiability of a linear relation between variables which are subject to error. Econometrica 18:375–389
Article MathSciNet Google Scholar
Ritov Y, Bickel PJ (1990) Achieving information bounds in non and semiparametric models. Ann Stat 18(2):925–938
Article MATH MathSciNet Google Scholar
Robins J, Tchetgen Tchetgen E, Li L, van der Vaart A (2009) Semiparametric minimax rates. Electron J Stat 3:1305–1321
Article MATH MathSciNet Google Scholar
Schick A (1986) On asymptotically efficient estimation in semiparametric models. Ann Stat 14(3):1139–1151
Article MATH MathSciNet Google Scholar
Stein C (1956) Efficient nonparametric testing and estimation. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability 1954–1955, vol. I. University of California Press, Berkeley/Los Angeles, pp 187–195
Google Scholar
Stone CJ (1975) Adaptive maximum likelihood estimators of a location parameter. Ann Stat 3:267–284
Article MATH Google Scholar
Strasser H (1996) Asymptotic efficiency of estimates for models with incidental nuisance parameters. Ann Stat 24(2):879–901
Article MATH MathSciNet Google Scholar
Tchetgen E, Li L, Robins J, van der Vaart A (2008) Minimax estimation of the integral of a power of a density. Stat Probab Lett 78(18):3307–3311
Article MATH Google Scholar
van der Vaart AW (1988) Estimating a real parameter in a class of semiparametric models. Ann Stat 16(4):1450–1474
Article MATH Google Scholar
van der Vaart A (1991) On differentiable functionals. Ann Stat 19(1):178–204
Article MATH Google Scholar
van der Vaart A (1996) Efficient maximum likelihood estimation in semiparametric mixture models. Ann Stat 24(2):862–878
Article MATH Google Scholar
van Eeden C (1970) Efficiency-robust estimation of location. Ann Math Stat 41:172–181
Article MATH Google Scholar
Wellner JA, Klaassen CAJ, Ritov Y (2006) Semiparametric models: a review of progress since BKRW (1993). In: Frontiers in statistics. Imperial College Press, London, pp 25–44
Google Scholar

Download references

Author information

Authors and Affiliations

ETH Zürich, Rämistrasse 101, HG G17 8092, Zürich, Switzerland
Peter Bühlmann

Authors

Peter Bühlmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter Bühlmann .

Editor information

Editors and Affiliations

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey, USA
Jianqing Fan
Department of Statistics, Hebrew University of Jerusalem, Jerusalem, Israel
Ya’acov Ritov
School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, USA
C. F. Jeff Wu

Appendix

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bühlmann, P. (2014). Robust Statistics. In: Fan, J., Ritov, Y., Wu, C.F.J. (eds) Selected Works of Peter J. Bickel. Selected Works in Probability and Statistics, vol 13. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5544-8_2

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5544-8_2
Published: 08 October 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5543-1
Online ISBN: 978-1-4614-5544-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Robust Statistics

Abstract

Similar content being viewed by others