Abstract
This is a short introduction to three papers on robustness, published by Peter Bickel as single author in the period 1975–1984: “One-step Huber estimates in the linear model” (Bickel 1975), “Parametric robustness: small biases can be worthwhile” (Bickel 1984a), and “Robust regression based on infinitesimal neighbourhoods” (Bickel1984b). It was the time when fundamental developments and understanding in robustness took place, and Peter Bickel has made deep contributions in this area. I am trying to place the results of the three papers in a new context of contemporary statistics.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
2.1 Introduction to Three Papers on Robustness
2.1.1 General Introduction
This is a short introduction to three papers on robustness, published by Peter Bickel as single author in the period 1975–1984: “One-step Huber estimates in the linear model” (Bickel 1975), “Parametric robustness: small biases can be worthwhile” (Bickel 1984a), and “Robust regression based on infinitesimal neighbourhoods” (Bickel 1984b). It was the time when fundamental developments and understanding in robustness took place, and Peter Bickel has made deep contributions in this area. I am trying to place the results of the three papers in a new context of contemporary statistics.
2.1.2 One-Step Huber Estimates in the Linear Model
The paper by Bickel (1975) about the following procedure. Given a \(\sqrt{n}\)-consistent initial estimator \(\tilde{\theta }\) for an unknown parameter θ, performing one Gauss-Newton iteration with respect to the objective function to be optimized leads to an asymptotically efficient estimator. Interestingly, this results holds even when the MLE is not efficient, and it is equivalent to the MLE if the latter is efficient. Such a result was known for the case where the loss function corresponds to the maximum likelihood estimator (Le Cam 1956). Bickel (1975) extends this result to much more general loss functions and models.
The idea of a computational short-cut without sacrificing statistical was relevant more than 30 years ago (summary point 5 in Sect. 3 of Bickel 1975). Yet, the idea is still very important in large scale and high-dimensional applications nowadays. Two issues emerge.
In some large-scale problems, one is willing to pay a price in terms of statistical accuracy while gaining substantially with respect to computing power. Peter Bickel has recently co-authored a paper on this subject (Meinshausen et al. 2009): having some sort of guarantee on statistical accuracy is then highly desirable. Results as in Bickel (1975), probably of weaker form which do not touch on the concept of efficiency, are underdeveloped for large-scale problems.
The other issue concerns the fact that iterations in algorithms correspond to some form of (algorithmic) regularization which is often very effective for large datasets. A prominent example of this is with boosting: instead of a Gauss-Newton step, boosting proceeds with Gauss-Southwell iterations which are coordinatewise up-dates based on an n-dimensional approximate gradient vector (where n denotes sample size). It is known, at least for some cases, that boosting with such Gauss-Southwell iterations achieves minimax convergence rate optimality (Bühlmann and Yu 2003; Bissantz et al. 2007) while being computationally attractive. Furthermore, in view of robustness, boosting can be easily modified such that each Gauss-Southwell up-date is performed in a robust way and hence, the overall procedure has desirable robustness properties (Lutz et al. 2008). As discussed in Sect. 3 of Bickel (1975), the starting value (i.e., the initial estimator) matters also in robustified boosting.
2.1.3 Parametric Robustness: Small Biases Can Be Worthwhile
The following problem is studied in Bickel (1984a): construct an estimator that performs well for a particular parametric model \({\mathcal{M}}_{0}\) while its risk is upper-bounded for another larger parametric model \({\mathcal{M}}_{1} \supset {\mathcal{M}}_{0}\). As an interpretation, one believes that \({\mathcal{M}}_{0}\) is adequate but one wants to guard against deviations coming from \({\mathcal{M}}_{1}\). It is shown in the paper that the corresponding optimality problem has not an explicit solution: however, approximate answers are presented and interesting connections are developed to the Efron-Morris (Efron and Morris 1971) family of translation estimates, i.e., adding a soft-thresholded additional correction term to the optimal estimator under \({\mathcal{M}}_{0}\). (The reference Efron and Morris (1971) is appearing in the text but is missing in the list of references in Bickel’s paper).
The notion of parametric robustness could be interesting in high-dimensional problems. Guarding against specific deviations (which may be easier to specify in some applications than in others) can be more powerful than trying to protect nonparametrically against point-mass distributions in any direction. In this sense, this paper is a key reference for developing effective high-dimensional robust inference.
2.1.4 Robust Regression Based on Infinitesimal Neighbourhoods
Robust regression is analyzed in Bickel (1984b) using a nice mathematical framework where the perturbation is within a \(1/\sqrt{n}\)-neighbourhood of the uncontaminated ideal model. The presented results in Bickel (1984b) give a clear (mathematical) interpretation of various procedures and suggest new robust methods for regression.
A major issue in robust regression is to guard against contaminations in X-space. Bickel (1984b) gives nice insights for the classical case where the dimension of X is relatively small: a new challenge is to deal with robustness in high-dimensional regression problems where the dimension of X can be much larger than sample size. One attempt has been to robustify high-dimensional estimators such as the Lasso (Khan et al. 2007) or L 2Boosting (Lutz et al. 2008), in particular with respect to contaminations in X-space. An interesting and different path has been initiated by Friedman (2001) with tree-based procedures which are robust in X-space (in connection with a robust loss function for the error). There is clearly a need of a unifying theory, in the spirit of Bickel (1984b), for robust regression when the dimension of X is large.
References
Begun JM, Hall WJ, Huang W-M, Wellner JA (1983) Information and asymptotic efficiency in parametric–nonparametric models. Ann Stat 11(2):432–452
Beran R (1974) Asymptotically efficient adaptive rank estimates in location models. Ann Stat 2:63–74
Bickel P (1975) One-step Huber estimates in the linear model. J Am Stat Assoc 70:428–434
Bickel PJ (1982) On adaptive estimation. Ann Stat 10(3):647–671
Bickel P (1984a) Parametric robustness: small biases can be worthwhile. Ann Stat 12:864–879
Bickel P (1984b) Robust regression based on infinitesimal neighbourhoods. Ann Stat 12:1349–1368
Bickel PJ, Klaassen CAJ (1986) Empirical Bayes estimation in functional and structural models, and uniformly adaptive estimation of location. Adv Appl Math 7(1):55–69
Bickel PJ, Ritov Y (1987) Efficient estimation in the errors in variables model. Ann Stat 15(2):513–540
Bickel PJ, Ritov Y (1988) Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhyā Ser A 50(3):381–393
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins series in the mathematical sciences. Johns Hopkins University Press, Baltimore
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1998) Efficient and adaptive estimation for semiparametric models. Springer, New York. Reprint of the 1993 original
Birgé L, Massart P (1993) Rates of convergence for minimum contrast estimators. Probab Theory Relat Fields 97(1–2):113–150
Birgé L, Massart P (1995) Estimation of integral functionals of a density. Ann Stat 23(1):11–29
Bissantz N, Hohage T, Munk A, Ruymgaart F (2007) Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J Numer Anal 45:2610–2636
Bühlmann P, Yu B (2003) Boosting with the L 2 loss: regression and classification. J Am Stat Assoc 98:324–339
Efron B (1977) The efficiency of Cox’s likelihood function for censored data. J Am Stat Assoc 72(359):557–565
Efron B, Morris C (1971) Limiting the risk of Bayes and empirical Bayes estimators – part I: Bayes case. J Am Stat Assoc 66:807–815
Friedman J (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Hájek J (1962) Asymptotically most powerful rank-order tests. Ann Math Stat 33:1124–1147
Khan J, Van Aelst S, Zamar R (2007) Robust linear model selection based on least angle regression. J Am Stat Assoc 102:1289–1299
Kiefer J, Wolfowitz J (1956) Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann Math Stat 27:887–906
Klaassen CAJ (1987) Consistent estimation of the influence function of locally asymptotically linear estimators. Ann Stat 15(4):1548–1562
Kosorok MR (2009) What’s so special about semiparametric methods? Sankhyā 71(2, Ser A): 331–353
Laurent B, Massart P (2000) Adaptive estimation of a quadratic functional by model selection. Ann Stat 28(5):1302–1338
Le Cam L (1956) On the asymptotic theory of estimation and testing hypotheses. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, Berkeley, pp 129–156
Lutz R, Kalisch M, Bühlmann P (2008) Robustified L2 boosting. Comput Stat Data Anal 52:3331–3341
Meinshausen N, Bickel P, Rice J (2009) Efficient blind search: optimal power of detection under computational cost constraint. Ann Appl Stat 3:38–60
Murphy SA, van der Vaart AW (1996) Likelihood inference in the errors-in-variables model. J Multivar Anal 59(1):81–108
Neyman J, Scott EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32
Pfanzagl J (1990a) Estimation in semiparametric models. Lecture notes in statistics, vol 63. Springer, New York. Some recent developments
Pfanzagl J (1990b) Large deviation probabilities for certain nonparametric maximum likelihood estimators. Ann Stat 18(4):1868–1877
Pfanzagl J (1993) Incidental versus random nuisance parameters. Ann Stat 21(4):1663–1691
Reiersol O (1950) Identifiability of a linear relation between variables which are subject to error. Econometrica 18:375–389
Ritov Y, Bickel PJ (1990) Achieving information bounds in non and semiparametric models. Ann Stat 18(2):925–938
Robins J, Tchetgen Tchetgen E, Li L, van der Vaart A (2009) Semiparametric minimax rates. Electron J Stat 3:1305–1321
Schick A (1986) On asymptotically efficient estimation in semiparametric models. Ann Stat 14(3):1139–1151
Stein C (1956) Efficient nonparametric testing and estimation. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability 1954–1955, vol. I. University of California Press, Berkeley/Los Angeles, pp 187–195
Stone CJ (1975) Adaptive maximum likelihood estimators of a location parameter. Ann Stat 3:267–284
Strasser H (1996) Asymptotic efficiency of estimates for models with incidental nuisance parameters. Ann Stat 24(2):879–901
Tchetgen E, Li L, Robins J, van der Vaart A (2008) Minimax estimation of the integral of a power of a density. Stat Probab Lett 78(18):3307–3311
van der Vaart AW (1988) Estimating a real parameter in a class of semiparametric models. Ann Stat 16(4):1450–1474
van der Vaart A (1991) On differentiable functionals. Ann Stat 19(1):178–204
van der Vaart A (1996) Efficient maximum likelihood estimation in semiparametric mixture models. Ann Stat 24(2):862–878
van Eeden C (1970) Efficiency-robust estimation of location. Ann Math Stat 41:172–181
Wellner JA, Klaassen CAJ, Ritov Y (2006) Semiparametric models: a review of progress since BKRW (1993). In: Frontiers in statistics. Imperial College Press, London, pp 25–44
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Bühlmann, P. (2014). Robust Statistics. In: Fan, J., Ritov, Y., Wu, C.F.J. (eds) Selected Works of Peter J. Bickel. Selected Works in Probability and Statistics, vol 13. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5544-8_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5544-8_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5543-1
Online ISBN: 978-1-4614-5544-8
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)