Trimming algorithms for clustering contaminated grouped data and their robustness

Gallegos, María Teresa; Ritter, Gunter

doi:10.1007/s11634-009-0044-9

Trimming algorithms for clustering contaminated grouped data and their robustness

Regular Article
Published: 02 September 2009

Volume 3, pages 135–167, (2009)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Trimming algorithms for clustering contaminated grouped data and their robustness

Download PDF

María Teresa Gallegos¹ &
Gunter Ritter²

173 Accesses
28 Citations
Explore all metrics

Abstract

We establish an affine equivariant, constrained heteroscedastic model and criterion with trimming for clustering contaminated, grouped data. We show existence of the maximum likelihood estimator, propose a method for determining an appropriate constraint, and design a strategy for finding reasonable clusterings. We finally compute breakdown points of the estimated parameters thereby showing asymptotic robustness of the method.

Article PDF

Assessing trimming methodologies for clustering linear regression data

Article Open access 30 July 2018

Finding Outliers in Gaussian Model-based Clustering

Article 30 May 2024

A semiparametric method for clustering mixed data

Article 15 July 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Barnett V, Lewis T (1994) Outliers in statistical data. Wiley, Chichester
MATH Google Scholar
Becker C, Gather U (1999) The masking breakdown point of multivariate outlier identification rules. JASA 94: 947–955
MATH MathSciNet Google Scholar
Bezdek JC, Keller J, Krisnapuram R, Pal NR (1999) Fuzzy models and algorithms for pattern recognition and image processing. The handbooks of fuzzy sets series. Kluwer, Boston
Google Scholar
Bock H-H (1985) On some significance tests in cluster analysis. J Class 2: 77–108
Article MATH MathSciNet Google Scholar
Chen H, Chen J, Kalbfleisch JD (2004) Testing for a finite mixture model with two components. J R Stat Soc Ser B 66: 95–115
Article MATH MathSciNet Google Scholar
Cuesta-Albertos JA, Gordaliza A, Matrán C (1997) Trimmed k-means: an attempt to robustify quantizers. Ann Stat 25: 553–576
Article MATH Google Scholar
Dennis JE Jr (1981) Algorithms for nonlinear fitting. In: Powell MJD (eds) Nonlinear optimization 1981. Procedings of the NATO Advanced Research Institute held at Cambridge in July 1981. Academic Press, London
Google Scholar
Donoho DL, Huber PJ (1983) The notion of a breakdown point. In: Bickel PJ, Doksum KA, Hodges JL (eds) A Festschrift for Erich L. Lehmann, The Wadsworth Statistics/Probability Series. Wadsworth, Belmont, pp 157–184
Google Scholar
Gallegos MT, Ritter G (2005) A robust method for cluster analysis. Ann Stat 33: 347–380
Article MATH MathSciNet Google Scholar
Gallegos MT, Ritter G (2009) Using combinatorial optimization in model-based clustering under spurious outliers and cardinality constraints. Comput Statist Data Anal (to appear)
García-Escudero LA, Gordaliza A (1999) Robustness properties of k-means and trimmed k-means. J Am Stat Assoc 94: 956–969
Article MATH Google Scholar
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A (2008) A general trimming approach to robust cluster analysis. Ann Stat 36: 1324–1345
Article MATH Google Scholar
Gordon AD (1999) Classification. Monographs on statistics and applied probability, vol 82, 2nd edn. CRC Press, New York
Google Scholar
Hathaway RJ (1985) A constrained formulation of maximum-likelihood estimation for normal mixture distributions. Ann Stat 13: 795–800
Article MATH MathSciNet Google Scholar
Hodges JL Jr (1967) Efficiency in normal samples and tolerance of extreme values for some estimates of location. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, pp 163–186
Kéribin C (2000) Consistent estimation of the order of mixture models. Sankhyā 62(Series A): 49–66
MATH Google Scholar
McLachlan GJ, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Mecklin CJ, Mundfrom DJ (2004) An appraisal and bibliography of tests for multivariate normality. Int Stat Rev 72(1): 123–138
MATH Google Scholar
Milligan GW, Cooper MC (1985) An examination of procedures for determining the number of clusters in a data set. Psychometrika 50: 159–179
Article Google Scholar
Mucha H-J, Bartel HG, Dolata J (2002) Exploring Roman brick and tile by cluster analysis with validation of results. In: Gaul W, Ritter G (eds) Classification, automation, and new media. Studies in classification, data analysis, and knowledge organization. Springer, Berlin, pp 471–478
Google Scholar
Neykov N, Filzmoser P, Dimova R, Neytchev P (2007) Robust fitting of mixtures using the trimmed likelihood estimator. Comput Stat Data Anal 52: 299–308
Article MATH MathSciNet Google Scholar
Pollard D (1981) Strong consistency of k-means clustering. Ann Stat 9: 135–140
Article MATH MathSciNet Google Scholar
Ritter G, Gallegos MT (1997) Outliers in statistical pattern recognition and an application to automatic chromosome classification. Patt Rec Lett 18: 525–539
Article Google Scholar
Rocke DM, Woodruff DL (1999) A synthesis of outlier detection and cluster identification. Technical report, University of California, Davis. http://handel.cipic.ucdavis.edu/~dmrocke/Synth5.pdf
Schroeder A (1976) Analyse d’un mélange de distributions de probabilités de même type. Revue de Statistique Appliquée 24: 39–62
MathSciNet Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464
Article MATH Google Scholar
Symons MJ (1981) Clustering criteria and multivariate normal mixtures. Biometrics 37: 35–43
Article MATH MathSciNet Google Scholar
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc Ser B 63: 411–423
Article MATH MathSciNet Google Scholar
Wolfe JH (1970) Pattern clustering by multivariate mixture analysis. Multivar Behav Res 5: 329–350
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Data Analysis, Salzweg, Germany
María Teresa Gallegos
Fakultät für Informatik und Mathematik, Universität Passau, Passau, Germany
Gunter Ritter

Authors

María Teresa Gallegos
View author publications
You can also search for this author in PubMed Google Scholar
Gunter Ritter
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gunter Ritter.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gallegos, M.T., Ritter, G. Trimming algorithms for clustering contaminated grouped data and their robustness. Adv Data Anal Classif 3, 135–167 (2009). https://doi.org/10.1007/s11634-009-0044-9

Download citation

Received: 23 May 2009
Revised: 08 August 2009
Accepted: 16 August 2009
Published: 02 September 2009
Issue Date: September 2009
DOI: https://doi.org/10.1007/s11634-009-0044-9

Keywords

Mathematics Subject Classification (2000)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Trimming algorithms for clustering contaminated grouped data and their robustness

Abstract

Article PDF

Similar content being viewed by others

Assessing trimming methodologies for clustering linear regression data

Finding Outliers in Gaussian Model-based Clustering

A semiparametric method for clustering mixed data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Trimming algorithms for clustering contaminated grouped data and their robustness

Abstract

Article PDF

Similar content being viewed by others

Assessing trimming methodologies for clustering linear regression data

Finding Outliers in Gaussian Model-based Clustering

A semiparametric method for clustering mixed data

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation