Abstract
Model-based approaches to cluster analysis and mixture modelling often involve maximizing classification and mixture likelihoods. Robust clustering and mixture modelling procedures, that can resist certain amount of contaminating data, can be introduced by considering trimmed versions of those classification and mixture likelihoods. Without appropriate constrains on the scatter matrices of the components, these trimmed likelihood maximizations result in ill-posed problems. Moreover, non-interesting or “spurious” clusters are often detected by unconstrained algorithms aimed at maximizing these trimmed likelihood criteria.
A useful approach to avoid spurious solutions is to restrict relative components scatter by prespecified tuning constants. Recently new methodologies for constrained parsimonious model-based clustering have been introduced which include, in the untrimmed case, the 14 parsimonious models that are often applied in model-based clustering when assuming normal components as limit cases. In this paper we extend this approach to cope with the presence of atypical observations and discuss two viable strategies for automatically estimating the restriction parameters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49, 803–821 (1993)
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28, 781–793 (1995)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: A general trimming approach to robust cluster analysis. Ann. Stat. 36, 1324–1345 (2008)
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Exploring the number of groups in robust model-based clustering. Stat. Comput. 21, 585–599 (2011)
García-Escudero, L.A., Gordaliza, A., Mayo-Iscar, A.: A constrained robust proposal for mixture modeling avoiding spurious solutions. Adv. Data Anal. Classif. 8(1), 27–43 (2013). https://doi.org/10.1007/s11634-013-0153-3
García-Escudero, L.A., Gordaliza, A., Matrán, C., Mayo-Iscar, A.: Avoiding spurious local maximizers in mixture modeling. Stat. Comput. 25(3), 619–633 (2014). https://doi.org/10.1007/s11222-014-9455-3
García-Escudero, L.A., Mayo-Iscar, A., Riani, M.: Constrained parsimonious model-based clustering. Stat. Comput. 32(1), 1–15 (2021). https://doi.org/10.1007/s11222-021-10061-3
Hathaway, R.: A constrained formulation of maximum likelihood estimation for normal mixture distributions. Ann. Stat. 13, 795–800 (1985)
Ingrassia, S., Rocci, R.: Constrained monotone EM algorithms for finite mixture of multivariate Gaussians. Comput. Stat. Data Anal. 51, 5339–5351 (2007)
Maitra, R., Melnykov, V.: Simulating data to study performance of finite mixture modeling and clustering algorithms. J. Comput. Graph. Stat. 19, 354–376 (2010)
Riani, M., Cerioli, A., Perrotta, D., Torti, F.: Simulating mixtures of multivariate data with fixed cluster overlap in FSDA library. Adv. Data Anal. Classif. 9(4), 461–481 (2015). https://doi.org/10.1007/s11634-015-0223-9
Riani, M., Perrotta, D., Torti, F.: FSDA: a MATLAB toolbox for robust analysis and interactive data exploration. Chemom. Intell. Lab. Syst. 116, 17–32 (2012)
Acknowledgements
The work benefits from the High Performance Computing (HPC) facility of the University of Parma. We also acknowledge financial support from the “Statistics for fraud detection, with applications to trade data and financial statements” project of the University of Parma. All the calculations in this paper have used the Flexible Statistics and Data Analysis (FSDA) MATLAB toolbox, which is freely downloadable from GitHub at the web address https://github.com/UniprJRC/FSDA.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
García-Escudero, L.A., Mayo-Iscar, A., Morelli, G., Riani, M. (2023). Advances in Robust Constrained Model Based Clustering. In: García-Escudero, L.A., et al. Building Bridges between Soft and Statistical Methodologies for Data Science . SMPS 2022. Advances in Intelligent Systems and Computing, vol 1433. Springer, Cham. https://doi.org/10.1007/978-3-031-15509-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-031-15509-3_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15508-6
Online ISBN: 978-3-031-15509-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)