Extending mixtures of multivariate t-factor analyzers

Andrews, Jeffrey L.; McNicholas, Paul D.

doi:10.1007/s11222-010-9175-2

Extending mixtures of multivariate t-factor analyzers

Published: 10 April 2010

Volume 21, pages 361–373, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Statistics and Computing Aims and scope Submit manuscript

Extending mixtures of multivariate t-factor analyzers

Download PDF

Jeffrey L. Andrews¹ &
Paul D. McNicholas¹

483 Accesses
81 Citations
Explore all metrics

Abstract

Model-based clustering typically involves the development of a family of mixture models and the imposition of these models upon data. The best member of the family is then chosen using some criterion and the associated parameter estimates lead to predicted group memberships, or clusterings. This paper describes the extension of the mixtures of multivariate t-factor analyzers model to include constraints on the degrees of freedom, the factor loadings, and the error variance matrices. The result is a family of six mixture models, including parsimonious models. Parameter estimates for this family of models are derived using an alternating expectation-conditional maximization algorithm and convergence is determined based on Aitken’s acceleration. Model selection is carried out using the Bayesian information criterion (BIC) and the integrated completed likelihood (ICL). This novel family of mixture models is then applied to simulated and real data where clustering performance meets or exceeds that of established model-based clustering methods. The simulation studies include a comparison of the BIC and the ICL as model selection techniques for this novel family of models. Application to simulated data with larger dimensionality is also explored.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Banfield, J.D., Raftery, A.E.: Model-based Gaussian and non-Gaussian clustering. Biometrics 49(3), 803–821 (1993)
Article MathSciNet MATH Google Scholar
Biernacki, C., Celeux, G., Govaert, G.: Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. Mach. Intell. 22(7), 719–725 (2000)
Article Google Scholar
Binder, D.A.: Bayesian cluster analysis. Biometrika 65, 31–38 (1978)
Article MathSciNet MATH Google Scholar
Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., Lindsay, B.: The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family. Ann. Ins. Stat. Math. 46, 373–388 (1994)
Article MATH Google Scholar
Brent, R.: Algorithms for Minimization without Derivatives. Prentice Hall, New Jersey (1973)
MATH Google Scholar
Celeux, G., Govaert, G.: Gaussian parsimonious clustering models. Pattern Recogn. 28, 781–793 (1995)
Article Google Scholar
Dasgupta, A., Raftery, A.E.: Detecting features in spatial point processes with clutter via model-based clustering. J. Am. Stat. Assoc. 93, 294–302 (1998)
Article MATH Google Scholar
Day, N.E.: Estimating the components of a mixture of normal distributions. Biometrika 56, 463–474 (1969)
Article MathSciNet MATH Google Scholar
Dean, N., Raftery, A.E.: The clustvarsel package: R package version 0.2-4 (2006)
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Forina, M., Armanino, C., Castino, M., Ubigli, M.: Multivariate data analysis as a discriminating method of the origin of wines. Vitis 25, 189–201 (1986)
Google Scholar
Fraley, C., Raftery, A.E.: How many clusters? Which clustering methods? Answers via model-based cluster analysis. Comput. J. 41(8), 578–588 (1998)
Article MATH Google Scholar
Fraley, C., Raftery, A.E.: Model-based clustering, discriminant analysis, and density estimation. J. Am. Stat. Assoc. 97(458), 611–631 (2002)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: Enhanced software for model-based clustering, density estimation, and discriminant analysis: MCLUST. J. Classif. 20, 263–286 (2003)
Article MathSciNet MATH Google Scholar
Fraley, C., Raftery, A.E.: MCLUST: version 3 for R: normal mixture modeling and model-based clustering. Technical Report 504, Department of Statistics, University of Washington, minor revisions January 2007 and November 2007 (2006)
Frühwirth-Schnatter, S.: Finite Mixture and Markov Switching Models. Springer, New York (2006)
MATH Google Scholar
Ghahramani, Z., Hinton, G.E.: The EM algorithm for factor analyzers. Tech. Rep. CRG-TR-96-1. University of Toronto, Toronto (1997)
Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with applications in election studies. Ann. Appl. Stat. 2(4), 1452–1477 (2008)
Article MathSciNet MATH Google Scholar
Hubert, L., Arabie, P.: Comparing partitions. J. Classif. 2, 193–218 (1985)
Article Google Scholar
Hurley, C.: Clustering visualizations of multivariate data. J. Comput. Graph. Stat. 13(4), 788–806 (2004)
Article MathSciNet Google Scholar
Kass, R.E., Raftery, A.E.: Bayes factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
Article MATH Google Scholar
Keribin, C.: Consistent estimation of the order of mixture models. Sankhyā Indian J. Stat. Ser. A 62(1), 49–66 (2000)
MathSciNet MATH Google Scholar
Leroux, B.G.: Consistent estimation of a mixing distribution. Ann. Stat. 20, 1350–1360 (1992)
Article MathSciNet MATH Google Scholar
Lindsay, B.G.: Mixture models: theory, geometry and applications. In: NSF-CBMS Regional Conference Series in Probability and Statistics, vol. 5. Institute of Mathematical Statistics, Hayward (1995)
Google Scholar
Lopes, H.F., West, M.: Bayesian model assessment in factor analysis. Stat. Sinica 14, 41–67 (2004)
MathSciNet MATH Google Scholar
Lubischew, A.A.: On the use of discriminant functions in taxonomy. Biometrics 18(4), 455–477 (1962)
Article MATH Google Scholar
McLachlan, G.J.: The classification and mixture maximum likelihood approaches to cluster analysis. In: Handbook of Statistics, vol. 2, pp. 199–208. North-Holland, Amsterdam (1982)
Google Scholar
McLachlan, G.J., Basford, K.E.: Mixture Models: Inference and Applications to Clustering. Dekker, New York (1988)
MATH Google Scholar
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions, 2nd edn. Wiley, New York (2008)
Book MATH Google Scholar
McLachlan, G.J., Peel, D.: Robust cluster analysis via mixtures of multivariate t-distributions. In: Lecture Notes in Computer Science, vol. 1451, pp. 658–666. Springer, Berlin (1998)
Google Scholar
McLachlan, G.J., Peel, D.: Finite Mixture Models. Wiley, New York (2000a)
Book MATH Google Scholar
McLachlan, G.J., Peel, D.: Mixtures of factor analyzers. In: Proceedings of the Seventh International Conference on Machine Learning, pp. 599–606. Morgan Kaufmann, San Francisco (2000b)
Google Scholar
McLachlan, G.J., Bean, R.W., Jones, L.B.T.: Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution. Comput. Stat. Data Anal. 51(11), 5327–5338 (2007)
Article MATH Google Scholar
McNicholas, P.D.: Model-based classification using latent Gaussian mixture models. J. Stat. Plan. Inference 140(5), 1175–1181 (2010)
Article MathSciNet MATH Google Scholar
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Tech. Rep. 05/11, Department of Statistics, Trinity College Dublin (2005)
McNicholas, P.D., Murphy, T.B.: Parsimonious Gaussian mixture models. Stat. Comput. 18, 285–296 (2008)
Article MathSciNet Google Scholar
McNicholas, P.D., Murphy, T.B.: Model-based clustering of longitudinal data. Can. J. Stat. 38(1), 153–168 (2010)
MathSciNet MATH Google Scholar
McNicholas, P.D., Murphy, T.B., McDaid, A.F., Frost, D.: Serial and parallel implementations of model-based clustering via parsimonious Gaussian mixture models. Comput. Stat. Data Anal. 54(3), 711–723 (2010)
Article MATH Google Scholar
Meng, X.L., van Dyk: The EM algorithm—an old folk song sung to a fast new tune (with discussion). J. R. Stat. Soc. Ser. B 59, 511–567 (1997)
Article MATH Google Scholar
Meng, X.L., Rubin, D.B.: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80, 267–278 (1993)
Article MathSciNet MATH Google Scholar
R Development Core Team (2009) R: a language and environment for statistical computing: R foundation for statistical computing, Vienna, Austria. http://www.R-project.org
Raftery, A.E., Dean, N.: Variable selection for model-based clustering. J. Am. Stat. Assoc. 101(473), 168–178 (2006)
Article MathSciNet MATH Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971)
Article Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 31–38 (1978)
Article Google Scholar
Scrucca, L.: Dimension reduction for model-based clustering. Stat. Comput. (2009, in press). doi:10.1007/s11222-009-9138-7
Shoham, S.: Robust clustering by deterministic agglomeration em of mixtures of multivariate t-distributions. Pattern Recogn. 35(5), 1127–1142 (2002)
Article MATH Google Scholar
Swayne, D., Cook, D., Buja, A., Lang, D., Wickham, H., Lawrence, M.: (2006) GGobi Manual. Sourced from www.ggobi.org/docs/manual.pdf
Tipping, T.E., Bishop, C.M.: Mixtures of probabilistic principal component analysers. Neural Comput. 11(2), 443–482 (1999a)
Article Google Scholar
Tipping, T.E., Bishop, C.M.: Probabilistic principal component analysers. J. R. Stat. Soc. Ser. B 61, 611–622 (1999b)
Article MathSciNet MATH Google Scholar
Wolfe, J.H.: Object cluster analysis of social areas. Master’s thesis, University of California, Berkeley (1963)
Wolfe, J.H.: Pattern clustering by multivariate mixture analysis. Multiv. Behav. Res. 5, 329–350 (1970)
Article Google Scholar
Woodbury, M.A.: Inverting modified matrices. Statistical Research Group, Memo. Rep. No. 42, Princeton University, Princeton, New Jersey (1950)
Zhao, J., Jiang, Q.: Probabilistic PCA for t distributions. Neurocomputing 69(16–18), 2217–2226 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
Jeffrey L. Andrews & Paul D. McNicholas

Authors

Jeffrey L. Andrews
View author publications
You can also search for this author in PubMed Google Scholar
Paul D. McNicholas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul D. McNicholas.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Andrews, J.L., McNicholas, P.D. Extending mixtures of multivariate t-factor analyzers. Stat Comput 21, 361–373 (2011). https://doi.org/10.1007/s11222-010-9175-2

Download citation

Received: 02 July 2009
Accepted: 12 March 2010
Published: 10 April 2010
Issue Date: July 2011
DOI: https://doi.org/10.1007/s11222-010-9175-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Extending mixtures of multivariate t-factor analyzers

Abstract

Article PDF

Similar content being viewed by others

Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction

Factor and hybrid components for model-based clustering

Mixtures of restricted skew-t factor analyzers with common factor loadings

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Extending mixtures of multivariate t-factor analyzers

Abstract

Article PDF

Similar content being viewed by others

Cluster-weighted \(t\)-factor analyzers for robust model-based clustering and dimension reduction

Factor and hybrid components for model-based clustering

Mixtures of restricted skew-t factor analyzers with common factor loadings

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation