Abstract
One possible approach to cluster analysis is the mixture maximum likelihood method, in which the data to be clustered are assumed to come from a finite mixture of populations. The method has been well developed, and much used, for the case of multivariate normal populations. Practical applications, however, often involve mixtures of categorical and continuous variables. Everitt (1988) and Everitt and Merette (1990) recently extended the normal model to deal with such data by incorporating the use of thresholds for the categorical variables. The computations involved in this model are so extensive, however, that it is only feasible for data containing very few categorical variables. In the present paper we consider an alternative model, known as the homogeneous Conditional Gaussian model in graphical modelling and as the location model in discriminant analysis. We extend this model to the finite mixture situation, obtain maximum likelihood estimates for the population parameters, and show that computation is feasible for an arbitrary number of variables. Some data sets are clustered by this method, and a small simulation study demonstrates characteristics of its performance.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Ashford, J. R. and Sowden, R. R. (1970) Multi-variate probit analysis. Biometrics, 26, 535–46.
Cormack, R. M. (1971) A review of classification (with discussion). Journal of the Royal Statistical Society, Series A, 134, 321–67.
Cox, D. R. and Wermuth, N. (1992) Response models for mixed binary and quantitative variables. Biometrika, 79, 441–61.
Day, N. E. (1969) Estimating the components of a mixture of normal distributions. Biometrika, 56, 463–74.
Demers, S., Kim, J., Legendre, P. and Legendre, L. (1992) Analyzing multivariate flow cytometric data in aquatic sciences. Cytometry, 13, 291–8.
Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39, 1–38.
Edwards, D. (1990) Hierarchical interaction models (with discussion). Journal of the Royal Statistical Society, Series B, 52, 3–20.
Everitt, B. S. (1988) A finite mixture model for the clustering of mixed mode data. Statistics and Probability Letters, 6, 305–9.
Everitt, B. S. (1993) Cluster Analysis, 3rd Edn. Edward Arnold, London.
Everitt, B. S. and Merette, C. (1990) The clustering of mixed-mode data: a comparison of possible approaches. Journal of Applied Statistics, 17, 283–97.
Gordon, A. D. (1981) Classification. Chapman and Hall, London.
Krzanowski, W. J. (1975) Discrimination and classification using both binary and continuous variables. Journal of the American Statistical Association, 70, 782–90.
Krzanowski, W. J. (1983) Distance between populations using mixed continuous and categorical variables. Biometrika, 70, 235–43.
Krzanowski, W. J. (1993) The location model for mixtures of categorical and continuous variables. Journal of Classification, 10, 25–49.
McLachlan, G. J. (1982) The classification and mixture maximum likelihood approaches to cluster analysis. In P. R. Krishnaiah and L. N. Kanal (eds.), Handbook of Statistics, Vol. 2, pp. 199–208. North-Holland, Amsterdam.
McLachlan, G. J. (1992) Discriminant Analysis and Statistical Pattern Recognition. Wiley, New York.
McLachlan, G. J. and Basford, K. E. (1988) Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.
Olkin, I. and Tate, R. F. (1961) Multivariate correlation models with mixed discrete and continuous variables. Annals of Mathematical Statistics, 32, 448–65 (correction 39, 1358–9).
Whittaker, J. (1990) Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
Wolfe, J. H. (1970) Pattern clustering by multivariate mixture analysis. Multivariate Behavioral Research, 5, 329–50.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lawrence, C.J., Krzanowski, W.J. Mixture separation for mixed-mode data. Stat Comput 6, 85–92 (1996). https://doi.org/10.1007/BF00161577
Issue Date:
DOI: https://doi.org/10.1007/BF00161577