Abstract
In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.
Résumé
Nous proposons un critère d'entropie pour évaluer le nombre de classes d'une partition en nous fondant sur un modèle de mélange de lois de probabilité. Ce critère se déduit d'une relation liant la vraisemblance et la vraisemblance classifiante d'un mélange. Des simulations de Monte Carlo illustrent ses qualités par rapport à des critères plus classiques.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
AITKIN, M., and RUBIN, D. B. (1985), “Estimation and Hypothesis Testing in Finite Mixture Models,”Journal of the Royal Statistical Society, Series B, 47, 67–75.
AITKIN, M., and TUNNICLIFFE WILSON, G. (1980), “Mixture Models, Outliers and the EM Algorithm,”Technometrics, 22, 325–332.
AKAIKE, H. (1974), “A New Look at the Statistical Identification Model,”IEEE Transactions on Automatic Control, 19, 716–723.
BANFIELD, J. D., and RAFTERY, A. E. (1993), “Model-Based Gaussian and non Gaussian Clustering,”Biometrics, 49, 803–821.
BEZDEK, J. C. (1981),Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum.
BOCK, H. H. (1985), “On Tests Concerning the Existence of a Classification,”Journal of Classification, 2, 77–108.
BOCK, H. H. (1989), “Probabilistic Aspects in Cluster Analysis,” inConceptual and Numerical Analysis of Data, Ed., O. Opitz, Springer-Verlag, Heidelberg, pp. 12–44.
BOZDOGAN, H. (1990), “On the Information-Based Measure of Covariance Complexity and its Application to the Evaluation of Multivariate Linear Models,”Communications in Statistics, Theory and Methods, 19, 221–278.
BOZDOGAN, H. (1993), “Choosing the Number of Component Clusters in the Mixture-Model Using a New Informational Complexity Criterion of the Inverse-Fisher Information Matrix,” inInformation and Classification, Eds., O. Optiz, B. Lausen, and R. Klar, Heidelberg: Springer-Verlag, pp. 40–54.
BOZDOGAN, H., and SCLOVE, S. L. (1984), “Multi-Sample Cluster Analysis using Akaike 's Information Criterion,”Annals of Institute of Statistical Mathematics, 36, 163–180.
BRYANT, P. G. (1991), “Large-Sample Results for Optimization Based Clustering Methods,”Journal of Classification, 8, 31–44.
BRYANT, P. G. (1993), “On Detecting the Numbers of Clusters Using the MDL Principle,” Unpublished Manuscript.
BRYANT, P. G., and WILLIAMSON, J. A. (1978), “Asymptotic Behavior of Classification Maximum Likelihood Estimates,”Biometrika, 65, 273–281.
BRYANT, P. G., and WILLIAMSON, J. A. (1986), “Maximum Likelihood and Classification: a Comparison of Three Approaches,” inClassification as a tool of research, Eds., W. Gaul and M. Schader, North-Holland, pp. 33–45.
CELEUX, G. (1986), “Validity Tests in Cluster Analysis Using a Probabilistic Teacher Algorithm,”COMPSTAT 90, Eds., F. de Antoni, N. Lauro and A. Rizzi, Heidelberg: Springer-Verlag, pp. 163–169.
CELEUX, G. and GOVAERT, G. (1991), “Clustering Criteria for Discrete Data and Latent Class Models,”Journal of Classification, 8, 157–176.
CELEUX, G. and GOVAERT, G. (1993), “Comparison of the Mixture and the Classification Maximum Likelihood in Cluster Analysis,”Journal of Statistical Computation and Simulation, 47, 127–146.
CUTLER, A., and WINDHAM, M. P. (1993), “Information-Based Validity Functionals for Mixture Analysis,”Proceedings of the first US-Japan Conference on the Frontiers of Statistical Modeling, Ed., H. Bozdogan, Amsterdam: Kluwer, pp. 149–170.
DEMPSTER, A. P., LAIRD, N. M., and RUBIN, D. B. (1977), “Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion),”Journal of the Royal Statistical Society, Series B, 39, 1–38.
GANESALINGAM, S. (1989), “Clasification and Mixture Approaches to Clustering via Maximum Likelihood,”Applied Statistics, 38, 455–466.
HATHAWAY, R. J. (1986), “Another Interpretation of the EM Algorithm for Mixture Distributions,”Statistics and Probability Letters, 4, 53–56.
KOEHLER, A. B., and MURPHREE, E. H. (1988), “A comparison of the Akaike and Schwarz Criteria for Selecting Model Order,”Applied Statistics, 37, 187–195.
MACQUEEN, J. (1967), “Some Methods for Classification and Analysis of Multivariate Observations,”Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, Eds., L. M. Le Cam and J. Neyman, Berkeley: University of California Press, Vol. 1 pp. 281–297.
MCLACHLAN, G. J. (1987), “On Bootstraping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture,”Applied Statistics, 36, 318–324.
MCLACHLAN, G. J., and BASFORD, K. E. (1988),Mixture Models, Inference and Applications to Clustering, New York. Marcel Dekker.
MARRIOTT, F. H. C. (1975), “Separating Mixtures of Normal Distributions,”Biometrics, 31, 767–769.
RISSANEN, J. (1989),Stochastic Complexity in Statistical Inquiry, Teaneck, New Jersey: World Scientific.
SCHWARZ, G. (1978), “Estimating the Dimension of a Model,”Annals of Statistics, 6, 461–464.
SOROMENHO, G. (1994), “Comparing Approaches for Testing the Number of Components in a Finite Mixture Model,”Computational Statistics, 9, 65–78.
TITTERINGTON, D. M., SMITH, A. F., and MAKOV, U. E. (1985),Statistical Analysis of Finite Mixture Distributions, New York: Wiley.
WINDHAM, M. P., and CUTLER, A. (1992), “Information Ratios for Validating Cluster Analyses,”Journal of the American Statistical Association, 87, 1188–1192.
WOLFE, J. H. (1970), “Pattern Clustering by Multivariate Mixture Analysis,”Multivariate Behavioral Research, 5, 329–350.
WOLFE, J. H. (1971), “A Monte Carlo Study of the Sampling Distribution of the Likelihood Ratio for Mixtures of Multinormal Distributions,” US Naval Personnel Research Activity.Technical Bulletin STB 72-2, San Diego, California.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Celeux, G., Soromenho, G. An entropy criterion for assessing the number of clusters in a mixture model. Journal of Classification 13, 195–212 (1996). https://doi.org/10.1007/BF01246098
Issue Date:
DOI: https://doi.org/10.1007/BF01246098