Abstract
This paper elaborates on an efficient approach for clustering discrete data by incrementally building multinomial mixture models through likelihood maximization using the Expectation-Maximization (EM) algorithm. The method adds sequentially at each step a new multinomial component to a mixture model based on a combined scheme of global and local search in order to deal with the initialization problem of the EM algorithm. In the global search phase several initial values are examined for the parameters of the multinomial component. These values are selected from an appropriately defined set of initialization candidates. Two methods are proposed here to specify the elements of this set based on the agglomerative and the kd-tree clustering algorithms. We investigate the performance of the incremental learning technique on a synthetic and a real dataset and also provide comparative results with the standard EM-based multinomial mixture model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Cheeseman, P., Stutz, J.: Bayesian classification (AutoClass): Theory and resutls. In: Fayyad, U., Piatesky-Shapiro, G., Smyth, P., Uthurusamy, R. (eds.) Advances in Knowledge Discovery and Data Mining, pp. 153–180. AAAI Press, CA (1995)
Bengio, Y., Bengio, S.: Modeling high-dimensional discrete data with multi-layer neural networks. In: Solla, S.A., Leen, T.K., Móller, K.-R. (eds.) Advances in Neural Processing Systems 12, pp. 400–406. MIT Press, Cambridge (2000)
Meil˘a, M., Hecherman, D.: An experimental comparison of model-based clustering methods. Machine Learning 42, 9–29 (2001)
Blekas, K., Fotiadis, D.I., Likas, A.: Greedy mixture learning for multiple motif discovering in biological sequences. Bioinformatics 19(5), 607–617 (2003)
Chickering, D., Heckerman, D.: Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables. Machine Learning 29, 181–212 (1997)
Render, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26(2), 195–239 (1984)
Vlassis, N., Likas, A.: A greedy EM algorithm for Gaussian mixture learning. Neural Processing Letters 15, 77–87 (2002)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Blekas, K., Likas, A. (2004). Incremental Mixture Learning for Clustering Discrete Data. In: Vouros, G.A., Panayiotopoulos, T. (eds) Methods and Applications of Artificial Intelligence. SETN 2004. Lecture Notes in Computer Science(), vol 3025. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24674-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-540-24674-9_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-21937-8
Online ISBN: 978-3-540-24674-9
eBook Packages: Springer Book Archive