Abstract
In this paper, we present a new approach for updating clusters incrementally. The proposed incremental approach preserves comprehensive statistical information of the clusters in form of Gaussian Mixture Models (GMM). As each GMM needs the number of Gaussian (component) as an input parameter, we proposed a method to determine the number of components automatically with introducing the concept of core points. In the updating phase, instead of processing each new sample individually, we collect the new incoming samples and cluster them. By employing the concepts of core points and GMMs, we build a number of GMMs for the new samples and we label the new GMMs based on their similarity to the already existing GMMs. To find the similarity among GMMs, we introduce a new modified version of Kullback-Leibler as a distance function. For merging the current GMMs and the new GMMs, we proposed a new merging mechanism in which the closest components in both GMMs are merged to create a new GMM. Since GMM structure is a compact representation of clusters, there is no increase in the time neither in clustering side nor in updating phase. We measured the accuracy of clusters based on different clustering validity metrics (DB, Dunn, SD and purity) and the results show that our algorithm outperforms other incremental clustering algorithms in terms of quality of the final clusters.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Mohammadi, M., Akbari, A., Raahemi, B., Nasersharif, B., Asgharian, H.: A fast anomaly detection system using probabilistic artificial immune algorithm capable of learning new attacks. Evolutionary Intelligence 6(3), 135–156 (2014)
Bigdeli, E., Mohammadi, M., Raahemi, B., Matwin, S.: Arbitrary shape cluster summarization with Gaussian Mixture Model. In: KDIR, Roma (2014)
O’Callaghan, L., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-Data Algorithms for High-Quality Clustering. IEEE Conference on Data Engineering (2001)
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003 (2003)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1, 141–182 (1997)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SIAM Conference on Data Mining (2006)
Chen, Y., Tu, L.: Density-based clustering for real time stream data. In: ACM SIGKDD (2007)
Hajji, H.: Statistical Analysis of Network Traffic forAdaptive Faults Detection. Transactions on Neural Networks, 16(5) (2005)
Song, M., Wang, H.: Highly efficient incremental estimation of Gaussian Mixture Models for online data stream clustering. In: SPIE Conference on Intelligent Computing: Theory And Applications III, Orlando (2005)
Declercq, A., Piater, J.H.: Online learning of Gaussian Mixture Models: a two-level approach. In: Third International Conference on Computer Vision Theory and Applications (2008)
Hennig, C.: Methods for merging Gaussian mixture components. Advances in Data Analysis and Classification 4(1), 3–34 (2010)
Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics, 79–86 (1951)
Dunn, K., Dunn, J.: Well separated clusters and optimal fuzzy partitions. Cybernetics 4, 95–104 (1997)
Davies, L.D., Bouldin, W.D.: A cluster separation measure. IEEE Trans. Pattern Anal. Machine Intell 1(4), 224–227 (1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Bigdeli, E., Mohammadi, M., Raahemi, B., Matwin, S. (2015). Incremental Cluster Updating Using Gaussian Mixture Model. In: Barbosa, D., Milios, E. (eds) Advances in Artificial Intelligence. Canadian AI 2015. Lecture Notes in Computer Science(), vol 9091. Springer, Cham. https://doi.org/10.1007/978-3-319-18356-5_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-18356-5_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18355-8
Online ISBN: 978-3-319-18356-5
eBook Packages: Computer ScienceComputer Science (R0)