Abstract
Streaming data clustering is becoming the most efficient way to cluster a very large data set. In this paper we present a new approach, called G-Stream, for topological clustering of evolving data streams. G-Stream allows one to discover clusters of arbitrary shape without any assumption on the number of clusters and by making one pass over the data. The topological structure is represented by a graph wherein each node represents a set of “close” data points and neighboring nodes are connected by edges. The use of the reservoir, to hold, temporarily, the very distant data points from the current prototypes, avoids needless movements of the nearest nodes to data points and therefore, improving the quality of clustering. The performance of the proposed algorithm is evaluated on both synthetic and real-world data sets.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler, C.: Streamkm++: A clustering algorithm for data streams. ACM Journal of Experimental Algorithmics 17(1) (2012)
Aggarwal, C.C., Watson, T.J., Ctr, R., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB, pp. 81–92 (2003)
de Andrade Silva, J., Faria, E.R., Barros, R.C., Hruschka, E.R., de Carvalho, A.C.P.L.F., Gama, J.: Data stream clustering: A survey. ACM Comput. Surv. 46(1), 13 (2013)
Bache, K., Lichman, M.: UCI machine learning repository (2013), http://archive.ics.uci.edu/ml
Bouguelia, M.R., Belaïd, Y., Belaïd, A.: An adaptive incremental clustering method based on the growing neural gas algorithm. In: ICPRAM, pp. 42–49 (2013)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: SDM, pp. 328–339 (2006)
Fritzke, B.: A growing neural gas network learns topologies. In: NIPS, pp. 625–632 (1994)
Isaksson, C., Dunham, M.H., Hahsler, M.: SOStream: Self organizing density-based clustering over data stream. In: Perner, P. (ed.) MLDM 2012. LNCS (LNAI), vol. 7376, pp. 264–278. Springer, Heidelberg (2012)
Kohonen, T., Schroeder, M.R., Huang, T.S. (eds.): Self-Organizing Maps, 3rd edn. Springer-Verlag New York, Inc., Secaucus (2001)
Martinetz, T., Schulten, K.: A “Neural-Gas” Network Learns Topologies. In: Artificial Neural Networks I, pp. 397–402 (1991)
Strehl, A., Ghosh, J.: Cluster ensembles — a knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3, 583–617 (2002)
Udommanetanakit, K., Rakthanmanon, T., Waiyamai, K.: E-stream: Evolution-based technique for stream clustering. In: Alhajj, R., Gao, H., Li, X., Li, J., Zaïane, O.R. (eds.) ADMA 2007. LNCS(LNAI), vol. 4632, pp. 605–615. Springer, Heidelberg (2007)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: SIGMOD Conference, pp. 103–114 (1996)
Zhang, X., Furtlehner, C., Sebag, M.: Data streaming with affinity propagation. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 628–643. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ghesmoune, M., Azzag, H., Lebbah, M. (2014). G-Stream: Growing Neural Gas over Data Stream. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8834. Springer, Cham. https://doi.org/10.1007/978-3-319-12637-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-319-12637-1_26
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12636-4
Online ISBN: 978-3-319-12637-1
eBook Packages: Computer ScienceComputer Science (R0)