Abstract
We present new algorithms for scalable clustering using graphics processors. Our basic approach is based on k-means. By changing the order of determining object labels, and exploiting the high computational power and pipeline of graphics processing units (GPUs) for distance computing and comparison, we speed up the k-means algorithm substantially. We introduce two strategies for retrieving data from the GPU, taking into account the low bandwidth from the GPU back to the main memory. We also extend our GPU-based approach to data stream clustering. We implement our algorithms in a PC with a Pentium IV 3.4G CPU and a NVIDIA GeForce 6800 GT graphics card. Our comprehensive performance study shows that the common GPU in desktop computers could be an efficient co-processor of CPU in traditional and data stream clustering.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proc. of VLDB (2003)
Babcock, B., Datar, M., Motwani, R., O’Callaghan, L.: Maintaining variance and k-medians over data stream windows. In: Proc. of PODS (2003)
Baciu, G., Wong, S., Sun, H.: Recode: An image-based collision detection algorithm. Visualization and Computer Animation 10(4), 181–192 (1999)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm fordiscovering clusters in large spatial databases with noise. In: Proc. of KDD (1996)
Govindaraju, N.K., Lloyd, B., Wang, W., Lin, M., et al.: Fast computation of database operations using graphics processors. In: Proc. of SIGMOD (2004)
Govindaraju, N.K., Raghuvanshi, N., Manocha, D.: Fast and approximate stream mining of quantiles and frequencies using graphics processors. In: Proc. Of SIGMOD (2005)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams:theory and practice. In: IEEE TKDE, pp. 515–528 (2003)
Guha, S., Rastogi, R., Shim, K.: Cure: An efficient clustering algorithm for large databases. In: Proc. of SIGMOD, pp. 73–84 (1998)
Hall, J.D., Hart, J.C.: Gpu acceleration of iterative clustering. In: Proc. Of SIGGRAPH poster (2004)
Hoff III, K.E., Keyser, J., Lin, M., Manocha, D., Culver, T.: Fast computation of generalized voronoi diagrams using graphics hardware. In: Proc. of SIGGRAPH, pp. 277–286 (1999)
Jain, A., Dubes, R.: Algorithms for clustering data. New Jersey (1998)
Larsen, E.S., McAllister, D.K.: Fast matrix multiplies using graphics hardware. In: Proc. of IEEE Supercomputing (2001)
Sun, C., Agrawal, D., Abbadi, A.E.: Hardware acceleration for spatial selections and joins. In: Proc. of SIGMOD, pp. 455–466 (2003)
Venkatasubramanian, S.: The graphics card as a stream computer. In: SIGMOD Workshop on Management and Processing of Data Streams (2003)
Thompson, C.J., Hahn, S., Oskin, M.: Using modern graphics architectures for general-purpose computing: A framework and analysis. In: Proc. of IEEE/ACM International Symposium on Microarchitectures, pp. 306–317 (2002)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: Proc. of SIGMOD, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cao, F., Tung, A.K.H., Zhou, A. (2006). Scalable Clustering Using Graphics Processors. In: Yu, J.X., Kitsuregawa, M., Leong, H.V. (eds) Advances in Web-Age Information Management. WAIM 2006. Lecture Notes in Computer Science, vol 4016. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11775300_32
Download citation
DOI: https://doi.org/10.1007/11775300_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35225-9
Online ISBN: 978-3-540-35226-6
eBook Packages: Computer ScienceComputer Science (R0)