Abstract
A hierarchical clustering is a clustering method in which each point is regarded as a single cluster initially and then the clustering algorithm repeats connecting the nearest two clusters until only one cluster remains. Because the result is presented as a dendrogram, one can easily figure out the distance and the inclusion relation between clusters.
One drawback of the agglomerative hierarchical clustering is its large time complexity of O(n 2), which would make this method infeasible against large data, where n expresses the number of the points in the data.
This paper proposes a fast approximation algorithm for the single linkage clustering algorithm that is a well-known agglomerative hierarchical clustering algorithm. Our algorithm reduces its time complexity to O(nB) by finding quickly the near clusters to be connected by use of Locality-Sensitive Hashing known as a fast algorithm for the approximated nearest neighbor search. Here B expresses the maximum number of points thrown into a single hash entry and practically grows a simple constant compared to n for sufficiently large hash tables.
By experiment, we show that (1) the proposed algorithm obtains similar clustering results to the single linkage algorithm and that (2) it runs faster for large data than the single linkage algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Gehekr, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High-Dimensional Data for Data Mining Applications. In: Proc. of ACM SIGMOD Conference, pp. 94–105 (1998)
Gionis, P.: Indyk and R. Motwani. Similarity Search in High Dimensions via Hashing. In: Proc. of the 25th VLDB Conference, pp. 518–528 (1999)
Haveliwala, T.H., Gionis, A., Indyk, P.: Scalable Techniques for Clustering the Web. In: Proc. of the Third International Workshop on the Web and Databases pp. 129–134 (2000)
Hinneburg, A., Keim, D.A.: An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In: Proc. of 4th International Conferences on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Indyk, P., Motwani, R.: Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In: Proc. of 30th ACM Symposium on Theory of Computing, pp. 604–613 (1998)
Karypis, G., Han, E., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer 32(8), 68–75 (1999)
Jung, S.Y., Kim, T.: An Agglomerative Hierarchical Clustering using Partial Maximum Array and Incremental Similarity Computation Method. In: Proc. of the 2001 IEEE International Conference on Data Mining, pp. 265–272 (2001)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases. In: Proc. of the 24th VLDB Conference, pp. 428–439 (1998)
Sibson, R.: SLINK: An Optimally Efficient Algorithm for the Single Link Cluster Method. Computer Journal 16, 30–34 (1973)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Model for Very Large Databases. In: Proc. of the 1996 ACM SIGMOD International Conference on Management of Data, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Koga, H., Ishibashi, T., Watanabe, T. (2004). Fast Hierarchical Clustering Algorithm Using Locality-Sensitive Hashing. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-540-30214-8_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive