Abstract
While spectral clustering has recently shown great promise, computational cost makes it infeasible for use with large data sets. To address this computational challenge, this paper considers the problem of approximate spectral clustering, which enables both the feasibility (of approximately clustering in very large and unloadable data sets) and acceleration (of clustering in loadable data sets), while maintaining acceptable accuracy. We examine and propose several schemes for approximate spectral grouping, and make an empirical comparison of those schemes in combination with several sampling strategies. Experimental results on several synthetic and real-world data sets show that approximate spectral clustering can achieve both the goals of feasibility and acceleration.
This work was supported by ARC Discovery Project DP0663196.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Trans. Neural Networks 16(3), 645–678 (2005)
Luxburg, U.: A tutorial on spectral clustering. Technical report, Max Planck Institute for Biological Cybernetics, Germany (2006)
Fowlkes, C., Belongie, S., Chung, F., Malik, J.: Spectral grouping using the Nystr\(\ddot{\mathrm{o}}\)m method. IEEE Trans. Pattern Analysis and Machine Intelligence 26(2), 214–225 (2004)
Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.: Incremental spectral clustering with application to monitoring of evolving blog communities. In: SIAM Conference on Data Mining (2007)
Miao, G., Song, Y., Zhang, D., Bai, H.: Parallel spectral clustering algorithm for large-scale community data mining. In: International Conference on WWW (2008)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
Ding, C., He, X., Zha, H., Gu, M., Simon, H.: A min-max cut algorithm for graph partitioning and data clustering. In: International Conference on Data Mining, pp. 107–114 (2001)
Chung, F.: Spectral Graph Theory. American Mathematical Society (1997)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems (2001)
Williams, C., Seeger, M.: Using the Nystr\(\ddot{\mathrm{o}}\)m method to speed up kernel machines. In: Advances in Neural Information Processing Systems, pp. 682–688 (2000)
Zhang, K., Tsang, I.W., Kwok, J.T.: Improved Nystr\(\ddot{\mathrm{o}}\)m low-rank approximation and error analysis. In: International Conference on Machine Learning (2008)
Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via volume sampling. In: Symposium on Discrete Algorithms (2006)
Drineas, P., Kannan, R., Mahoney, M.: Fast Monte Carlo algorithms for matrices II: computing a low-rank approximation to a matrix. SIAM Journal on Computing 36(1), 158–183 (2006)
Talwalkar, A., Kumar, S., Rowley, H.: Large-scale manifold learning. In: International Conference on Computer Vision and Pattern Recognition (2008)
Pavan, M., Pelillo, M.: Efficient out-of-sample extension of dominant-set clusters. In: Advances in Neural Information Processing Systems (2004)
Bezdek, J., Hathaway, R., Huband, J., Leckie, C., Kotagiri, R.: Approximate clustering in very large relational data. International Journal of Intelligent Systems 21(8), 817–841 (2006)
He, X., Niyogi, P.: Locality preserving projections. In: Advances in Neural Information Processing Systems (2003)
Wang, L., Bezdek, J.C., Leckie, C., Kotagiri, R.: Selective sampling for approximate clustering of very large data sets. International Journal of Intelligence Systems 23(3), 313–331 (2008)
Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Trans. Knowledge and Data Engineering 17(2), 1637–1642 (2005)
Lovasz, L., Plummer, M.: Matching Theory. Akademiai Kiado. North Holland, Budapest (1986)
Georghiades, A., Belhumeur, P., Kriegman, D.: From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Analysis and Machine Intelligence 23(6), 643–660 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Leckie, C., Ramamohanarao, K., Bezdek, J. (2009). Approximate Spectral Clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, TB. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), vol 5476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01307-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-01307-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01306-5
Online ISBN: 978-3-642-01307-2
eBook Packages: Computer ScienceComputer Science (R0)