Abstract
Cluster ensembles aim to generate a stable and robust consensus clustering by combining multiple different clustering results of a dataset. Multiple clusterings can be represented either by multiple co-association pairwise relations or cluster based features. Traditional clustering ensemble algorithms learn the consensus clustering using either of the two representations, but not both. In this paper, we propose to integrate the two representations in a unified framework by means of weighted graph regularized nonnegative matrix factorization. Such integration makes the two representations complementary to each other and thus outperforms both of them in clustering accuracy and stability. Extensive experimental results on a number of datasets further demonstrate this.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research 3, 583–617 (2002)
Li, T., Ding, C.: Weighted consensus clustering. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 798–809 (2008)
Wang, H., Shan, H., Banerjee, A.: Bayesian cluster ensembles. In: Proceedings of the 9th SIAM International Conference on Data Mining, pp. 211–222 (2009)
Wang, F., Wang, X., Li, T.: Generalized cluster aggregation. In: Proceedings of the 21st International Jont Conference on Artifical Intelligence, pp. 1279–1284 (2009)
Topchy, A., Jain, A., Punch, W.: Clustering ensembles: Models of consensus and weak partitions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1866–1881 (2005)
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Transactions on Knowledge Discovery from Data 1(1), 4 (2007)
Li, T., Ding, C., Jordan, M.: Solving consensus and semi-supervised clustering problems using nonnegative matrix factorization. In: Proceedings of the 7th IEEE International Conference on Data Mining, pp. 577–582 (2007)
Wang, F., Ding, C., Li, T.: Integrated kl (k-means-laplacian) clustering: A new clustering approach by combining attribute data and pairwise relations. In: Proceedings of the 9th SIAM International Conference on Data Mining, pp. 38–48 (2009)
Al-Razgan, M., Domeniconi, C.: Weighted clustering ensembles. In: Proceedings of 6th SIAM International Conference on Data Mining, pp. 258–269 (2006)
Hadjitodorov, S., Kuncheva, L., Todorova, L.: Moderate diversity for better cluster ensembles. Information Fusion 7(3), 264–275 (2006)
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing 20(1), 359 (1999)
Fern, X., Brodley, C.: Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of the 21th International Conference on Machine Learning, pp. 281–288 (2004)
Topchy, A., Jain, A., Punch, W.: A mixture model for clustering ensembles. In: Proceedings of 4th SIAM International Conference on Data Mining, pp. 379–390 (2004)
Blei, D., Ng, A., Jordan, M.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Lee, D., Seung, H.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized non-negative matrix factorization for data representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (to appear, 2011)
Bertsekas, D.: Nonlinear programming. Athena Scientific, Belmont (1999)
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007)
Lovász, L., Plummer, M.: Matching theory (1986)
Fern, X., Brodley, C.: Random projection for high dimensional data clustering: A cluster ensemble approach. In: Proceedings of the 20th International Conference on Machine Learning, pp. 186–193 (2003)
Munkres, J.: Algorithms for the assignment and transportation problems. Journal of the Society for Industrial and Applied Mathematics, 32–38 (1957)
Monti, S., Tamayo, P., Mesirov, J., Golub, T.: Consensus clustering: a resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning 52(1), 91–118 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Du, L., Li, X., Shen, YD. (2011). Cluster Ensembles via Weighted Graph Regularized Nonnegative Matrix Factorization. In: Tang, J., King, I., Chen, L., Wang, J. (eds) Advanced Data Mining and Applications. ADMA 2011. Lecture Notes in Computer Science(), vol 7120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25853-4_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-25853-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25852-7
Online ISBN: 978-3-642-25853-4
eBook Packages: Computer ScienceComputer Science (R0)