Abstract
Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. Recently, a number of methods have been proposed and demonstrated good performance based on matrix approximation. Despite significant research on these methods, few attempts have been made to establish the connections between them while highlighting their differences. In this paper, we present a unified view of these methods within a general clustering framework where the problem of clustering is formulated as matrix approximations and the clustering objective is minimizing the approximation error between the original data matrix and the reconstructed matrix based on the cluster structures. The general framework provides an elegant base to compare and understand various clustering methods. We provide characterizations of different clustering methods within the general framework including traditional one-side clustering, subspace clustering and two-side clustering. We also establish the connections between our general clustering framework with existing frameworks.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Aggarwal CC, Wolf JL, Yu PS, Procopiuc C, Park JS (1999) Fast algorithms for projected clustering. In: Proceedings of the 1999 ACM SIGMOD international conference on Management of data (SIGMOD’99). ACM Press, pp 61–72
Ando RK, Lee L (2001) Iterative residual rescaling: an analysis and generalization of LSI. In: Proceedings of the 24th SIGIR, pp 154–162
Baier D, Gaul W, Schader M (1997) Two-mode overlapping clustering with applications to simultaneous benefit segmentation and market structuring. In: Klar R, Opitz O (eds) Classification and knowledge organization. Springer, Heidelberg, pp 577–566
Castillo W, Trejos J (2002) Two-mode partitioning: Review of methods and application and tabu search. In: Jajuga K, Sokolowski A, Bock H-H (eds) Classification, clustering and data analysis. Springer, Heidelberg, pp 43–51
Cho H, Dhillon IS, Guan Y, Sra S (2004) Minimum sum-squared residue co-clustering of gene experssion data. In: Proceedings of the SIAM data mining conference
Dhillon IS, Mallela S, Modha SS (2003) Information-theoretic co-clustering. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (SIGKDD 2003). ACM Press, pp 89–98
Dhillon IS and Modha DS (2001). Concept decompositions for large sparse text data using clustering. Mach Learn 42(1/2): 143–175
Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 126–135
Eckes T and Orlik P (1993). An error variance approach to two-mode hierarchical clustering. J Classif 10: 52–74
Gaul W, Schader M (1996) A new algorithm for two-mode clustering. In: Bock H-H, Polasek W (eds). Data analysis and information systems. Springer, Heidelberg, pp 15–23
Golub GH, Van Loan CF (1996) Matrix computations. The Johns Hopkins University Press
Govaert G (1995). Simultaneous clustering of rows and columns. Control Cybernet 24(4): 437–458
Hartigan JA (1975). Clustering algorithms. Wiley, New York
Jain AK and Dubes RC (1988). Algorithms for clustering data. Prentice Hall, Englewood Cliffs
Kleinberg JM (1999). Authoritative sources in a hyperlinked environment. J ACM 46(5): 604–632
Lee DD, Sebastian Seung H (2000) Algorithms for non-negative matrix factorization. In: NIPS, pp 556–562
Li T (2005) A general model for clustering binary data. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp 188–197
Li T, Ma S (2004) IFD: iterative feature and data clustering. In: Proceedings of the 2004 SIAM international conference on data mining (SDM 2004). SIAM
Li T, Ma S, Ogihara M (2004) Document clustering via adaptive subspace iteration. In: Proceedings of twenty-seventh annual international ACM SIGIR conference on research and development in information retrieval (SIGIR 2004), pp 218–225
Long B, Zhang Z, Yu PS (2005) Co-clustering by block value decomposition. In: KDD ’05: Proceeding of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining, pp 635–640
Maurizio V (2001) Double k-means clustering for simultaneous classification of objects and variables. In: Borra S, Rocci R, Vichi M, Schader M (eds) Advances in classification and data analysis. Springer, Heidelberg, pp 43–52
Sha F, Saul LK, Lee DD (2002) Multiplicative updates for nonegative quadratic programming in support vector machines. In: Advances in neural information processing systems, pp 1065–1072
Slonim N, Tishby N (2000) Document clustering using word clusters via the information bottleneck method. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (SIGIR’00). ACM Press, pp 208–215
Soete GD, douglas Carroll J (1994) K-means clustering in a low-dimensional euclidean space. In: New approaches in classification and data analysis. Springer, Heidelberg, pp 212–219
Xu W, Gong Y (2004) Document clustering by concept factorization. In: SIGIR ’04: Proceedings of the 27th annual international conference on Research and development in information retrieval. ACM Press, pp 202–209
Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval(SIGIR’03). ACM Press, pp 267–273
Zha H, He X, Ding C, Simon H (2001) Spectral relaxation for k-means clustering. In: Proceedings of neural information processing systems
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, T. Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17, 1–15 (2008). https://doi.org/10.1007/s10115-007-0116-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-007-0116-0