Abstract
The widely used K-means clustering deals with ball-shaped (spherical Gaussian) clusters. In this paper, we extend the K-means clustering to accommodate extended clusters in subspaces, such as line-shaped clusters, plane-shaped clusters, and ball-shaped clusters. The algorithm retains much of the K-means clustering flavors: easy to implement and fast to converge. A model selection procedure is incorporated to determine the cluster shape. As a result, our algorithm can recognize a wide range of subspace clusters studied in various literatures, and also the global ball-shaped clusters (living in all dimensions). We carry extensive experiments on both synthetic and real-world datasets, and the results demonstrate the effectiveness of our algorithm.
Chapter PDF
Similar content being viewed by others
Keywords
- Linear Discriminant Analysis
- Spectral Cluster
- Synthetic Dataset
- Normalize Mutual Information
- Subspace Cluster
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD 1999 (1999)
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD 2000 (2000)
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD 1998 (1998)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: SIGKDD 1999 (1999)
Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum squared residue co-clustering of gene expression data. In: SDM 2004 (2004)
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: SIGKDD 2004 (2004)
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretical co-clustering. In: SIGKDD 2003 (2003)
Ding, C., He, X., Simon, H.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: SDM 2005 (2005)
Ding, C., Li, T.: Adaptive dimension reduction using discriminant analysis and k-means c lustering. In: ICML 2007 (2007)
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: SIGKDD 2006 (2006)
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2004)
Goil, S., Nagesh, H., Choudhary, A.: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern Univ. (1999)
Milligan, G.W.: An algorithm for generating artificial test clusters. Psychometrika 50, 123–127 (1985)
Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. In: NIPS 2001 (2001)
Li, T., Ma, S., Ogihara, M.: Document Clustering via Adaptive Subspace Clsutering. In: SIGIR 2004 (2004)
Liu, B., Xia, Y., Yu, P.S.: Clustering through decision tree construction. In: CIKM 2000 (2000)
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: A review. In: SIGKDD Explorations 2004 (2004)
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD 2002 (2002)
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research (2003)
Tasoulis, D.K., Zeimpekis, D., Gallopoulos, E., Vrahatis, M.N.: Oriented -windows: A pca driven clustering method. In: Advances in Web Intelligence and Data Mining (2006)
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). In: CVPR 2003 (2003)
Woo, K.-G., Lee, J.-H., Kim, M.-H., Lee, Y.-J.: Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology (2004)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: SIGIR 2003 (2003)
Yang, J., Wang, W., Wang, H., Yu, P.: d-clusters: Capturing subspace correlation in a large data set. In: ICDE 2002 (2002)
Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV 2003 (2003)
Zelnik-manor, L., Perona, P.: Self-tuning spectral clustering. In: NIPS 2005 (2005)
Zhang, Q., Liu, J., Wang, W.: Incremental subspace clustering over multiple data streams. In: ICDM 2007 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, D., Ding, C., Li, T. (2009). K-Subspace Clustering. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-04174-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)