K-Subspace Clustering

Wang, Dingding; Ding, Chris; Li, Tao

doi:10.1007/978-3-642-04174-7_33

Dingding Wang²²,
Chris Ding²³ &
Tao Li²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5782))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4378 Accesses
15 Citations

Abstract

The widely used K-means clustering deals with ball-shaped (spherical Gaussian) clusters. In this paper, we extend the K-means clustering to accommodate extended clusters in subspaces, such as line-shaped clusters, plane-shaped clusters, and ball-shaped clusters. The algorithm retains much of the K-means clustering flavors: easy to implement and fast to converge. A model selection procedure is incorporated to determine the cluster shape. As a result, our algorithm can recognize a wide range of subspace clusters studied in various literatures, and also the global ball-shaped clusters (living in all dimensions). We carry extensive experiments on both synthetic and real-world datasets, and the results demonstrate the effectiveness of our algorithm.

Download to read the full chapter text

Chapter PDF

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace multi-clustering: a review

Article 04 October 2017

Efficient Monte Carlo clustering in subspaces

Article 14 February 2017

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Aggarwal, C.C., Wolf, J.L., Yu, P.S., Procopiuc, C., Park, J.S.: Fast algorithms for projected clustering. In: SIGMOD 1999 (1999)
Google Scholar
Aggarwal, C.C., Yu, P.S.: Finding generalized projected clusters in high dimensional spaces. In: SIGMOD 2000 (2000)
Google Scholar
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD 1998 (1998)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Google Scholar
Cheng, C.-H., Fu, A.W., Zhang, Y.: Entropy-based subspace clustering for mining numerical data. In: SIGKDD 1999 (1999)
Google Scholar
Cho, H., Dhillon, I., Guan, Y., Sra, S.: Minimum sum squared residue co-clustering of gene expression data. In: SDM 2004 (2004)
Google Scholar
Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: SIGKDD 2004 (2004)
Google Scholar
Dhillon, I.S., Mallela, S., Modha, D.S.: Information-theoretical co-clustering. In: SIGKDD 2003 (2003)
Google Scholar
Ding, C., He, X., Simon, H.: On the equivalence of nonnegative matrix factorization and spectral clustering. In: SDM 2005 (2005)
Google Scholar
Ding, C., Li, T.: Adaptive dimension reduction using discriminant analysis and k-means c lustering. In: ICML 2007 (2007)
Google Scholar
Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: SIGKDD 2006 (2006)
Google Scholar
Friedman, J.H., Meulman, J.J.: Clustering objects on subsets of attributes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) (2004)
Google Scholar
Goil, S., Nagesh, H., Choudhary, A.: Efficient and scalable subspace clustering for very large data sets. Technical Report CPDC-TR-9906-010, Northwestern Univ. (1999)
Google Scholar
Milligan, G.W.: An algorithm for generating artificial test clusters. Psychometrika 50, 123–127 (1985)
Article Google Scholar
Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. In: NIPS 2001 (2001)
Google Scholar
Li, T., Ma, S., Ogihara, M.: Document Clustering via Adaptive Subspace Clsutering. In: SIGIR 2004 (2004)
Google Scholar
Liu, B., Xia, Y., Yu, P.S.: Clustering through decision tree construction. In: CIKM 2000 (2000)
Google Scholar
McCallum, A.K.: Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering (1996), http://www.cs.cmu.edu/~mccallum/bow
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data: A review. In: SIGKDD Explorations 2004 (2004)
Google Scholar
Procopiuc, C.M., Jones, M., Agarwal, P.K., Murali, T.M.: A monte carlo algorithm for fast projective clustering. In: SIGMOD 2002 (2002)
Google Scholar
Strehl, A., Ghosh, J.: Cluster ensembles - a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research (2003)
Google Scholar
Tasoulis, D.K., Zeimpekis, D., Gallopoulos, E., Vrahatis, M.N.: Oriented -windows: A pca driven clustering method. In: Advances in Web Intelligence and Data Mining (2006)
Google Scholar
Vidal, R., Ma, Y., Sastry, S.: Generalized principal component analysis (GPCA). In: CVPR 2003 (2003)
Google Scholar
Woo, K.-G., Lee, J.-H., Kim, M.-H., Lee, Y.-J.: Findit: a fast and intelligent subspace clustering algorithm using dimension voting. Information and Software Technology (2004)
Google Scholar
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: SIGIR 2003 (2003)
Google Scholar
Yang, J., Wang, W., Wang, H., Yu, P.: d-clusters: Capturing subspace correlation in a large data set. In: ICDE 2002 (2002)
Google Scholar
Yu, S.X., Shi, J.: Multiclass spectral clustering. In: ICCV 2003 (2003)
Google Scholar
Zelnik-manor, L., Perona, P.: Self-tuning spectral clustering. In: NIPS 2005 (2005)
Google Scholar
Zhang, Q., Liu, J., Wang, W.: Incremental subspace clustering over multiple data streams. In: ICDM 2007 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Florida International Univ., Miami, FL, 33199, USA
Dingding Wang & Tao Li
CSE Department, University of Texas, Arlington, Arlington, TX, 76019, USA
Chris Ding

Authors

Dingding Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chris Ding
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
The Centre for Computational Statistics and Machine Learning Department of Computer Science, University College London, Gower St.,, WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Ding, C., Li, T. (2009). K-Subspace Clustering. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5782. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04174-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-04174-7_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04173-0
Online ISBN: 978-3-642-04174-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

K-Subspace Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace multi-clustering: a review

Efficient Monte Carlo clustering in subspaces

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

K-Subspace Clustering

Abstract

Chapter PDF

Similar content being viewed by others

Efficient Density-Based Subspace Clustering in High Dimensions

Subspace multi-clustering: a review

Efficient Monte Carlo clustering in subspaces

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation