The Performance of Objective Functions for Clustering Categorical Data

Xiang, Zhengrong; Islam, Md Zahidul

doi:10.1007/978-3-319-13332-4_2

Zhengrong Xiang²¹ &
Md Zahidul Islam²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8863))

Included in the following conference series:

Pacific Rim Knowledge Acquisition Workshop

764 Accesses

Abstract

Partitioning methods, such as k-means, are popular and useful for clustering. Recently we proposed a new partitioning method for clustering categorical data: using the transfer algorithm to optimize an objective function called within-cluster dispersion. Preliminary experimental results showed that this method outperforms a standard method called k-modes, in terms of the average quality of clustering results. In this paper, we make more advanced efforts to compare the performance of objective functions for categorical data. First we analytically compare the quality of three objective functions: k-medoids, k-modes and within-cluster dispersion. Secondly we measure how well these objectives find true structures in real data sets, by finding their global optima, which we argue is a better measurement than average clustering results. The conclusion is that within-cluster dispersion is generally a better objective for discovering cluster structures. Moreover, we evaluate the performance of various distance measures on within-cluster dispersion, and give some useful observations.

The second author would like to thank the Faculty of Business Compact Fund R4 P55 in Charles Sturt University, Australia.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A method for k-means-like clustering of categorical data

Article 06 September 2019

The Use of Transfer Algorithm for Clustering Categorical Data

A Quality Metric for K-Means Clustering Based on Centroid Locations

Keywords

References

Kaufman, L., Rousseeuw, P.J.: Finding groups in data: an introduction to cluster analysis. John Wiley & Sons (2009)
Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis, 5th edn. John Wiley & Sons (2011)
Google Scholar
Steinley, D.: K-means clustering: a half - century synthesis. British Journal of Mathematical and Statistical Psychology 59(1), 1–34 (2006)
Article MathSciNet Google Scholar
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining and Knowledge discovery 2(3), 283–304 (1998)
Article Google Scholar
Xiang, Z., Ji, L.: The use of transfer algorithm for clustering categorical data. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds.) ADMA 2013, Part II. LNCS, vol. 8347, pp. 59–70. Springer, Heidelberg (2013)
Chapter Google Scholar
Banfield, C.F., Bassill, L.C.: Algorithm AS 113. A transfer algorithm for nonhierarchical classification. Applied Statistics 26, 206–210 (1977)
Article Google Scholar
Muller, E., Gunnemann, S., Farber, I., et al.: Discovering multiple clustering solutions: Grouping objects in different views of the data. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE), pp. 1207–1210. IEEE (2012)
Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: A comparative evaluation. Red 30(2), 3 (2008)
Google Scholar
Chaturvedi, A., Green, P.E., Caroll, J.D.: K-modes clustering. Journal of Classification 18(1), 35–55 (2001)
Article MATH MathSciNet Google Scholar
Pang-Ning, T., Steinbach, M., Kumar, V.: Introduction to data mining. Library of Congress (2006)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. In: Proceedings of 15th International Conference on Data Engineering, pp. 512–521. IEEE (1999)
Google Scholar
Palmer, C.R., Faloutsos, C.: Electricity based external similarity of categorical attributes. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 486–500. Springer, Heidelberg (2003)
Chapter Google Scholar
Bache, K., Lichman, M.: UCI Machine Learning Repository. University of California, School of Information and Computer Science, Irvine (2013), http://archive.ics.uci.edu/ml
Telgarsky, M., Vattani, A.: Hartigan’s Method: k-means Clustering without Voronoi. In: International Conference on Artificial Intelligence and Statistics, pp. 820–827 (2010)
Google Scholar
Slonim, N., Aharoni, E., Crammer, K.: Hartigan’s K-means versus Lloyd’s K-means: is it time for a change? In: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence, pp. 1677–1684. AAAI Press (2013)
Google Scholar
Steinley, D.: Local optima in K-means clustering: what you don’t know hurt you. Psychological Methods 8(3), 294 (2003)
Article Google Scholar
Ng, M.K., Li, M.J., Huang, J.Z., et al.: On the impact of dissimilarity measure in k-modes clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence 29(3), 503–507 (2007)
Article Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science, Zhejiang University, China
Zhengrong Xiang
School of Computing and Mathematics, Charles Sturt University, Australia
Md Zahidul Islam

Authors

Zhengrong Xiang
View author publications
You can also search for this author in PubMed Google Scholar
Md Zahidul Islam
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Tasmania, Private Bag 87, 7001, Hobart, Tasmania, Australia
Yang Sok Kim & Byeong Ho Kang &
Department of Computing, Faculty of Science, Macquarie University, 2109, Sydney, Australia
Deborah Richards

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xiang, Z., Islam, M.Z. (2014). The Performance of Objective Functions for Clustering Categorical Data. In: Kim, Y.S., Kang, B.H., Richards, D. (eds) Knowledge Management and Acquisition for Smart Systems and Services. PKAW 2014. Lecture Notes in Computer Science(), vol 8863. Springer, Cham. https://doi.org/10.1007/978-3-319-13332-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-13332-4_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13331-7
Online ISBN: 978-3-319-13332-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Performance of Objective Functions for Clustering Categorical Data

Abstract

Chapter PDF

Similar content being viewed by others

A method for k-means-like clustering of categorical data

The Use of Transfer Algorithm for Clustering Categorical Data

A Quality Metric for K-Means Clustering Based on Centroid Locations

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Performance of Objective Functions for Clustering Categorical Data

Abstract

Chapter PDF

Similar content being viewed by others

A method for k-means-like clustering of categorical data

The Use of Transfer Algorithm for Clustering Categorical Data

A Quality Metric for K-Means Clustering Based on Centroid Locations

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation