Abstract
The traditional centroid-based classifiers cannot be directly applied to categorical data classification due to the undefined concept of centroid for a categorical class, and the lack of an effective distance measure for categorical objects. In this paper, two centroid-based classifiers are proposed for categorical data classification. We propose a new formulation for the centroid of categorical classes to address the first problem, while two weighted distance measures are defined for the second problem. The experimental results conducted on real-world data sets show the effectiveness of the proposed methods.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Han, E.-H(S.), Karypis, G.: Centroid-based document classification: Analysis and experimental results. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000)
Chen, L., Ye, Y., Jiang, Q.: New centroid-based classifier for text categorization. In: Proceedings of the AINAW, pp. 1217–1222 (2008)
Sen, P.: Gini diversity index, hamming distance and curse of dimensionality. Metron - International Journal of Statistics LXIII(3), 329–349 (2005)
Weinberger, K., Saul, L.: Distance Metric Learning for Large Margin Nearest Neighbor Classification. Journal of Machine Learning Research 10, 207–244 (2009)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)
Hall, M., Frank, E., et al.: The weka data mining software: An update. SIGKDD Explorations 11 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Chen, L., Guo, G. (2014). Centroid-Based Classification of Categorical Data. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_50
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_50
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)