Abstract
Feature clustering has evolved to be a powerful method for clustering text documents. In this paper we propose a hybrid similarity based clustering algorithm for feature clustering. Documents are represented by keywords. These words are grouped into clusters, based on efficient similarity computations. Documents with related words are grouped into clusters. The clusters are characterised by similarity equations, graph based similarity measures and Gaussian parameters. As words are been given into the system, clusters would be generated automatically. The hybrid mechanism works with membership algorithms to identify documents that match with one another and can be grouped into clusters. The method works to find the real distribution of words in the text documents. Experimental results do show that the proposed method is much better when compared against several other clustering methods. The distinguished clusters are identified by a unique group of top keywords, obtained from the documents of a cluster.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Yan, J., Zhang, B., Liu, N., Yan, S., Cheng, Q., Fan, W., Yang, Q., Xi, W., Chen, Z.: Effective And Efficient Dimentionality Reduction For Large Scale And Streaming Data Preprocessing. IEEE Trans. Knowledge and Data Eng. 18(3), 320–333 (2006)
Hiraoka, K., Hidai, K., Hamahira, M., Mizoguchi, H., Mishima, T., Yoshizawa, S.: Successive Learning of Linear Discriminat Analysis: Sanger-Type Algorithm. In: Proceedings of IEEE CS Int’l Conf. Pattern Recognition, pp. 2664–2667 (2000)
Weng, J., Chang, Y., Hwang, W.S.: Candid Covariance-Free Incremental Principal Component Analysis. IEEE Trans. Pattern Analysis and Machine Intelligence 25(8), 1034–1040 (2003)
Yang, Y., Pederson, J.O.: A comparative study on feature selection in text categorization. In: Proc. of the Fourth International Conference on Machine Learning, pp. 412–420 (2007)
Li, Y.J., Chungm, S.M., Holt, J.D.: Text document clustering based on frequent word meaning sequences. Data & Knowledge Engineering 64, 381–404 (2008)
Fung, B., Wang, K., Ester, M.: Hierarchial Document Clustering Using Frequent Itemsets. In: Proc. of 3rd SIAM International Conference on Data Mining (2003)
Khalessizadeh, S.M., Zaefarian, R., Nasseri, S.H., Ardil, E.: Genetic Mining Using Genetic Algorithm For Topic Based On Concept Distribution. World Academy of Science, Engineering and Technology 13 (2006)
Rose, J.D., Dev, D.D., Robin, C.R.R.: An Improved Genetic Based Keyword Extraction Technique. In: Terrazas, G., Otero, F.E.B., Masegosa, A.D. (eds.) NICSO 2013. SCI, vol. 512, pp. 153–166. Springer, Heidelberg (2014)
Fagin, R., Lotem, A., Naor, M.: Optimal Aggregation Algorithms for Middleware. In: Proc. of the 20th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 102–113 (2001)
Sinha, R., Mihang, R.: Unsupervised Graph Based Word Sense Disambiguation Using Measure Of Word Semantic Similarity. In: IEEE International Conference on Semantic Computing, pp. 363–369 (2007)
Leacock, C., Chodorow, M.: Combining Local Context And Wordnet Sense Similatity For Word Sence Identification In Wordnet, An Electronic Lexical Database. The MIT Press (1998)
Wu, Z., Palmer, M.: Verb Semantics And Lexical Selection. In: Proc. of the 32nd Annual Meeting of the Association For Computational Linguistics, Las Cruces Mexico (1994)
Jiang, J.Y., Liou, R.J., Lee, S.J.: A Fuzzy Self Constructing Feature Clustering Algorithm For Text Classification. IEEE Transactions on Knowledge and Data Engineering 23(3), 335–348 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Dev, D.D., Jebaruby, M. (2014). Text Clustering Using Novel Hybrid Algorithm. In: Nguyen, N.T., Attachoo, B., Trawiński, B., Somboonviwat, K. (eds) Intelligent Information and Database Systems. ACIIDS 2014. Lecture Notes in Computer Science(), vol 8397. Springer, Cham. https://doi.org/10.1007/978-3-319-05476-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-05476-6_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-05475-9
Online ISBN: 978-3-319-05476-6
eBook Packages: Computer ScienceComputer Science (R0)