Abstract
Text categorization plays a significant role in many information management tasks. Due to the increasing volume of documents on the Internet, automated text categorization has been more considered for classifying documents in pre-defined categories. A major problem of text categorization is the high dimensionality of feature space. Most of the features are irrelevant and redundant impacting the classifier performance. Hence, feature selection is used to reduce the high dimensionality of feature space and increase classification efficiency. In this paper, we proposed a hybrid two-stage method for text feature selection based on Relative Discrimination Criterion (RDC) and Ant Colony Optimization (ACO). To this end, we applied RDC method, at first, in order to rank features based on their values. Features, then, which their values are lower than a threshold are removed from the feature set. In the second stage, as a wrapper method, an ACO-based feature selection method is applied, to select redundant or irrelevant features that have not been removed in the first stage. Finally, to assess the proposed methods, we have conducted several experiments on different datasets to indicate the superiority of our proposed algorithm. We aim to propose a hybrid approach which is computationally more efficient in much the same way as it is more accurate compared to the other embedded or wrapper methods. The obtained results endorse that the proposed method is of remarkable performance in text feature selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: ICML, vol 412–420, p 35. Nashville, TN, USA
Chen J, Huang H, Tian S, Qu Y (2009) Feature selection for text classification with Naïve Bayes. Expert Syst Appl 36(3):5432–5435
Bahassine S, Madani A, Al-Sarem M, Kissi M (2020) Feature selection using an improved Chi-square for Arabic text classification. J King Saud Univ-Comput Inf Sci 32(2):225–231
Cekik R, Uysal AK (2020) A novel filter feature selection method using rough set for short text data. Expert Syst Appl 160:113691
Mousavirad SJ, Schaefer G, Korovin I, Moghadam MH, Saadatmand M, Pedram M (2021) An enhanced differential evolution algorithm using a novel clustering-based mutation operator. In: 2021 IEEE international conference on systems, man, and cybernetics (SMC), pp 176–181. https://doi.org/10.1109/SMC52423.2021.9658743
Mousavirad SJ, Rahnamayan S (2020) One-array differential evolution algorithm with a novel replacement strategy for numerical optimization. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), pp 2514–2519. https://doi.org/10.1109/SMC42975.2020.9283154
Mousavirad SJ, Rahnamayan S (2020) CenPSO: a novel center-based particle swarm optimization algorithm for large-scale optimization. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC), pp 2066–2071. https://doi.org/10.1109/SMC42975.2020.9283143
Bojnordi E, Mousavirad SJ, Schaefer G, Korovin I (2021) MCS-HMS: a multi-cluster selection strategy for the human mental search algorithm. arXiv preprint arXiv:2111.10676
Mousavirad SJ, Schaefer G, Korovin I, Saadatmand M (2021) HMS-OS: improving the human mental search optimisation algorithm by grouping in both search and objective space. arXiv preprint arXiv:2111.10188
Marie-Sainte SL, Alalyani N (2020) Firefly algorithm based feature selection for Arabic text classification. J King Saud Univ-Comput Inf Sci 32(3):320–328
Purushothaman R, Rajagopalan S, Dhandapani G (2020) Hybridizing Gray Wolf Optimization (GWO) with Grasshopper Optimization Algorithm (GOA) for text feature selection and clustering. Appl Soft Comput 96:106651
Mousavirad SJ, Ebrahimpour-Komleh H (2013) Feature selection using modified imperialist competitive algorithm. ICCKE 2013:400–405. https://doi.org/10.1109/ICCKE.2013.6682833
Aghdam MH, Ghasem-Aghaee N, Basiri ME (2009) Text feature selection using ant colony optimization. Expert Syst Appl 36(3):6843–6853
Shang W, Huang H, Zhu H, Lin Y, Qu Y, Wang Z (2007) A novel feature selection algorithm for text categorization. Expert Syst Appl 33(1):1–5
Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31(3):226–233
Paniri M, Dowlatshahi MB, Nezamabadi-pour H (2021) Ant-TD: Ant colony optimization plus temporal difference reinforcement learning for multi-label feature selection. Swarm Evol Comput 64:100892
Jayaprakash A, KeziSelvaVijila C (2019) Feature selection using ant colony optimization (ACO) and road sign detection and recognition (RSDR) system. Cogn Syst Res 58:123–133
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3(Mar), 1157–1182
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(4):491–502
Ng AY (2004) Feature selection, L 1 vs. L 2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on Machine learning, p 78
Mladenić D (2005) Feature selection for dimensionality reduction. In: International statistical and optimization perspectives workshop “Subspace, Latent Structure and Feature Selection”. Springer, pp 84–102
Rehman A, Javed K, Babri HA, Saeed M (2015) Relative discrimination criterion—a novel feature ranking method for text data. Expert Syst Appl 42(7):3670–3681
Cordón García O, Herrera Triguero F, Stützle T (2002) A review on the ant colony optimization metaheuristic: basis, models and new trends. Mathware Soft Comput 9(2) [–3]
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manage 24(5):513–523
Van Rijsbergen C (1979) Information retrieval: theory and practice. In: Proceedings of the Joint IBM/University of Newcastle upon Tyne Seminar on Data Base Systems, pp 1–14
Imani MB, Keyvanpour MR, Azmi R (2013) A novel embedded feature selection method: a comparative study in the application of text categorization. Appl Artif Intell 27(5):408–427
Uğuz H (2011) A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl-Based Syst 24(7):1024–1032
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Hemmati, M., Mousavirad, S.J., Bojnordi, E., Shaeri, M. (2022). A New Hybrid Method for Text Feature Selection Through Combination of Relative Discrimination Criterion and Ant Colony Optimization. In: Kim, J.H., Deep, K., Geem, Z.W., Sadollah, A., Yadav, A. (eds) Proceedings of 7th International Conference on Harmony Search, Soft Computing and Applications. Lecture Notes on Data Engineering and Communications Technologies, vol 140. Springer, Singapore. https://doi.org/10.1007/978-981-19-2948-9_16
Download citation
DOI: https://doi.org/10.1007/978-981-19-2948-9_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2947-2
Online ISBN: 978-981-19-2948-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)