Abstract
Today’s large amount of text data makes feature classification a difficult text processing challenge. High dimensionality is the primary challenge in text processing, and feature selection is a common method for reducing dimensions. The most crucial factors in text classification are a strong text representation and a very accurate classifier. As a result, choosing the right features is essential for using machine learning algorithms effectively. Different optimization techniques, including the Genetic Algorithm (GA), have been effectively used for dimensionality reduction in the field of text classification. To evaluate the performance of GA for Feature Selection (FS), we compared the GA for FS with other filtering methods to prove the efficiency of the GA for FS, for that, we used the NB classifier and three benchmark document collections: SMS, BBC, and 20Newsgroups.
Supported by LIPIM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agrawal, P., Abutarboush, H.F., Ganesh, T., Mohamed, A.W.: Metaheuristic algorithms on feature selection: a survey of one decade of research (2009–2019). IEEE Access 9, 26766–26791 (2021)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)
Bolón-Canedo, V., Sánchez-Marono, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)
Canul-Reich, J., Hall, L.O., Goldgof, D.B., Korecki, J.N., Eschrich, S.: Iterative feature perturbation as a gene selector for microarray data. Int. J. Pattern Recognit Artif Intell. 26(05), 1260003 (2012)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naïve bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)
Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)
Ewees, A.A., et al.: Boosting arithmetic optimization algorithm with genetic algorithm operators for feature selection: case study on cox proportional hazards model. Mathematics 9(18), 2321 (2021)
Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45268-0_6
Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Han, J., Kamber, M., Pei, J.: Outlier detection. Data mining: concepts and techniques, pp. 543–584 (2012)
Hong, S.S., Lee, W., Han, M.M.: The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int. J. Advance Soft Comput. Appl. 7(1), 2074–8523 (2015)
Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)
Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019)
Kumbhar, P., Mali, M.: A survey on feature selection techniques and classification algorithms for efficient text classification. Int. J. Sci. Res. 5(5), 9 (2016)
Labani, M., Moradi, P., Jalili, M.: A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion. Expert Syst. Appl. 149, 113276 (2020)
Li, A.D., Xue, B., Zhang, M.: Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection. Inf. Sci. 523, 245–265 (2020)
Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)
Nag, K., Pal, N.R.: Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Trans. Evol. Comput. 24(3), 454–466 (2019)
Naghibi, T., Hoffmann, S., Pfister, B.: A semidefinite programming based search strategy for feature selection with mutual information measure. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1529–1541 (2014)
Pintas, J.T., Fernandes, L.A., Garcia, A.C.B.: Feature selection methods for text classification: a systematic literature review. Artif. Intell. Rev. 54(8), 6149–6200 (2021)
Pragadeesh, C., Jeyaraj, R., Siranjeevi, K., Abishek, R., Jeyakumar, G.: Hybrid feature selection using micro genetic algorithm on microarray gene expression data. J. Intell. Fuzzy Syst. 36(3), 2241–2246 (2019)
Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., García-Torres, M.: Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl. 39(12), 11094–11102 (2012)
Sahoo, S.R., Gupta, B.B.: Classification of spammer and nonspammer content in online social network using genetic algorithm-based feature selection. Enterp. Inf. Syst. 14(5), 710–736 (2020)
Salesi, S., Cosma, G., Mavrovouniotis, M.: TAGA: TABU asexual genetic algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Inf. Sci. 565, 105–127 (2021)
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
Thirumoorthy, K., Muneeswaran, K.: Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sādhanā 45(1), 1–13 (2020)
Tsai, C.F., Chen, Z.Y., Ke, S.W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014)
Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl.-Based Syst. 24(7), 1024–1032 (2011)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Belkarkor, S., Hafidi, I., Nachaoui, M. (2023). Feature Selection for Text Classification Using Genetic Algorithm. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-29313-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)