Skip to main content

Feature Selection for Text Classification Using Genetic Algorithm

  • Conference paper
  • First Online:
Advances in Machine Intelligence and Computer Science Applications (ICMICSA 2022)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 656))

Abstract

Today’s large amount of text data makes feature classification a difficult text processing challenge. High dimensionality is the primary challenge in text processing, and feature selection is a common method for reducing dimensions. The most crucial factors in text classification are a strong text representation and a very accurate classifier. As a result, choosing the right features is essential for using machine learning algorithms effectively. Different optimization techniques, including the Genetic Algorithm (GA), have been effectively used for dimensionality reduction in the field of text classification. To evaluate the performance of GA for Feature Selection (FS), we compared the GA for FS with other filtering methods to prove the efficiency of the GA for FS, for that, we used the NB classifier and three benchmark document collections: SMS, BBC, and 20Newsgroups.

Supported by LIPIM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, P., Abutarboush, H.F., Ganesh, T., Mohamed, A.W.: Metaheuristic algorithms on feature selection: a survey of one decade of research (2009–2019). IEEE Access 9, 26766–26791 (2021)

    Article  Google Scholar 

  2. Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6(1), 37–66 (1991)

    Article  Google Scholar 

  3. Bolón-Canedo, V., Sánchez-Marono, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)

    Article  Google Scholar 

  4. Canul-Reich, J., Hall, L.O., Goldgof, D.B., Korecki, J.N., Eschrich, S.: Iterative feature perturbation as a gene selector for microarray data. Int. J. Pattern Recognit Artif Intell. 26(05), 1260003 (2012)

    Article  MathSciNet  Google Scholar 

  5. Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)

    Article  Google Scholar 

  6. Chen, J., Huang, H., Tian, S., Qu, Y.: Feature selection for text classification with naïve bayes. Expert Syst. Appl. 36(3), 5432–5435 (2009)

    Article  Google Scholar 

  7. Das, A.K., Das, S., Ghosh, A.: Ensemble feature selection using bi-objective genetic algorithm. Knowl.-Based Syst. 123, 116–127 (2017)

    Article  Google Scholar 

  8. Ewees, A.A., et al.: Boosting arithmetic optimization algorithm with genetic algorithm operators for feature selection: case study on cox proportional hazards model. Mathematics 9(18), 2321 (2021)

    Article  Google Scholar 

  9. Galavotti, L., Sebastiani, F., Simi, M.: Experiments on the use of feature selection and negative evidence in automated text categorization. In: Borbinha, J., Baker, T. (eds.) ECDL 2000. LNCS, vol. 1923, pp. 59–68. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45268-0_6

    Chapter  Google Scholar 

  10. Ghareb, A.S., Bakar, A.A., Hamdan, A.R.: Hybrid feature selection based on enhanced genetic algorithm for text categorization. Expert Syst. Appl. 49, 31–47 (2016)

    Article  Google Scholar 

  11. Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)

    Article  Google Scholar 

  12. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  13. Han, J., Kamber, M., Pei, J.: Outlier detection. Data mining: concepts and techniques, pp. 543–584 (2012)

    Google Scholar 

  14. Hong, S.S., Lee, W., Han, M.M.: The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int. J. Advance Soft Comput. Appl. 7(1), 2074–8523 (2015)

    Google Scholar 

  15. Kim, S.B., Han, K.S., Rim, H.C., Myaeng, S.H.: Some effective techniques for Naive Bayes text classification. IEEE Trans. Knowl. Data Eng. 18(11), 1457–1466 (2006)

    Article  Google Scholar 

  16. Kowsari, K., Jafari Meimandi, K., Heidarysafa, M., Mendu, S., Barnes, L., Brown, D.: Text classification algorithms: a survey. Information 10(4), 150 (2019)

    Article  Google Scholar 

  17. Kumbhar, P., Mali, M.: A survey on feature selection techniques and classification algorithms for efficient text classification. Int. J. Sci. Res. 5(5), 9 (2016)

    Google Scholar 

  18. Labani, M., Moradi, P., Jalili, M.: A multi-objective genetic algorithm for text feature selection using the relative discriminative criterion. Expert Syst. Appl. 149, 113276 (2020)

    Article  Google Scholar 

  19. Li, A.D., Xue, B., Zhang, M.: Multi-objective feature selection using hybridization of a genetic algorithm and direct multisearch for key quality characteristic selection. Inf. Sci. 523, 245–265 (2020)

    Article  MathSciNet  Google Scholar 

  20. Nag, K., Pal, N.R.: A multiobjective genetic programming-based ensemble for simultaneous feature selection and classification. IEEE Trans. Cybern. 46(2), 499–510 (2015)

    Article  Google Scholar 

  21. Nag, K., Pal, N.R.: Feature extraction and selection for parsimonious classifiers with multiobjective genetic programming. IEEE Trans. Evol. Comput. 24(3), 454–466 (2019)

    Google Scholar 

  22. Naghibi, T., Hoffmann, S., Pfister, B.: A semidefinite programming based search strategy for feature selection with mutual information measure. IEEE Trans. Pattern Anal. Mach. Intell. 37(8), 1529–1541 (2014)

    Article  Google Scholar 

  23. Pintas, J.T., Fernandes, L.A., Garcia, A.C.B.: Feature selection methods for text classification: a systematic literature review. Artif. Intell. Rev. 54(8), 6149–6200 (2021)

    Article  Google Scholar 

  24. Pragadeesh, C., Jeyaraj, R., Siranjeevi, K., Abishek, R., Jeyakumar, G.: Hybrid feature selection using micro genetic algorithm on microarray gene expression data. J. Intell. Fuzzy Syst. 36(3), 2241–2246 (2019)

    Article  Google Scholar 

  25. Ruiz, R., Riquelme, J.C., Aguilar-Ruiz, J.S., García-Torres, M.: Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches. Expert Syst. Appl. 39(12), 11094–11102 (2012)

    Article  Google Scholar 

  26. Sahoo, S.R., Gupta, B.B.: Classification of spammer and nonspammer content in online social network using genetic algorithm-based feature selection. Enterp. Inf. Syst. 14(5), 710–736 (2020)

    Article  Google Scholar 

  27. Salesi, S., Cosma, G., Mavrovouniotis, M.: TAGA: TABU asexual genetic algorithm embedded in a filter/filter feature selection approach for high-dimensional data. Inf. Sci. 565, 105–127 (2021)

    Article  MathSciNet  Google Scholar 

  28. Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  29. Thirumoorthy, K., Muneeswaran, K.: Optimal feature subset selection using hybrid binary Jaya optimization algorithm for text classification. Sādhanā 45(1), 1–13 (2020)

    Article  Google Scholar 

  30. Tsai, C.F., Chen, Z.Y., Ke, S.W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014)

    Article  Google Scholar 

  31. Uğuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl.-Based Syst. 24(7), 1024–1032 (2011)

    Article  Google Scholar 

  32. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithms. Mach. Learn. 38(3), 257–286 (2000)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Salma Belkarkor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belkarkor, S., Hafidi, I., Nachaoui, M. (2023). Feature Selection for Text Classification Using Genetic Algorithm. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_7

Download citation

Publish with us

Policies and ethics