Skip to main content

Meta-heuristic Algorithms for Text Feature Selection Problems

  • Conference paper
  • First Online:
Advances in Machine Intelligence and Computer Science Applications (ICMICSA 2022)

Abstract

The increasing unstructured amount of text information on the Internet has become a significant obstacle to user needs. The text clustering approach is used in text mining to divide a group of texts into a predetermined number of clusters. The clustering algorithms suffer from the increasing amount of non-informative words in the corpus. Non-informative text features are removed from each document to improve clustering approach performance and computation. Here, several particle swarm optimization variants, such as inertia weight and constriction factor, are compared to improve the particle exploration experience. The PSO method is compared with the other commonly used metaheuristics (i.e., the Genetic algorithm and the Harmony search algorithm). Also, various exploration and initialization characteristics are integrated with the PSO to improves its performance. The experiments were carried out on four standard datasets: Reuters-21578, 20Newsgroups, Classic4, and WebKB. The experimental results show that PSO outperforms the other competing approaches in terms of clustering Accuracy, Precision, Recall, and F-measure. Moreover, adjusting PSO parameters improves its performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)

    Article  Google Scholar 

  2. Abualigah, L.M., Khader, A.T., AlBetar, M.A., Hanandeh, E.S.: Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. In: First EAI International Conference on Computer Science and Engineering, pp. 169–178. EAI (2017)

    Google Scholar 

  3. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  4. Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)

    Article  Google Scholar 

  5. Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)

    Article  Google Scholar 

  6. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)

    Article  Google Scholar 

  7. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)

    Article  MATH  Google Scholar 

  8. Bharti, K.K., Singh, P.K.: A three-stage unsupervised dimension reduction method for text clustering. J. Comput. Sci. 5(2), 156–169 (2014)

    Article  Google Scholar 

  9. Jafer, Y., Matwin, S., Sokolova, M.: Privacy-aware filter-based feature selection. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 1-5. IEEE, Washington, DC (2014)

    Google Scholar 

  10. Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42(6), 3105–3114 (2015)

    Article  Google Scholar 

  11. Bai, X., Gao, X., Xue, B.: Particle swarm optimization based two-stage feature selection in text mining. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE, Rio de Janeiro (2018)

    Google Scholar 

  12. Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)

    Google Scholar 

  13. Uǧuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl.-Based Syst. 24(7), 1024–1032 (2011)

    Article  Google Scholar 

  14. Kennedy, J. and Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, pp. 1942–1948. IEEE, Perth (1995)

    Google Scholar 

  15. Chuang, L.Y., Yang, C.H., Li, J.C.: Chaotic maps based on binary particle swarm optimization for feature selection. Appl. Soft Comput. 11(1), 239–248 (2011)

    Article  Google Scholar 

  16. Lin, S.W., Ying, K.C., Chen, S.C., Lee, Z.J.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)

    Article  Google Scholar 

  17. Lu, Y., Liang, M., Ye, Z., Cao, L.: Improved particle swarm optimization algorithm and its application in text feature selection. Appl. Soft Comput. 35, 629–636 (2015)

    Article  Google Scholar 

  18. Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. In: IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), pp. 69–73. IEEE, Anchorage (1998)

    Google Scholar 

  19. Holland, J. H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press (1992)

    Google Scholar 

  20. Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International Conference on Computer Science and information technology (CSIT), pp. 1–6. IEEE, Amman (2016)

    Google Scholar 

  21. Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26(09), 917–922 (1977)

    Article  MATH  Google Scholar 

  22. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)

    Article  MATH  Google Scholar 

  23. Arai, H., Maung, C., Xu, K., Schweitzer, H.: Unsupervised feature selection by heuristic search with provable bounds on suboptimality. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, No. 1 (2016)

    Google Scholar 

  24. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  25. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  26. Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263 (2010)

    Google Scholar 

  27. Ferreira, A.J., Figueiredo, M.A.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)

    Article  Google Scholar 

  28. Hsu, H.H., Hsieh, C.W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)

    Article  Google Scholar 

  29. Zafra, A., Pechenizkiy, M., Ventura, S.: HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning. Inf. Sci. 222, 282–301 (2013)

    Article  MathSciNet  Google Scholar 

  30. Aghdam, M.H., Heidari, S.: Feature selection using particle swarm optimization in text categorization. J. Artif. Intell. Soft Comput. Res. 5(4), 231–238 (2015)

    Article  Google Scholar 

  31. Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)

    Article  Google Scholar 

  32. Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)

    Article  Google Scholar 

  33. Hong, S.S., Lee, W., Han, M.M.: The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int. J. Adv. Soft. Comput. Appl. 7(1), 2074–8523 (2015)

    Google Scholar 

  34. Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2008)

    Article  Google Scholar 

  35. Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Feature Extraction. Construction and Selection, pp. 117–136. Springer, Boston (1998)

    Google Scholar 

  36. Shamsinejadbabki, P., Saraee, M.: A new unsupervised feature selection method for text clustering based on genetic algorithms. J. Intell. Inf. Syst. 38(3), 669–684 (2012)

    Article  Google Scholar 

  37. Sharma, M., Kaur, P.: A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives Comput. Methods Eng. 28(3), 1103–1127 (2021)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Issam Lakouam .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lakouam, I., Hafidi, I., Nachaoui, M. (2023). Meta-heuristic Algorithms for Text Feature Selection Problems. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_5

Download citation

Publish with us

Policies and ethics