Abstract
The increasing unstructured amount of text information on the Internet has become a significant obstacle to user needs. The text clustering approach is used in text mining to divide a group of texts into a predetermined number of clusters. The clustering algorithms suffer from the increasing amount of non-informative words in the corpus. Non-informative text features are removed from each document to improve clustering approach performance and computation. Here, several particle swarm optimization variants, such as inertia weight and constriction factor, are compared to improve the particle exploration experience. The PSO method is compared with the other commonly used metaheuristics (i.e., the Genetic algorithm and the Harmony search algorithm). Also, various exploration and initialization characteristics are integrated with the PSO to improves its performance. The experiments were carried out on four standard datasets: Reuters-21578, 20Newsgroups, Classic4, and WebKB. The experimental results show that PSO outperforms the other competing approaches in terms of clustering Accuracy, Precision, Recall, and F-measure. Moreover, adjusting PSO parameters improves its performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bharti, K.K., Singh, P.K.: Opposition chaotic fitness mutation based adaptive inertia weight BPSO for feature selection in text clustering. Appl. Soft Comput. 43, 20–34 (2016)
Abualigah, L.M., Khader, A.T., AlBetar, M.A., Hanandeh, E.S.: Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering. In: First EAI International Conference on Computer Science and Engineering, pp. 169–178. EAI (2017)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A new feature selection method to improve the document clustering using particle swarm optimization algorithm. J. Comput. Sci. 25, 456–466 (2018)
Li, J., Cheng, K., Wang, S., Morstatter, F., Trevino, R.P., Tang, J., Liu, H.: Feature selection: a data perspective. ACM Comput. Surv. (CSUR) 50(6), 1–45 (2017)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)
Bharti, K.K., Singh, P.K.: A three-stage unsupervised dimension reduction method for text clustering. J. Comput. Sci. 5(2), 156–169 (2014)
Jafer, Y., Matwin, S., Sokolova, M.: Privacy-aware filter-based feature selection. In: 2014 IEEE International Conference on Big Data (Big Data), pp. 1-5. IEEE, Washington, DC (2014)
Bharti, K.K., Singh, P.K.: Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Syst. Appl. 42(6), 3105–3114 (2015)
Bai, X., Gao, X., Xue, B.: Particle swarm optimization based two-stage feature selection in text mining. In: 2018 IEEE Congress on Evolutionary Computation (CEC), pp. 1–8. IEEE, Rio de Janeiro (2018)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Uǧuz, H.: A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm. Knowl.-Based Syst. 24(7), 1024–1032 (2011)
Kennedy, J. and Eberhart, R.: Particle swarm optimization. In: Proceedings of ICNN’95-International Conference on Neural Networks, pp. 1942–1948. IEEE, Perth (1995)
Chuang, L.Y., Yang, C.H., Li, J.C.: Chaotic maps based on binary particle swarm optimization for feature selection. Appl. Soft Comput. 11(1), 239–248 (2011)
Lin, S.W., Ying, K.C., Chen, S.C., Lee, Z.J.: Particle swarm optimization for parameter determination and feature selection of support vector machines. Expert Syst. Appl. 35(4), 1817–1824 (2008)
Lu, Y., Liang, M., Ye, Z., Cao, L.: Improved particle swarm optimization algorithm and its application in text feature selection. Appl. Soft Comput. 35, 629–636 (2015)
Shi, Y., Eberhart, R.: A modified particle swarm optimizer. In: 1998 IEEE international conference on evolutionary computation proceedings. In: IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360), pp. 69–73. IEEE, Anchorage (1998)
Holland, J. H.: Adaptation in natural and artificial systems: an introductory analysis with applications to biology, control, and artificial intelligence. MIT Press (1992)
Abualigah, L.M., Khader, A.T., Al-Betar, M.A.: Unsupervised feature selection technique based on genetic algorithm for improving the text clustering. In: 2016 7th International Conference on Computer Science and information technology (CSIT), pp. 1–6. IEEE, Amman (2016)
Narendra, P.M., Fukunaga, K.: A branch and bound algorithm for feature subset selection. IEEE Trans. Comput. 26(09), 917–922 (1977)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Arai, H., Maung, C., Xu, K., Schweitzer, H.: Unsupervised feature selection by heuristic search with provable bounds on suboptimality. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, No. 1 (2016)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Yang, Y.: Noise reduction in a statistical approach to text categorization. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 256–263 (2010)
Ferreira, A.J., Figueiredo, M.A.: Efficient feature selection filters for high-dimensional data. Pattern Recogn. Lett. 33(13), 1794–1804 (2012)
Hsu, H.H., Hsieh, C.W., Lu, M.D.: Hybrid feature selection by combining filters and wrappers. Expert Syst. Appl. 38(7), 8144–8150 (2011)
Zafra, A., Pechenizkiy, M., Ventura, S.: HyDR-MI: A hybrid algorithm to reduce dimensionality in multiple instance learning. Inf. Sci. 222, 282–301 (2013)
Aghdam, M.H., Heidari, S.: Feature selection using particle swarm optimization in text categorization. J. Artif. Intell. Soft Comput. Res. 5(4), 231–238 (2015)
Chuang, L.Y., Tsai, S.W., Yang, C.H.: Improved binary particle swarm optimization using catfish effect for feature selection. Expert Syst. Appl. 38(10), 12699–12707 (2011)
Liu, Y., Wang, G., Chen, H., Dong, H., Zhu, X., Wang, S.: An improved particle swarm optimization for feature selection. J. Bionic Eng. 8(2), 191–200 (2011)
Hong, S.S., Lee, W., Han, M.M.: The feature selection method based on genetic algorithm for efficient of text clustering and text classification. Int. J. Adv. Soft. Comput. Appl. 7(1), 2074–8523 (2015)
Tan, F., Fu, X., Zhang, Y., Bourgeois, A.G.: A genetic algorithm-based method for feature subset selection. Soft. Comput. 12(2), 111–120 (2008)
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Feature Extraction. Construction and Selection, pp. 117–136. Springer, Boston (1998)
Shamsinejadbabki, P., Saraee, M.: A new unsupervised feature selection method for text clustering based on genetic algorithms. J. Intell. Inf. Syst. 38(3), 669–684 (2012)
Sharma, M., Kaur, P.: A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives Comput. Methods Eng. 28(3), 1103–1127 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lakouam, I., Hafidi, I., Nachaoui, M. (2023). Meta-heuristic Algorithms for Text Feature Selection Problems. In: Aboutabit, N., Lazaar, M., Hafidi, I. (eds) Advances in Machine Intelligence and Computer Science Applications. ICMICSA 2022. Lecture Notes in Networks and Systems, vol 656. Springer, Cham. https://doi.org/10.1007/978-3-031-29313-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-29313-9_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-28845-6
Online ISBN: 978-3-031-29313-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)