Abstract
The volume of textual data in digital form is growing with each day. For arranging these textual data, text classification has been used. To achieve efficient text classification, data preprocessing is an important phase. It prepares information for machine learning models. Text classification, however, has the issue of the high dimensionality of space for features. Feature selection is a technique for data preprocessing widely used on high-dimensional data. By feature selection techniques, this high dimensionality of feature space is solved and increases text classification efficiency. Feature selection explores how a list of features used to create text classification models may be chosen. Its goals include reducing dimensionality, deleting uninformative features, reducing the amount of data available to classifiers for learning, and enhancing classifiers’ predictive performance. The different methods of feature selection are presented in this paper. This paper also presents the advantages and limitations of feature selection methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Magnini B, Pezzulo G, Gliozzo A (2002) The role of domain information in word sense disambiguation. Natural Language Engineering
Ramisetty S, Verma S (2019) The amalgamative sharp wireless sensor networks routing and with enhanced machine learning. J Comput Theor Nanosci 16(9):3766–3769
Zhang ML, Zhou ZH (2007) ML-KNN: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048
Batra I, Verma S, Malik A, Ghosh U, Rodrigues JJ, Nguyen GN, Mariappan V (2020) Hybrid logical security framework for privacy preservation in the green internet of things. Sustainability 12(14):5542
Armanfard N, Reilly JP, Komeili M (2015) Local feature selection for data classification. IEEE Trans Pattern Anal Mach Intell 38(6):1217–1227
Pölsterl S, Conjeti S, Navab N, Katouzian A (2016) Survival analysis for high-dimensional, heterogeneous medical data: exploring feature extraction as an alternative to feature selection. Artif Intell Med 72:1–11
Lei S (2012) A feature selection method based on information gain and genetic algorithm. In: 2012 international conference on computer science and electronics engineering, vol 2. IEEE, pp 355–358
Jin X, Xu A, Bie R, Guo P (2006) Machine learning techniques and chi-square feature selection for cancer classification using SAGE gene expression profiles. In: International workshop on data mining for biomedical applications. Springer, Berlin, Heidelberg, pp 106–115
Youn E, Koenig L, Jeong MK, Baek SH (2010) Support vector-based feature selection using Fisher’s linear discriminant and support vector machine. Expert Syst Appl 37(9):6148–6156
Wang Y, Wang XJ (2005) A new approach to feature selection in text classification. In: 2005 international conference on machine learning and cybernetics, vol 6. IEEE, pp 3814–3819
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238
Labani M, Moradi P, Ahmadizar F, Jalili M (2018) A novel multivariate filter method for feature selection in text classification problems. Eng Appl Artif Intell 70:25–37
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83(2):83–90
Rani P, Verma S, Nguyen GN (2020) Mitigation of black hole and gray hole attack using swarm inspired algorithm with artificial neural network. IEEE Access 8:121755–121764
Chen H, Jiang W, Li C, Li R (2013) A heuristic feature selection approach for text categorization by using chaos optimization and genetic algorithm. Mathematical Problems in Engineering
Guyon I, Bitter HM, Ahmed Z, Brown M, Heller J (2003) Multivariate non-linear feature selection with kernel multiplicative updates and gram-schmidt relief. In: BISC flint-CIBI 2003 workshop, Berkeley, pp 1–11
Urbanowicz RJ, Olson RS, Schmitt P, Meeker M, Moore JH (2018) Benchmarking relief-based feature selection methods for bioinformatics data mining. J Biomed Inform 85:168–188
Vickers NJ (2017) Animal communication: when i’m calling you, will you answer too? Curr Biol 27(14):R713–R715
Hoerl AE, Kennard RW (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12(1):55–67
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B (Stat Methodol) 67(2):301–320
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Dogra, V., Singh, A., Verma, S., Kavita, Jhanjhi, N.Z., Talib, M.N. (2021). Understanding of Data Preprocessing for Dimensionality Reduction Using Feature Selection Techniques in Text Classification. In: Peng, SL., Hsieh, SY., Gopalakrishnan, S., Duraisamy, B. (eds) Intelligent Computing and Innovation on Data Science. Lecture Notes in Networks and Systems, vol 248. Springer, Singapore. https://doi.org/10.1007/978-981-16-3153-5_48
Download citation
DOI: https://doi.org/10.1007/978-981-16-3153-5_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-3152-8
Online ISBN: 978-981-16-3153-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)