Abstract
With the high consumption of digital data, the problem of unorganized textual data created major challenges in today’s scenario. To overcome this issue of unorganized textual information, we perform document classification which is divided into four major phases, i.e., pre-processing, feature selection, model training, and model testing. In this paper, we have selected four types of feature vectors, three with weighting techniques and one without weighting. Performance has been tested on these four feature vectors with five variances of Naïve Bayes classifiers out of which two were not able to perform training due to the sparseness in the dataset and the rest three performed well. In the reported result of the experiment, F1-macro score and the accuracy of Complement Naïve Bayes using term frequency weighting scheme is 0.807901362 and 0.821163038 which outperform all the other feature sets with all the variance of Naïve Bayes classifiers. In terms of time consumption for training and testing again, the performance of Complement Naïve Bayes using term frequency weighting scheme found best.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Somvanshi M, Chavan P (2016) A review of machine learning techniques using decision tree and support vector machine. In: International conference on computing communication control and automation (ICCUBEA). IEEE, pp 1–7
Aggarwal CC, Zhai CX (2012) A survey of text classification algorithms. Mining text data, pp 163–222
Tang B, He H et al (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606
Michael WB (2004) Survey of text mining: clustering, classification and retrieval. Automatic discovery of similar words. Springer, New York, pp 25–43
Rish I (2001) An empirical study of the Naïve Bayes classifier. IJCAI 2001 Work Empir Methods Artif Intell 3
Rennie JDM, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naïve Bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning (ICML-2003). Washington DC
Moschitti A, Basili R (2004) Complex linguistic features for text classification: a comprehensive study. Lect Notes Comput Sci 2997:181–196
Bekkerman R, Allan J (2004) Using bigrams in text categorization technical report IR-408. Center of Intelligent Information Retrieval, UMass Amherst
Farhoodi M, Yari A, Sayah A (2011) N-gram based text classification for Persian newspaper corpus. In: The 7th international conference on digital content, multimedia technology and its applications, pp 55–59
Zhang X, Wu B (2015) Short text classification based on feature extension using The N-Gram model. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 710–716
Graovac J, Kovacevic J and Pavlovic-Lažetic G 2015 Language independent n-Gram-based text categorization with weighting factors: a case study. J Inf Data Manag 6(1):4–17
Pane RA, Mubarok MS, Huda NS, Adiwijaya (2018) A multi-lable classification on topics of Quranic verses in English translation using Multinomial Naïve Bayes. In: 2018 6th International Conference on Information and Communication Technology ICoICT, pp 481–484. https://doi.org/10.1109/ICoICT.2018.8528777
Gourav B, Jindal R (2011) Similarity measures of research papers and patents using adaptive and parameter free threshold. Int J Comput Appl 33(5)
Wu KY, Zhou M et al (2017) A fuzzy logic-based text classification method for social media data. Systems, man, and cybernetics (SMC), IEEE international conference on, vol 13, no 3, pp 23–32
Liu J, Jin T et al (2017) An improved KNN text classification algorithm based on Simhash. In: Cognitive informatics & cognitive computing (ICCI* CC), 2017 IEEE 16th international conference on. IEEE, pp 92–95
Yudha BP, Sarrno R (2015) Personality classification based on Twitter text using Naïve Bayes, KNN and SVM. In: Data and software engineering (ICoDSE), in proceedings of international conference on. IEEE, pp 170–174
Wong T-T, Tsai H-C (2021) 2021 Multinomial Naïve Bayesian classifier with generalized dirichlet priors for high-dimensional imbalanced data. Knowl-Based Syst 228:107288
Yao L, Chengsheng M, Yuan L (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence. PKP, Honolulu, pp 7370–7377
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29:3844–3852
Mingyang, J et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Abbas, A., Jaiswal, M., Agarwal, S., Jha, P., Siddiqui, T.J. (2024). Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_20
Download citation
DOI: https://doi.org/10.1007/978-981-99-5435-3_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5434-6
Online ISBN: 978-981-99-5435-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)