Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

Abbas, Ali; Jaiswal, Manish; Agarwal, Shreya; Jha, Prajna; Siddiqui, Tanveer J.

doi:10.1007/978-981-99-5435-3_20

Part of the book series: Studies in Autonomic, Data-driven and Industrial Computing ((SADIC))

Included in the following conference series:

International Conference on Data Science and Communication

171 Accesses

Abstract

With the high consumption of digital data, the problem of unorganized textual data created major challenges in today’s scenario. To overcome this issue of unorganized textual information, we perform document classification which is divided into four major phases, i.e., pre-processing, feature selection, model training, and model testing. In this paper, we have selected four types of feature vectors, three with weighting techniques and one without weighting. Performance has been tested on these four feature vectors with five variances of Naïve Bayes classifiers out of which two were not able to perform training due to the sparseness in the dataset and the rest three performed well. In the reported result of the experiment, F1-macro score and the accuracy of Complement Naïve Bayes using term frequency weighting scheme is 0.807901362 and 0.821163038 which outperform all the other feature sets with all the variance of Naïve Bayes classifiers. In terms of time consumption for training and testing again, the performance of Complement Naïve Bayes using term frequency weighting scheme found best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Hardcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Review of Techniques to Determine the Optimal Word Score in Text Classification

Features Selection Method for Automatic Text Categorization: A Comparative Study with WEKA and RapidMiner Tools

References

Somvanshi M, Chavan P (2016) A review of machine learning techniques using decision tree and support vector machine. In: International conference on computing communication control and automation (ICCUBEA). IEEE, pp 1–7
Google Scholar
Aggarwal CC, Zhai CX (2012) A survey of text classification algorithms. Mining text data, pp 163–222
Google Scholar
Tang B, He H et al (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606
Google Scholar
Michael WB (2004) Survey of text mining: clustering, classification and retrieval. Automatic discovery of similar words. Springer, New York, pp 25–43
Google Scholar
Rish I (2001) An empirical study of the Naïve Bayes classifier. IJCAI 2001 Work Empir Methods Artif Intell 3
Google Scholar
Rennie JDM, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naïve Bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning (ICML-2003). Washington DC
Google Scholar
Moschitti A, Basili R (2004) Complex linguistic features for text classification: a comprehensive study. Lect Notes Comput Sci 2997:181–196
Article Google Scholar
Bekkerman R, Allan J (2004) Using bigrams in text categorization technical report IR-408. Center of Intelligent Information Retrieval, UMass Amherst
Google Scholar
Farhoodi M, Yari A, Sayah A (2011) N-gram based text classification for Persian newspaper corpus. In: The 7th international conference on digital content, multimedia technology and its applications, pp 55–59
Google Scholar
Zhang X, Wu B (2015) Short text classification based on feature extension using The N-Gram model. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 710–716
Google Scholar
Graovac J, Kovacevic J and Pavlovic-Lažetic G 2015 Language independent n-Gram-based text categorization with weighting factors: a case study. J Inf Data Manag 6(1):4–17
Google Scholar
Pane RA, Mubarok MS, Huda NS, Adiwijaya (2018) A multi-lable classification on topics of Quranic verses in English translation using Multinomial Naïve Bayes. In: 2018 6th International Conference on Information and Communication Technology ICoICT, pp 481–484. https://doi.org/10.1109/ICoICT.2018.8528777
Gourav B, Jindal R (2011) Similarity measures of research papers and patents using adaptive and parameter free threshold. Int J Comput Appl 33(5)
Google Scholar
Wu KY, Zhou M et al (2017) A fuzzy logic-based text classification method for social media data. Systems, man, and cybernetics (SMC), IEEE international conference on, vol 13, no 3, pp 23–32
Google Scholar
Liu J, Jin T et al (2017) An improved KNN text classification algorithm based on Simhash. In: Cognitive informatics & cognitive computing (ICCI* CC), 2017 IEEE 16th international conference on. IEEE, pp 92–95
Google Scholar
Yudha BP, Sarrno R (2015) Personality classification based on Twitter text using Naïve Bayes, KNN and SVM. In: Data and software engineering (ICoDSE), in proceedings of international conference on. IEEE, pp 170–174
Google Scholar
Wong T-T, Tsai H-C (2021) 2021 Multinomial Naïve Bayesian classifier with generalized dirichlet priors for high-dimensional imbalanced data. Knowl-Based Syst 228:107288
Article Google Scholar
Yao L, Chengsheng M, Yuan L (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence. PKP, Honolulu, pp 7370–7377
Google Scholar
Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29:3844–3852
Google Scholar
Mingyang, J et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication, University of Allahabad, Prayagraj, India
Ali Abbas, Manish Jaiswal, Shreya Agarwal, Prajna Jha & Tanveer J. Siddiqui

Authors

Ali Abbas
View author publications
You can also search for this author in PubMed Google Scholar
Manish Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar
Shreya Agarwal
View author publications
You can also search for this author in PubMed Google Scholar
Prajna Jha
View author publications
You can also search for this author in PubMed Google Scholar
Tanveer J. Siddiqui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Abbas .

Editor information

Editors and Affiliations

Faculty of Engineering, University of Porto (FEUP), Porto, Portugal
João Manuel R. S. Tavares
National Institute of Telecommunications (INATEL), Santa Rita do Sapucaí, Minas Gerais, Brazil
Joel J. P. C. Rodrigues
Siliguri Institute of Technology, Siliguri, West Bengal, India
Debajyoti Misra
Siliguri Institute of Technology, Siliguri, West Bengal, India
Debasmriti Bhattacherjee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abbas, A., Jaiswal, M., Agarwal, S., Jha, P., Siddiqui, T.J. (2024). Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_20

Download citation

DOI: https://doi.org/10.1007/978-981-99-5435-3_20
Published: 03 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-5434-6
Online ISBN: 978-981-99-5435-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Review of Techniques to Determine the Optimal Word Score in Text Classification

Features Selection Method for Automatic Text Categorization: A Comparative Study with WEKA and RapidMiner Tools

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Investigation of Feature Selection Techniques on Performance of Automatic Text Categorization

A Review of Techniques to Determine the Optimal Word Score in Text Classification

Features Selection Method for Automatic Text Categorization: A Comparative Study with WEKA and RapidMiner Tools

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation