Skip to main content

Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification

  • Conference paper
  • First Online:
Data Science and Communication (ICTDsC 2023)

Abstract

With the high consumption of digital data, the problem of unorganized textual data created major challenges in today’s scenario. To overcome this issue of unorganized textual information, we perform document classification which is divided into four major phases, i.e., pre-processing, feature selection, model training, and model testing. In this paper, we have selected four types of feature vectors, three with weighting techniques and one without weighting. Performance has been tested on these four feature vectors with five variances of  Naïve Bayes classifiers out of which two were not able to perform training due to the sparseness in the dataset and the rest three performed well. In the reported result of the experiment, F1-macro score and the accuracy of Complement Naïve Bayes using term frequency weighting scheme is 0.807901362 and 0.821163038 which outperform all the other feature sets with all the variance of Naïve Bayes classifiers. In terms of time consumption for training and testing again, the performance of Complement Naïve Bayes using term frequency weighting scheme found best.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Somvanshi M, Chavan P (2016) A review of machine learning techniques using decision tree and support vector machine. In: International conference on computing communication control and automation (ICCUBEA). IEEE, pp 1–7

    Google Scholar 

  2. Aggarwal CC, Zhai CX (2012) A survey of text classification algorithms. Mining text data, pp 163–222

    Google Scholar 

  3. Tang B, He H et al (2016) A Bayesian classification approach using class-specific features for text categorization. IEEE Trans Knowl Data Eng 28(6):1602–1606

    Google Scholar 

  4. Michael WB (2004) Survey of text mining: clustering, classification and retrieval. Automatic discovery of similar words. Springer, New York, pp 25–43

    Google Scholar 

  5. Rish I (2001) An empirical study of the Naïve Bayes classifier. IJCAI 2001 Work Empir Methods Artif Intell 3

    Google Scholar 

  6. Rennie JDM, Shih L, Teevan J, Karger DR (2003) Tackling the poor assumptions of Naïve Bayes text classifiers. In: Proceedings of the twentieth international conference on machine learning (ICML-2003). Washington DC

    Google Scholar 

  7. Moschitti A, Basili R (2004) Complex linguistic features for text classification: a comprehensive study. Lect Notes Comput Sci 2997:181–196

    Article  Google Scholar 

  8. Bekkerman R, Allan J (2004) Using bigrams in text categorization technical report IR-408. Center of Intelligent Information Retrieval, UMass Amherst

    Google Scholar 

  9. Farhoodi M, Yari A, Sayah A (2011) N-gram based text classification for Persian newspaper corpus. In: The 7th international conference on digital content, multimedia technology and its applications, pp 55–59

    Google Scholar 

  10. Zhang X, Wu B (2015) Short text classification based on feature extension using The N-Gram model. In: Proceedings of the 12th international conference on fuzzy systems and knowledge discovery, pp 710–716

    Google Scholar 

  11. Graovac J, Kovacevic J and Pavlovic-Lažetic G 2015 Language independent n-Gram-based text categorization with weighting factors: a case study. J Inf Data Manag 6(1):4–17

    Google Scholar 

  12. Pane RA, Mubarok MS, Huda NS, Adiwijaya (2018) A multi-lable classification on topics of Quranic verses in English translation using Multinomial Naïve Bayes. In: 2018 6th International Conference on Information and Communication Technology ICoICT, pp 481–484. https://doi.org/10.1109/ICoICT.2018.8528777

  13. Gourav B, Jindal R (2011) Similarity measures of research papers and patents using adaptive and parameter free threshold. Int J Comput Appl 33(5)

    Google Scholar 

  14. Wu KY, Zhou M et al (2017) A fuzzy logic-based text classification method for social media data. Systems, man, and cybernetics (SMC), IEEE international conference on, vol 13, no 3, pp 23–32

    Google Scholar 

  15. Liu J, Jin T et al (2017) An improved KNN text classification algorithm based on Simhash. In: Cognitive informatics & cognitive computing (ICCI* CC), 2017 IEEE 16th international conference on. IEEE, pp 92–95

    Google Scholar 

  16. Yudha BP, Sarrno R (2015) Personality classification based on Twitter text using Naïve Bayes, KNN and SVM. In: Data and software engineering (ICoDSE), in proceedings of international conference on. IEEE, pp 170–174

    Google Scholar 

  17. Wong T-T, Tsai H-C (2021) 2021 Multinomial Naïve Bayesian classifier with generalized dirichlet priors for high-dimensional imbalanced data. Knowl-Based Syst 228:107288

    Article  Google Scholar 

  18. Yao L, Chengsheng M, Yuan L (2019) Graph convolutional networks for text classification. In: Proceedings of the AAAI conference on artificial intelligence. PKP, Honolulu, pp 7370–7377

    Google Scholar 

  19. Defferrard M, Bresson X, Vandergheynst P (2016) Convolutional neural networks on graphs with fast localized spectral filtering. Adv Neural Inf Process Syst 29:3844–3852

    Google Scholar 

  20. Mingyang, J et al (2018) Text classification based on deep belief network and softmax regression. Neural Comput Appl 29(1):61–70

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Abbas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abbas, A., Jaiswal, M., Agarwal, S., Jha, P., Siddiqui, T.J. (2024). Performance Based Comparative Analysis of Naïve Bayes Variants for Text Classification. In: Tavares, J.M.R.S., Rodrigues, J.J.P.C., Misra, D., Bhattacherjee, D. (eds) Data Science and Communication. ICTDsC 2023. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-99-5435-3_20

Download citation

Publish with us

Policies and ethics