An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification

Bhopale, Amol P.; Tiwari, Ashish

doi:10.1007/978-3-030-82469-3_6

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 256))

Included in the following conference series:

International Conference on Machine Learning and Big Data Analytics

925 Accesses
1 Citations

Abstract

The increased use of cyberspace and social media has resulted in a rise in the number of unsolicited bulk e-mails, necessitating the implementation of a reliable system for filtering out such anomalies. In recent years several deep learning based word representation techniques are devised. These advances in the field of word representation can provide a robust solution to such problems. In this paper, we applied a transfer learning technique, i.e., a pre-trained Bidirectional Encoder Representations from Transformer (BERT) model is fine-tuned on the required datasets for spam email classification. The classification results are compared with other state-of-the-art classification techniques such as logistic regression, SVM, Naïve Bayes, Random Forest, and LSTM. To evaluate the performance of a proposed technique, experiments are carried out on two well-known datasets viz. Enron spam dataset with 33,716 email messages and Kaggle’s SMSSpamcollection dataset containing 5574 messages. Significant improvements are observed in results generated by the proposed model over other models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transformer-Based Attention Model for Email Spam Classification

Email Spam Detection Using Naive Bayes and Random Forest Classifiers

A Comprehensive Review of Fraudulent Email Detection Models

Notes

1.
https://github.com/flairNLP/flair.

References

Devlin, J., Chang, M-W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: NAACL, (2019)
Google Scholar
Masud, K., Rashedur, M.R.: Decision tree and naïve Bayes algorithm for classification and generation of actionable knowledge for direct marketing. J. Soft Eng. Appl. 6(4), 196–206 (2013)
Article Google Scholar
Bahgat, E.M., Rady, S., Gad, W.: An e-mail filtering approach using classification techniques. In: Gaber, T., Hassanien, A.E., El-Bendary, N., Dey, N. (eds.) The 1st International Conference on Advanced Intelligent System and Informatics (AISI2015), November 28–30, 2015, Beni Suef, Egypt. AISC, vol. 407, pp. 321–331. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-26690-9_29
Chapter Google Scholar
Bouguila, N., Amayri, O.: A discrete mixture-based kernel for SVMs: application to spam and image categorization. Inf. Process. Manag. 45(6), 631–642 (2009)
Article Google Scholar
Torabi, Z.S., Nadimi-Shahraki, M.H., Nabiollahi, A.: Efficient support vector machines for spam detection: a survey. Int. J. Comput. Sci. Inf. Secur. 13, 11–28 (2015)
Google Scholar
Cao, Y., Liao, X., Li, Y.: An e-mail filtering approach using neural network. In: Yin, F.-L., Wang, J., Guo, C. (eds.) ISNN 2004. LNCS, vol. 3174, pp. 688–694. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28648-6_110
Chapter Google Scholar
Fdez-Riverola, F., Iglesias, E.L., Diaz, F., Mendez, J.R., Corchado, J.M.: SpamHunting: an instance-based reasoning system for spam labelling and filtering. Decis. Support Syst. 43(3), 722–736 (2007)
Article Google Scholar
Christina, V., Karpagavalli, S., Suganya, G.: Email spam filtering using supervised machine learning techniques. Int. J. Comput. Sci. Eng. 02(09), 3126–3129 (2010)
Google Scholar
Méndez, J.R., Fdez-Riverola, F., Díaz, F., Iglesias, E.L., Corchado, J.M.: A comparative performance study of feature selection methods for the anti-spam filtering domain. In: Perner, P. (ed.) ICDM 2006. LNCS (LNAI), vol. 4065, pp. 106–120. Springer, Heidelberg (2006). https://doi.org/10.1007/11790853_9
Chapter Google Scholar
Lee, J.S, Hsiang, J.: PatentBERT: Patent classification with fine-tuning a pre-trained BERT Model. ArXiv abs/1906.02124 (2019)
Google Scholar
Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6(1), 26094 (2016)
Article Google Scholar
Ashutosh, A., Ram, A., Tang, R., Lin, J.: DocBERT: Bert for document classification. arXiv preprint arXiv:1904.08398 (2019)
Guoqin, M.: Tweets classification with BERT in the field of disaster management. In: StudentReport@Stanford.edu (2019)
Google Scholar
Wu, Y., et al.: Google's neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Visvesvaraya National Institute of Technology, Nagpur, 440010, India
Amol P. Bhopale & Ashish Tiwari

Authors

Amol P. Bhopale
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Tiwari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Institute of Technology Patna, Patna, India
Rajiv Misra
Indian Institute of Technology Bombay, Mumbai, India
Rudrapatna K. Shyamasundar
Indian Institute of Technology (BHU), Varanasi, India
Amrita Chaturvedi
Cardiff University, Cardiff, UK
Rana Omer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhopale, A.P., Tiwari, A. (2022). An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification. In: Misra, R., Shyamasundar, R.K., Chaturvedi, A., Omer, R. (eds) Machine Learning and Big Data Analytics (Proceedings of International Conference on Machine Learning and Big Data Analytics (ICMLBDA) 2021). ICMLBDA 2021. Lecture Notes in Networks and Systems, vol 256. Springer, Cham. https://doi.org/10.1007/978-3-030-82469-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-82469-3_6
Published: 30 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-82468-6
Online ISBN: 978-3-030-82469-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transformer-Based Attention Model for Email Spam Classification

Email Spam Detection Using Naive Bayes and Random Forest Classifiers

A Comprehensive Review of Fraudulent Email Detection Models

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Application of Transfer Learning: Fine-Tuning BERT for Spam Email Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transformer-Based Attention Model for Email Spam Classification

Email Spam Detection Using Naive Bayes and Random Forest Classifiers

A Comprehensive Review of Fraudulent Email Detection Models

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation