A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering

Gupta, Saksham; Chhabra, Amit; Agrawal, Satvik; Singh, Sunil K.

doi:10.1007/978-3-031-22018-0_24

Saksham Gupta¹²,
Amit Chhabra¹²,
Satvik Agrawal¹² &
…
Sunil K. Singh¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 599))

Included in the following conference series:

International Conference on Cyber Security, Privacy and Networking

387 Accesses
3 Citations

Abstract

In July 2021, the daily spam count globally reached 283 billion and constitutes 84.12% of the total email volume. The increasing surge in the spam or unsolicited emails that can hamper communication has led to an intrinsic requirement for robust and reliable antispam filters. In recent years, spam filtration and monitoring have become significant concerns for mail and other internet services. Machine learning strategies are being employed to act as safeguards against internet spam. This study provides a systematic survey of spam filtering methods using machine learning techniques. Logistic Regression, Random Forest, Naive Bayes, and Decision Tree methods used for spam filtering have been compared based on precision, recall, and accuracy on a dataset composed of Twitter tweets, Facebook posts, and YouTube comments. The preliminary discussion involves a background study of the related work on spam filtering and the research gaps in the current literature. Further, a detailed discussion on each method has been provided in this study. The results of our experiments indicate that Decision Trees provide the best accuracy at 97.02% and precision at 98.83%, and Logistic Regression has the highest recall at 99.89%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

The Comparison of Machine Learning Methods for Email Spam Detection

Analyzing Random Forest, Naive Bayes, and SVM to Filter Spam Emails Across Multiple Datasets

References

Faris, H., Al-Zoubi, A.M., Heidari, A.A., Aljarah, I., Mafarja, M., Hassonah, M.A., Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 48, 67–83 (2019)
Google Scholar
Yu, H.Q., Reiff-Marganiec, S.: Learning disease causality knowledge from the web of health data. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–19 (2022)
Article Google Scholar
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29, 63–92 (2008)
Google Scholar
Alghoul, A., Ajrami, S., Jarousha, G., Harb, G., Abu-Naser, S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (2018)
Google Scholar
Sahoo, S.R., et al.: Spammer detection approaches in online social network (OSNs): a survey. In: Sustainable Management of Manufacturing Systems in Industry 4.0, pp. 159–180. Springer, Cham (2022)
Google Scholar
Gupta, B.B., Badve, O.P.: GARCH and ANN-based DDoS detection and filtering in cloud computing environment. Int. J. Embed. Syst. 9(5), 391–400 (2017)
Article Google Scholar
Udayakumar, N., Anandaselvi, S., Subbulakshmi, T.: Dynamic malware analysis using machine learning algorithm. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS) (2017)
Google Scholar
Chui KT, et al.: Handling data heterogeneity in electricity load disaggregation via optimized complete ensemble empirical mode decomposition and wavelet packet transform. Sensors 21(9):3133 (2021). https://doi.org/10.3390/s21093133
DeBarr, D., Wechsler, H.: Using social network analysis for Spam Detection. Adv. Soc. Comput. 62–69 (2010)
Google Scholar
Lu, J., Shen, J., et al.: Blockchain-based secure data storage protocol for sensors in the industrial internet of things. IEEE Trans. Indus. Inf. 18(8), 5422–5431 (2022). https://doi.org/10.1109/TII.2021.3112601
Article Google Scholar
Rusland, N.F., Wahid, N., Kasim, S., Hafit, H.: Analysis of Naive Bayes algorithm for email spam filtering across multiple datasets. In: IOP Conference Series: Materials Science and Engineering, vol. 226, p. 012091 (2017)
Google Scholar
Xu, H., Sun, W., Javaid, A.: Efficient spam detection across online social networks. In: 2016 IEEE International Conference on Big Data Analysis (ICBDA) (2016)
Google Scholar
Gupta, B.B.: A lightweight mutual authentication approach for RFID tags in IoT devices. Int. J. Netw. Virtual Organ. (2016)
Google Scholar
Hijawi, W., Faris, H., Alqatawna, J., Al-Zoubi, A.M., Aljarah, I.: Improving email spam detection using content based feature engineering approach. In: 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2017)
Google Scholar
Banaday, M., Jan, T.: Effectiveness and limitations of statistical spam filters. In: arXiv. (2009)
Google Scholar
Cvitić, I., Peraković, D., Periša, M. et al.: Ensemble machine learning approach for classification of IoT devices in smart home. Int. J. Mach. Learn. Cyber. 12, 3179–3202 (2021). https://doi.org/10.1007/s13042-020-01241-0
Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)
Google Scholar
Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)
Google Scholar
Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)
Google Scholar
Dean, J.: Large-scale deep learning for building intelligent computer systems. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (2016)
Google Scholar
Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)
Google Scholar
Barushka, A., Hájek, P.: Spam filtering using regularized neural networks with rectified linear units. In: AI*IA 2016 Advances in Artificial Intelligence, pp. 65–75 (2016)
Google Scholar
Gupta, S., Sharma, P., Sharma, D., Gupta, V., Sambyal, N.: Detection and localization of potholes in thermal images using deep neural networks. Multimedia Tools Appl. 79, 26265–26284 (2020)
Article Google Scholar
Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)
Google Scholar
Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J. Inf. Secur. Appl. 50, 102419 (2020)
Google Scholar
Kumar, N., Sonowal, S., Nishant: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (2020)
Google Scholar
Sharma, R., Sharma, T.P., Sharma, A.K.: Detecting and preventing misbehaving intruders in the internet of vehicles. Int. J. Cloud Appl. Comput. (IJCAC) 12(1), 1–21 (2022)
MathSciNet Google Scholar
Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information (2009)
Google Scholar
Bhuiyan, H., Ashiquzzaman, A., Juthi, T., Biswas, S., Ara, J.: A survey of existing E-Mail spam filtering methods considering machine learning techniques. Global J. Comput. Sci. Technol. (2018)
Google Scholar
Kumar, S., Singh, S.K., Aggarwal, N., Aggarwal, K.: Evaluation of automatic parallelization algorithms to minimize speculative parallelism overheads: an experiment. J. Discrete Math. Sci. Crypt. 24, 1517–1528 (2021)
MATH Google Scholar
Singh, I., Singh, S.K., Kumar, S., Aggarwal, K.: Dropout-VGG based convolutional neural network for traffic sign categorization. Lecture Notes on Data Engineering and Communications Technologies, pp. 247–261 (2022)
Google Scholar
Ling, Z., Hao, Z.J.: An intrusion detection system based on normalized mutual information antibodies feature selection and adaptive quantum artificial immune system. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–25 (2022)
Google Scholar
Singh, I., Singh, S.K., Singh, R., Kumar, S.: Efficient loop unrolling factor prediction algorithm using machine learning models. In: 2022 3rd International Conference for Emerging Technology (INCET) (2022)
Google Scholar
Singh, S.K.: Linux yourself (2021)
Google Scholar
Gansterer, W.N., Janecek, A.G., Neumayer, R.: Spam filtering based on latent semantic indexing. In: Survey of Text Mining II, pp. 165–183 (2008)
Google Scholar
Lee, D., Lee, M.J., Kim, B.J.: Deviation-based spam-filtering method via stochastic approach. EPL (Europhys. Lett.) 121, 68004 (2018)
Google Scholar
Wang, J., Katagishi, K.: Image content-based email spam image filtering. J. Adv. Comput. Netw. 2, 110–114 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Chandigarh College of Engineering and Technology, Sector 26, Chandigarh, India
Saksham Gupta, Amit Chhabra & Satvik Agrawal
Kalinga Institute of Industrial Technology, Patia, Bhubaneswar, Odisha, India
Sunil K. Singh

Authors

Saksham Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Amit Chhabra
View author publications
You can also search for this author in PubMed Google Scholar
Satvik Agrawal
View author publications
You can also search for this author in PubMed Google Scholar
Sunil K. Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saksham Gupta .

Editor information

Editors and Affiliations

Electronics Engineering and Telecom, State University of Rio de Janeiro, Rio de Janeiro, Brazil
Nadia Nedjah
University of Murcia, Murcia, Spain
Gregorio Martínez Pérez
Asia University, Taichung, Taiwan
B. B. Gupta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gupta, S., Chhabra, A., Agrawal, S., Singh, S.K. (2023). A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering. In: Nedjah, N., Martínez Pérez, G., Gupta, B.B. (eds) International Conference on Cyber Security, Privacy and Networking (ICSPN 2022). ICSPN 2021. Lecture Notes in Networks and Systems, vol 599. Springer, Cham. https://doi.org/10.1007/978-3-031-22018-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-22018-0_24
Published: 21 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22017-3
Online ISBN: 978-3-031-22018-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

The Comparison of Machine Learning Methods for Email Spam Detection

Analyzing Random Forest, Naive Bayes, and SVM to Filter Spam Emails Across Multiple Datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing the Performance Variations of Naive Bayes, Linear SVM, and Random Forest for Spam Detection: A Comprehensive Study on the &Quot; Spam or Ham" Dataset

The Comparison of Machine Learning Methods for Email Spam Detection

Analyzing Random Forest, Naive Bayes, and SVM to Filter Spam Emails Across Multiple Datasets

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation