Abstract
In July 2021, the daily spam count globally reached 283 billion and constitutes 84.12% of the total email volume. The increasing surge in the spam or unsolicited emails that can hamper communication has led to an intrinsic requirement for robust and reliable antispam filters. In recent years, spam filtration and monitoring have become significant concerns for mail and other internet services. Machine learning strategies are being employed to act as safeguards against internet spam. This study provides a systematic survey of spam filtering methods using machine learning techniques. Logistic Regression, Random Forest, Naive Bayes, and Decision Tree methods used for spam filtering have been compared based on precision, recall, and accuracy on a dataset composed of Twitter tweets, Facebook posts, and YouTube comments. The preliminary discussion involves a background study of the related work on spam filtering and the research gaps in the current literature. Further, a detailed discussion on each method has been provided in this study. The results of our experiments indicate that Decision Trees provide the best accuracy at 97.02% and precision at 98.83%, and Logistic Regression has the highest recall at 99.89%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Faris, H., Al-Zoubi, A.M., Heidari, A.A., Aljarah, I., Mafarja, M., Hassonah, M.A., Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 48, 67–83 (2019)
Yu, H.Q., Reiff-Marganiec, S.: Learning disease causality knowledge from the web of health data. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–19 (2022)
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29, 63–92 (2008)
Alghoul, A., Ajrami, S., Jarousha, G., Harb, G., Abu-Naser, S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (2018)
Sahoo, S.R., et al.: Spammer detection approaches in online social network (OSNs): a survey. In: Sustainable Management of Manufacturing Systems in Industry 4.0, pp. 159–180. Springer, Cham (2022)
Gupta, B.B., Badve, O.P.: GARCH and ANN-based DDoS detection and filtering in cloud computing environment. Int. J. Embed. Syst. 9(5), 391–400 (2017)
Udayakumar, N., Anandaselvi, S., Subbulakshmi, T.: Dynamic malware analysis using machine learning algorithm. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS) (2017)
Chui KT, et al.: Handling data heterogeneity in electricity load disaggregation via optimized complete ensemble empirical mode decomposition and wavelet packet transform. Sensors 21(9):3133 (2021). https://doi.org/10.3390/s21093133
DeBarr, D., Wechsler, H.: Using social network analysis for Spam Detection. Adv. Soc. Comput. 62–69 (2010)
Lu, J., Shen, J., et al.: Blockchain-based secure data storage protocol for sensors in the industrial internet of things. IEEE Trans. Indus. Inf. 18(8), 5422–5431 (2022). https://doi.org/10.1109/TII.2021.3112601
Rusland, N.F., Wahid, N., Kasim, S., Hafit, H.: Analysis of Naive Bayes algorithm for email spam filtering across multiple datasets. In: IOP Conference Series: Materials Science and Engineering, vol. 226, p. 012091 (2017)
Xu, H., Sun, W., Javaid, A.: Efficient spam detection across online social networks. In: 2016 IEEE International Conference on Big Data Analysis (ICBDA) (2016)
Gupta, B.B.: A lightweight mutual authentication approach for RFID tags in IoT devices. Int. J. Netw. Virtual Organ. (2016)
Hijawi, W., Faris, H., Alqatawna, J., Al-Zoubi, A.M., Aljarah, I.: Improving email spam detection using content based feature engineering approach. In: 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2017)
Banaday, M., Jan, T.: Effectiveness and limitations of statistical spam filters. In: arXiv. (2009)
Cvitić, I., Peraković, D., Periša, M. et al.: Ensemble machine learning approach for classification of IoT devices in smart home. Int. J. Mach. Learn. Cyber. 12, 3179–3202 (2021). https://doi.org/10.1007/s13042-020-01241-0
Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)
Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)
Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)
Dean, J.: Large-scale deep learning for building intelligent computer systems. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (2016)
Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)
Barushka, A., Hájek, P.: Spam filtering using regularized neural networks with rectified linear units. In: AI*IA 2016 Advances in Artificial Intelligence, pp. 65–75 (2016)
Gupta, S., Sharma, P., Sharma, D., Gupta, V., Sambyal, N.: Detection and localization of potholes in thermal images using deep neural networks. Multimedia Tools Appl. 79, 26265–26284 (2020)
Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)
Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J. Inf. Secur. Appl. 50, 102419 (2020)
Kumar, N., Sonowal, S., Nishant: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (2020)
Sharma, R., Sharma, T.P., Sharma, A.K.: Detecting and preventing misbehaving intruders in the internet of vehicles. Int. J. Cloud Appl. Comput. (IJCAC) 12(1), 1–21 (2022)
Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information (2009)
Bhuiyan, H., Ashiquzzaman, A., Juthi, T., Biswas, S., Ara, J.: A survey of existing E-Mail spam filtering methods considering machine learning techniques. Global J. Comput. Sci. Technol. (2018)
Kumar, S., Singh, S.K., Aggarwal, N., Aggarwal, K.: Evaluation of automatic parallelization algorithms to minimize speculative parallelism overheads: an experiment. J. Discrete Math. Sci. Crypt. 24, 1517–1528 (2021)
Singh, I., Singh, S.K., Kumar, S., Aggarwal, K.: Dropout-VGG based convolutional neural network for traffic sign categorization. Lecture Notes on Data Engineering and Communications Technologies, pp. 247–261 (2022)
Ling, Z., Hao, Z.J.: An intrusion detection system based on normalized mutual information antibodies feature selection and adaptive quantum artificial immune system. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–25 (2022)
Singh, I., Singh, S.K., Singh, R., Kumar, S.: Efficient loop unrolling factor prediction algorithm using machine learning models. In: 2022 3rd International Conference for Emerging Technology (INCET) (2022)
Singh, S.K.: Linux yourself (2021)
Gansterer, W.N., Janecek, A.G., Neumayer, R.: Spam filtering based on latent semantic indexing. In: Survey of Text Mining II, pp. 165–183 (2008)
Lee, D., Lee, M.J., Kim, B.J.: Deviation-based spam-filtering method via stochastic approach. EPL (Europhys. Lett.) 121, 68004 (2018)
Wang, J., Katagishi, K.: Image content-based email spam image filtering. J. Adv. Comput. Netw. 2, 110–114 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gupta, S., Chhabra, A., Agrawal, S., Singh, S.K. (2023). A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering. In: Nedjah, N., Martínez Pérez, G., Gupta, B.B. (eds) International Conference on Cyber Security, Privacy and Networking (ICSPN 2022). ICSPN 2021. Lecture Notes in Networks and Systems, vol 599. Springer, Cham. https://doi.org/10.1007/978-3-031-22018-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-22018-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22017-3
Online ISBN: 978-3-031-22018-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)