Abstract
About more than half of worldwide email traffic, amounting several billions per day consists of spam causing considerable disturbance in telecommunications. This upheaval volume of unwanted messages implies an intense need for reliable and robust spam filters. Conventional filtering methods have largely failed to tackle the adaptive nature of spam messages. Machine learning methods, on the contrary, may have the ability to intelligently detect and filter spams. Here, we present a system of spam message prediction based on appropriate lexical analysis like tokenization, stop-words removal, stemming, lemmatization, and feature extraction. Impressive results, i.e., over 97% accuracy with random forest classifier, have been obtained in several experiments on the UCI spam collection dataset. We have also hosted the developed spam-detection system on Heroku as platform-as-a-service (PAAS).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Awad M, Foqaha M (2016) Email spam classification using hybrid approach of RBF neural network and particle swarm optimization. Int J Netw Secur Appl 8(4):17–28. http://doi.org/10.5121/ijnsa.2016.8402
Fonseca O, Fazzion E, Cunha I, Las-Casas PHB, Guedes D, Meira W, Hoepers C, Steding-Jessen K, Chaves MHP (2016) Measuring, characterizing, and avoiding spam traffic costs. IEEE Int Comp 20(4):16–24. http://doi.org/10.1109/MIC.2016.53
Kaspersky Lab Report. https://www.kaspersky.com/about/press-releases/2013_kaspersky-lab-report-37-3-million-users-experienced-phishing-attacks-in-the-last-year. Accessed on 1 Feb 2021
Cormack GV, Smucker MD, Clarke CL (2011) Efficient and effective spam filtering and re-ranking for large web datasets. Inf Retrieval 14(5):441–465. arXiv:1004.5168v1
Awad WA, ELseuofi SM (2011) Machine learning methods for spam e-mail classification. Int J Comput Sci Inf Technol 3(1):173–184. http://doi.org/10.5121/ijcsit.2011.3112.173
Marsono MN, El-Kharashi MW, Gebali F (2008) Binary LNS-based naive Bayes inference engine for spam control: noise analysis and FPGA synthesis. IET Comput Digit Tech 2(1):56–62. http://doi.org/10.1049/iet-cdt:20050180
Amayri O (2009) On email spam filtering using a support vector machine. Doctoral dissertation, Concordia University. https://spectrum.library.concordia.ca/976212/
Torabi ZS, Nadimi-Shahraki MH, Nabiollahi A (2015) Efficient support vector machines for spam detection: a survey. Int J Comput Sci Inf Secur 13(1):11–28
Chawla G, Saini R (2016) Implementation of improved KNN algorithm for email spam detection. Int J Trends Res Dev 3(5):479–483
Cao Y, Liao X, Li Y (2004) An e-mail filtering approach using neural network. In: International symposium on neural networks, pp 688–694. http://doi.org/10.1007/978-3-540-28648-6_110
Dada EG, Joseph SB (2018) Random forests machine learning technique for email spam filtering. Semin Ser 9(1):29–36
Sheng S, Holbrook M, Kumaraguru P, Cranor LF, Downs J (2010) Who falls for phish? A demographic analysis of phishing susceptibility and effectiveness of interventions. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 373–382. http://doi.org/10.1145/1753326.1753383
Akinyelu AA, Adewumi AO (2014) Classification of phishing email using random forest machine learning technique. J Appl Math 2014(425731):1–6. https://doi.org/10.1155/2014/425731
Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutorials 15(4):2091–2121. https://doi.org/10.1109/SURV.2013.032213.00009
Obied A (2007) Bayesian spam filtering. Department of Computer Science, University of Calgary
Wang XL (2005) Learning to classify email: a survey. In: International conference on machine learning and cybernetics. IEEE, pp 5716–5719. http://doi.org/10.1109/ICMLC.2005.1527956
Bhowmick A, Hazarika SM (2018) E-mail spam filtering: a review of techniques and trends. In: Advances in electronics, communication and computing, pp 583–590. http://doi.org/10.1007/978-981-10-4765-7
Karthika R, Visalakshi P (2015) A hybrid ACO based feature selection method for email spam classification. WSEAS Trans Comput 14:171–177
Deshpande VP, Erbacher RF, Harris C (2007) An evaluation of naive bayesian anti-spam filtering techniques. In: IEEE SMC information assurance and security workshop, pp 333–340. http://doi.org/10.1109/IAW.2007.381951
Mishra R, Thakur RS (2013) Analysis of random forest and Naive Bayes for spam mail using feature selection categorization. Int J Comput Appl 80(3):42–47. http://doi.org/10.5120/13844-1670
Sjarif NNA, Azmi NFM, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515. http://doi.org/10.1016/j.procs.2019.11.150
UCI SMS Spam Collection Dataset. https://www.kaggle.com/uciml/sms-spam-collection-dataset. Accessed 17 Sept 2020
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sartaj, S., Mollah, A.F. (2021). An Intelligent System for Spam Message Detection. In: Sheth, A., Sinhal, A., Shrivastava, A., Pandey, A.K. (eds) Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-2248-9_37
Download citation
DOI: https://doi.org/10.1007/978-981-16-2248-9_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-2247-2
Online ISBN: 978-981-16-2248-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)