Skip to main content

An Intelligent System for Spam Message Detection

  • Conference paper
  • First Online:
Intelligent Systems

Abstract

About more than half of worldwide email traffic, amounting several billions per day consists of spam causing considerable disturbance in telecommunications. This upheaval volume of unwanted messages implies an intense need for reliable and robust spam filters. Conventional filtering methods have largely failed to tackle the adaptive nature of spam messages. Machine learning methods, on the contrary, may have the ability to intelligently detect and filter spams. Here, we present a system of spam message prediction based on appropriate lexical analysis like tokenization, stop-words removal, stemming, lemmatization, and feature extraction. Impressive results, i.e., over 97% accuracy with random forest classifier, have been obtained in several experiments on the UCI spam collection dataset. We have also hosted the developed spam-detection system on Heroku as platform-as-a-service (PAAS).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Awad M, Foqaha M (2016) Email spam classification using hybrid approach of RBF neural network and particle swarm optimization. Int J Netw Secur Appl 8(4):17–28. http://doi.org/10.5121/ijnsa.2016.8402

  2. Fonseca O, Fazzion E, Cunha I, Las-Casas PHB, Guedes D, Meira W, Hoepers C, Steding-Jessen K, Chaves MHP (2016) Measuring, characterizing, and avoiding spam traffic costs. IEEE Int Comp 20(4):16–24. http://doi.org/10.1109/MIC.2016.53

  3. Kaspersky Lab Report. https://www.kaspersky.com/about/press-releases/2013_kaspersky-lab-report-37-3-million-users-experienced-phishing-attacks-in-the-last-year. Accessed on 1 Feb 2021

  4. Cormack GV, Smucker MD, Clarke CL (2011) Efficient and effective spam filtering and re-ranking for large web datasets. Inf Retrieval 14(5):441–465. arXiv:1004.5168v1

  5. Awad WA, ELseuofi SM (2011) Machine learning methods for spam e-mail classification. Int J Comput Sci Inf Technol 3(1):173–184. http://doi.org/10.5121/ijcsit.2011.3112.173

  6. Marsono MN, El-Kharashi MW, Gebali F (2008) Binary LNS-based naive Bayes inference engine for spam control: noise analysis and FPGA synthesis. IET Comput Digit Tech 2(1):56–62. http://doi.org/10.1049/iet-cdt:20050180

  7. Amayri O (2009) On email spam filtering using a support vector machine. Doctoral dissertation, Concordia University. https://spectrum.library.concordia.ca/976212/

  8. Torabi ZS, Nadimi-Shahraki MH, Nabiollahi A (2015) Efficient support vector machines for spam detection: a survey. Int J Comput Sci Inf Secur 13(1):11–28

    Google Scholar 

  9. Chawla G, Saini R (2016) Implementation of improved KNN algorithm for email spam detection. Int J Trends Res Dev 3(5):479–483

    Google Scholar 

  10. Cao Y, Liao X, Li Y (2004) An e-mail filtering approach using neural network. In: International symposium on neural networks, pp 688–694. http://doi.org/10.1007/978-3-540-28648-6_110

  11. Dada EG, Joseph SB (2018) Random forests machine learning technique for email spam filtering. Semin Ser 9(1):29–36

    Google Scholar 

  12. Sheng S, Holbrook M, Kumaraguru P, Cranor LF, Downs J (2010) Who falls for phish? A demographic analysis of phishing susceptibility and effectiveness of interventions. In: Proceedings of the SIGCHI conference on human factors in computing systems, pp 373–382. http://doi.org/10.1145/1753326.1753383

  13. Akinyelu AA, Adewumi AO (2014) Classification of phishing email using random forest machine learning technique. J Appl Math 2014(425731):1–6. https://doi.org/10.1155/2014/425731

    Article  Google Scholar 

  14. Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutorials 15(4):2091–2121. https://doi.org/10.1109/SURV.2013.032213.00009

    Article  Google Scholar 

  15. Obied A (2007) Bayesian spam filtering. Department of Computer Science, University of Calgary

    Google Scholar 

  16. Wang XL (2005) Learning to classify email: a survey. In: International conference on machine learning and cybernetics. IEEE, pp 5716–5719. http://doi.org/10.1109/ICMLC.2005.1527956

  17. Bhowmick A, Hazarika SM (2018) E-mail spam filtering: a review of techniques and trends. In: Advances in electronics, communication and computing, pp 583–590. http://doi.org/10.1007/978-981-10-4765-7

  18. Karthika R, Visalakshi P (2015) A hybrid ACO based feature selection method for email spam classification. WSEAS Trans Comput 14:171–177

    Google Scholar 

  19. Deshpande VP, Erbacher RF, Harris C (2007) An evaluation of naive bayesian anti-spam filtering techniques. In: IEEE SMC information assurance and security workshop, pp 333–340. http://doi.org/10.1109/IAW.2007.381951

  20. Mishra R, Thakur RS (2013) Analysis of random forest and Naive Bayes for spam mail using feature selection categorization. Int J Comput Appl 80(3):42–47. http://doi.org/10.5120/13844-1670

  21. Sjarif NNA, Azmi NFM, Chuprat S, Sarkan HM, Yahya Y, Sam SM (2019) SMS spam message detection using term frequency-inverse document frequency and random forest algorithm. Procedia Comput Sci 161:509–515. http://doi.org/10.1016/j.procs.2019.11.150

  22. UCI SMS Spam Collection Dataset. https://www.kaggle.com/uciml/sms-spam-collection-dataset. Accessed 17 Sept 2020

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sartaj, S., Mollah, A.F. (2021). An Intelligent System for Spam Message Detection. In: Sheth, A., Sinhal, A., Shrivastava, A., Pandey, A.K. (eds) Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-16-2248-9_37

Download citation

Publish with us

Policies and ethics