Skip to main content

The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks

  • Conference paper
  • First Online:
Advanced Technologies, Systems, and Applications III (IAT 2018)

Abstract

With the increase in Internet use throughout the world, expansion in network security is indispensable since it decreases the chances of privacy spoofing, identity or information theft and bank frauds. Two of the most frequent network security breaches involve phishing and spam emails as they are an easy way to pass a virus or a malicious site, which can lead to extensive frauds. Despite the fact that there is an abundance of tools for detection and blocking of these types of messages and websites, society is still trying to combat and rise above said problem. The purpose of this paper was to exclude the human factor in security breaches executed in this manner with the use of various machine learning algorithms. For the purpose of training and testing of the most successful algorithms (Random Forest, k-Nearest Neighbor, Artificial Neural Network, Support Vector Machine, Logistic Regression, Naive Bayes) paper used two separate bases, UCIs Phishing Websites Data Set and Spam Emails Dataset together with Weka software, and found that the best results for both of them are achieved with the Random Forest algorithm. However, databases responded differently to feature selection algorithms, as the best result for phishing (97.33% accuracy) was accomplished through Ranker + Principal Components Optimization, and the best result for spam (94.24% accuracy) was accomplished through BestFirst + CfsSubsEval Optimization in Weka. These findings provide a base platform for future work towards a faster and more accurate online fraud detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cyber Security Ventures. https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/lncs. Accessed 10 May 2018

  2. Merriam Webester. https://www.merriam-webster.com/dictionary/spam/lncs. Accessed 09 May 2018

  3. Jukic, S., et al.: Comparison of machine learning techniques in spam e-mail classification. SE Eur. J. Soft Comput. 4(1), 32–36 (2015)

    MathSciNet  Google Scholar 

  4. Symantec Report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-2017-en.pdf./lncs. Accessed 11 Mar 2018

  5. Phising.org. http://www.phishing.org/history-of-phishing/lncs. Accessed 13 Mar 2018

  6. Hodzic, A., et al.: Comparison of machine learning techniques in phishing website classification. In: Proceedings Book of International Conference on Economic and Social Studies, pp. 249–256. International Burch University, Sarajevo, BiH (2016)

    Google Scholar 

  7. Data Breach Investigations Report Executive Summary. https://www.verizonenterprise.com/verizon-insights-lab/dbir/lncs. Accessed 24 Feb 2018

  8. Sahoo, B., et al.: Malicious URL detection using machine learning: a survey. arXiv.org/abs/1701.07179 [cs.LG] (2017)

  9. Altyeb, A.: Phishing websites classification using hybrid SVM and KNN approach. Int. J. Adv. Comput. Sci. Appl. 8(6), 90–95 (2017)

    Google Scholar 

  10. Islam, M., Chowdhury, N.K.: Phishing websites detection using machine learning based classification techniques. In: 1st International Conference on Advanced Information and Communication Technology (2016)

    Google Scholar 

  11. Seelio by Keypath Education. https://seelio.com/w/32o4/phishing-website-detection-using-machine-learning-in-weka. Accessed 10 Mar 2018

  12. Rusland, N.F., et al.: Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. In: Materials Science and Engineering, vol. 226, IOP Publishing Ltd., Riga, Latvia (2017)

    Google Scholar 

  13. Saad, O., et al.: A survey of machine learning techniques for spam filtering. Int. J. Comput. Sci. Netw. Secur. 12(2), 66 (2012)

    Google Scholar 

  14. Bluszcz, J., et al.: Application of support vector machine algorithm in e-mail spam filtering (poster). Humboldt-Univerzität zu Berlin, Berlin (2016)

    Google Scholar 

  15. Sakkis, G., et al.: A memorybased approach to anti-spam filtering for mailing lists. Inf. Retrieval 6(1), 49–73 (2003)

    Article  Google Scholar 

  16. Frank, E., Hall, M.A., Witten, I.A.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann, Burlington (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ina Salihovic .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salihovic, I., Serdarevic, H., Kevric, J. (2019). The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks. In: Avdaković, S. (eds) Advanced Technologies, Systems, and Applications III. IAT 2018. Lecture Notes in Networks and Systems, vol 60. Springer, Cham. https://doi.org/10.1007/978-3-030-02577-9_47

Download citation

Publish with us

Policies and ethics