The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks

Salihovic, Ina; Serdarevic, Haris; Kevric, Jasmin

doi:10.1007/978-3-030-02577-9_47

Ina Salihovic³,
Haris Serdarevic³ &
Jasmin Kevric³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 60))

Included in the following conference series:

International Symposium on Innovative and Interdisciplinary Applications of Advanced Technologies

754 Accesses
7 Citations

Abstract

With the increase in Internet use throughout the world, expansion in network security is indispensable since it decreases the chances of privacy spoofing, identity or information theft and bank frauds. Two of the most frequent network security breaches involve phishing and spam emails as they are an easy way to pass a virus or a malicious site, which can lead to extensive frauds. Despite the fact that there is an abundance of tools for detection and blocking of these types of messages and websites, society is still trying to combat and rise above said problem. The purpose of this paper was to exclude the human factor in security breaches executed in this manner with the use of various machine learning algorithms. For the purpose of training and testing of the most successful algorithms (Random Forest, k-Nearest Neighbor, Artificial Neural Network, Support Vector Machine, Logistic Regression, Naive Bayes) paper used two separate bases, UCIs Phishing Websites Data Set and Spam Emails Dataset together with Weka software, and found that the best results for both of them are achieved with the Random Forest algorithm. However, databases responded differently to feature selection algorithms, as the best result for phishing (97.33% accuracy) was accomplished through Ranker + Principal Components Optimization, and the best result for spam (94.24% accuracy) was accomplished through BestFirst + CfsSubsEval Optimization in Weka. These findings provide a base platform for future work towards a faster and more accurate online fraud detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Comparative Analysis for Email Spam Detection Using Machine Learning Algorithms

Intelligent System for Detecting Spam Emails Using Machine Learning Classifiers

Detection and Binary Classification of Spear-Phishing Emails in Organizations Using a Hybrid Machine Learning Approach

References

Cyber Security Ventures. https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/lncs. Accessed 10 May 2018
Merriam Webester. https://www.merriam-webster.com/dictionary/spam/lncs. Accessed 09 May 2018
Jukic, S., et al.: Comparison of machine learning techniques in spam e-mail classification. SE Eur. J. Soft Comput. 4(1), 32–36 (2015)
MathSciNet Google Scholar
Symantec Report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-2017-en.pdf./lncs. Accessed 11 Mar 2018
Phising.org. http://www.phishing.org/history-of-phishing/lncs. Accessed 13 Mar 2018
Hodzic, A., et al.: Comparison of machine learning techniques in phishing website classification. In: Proceedings Book of International Conference on Economic and Social Studies, pp. 249–256. International Burch University, Sarajevo, BiH (2016)
Google Scholar
Data Breach Investigations Report Executive Summary. https://www.verizonenterprise.com/verizon-insights-lab/dbir/lncs. Accessed 24 Feb 2018
Sahoo, B., et al.: Malicious URL detection using machine learning: a survey. arXiv.org/abs/1701.07179 [cs.LG] (2017)
Altyeb, A.: Phishing websites classification using hybrid SVM and KNN approach. Int. J. Adv. Comput. Sci. Appl. 8(6), 90–95 (2017)
Google Scholar
Islam, M., Chowdhury, N.K.: Phishing websites detection using machine learning based classification techniques. In: 1st International Conference on Advanced Information and Communication Technology (2016)
Google Scholar
Seelio by Keypath Education. https://seelio.com/w/32o4/phishing-website-detection-using-machine-learning-in-weka. Accessed 10 Mar 2018
Rusland, N.F., et al.: Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. In: Materials Science and Engineering, vol. 226, IOP Publishing Ltd., Riga, Latvia (2017)
Google Scholar
Saad, O., et al.: A survey of machine learning techniques for spam filtering. Int. J. Comput. Sci. Netw. Secur. 12(2), 66 (2012)
Google Scholar
Bluszcz, J., et al.: Application of support vector machine algorithm in e-mail spam filtering (poster). Humboldt-Univerzität zu Berlin, Berlin (2016)
Google Scholar
Sakkis, G., et al.: A memorybased approach to anti-spam filtering for mailing lists. Inf. Retrieval 6(1), 49–73 (2003)
Article Google Scholar
Frank, E., Hall, M.A., Witten, I.A.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann, Burlington (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

International Burch University, Sarajevo, 71000, Bosnia and Herzegovina
Ina Salihovic, Haris Serdarevic & Jasmin Kevric

Authors

Ina Salihovic
View author publications
You can also search for this author in PubMed Google Scholar
Haris Serdarevic
View author publications
You can also search for this author in PubMed Google Scholar
Jasmin Kevric
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ina Salihovic .

Editor information

Editors and Affiliations

Faculty of Electrical Engineering, University of Sarajevo, Sarajevo, Bosnia and Herzegovina
Samir Avdaković

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salihovic, I., Serdarevic, H., Kevric, J. (2019). The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks. In: Avdaković, S. (eds) Advanced Technologies, Systems, and Applications III. IAT 2018. Lecture Notes in Networks and Systems, vol 60. Springer, Cham. https://doi.org/10.1007/978-3-030-02577-9_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-02577-9_47
Published: 04 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02576-2
Online ISBN: 978-3-030-02577-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis for Email Spam Detection Using Machine Learning Algorithms

Intelligent System for Detecting Spam Emails Using Machine Learning Classifiers

Detection and Binary Classification of Spear-Phishing Emails in Organizations Using a Hybrid Machine Learning Approach

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Analysis for Email Spam Detection Using Machine Learning Algorithms

Intelligent System for Detecting Spam Emails Using Machine Learning Classifiers

Detection and Binary Classification of Spear-Phishing Emails in Organizations Using a Hybrid Machine Learning Approach

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation