Abstract
With the increase in Internet use throughout the world, expansion in network security is indispensable since it decreases the chances of privacy spoofing, identity or information theft and bank frauds. Two of the most frequent network security breaches involve phishing and spam emails as they are an easy way to pass a virus or a malicious site, which can lead to extensive frauds. Despite the fact that there is an abundance of tools for detection and blocking of these types of messages and websites, society is still trying to combat and rise above said problem. The purpose of this paper was to exclude the human factor in security breaches executed in this manner with the use of various machine learning algorithms. For the purpose of training and testing of the most successful algorithms (Random Forest, k-Nearest Neighbor, Artificial Neural Network, Support Vector Machine, Logistic Regression, Naive Bayes) paper used two separate bases, UCIs Phishing Websites Data Set and Spam Emails Dataset together with Weka software, and found that the best results for both of them are achieved with the Random Forest algorithm. However, databases responded differently to feature selection algorithms, as the best result for phishing (97.33% accuracy) was accomplished through Ranker + Principal Components Optimization, and the best result for spam (94.24% accuracy) was accomplished through BestFirst + CfsSubsEval Optimization in Weka. These findings provide a base platform for future work towards a faster and more accurate online fraud detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cyber Security Ventures. https://cybersecurityventures.com/hackerpocalypse-cybercrime-report-2016/lncs. Accessed 10 May 2018
Merriam Webester. https://www.merriam-webster.com/dictionary/spam/lncs. Accessed 09 May 2018
Jukic, S., et al.: Comparison of machine learning techniques in spam e-mail classification. SE Eur. J. Soft Comput. 4(1), 32–36 (2015)
Symantec Report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-22-2017-en.pdf./lncs. Accessed 11 Mar 2018
Phising.org. http://www.phishing.org/history-of-phishing/lncs. Accessed 13 Mar 2018
Hodzic, A., et al.: Comparison of machine learning techniques in phishing website classification. In: Proceedings Book of International Conference on Economic and Social Studies, pp. 249–256. International Burch University, Sarajevo, BiH (2016)
Data Breach Investigations Report Executive Summary. https://www.verizonenterprise.com/verizon-insights-lab/dbir/lncs. Accessed 24 Feb 2018
Sahoo, B., et al.: Malicious URL detection using machine learning: a survey. arXiv.org/abs/1701.07179 [cs.LG] (2017)
Altyeb, A.: Phishing websites classification using hybrid SVM and KNN approach. Int. J. Adv. Comput. Sci. Appl. 8(6), 90–95 (2017)
Islam, M., Chowdhury, N.K.: Phishing websites detection using machine learning based classification techniques. In: 1st International Conference on Advanced Information and Communication Technology (2016)
Seelio by Keypath Education. https://seelio.com/w/32o4/phishing-website-detection-using-machine-learning-in-weka. Accessed 10 Mar 2018
Rusland, N.F., et al.: Analysis of Naïve Bayes algorithm for email spam filtering across multiple datasets. In: Materials Science and Engineering, vol. 226, IOP Publishing Ltd., Riga, Latvia (2017)
Saad, O., et al.: A survey of machine learning techniques for spam filtering. Int. J. Comput. Sci. Netw. Secur. 12(2), 66 (2012)
Bluszcz, J., et al.: Application of support vector machine algorithm in e-mail spam filtering (poster). Humboldt-Univerzität zu Berlin, Berlin (2016)
Sakkis, G., et al.: A memorybased approach to anti-spam filtering for mailing lists. Inf. Retrieval 6(1), 49–73 (2003)
Frank, E., Hall, M.A., Witten, I.A.: The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th edn. Morgan Kaufmann, Burlington (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Salihovic, I., Serdarevic, H., Kevric, J. (2019). The Role of Feature Selection in Machine Learning for Detection of Spam and Phishing Attacks. In: Avdaković, S. (eds) Advanced Technologies, Systems, and Applications III. IAT 2018. Lecture Notes in Networks and Systems, vol 60. Springer, Cham. https://doi.org/10.1007/978-3-030-02577-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-02577-9_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02576-2
Online ISBN: 978-3-030-02577-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)