Skip to main content

Comparative Analysis of Machine Learning Algorithms for Phishing Website Detection

  • Conference paper
  • First Online:
Inventive Computation and Information Technologies

Abstract

Internet has become the most effective media for leveraging social interactions during the COVID-19 pandemic. Users’ immense dependence on digital platform increases the chance of fraudulence. Phishing attacks are the most common ways of attack in the digital world. Any communication method can be used to target an individual and trick them into leaking confidential data in a fake environment, which can be later used to harm the sole victim or even an entire business depending on the attacker’s intend and the type of leaked data. Researchers have developed enormous anti-phishing tools and techniques like whitelist, blacklist, and antivirus software to detect web phishing. Classification is one of the techniques used to detect website phishing. This paper has proposed a model for detecting phishing attacks using various machine learning (ML) classifiers. K-nearest neighbors, random forest, support vector machines, and logistic regression are used as the machine learning classifiers to train the proposed model. The dataset in this research was obtained from the public online repository Mendeley with 48 features are extracted from 5000 phishing websites and 5000 real websites. The model was analyzed using F1 scores, where both precision and recall evaluations are taken into consideration. The proposed work has concluded that the random forest classifier has achieved the most efficient and highest performance scoring with 98% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Da Silva JAT, Al-Khatib A, Tsigaris P (2020) Spam e-mails in academia: issues and costs. Scientometrics 122:1171–1188

    Article  Google Scholar 

  2. Mironova SM, Simonova SS (2020) Protection of the rights and freedoms of minors in the digital space. Russ J Criminol 14:234–241

    Article  Google Scholar 

  3. Sethuraman SC, Vijayakumar V, Walczak S (2020) Cyber attacks on healthcare devices using unmanned aerial vehicles. J Med Syst 44:10

    Article  Google Scholar 

  4. Tuan TA, Long HV, Son L, Kumar R, Priyadarshini I, Son NTK (2020) Performance evaluation of Botnet DDoS attack detection using machine learning. Evol Intell 13:283–294

    Article  Google Scholar 

  5. Azeez NA, Salaudeen BB, Misra S, Damasevicius R, Maskeliunas R (2020) Identifying phishing attacks in communication networks using URL consistency features. Int J Electron Secur Digit Forensics 12:200–213

    Article  Google Scholar 

  6. Iwendi C, Jalil Z, Javed AR, Reddy GT, Kaluri R, Srivastava G, Jo O (2020) KeySplitWatermark: zero watermarking algorithm for software protection against cyber-attacks. IEEE Access 8:72650–72660

    Article  Google Scholar 

  7. Liu XW, Fu JM (2020) SPWalk: similar property oriented feature learning for phishing detection. IEEE Access 8:87031–87045

    Article  Google Scholar 

  8. Parra GD, Rad P, Choo KKR, Beebe N (2020) Detecting internet of things attacks using distributed deep learning. J Netw Comput Appl 163:13

    Google Scholar 

  9. Tan CL, Chiew KL, Yong KSC, Sze SN, Abdullah J, Sebastian Y (2020) A graph-theoretic approach for the detection of phishing webpages. Comput Secur 95:14

    Article  Google Scholar 

  10. Anwar S, Al-Obeidat F, Tubaishat A, Din S, Ahmad A, Khan FA, Jeon G, Loo J (2020) Countering malicious URLs in internet of things using a knowledge-based approach and a simulated expert. IEEE Internet Things J 7:4497–4504

    Article  Google Scholar 

  11. Ariyadasa S, Fernando S, Fernando S (2020) Detecting phishing attacks using a combined model of LSTM and CNN. Int J Adv Appl Sci 7:56–67

    Article  Google Scholar 

  12. Bozkir AS, Aydos M (2020) LogoSENSE: a companion HOG based logo detection scheme for phishing web page and E-mail brand recognition. Comput Secur 95:18

    Article  Google Scholar 

  13. Gupta BB, Jain AK (2020) Phishing attack detection using a search engine and heuristics-based technique. J Inf Technol Res 13:94–109

    Article  Google Scholar 

  14. Sonowal G, Kuppusamy KS (2020) PhiDMA—a phishing detection model with multi-filter approach. J King Saud Univ Comput Inf Sci 32:99–112

    Google Scholar 

  15. Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F, Anjum A, Hamdani M (2020) Phishing web site detection using diverse machine learning algorithms. Electron Libr 38:65–80

    Article  Google Scholar 

  16. Rodriguez GE, Torres JG, Flores P, Benavides DE (2020) Cross-site scripting (XSS) attacks and mitigation: a survey. Comput Netw 166:23

    Article  Google Scholar 

  17. Das A, Baki S, El Aassal A, Verma R, Dunbar A (2020) SoK: a comprehensive reexamination of phishing research from the security perspective. IEEE Commun Surv Tutor 22:671–708

    Article  Google Scholar 

  18. Adewole KS, Hang T, Wu WQ, Songs HB, Sangaiah AK (2020) Twitter spam account detection based on clustering and classification methods. J Supercomput 76:4802–4837

    Article  Google Scholar 

  19. Rao RS, Vaishnavi T, Pais AR (2020) CatchPhish: detection of phishing websites by inspecting URLs. J Ambient Intell Humaniz Comput 11:813–825

    Article  Google Scholar 

  20. Shabudin S, Sani NS, Ariffin KAZ, Aliff M (2020) Feature selection for phishing website classification. Int J Adv Comput Sci Appl 11:587–595

    Google Scholar 

  21. Raja SE, Ravi R (2020) A performance analysis of software defined network based prevention on phishing attack in cyberspace using a deep machine learning with CANTINA approach (DMLCA). Comput Commun 153:375–381

    Article  Google Scholar 

  22. Sarma D (2012) Security of hard disk encryption. Masters Thesis, Royal Institute of Technology, Stockholm, Sweden. Identifiers: urn:nbn:se:kth:diva-98673 (URN)

    Google Scholar 

  23. Alqahtani H et al (2020) Cyber intrusion detection using machine learning classification techniques. In: Computing science, communication and security, pp 121–31. Springer, Singapore

    Google Scholar 

  24. Hossain S, et al (2019) A belief rule based expert system to predict student performance under uncertainty. In: 2019 22nd international conference on computer and information technology (ICCIT), pp 1–6. IEEE

    Google Scholar 

  25. Ahmed F et al (2020) A combined belief rule based expert system to predict coronary artery disease. In: 2020 international conference on inventive computation technologies (ICICT), pp 252–257. IEEE

    Google Scholar 

  26. Hossain S et al (2020) A rule-based expert system to assess coronary artery disease under uncertainty. In: Computing science, communication and security, Singapore, pp 143–159. Springer, Singapore

    Google Scholar 

  27. Hossain S et al (2020) Crime prediction using spatio-temporal data. In: Computing science, communication and security. Springer, Singapore, pp 277–289

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dhiman Sarma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarma, D., Mittra, T., Bawm, R.M., Sarwar, T., Lima, F.F., Hossain, S. (2021). Comparative Analysis of Machine Learning Algorithms for Phishing Website Detection. In: Smys, S., Balas, V.E., Kamel, K.A., Lafata, P. (eds) Inventive Computation and Information Technologies. Lecture Notes in Networks and Systems, vol 173. Springer, Singapore. https://doi.org/10.1007/978-981-33-4305-4_64

Download citation

Publish with us

Policies and ethics