URL Classification on Extracted Feature Using Deep Learning

Sahoo, Vishal Kumar; Singh, Vinayak; Gourisaria, Mahendra Kumar; Acharya, Anuja Kumar

doi:10.1007/978-981-19-7867-8_33

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 586))

618 Accesses
1 Citations

Abstract

The widespread adoption of the World Wide Web (WWW) has brought about a monumental transition toward e-commerce, online banking, and social media. This popularity has presented attackers with newer opportunities to scam the unsuspecting—malicious URLs are among the most common forms of attack. These URLs host unsolicited content and perpetrate cybercrimes. Hence classifying a malicious URL from a benign URL is crucial to enable a secure browsing experience. Blacklists have traditionally been used to classify URLs, however, blacklists are not exhaustive and do not perform well against unknown URLs. This necessitates the use of machine learning/deep learning as they improve the generality of the solution. In this paper, we employ a novel feature extraction algorithm using ‘urllib.parse’, ‘tld’, and ‘re’ libraries to extract static and dynamic lexical features from the URL text. IPv4 and IPv6 address groups and the use of shortening services are detected and used as features. Static features like https/http protocols used show a high correlation with the target variable. Various machine learning and deep learning algorithms were implemented and evaluated for the binary classification of URLs. Experimentation and evaluation were based on 450,176 unique URLs where MLP and Conv1D gave the best overall results with 99.73% and 99.72% accuracies and F1 Scores of 0.9981 and 0.9983, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Article 04 March 2021

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Malicious URL Detection Using Transformers’ NLP Models and Machine Learning

References

Internet Security Threat Report (ISTR) 2019–Symantec.: https://www.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf. Last Accessed 17 Mar 2022
Sahoo, D., Liu, C., Hoi, S.C.: Malicious URL detection using machine learning: a survey (2017). arXiv preprint arXiv:1701.07179
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutorials 15(4), 2091–2121 (2013)
Article Google Scholar
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious JavaScript code. In Proceedings of the 19th International Conference on World Wide Web, pp. 281–290. (2010)
Google Scholar
Heartfield, R., Loukas, G.: A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput. Surv. (CSUR) 48(3), 1–39 (2015)
Article Google Scholar
Prakash, P., Kumar, M., Kompella, R.R., Gupta, M.: Phishnet: predictive blacklisting to detect phishing attacks. In: 2010 Proceedings IEEE INFOCOM, pp. 1–5. IEEE (2010)
Google Scholar
Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 1–8. (2007)
Google Scholar
Khonji, M., Jones, A., Iraqi, Y.: A study of feature subset evaluators and feature subset searching methods for phishing classification. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 135–144. (2011)
Google Scholar
Kuyama, M., Kakizaki, Y., Sasaki, R.: Method for detecting a malicious domain by using whois and dns features. In: The Third International Conference on Digital Security and Forensics (DigitalSec2016), vol. 74 (2016)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Learning to detect malicious urls. ACM Trans. Intell. Syst. Technol. (TIST) 2(3), 1–24 (2011)
Google Scholar
Singh, V., Gourisaria, M.K., Harshvardhan, G.M., Rautaray, S.S., Pandey, M., Sahni, M., ... Espinoza-Audelo, L.F.: Diagnosis of intracranial tumors via the selective CNN data modeling technique. Appl. Sci. 12(6), 2900 (2022)
Google Scholar
Das, H., Naik, B., Behera, H.S.: Classification of diabetes mellitus disease (DMD): a data mining (DM) approach. In: Progress in Computing, Analytics and Networking, pp. 539–549. Springer, Singapore (2018)
Google Scholar
Sarah, S., Singh, V., Gourisaria, M.K., Singh, P.K.: Retinal disease detection using CNN through optical coherence tomography images. In 2021 5th International Conference on Information Systems and Computer Networks (ISCON), pp. 1–7. IEEE (2021)
Google Scholar
Panigrahi, K.P., Sahoo, A.K., Das, H.: A cnn approach for corn leaves disease detection to support digital agricultural system. In: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI), vol. 48184, pp. 678–683. IEEE (2020)
Google Scholar
Chandra, S., Gourisaria, M.K., Harshvardhan, G.M., Rautaray, S.S., Pandey, M., Mohanty, S.N.: Semantic analysis of sentiments through web-mined twitter corpus. In CEUR Workshop Proceedings, vol. 2786, pp. 122–135. (2021)
Google Scholar
Pramanik, R., Khare, S., Gourisaria, M.K.: Inferring the occurrence of chronic kidney failure: a data mining solution. In: Gupta, D., Khanna, A., Kansal, V., Fortino, G., Hassanien, A.E. (eds.) Proceedings of Second Doctoral Symposium on Computational Intelligence. Advances in Intelligent Systems and Computing, vol. 1374, Springer, Singapore (2022)
Google Scholar
Sun, B., Akiyama, M., Yagi, T., Hatada, M., Mori, T.: Automating URL blacklist generation with similarity search approach. IEICE Trans. Inf. Syst. 99(4), 873–882 (2016)
Article Google Scholar
Sinha, S., Bailey, M., Jahanian, F.: Shades of grey: on the effectiveness of reputation-based “blacklists”. In: 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE), pp. 57–64. IEEE (2008)
Google Scholar
Ma, J., Saul, L.K., Savage, S., Voelker, G.M.: Beyond blacklists: learning to detect malicious web sites from suspicious URLs. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1245–1254. (2009)
Google Scholar
Vundavalli, V., Barsha, F., Masum, M., Shahriar, H., Haddad, H.: Malicious URL detection using supervised machine learning techniques. In: 13th International Conference on Security of Information and Networks, pp. 1–6. (2020)
Google Scholar
Aydin, M., Butun, I., Bicakci, K., Baykal, N.: Using attribute-based feature selection approaches and machine learning algorithms for detecting fraudulent website URLs. In: 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp. 0774–0779. IEEE (2020)
Google Scholar
Bharadwaj, R., Bhatia, A., Chhibbar, L. D., Tiwari, K., Agrawal, A.: Is this url safe: detection of malicious urls using global vector for word representation. In: 2022 International Conference on Information Networking (ICOIN), pp. 486–491. IEEE (2022)
Google Scholar
https://www.kaggle.com/datasets/siddharthkumar25/malicious-and-benign-urls. Last Accessed 3 Mar 2022
Singh, V., Gourisaria, M.K., Das, H.: Performance analysis of machine learning algorithms for prediction of liver disease. In: 2021 IEEE 4th International Conference on Computing, Power and Communication Technologies (GUCON), pp. 1–7. IEEE (2021)
Google Scholar
Das, H., Naik, B., Behera, H.S.: Medical disease analysis using neuro-fuzzy with feature extraction model for classification. Inform. Med. Unlocked 18, 100288 (2020)
Google Scholar
Sarah, S., Gourisaria, M.K., Khare, S., Das, H.: Heart disease prediction using core machine learning techniques—a comparative study. In: Advances in Data and Information Sciences, pp. 247–260. Springer, Singapore (2022)
Google Scholar
Magesh Kumar, C., Thiyagarajan, R., Natarajan, S.P., Arulselvi, S., Sainarayanan, G.: Gabor features and LDA based face recognition with ANN classifier. In: 2011 International Conference on Emerging Trends in Electrical and Computer Technology, pp. 831–836. IEEE (2011)
Google Scholar
Wijoyo, S., Wijoyo, S.: Speech recognition using linear predictive coding and artificial neural network for controlling the movement of a mobile robot. In: Proceedings of the 2011 International Conference on Information and Electronics Engineering (ICIEE 2011), Bangkok, Thailand, pp. 28–29. (2011)
Google Scholar
Jain, S., Gupta, R., Moghe, A.A.: Stock price prediction on daily stock data using deep neural networks. In: 2018 International Conference on Advanced Computation and Telecommunication (ICACAT), pp. 1–13. IEEE (2018)
Google Scholar
Visca, M., Bouton, A., Powell, R., Gao, Y., Fallah, S.: Conv1D energy-aware path planner for mobile robots in unstructured environments. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 2279–2285. IEEE (2021)
Google Scholar
Kim, T., Lee, J., Nam, J.: Sample-level CNN architectures for music auto-tagging using raw waveforms. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 366–370. IEEE (2018)
Google Scholar
Singh, V., Gourisaria, M.K., Harshvardhan, G.M., Singh, V.: Mycobacterium tuberculosis detection using CNN ranking approach. In: Gandhi, T.K., Konar, D., Sen, B., Sharma, K. (eds.) Advanced Computational Paradigms and Hybrid Intelligent Computing. Advances in Intelligent Systems and Computing, vol. 1373. Springer, Singapore (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Engineering, KIIT Deemed to be University, Bhubaneswar, Odisha, 751024, India
Vishal Kumar Sahoo, Vinayak Singh, Mahendra Kumar Gourisaria & Anuja Kumar Acharya

Authors

Vishal Kumar Sahoo
View author publications
You can also search for this author in PubMed Google Scholar
Vinayak Singh
View author publications
You can also search for this author in PubMed Google Scholar
Mahendra Kumar Gourisaria
View author publications
You can also search for this author in PubMed Google Scholar
Anuja Kumar Acharya
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Vision Laboratory, University of Sassari, Alghero, Sassari, Italy
Massimo Tistarelli
Computer Vision and Biometrics Lab, Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, India
Shiv Ram Dubey
Computer Vision and Biometrics Lab, Department of Information Technology, Indian Institute of Information Technology, Allahabad, India
Satish Kumar Singh
University of Münster, Münster, Germany
Xiaoyi Jiang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sahoo, V.K., Singh, V., Gourisaria, M.K., Acharya, A.K. (2023). URL Classification on Extracted Feature Using Deep Learning. In: Tistarelli, M., Dubey, S.R., Singh, S.K., Jiang, X. (eds) Computer Vision and Machine Intelligence. Lecture Notes in Networks and Systems, vol 586. Springer, Singapore. https://doi.org/10.1007/978-981-19-7867-8_33

Download citation

DOI: https://doi.org/10.1007/978-981-19-7867-8_33
Published: 06 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-7866-1
Online ISBN: 978-981-19-7867-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

URL Classification on Extracted Feature Using Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Malicious URL Detection Using Transformers’ NLP Models and Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

URL Classification on Extracted Feature Using Deep Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

URLdeepDetect: A Deep Learning Approach for Detecting Malicious URLs Using Semantic Vector Models

Analysis for Malicious URLs Using Machine Learning and Deep Learning Approaches

Malicious URL Detection Using Transformers’ NLP Models and Machine Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation