Optimized Decision Forest for Website Phishing Detection

Balogun, Abdullateef O.; Mojeed, Hammed A.; Adewole, Kayode S.; Akintola, Abimbola G.; Salihu, Shakirat A.; Bajeh, Amos O.; Jimoh, Rasheed G.

doi:10.1007/978-3-030-90321-3_47

Abdullateef O. Balogun^12,13,
Hammed A. Mojeed¹²,
Kayode S. Adewole¹²,
Abimbola G. Akintola¹²,
Shakirat A. Salihu¹²,
Amos O. Bajeh¹² &
…
Rasheed G. Jimoh¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 231))

Included in the following conference series:

Proceedings of the Computational Methods in Systems and Software

1190 Accesses
6 Citations

Abstract

The development of web and internet technology has resulted in its application in a wide range of services. This has resulted in an increase in the number of cybersecurity issues over the years, the most famous of which is the phishing attack, in which hostile websites impersonate genuine websites to acquire naïve users’ data required for illegal access. Current mitigation measures, including anti-phishing software and machine learning (ML) approach, have proven to be successful in identifying phishing operations. Hackers, on the other hand, are coming up with new techniques to get around these counter-measures. Nonetheless, given the dynamism of phishing efforts, there is a constant requirement for novel and efficient website phishing detection solutions. In this study, an optimized decision forest (ODF) method for detecting website phishing is proposed ODF involves the use of a genetic algorithm (GA) for the selection of optimal diverse individual trees in a forest to generate an efficient sub-forest. Specifically, accurate and diverse trees from a decision forest are passed into GA as an initial population to generate a more robust forest with high efficacy. The performance of the proposed ODF is evaluated using three phishing datasets from the UCI repository. Findings from the experimental results revealed that ODF performed better than selected baseline classifiers. Particularly, ODF recorded a high detection accuracy (98.37%), AUC (0.999), f-measure (0.98), MCC (0.967) values with a low false-positive rate (0.016). In addition, ODF outperformed some existing ML-based phishing attack models. Consequently, the proposed ODF method is recommended for dealing with sophisticated phishing attacks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Rotation Forest-Based Logistic Model Tree for Website Phishing Detection

Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations

Article 21 July 2020

Malicious Uniform Resource Locator Detection Using Wolf Optimization Algorithm and Random Forest Classifier

References

Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2013). https://doi.org/10.1007/s00521-013-1490-z
Article Google Scholar
Vrbančič, G., Fister Jr, I., Podgorelec, V.: Swarm intelligence approaches for parameter setting of deep learning neural network: Case study on phishing websites classification. In: Proceedings of the 8th International Conference on Web Intelligence, Mining and Semantics, pp. 1–8 (2018)
Google Scholar
Ali, W., Ahmed, A.A.: Hybrid intelligent phishing website prediction using deep neural networks with genetic algorithm-based feature selection and weighting. IET Inf. Secur. 13, 659–669 (2019)
Article Google Scholar
Verma, R., Das, A.: What's in a URL: Fast feature extraction and malicious url detection. In: Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics, pp. 55–63 (2017)
Google Scholar
Azeez, N., Misra, S., Margaret, I.A., Fernandez-Sanz, L.: Adopting automated whitelist approach for detecting phishing attacks. Comput. Secur. 108, 102328 (2021)
Google Scholar
Alqahtani, M.: Phishing websites classification using association classification (PWCAC). In: 2019 International Conference On Computer and Information Sciences (ICCIS), pp. 1–6. IEEE (2019)
Google Scholar
Abdelhamid, N., Ayesh, A., Thabtah, F.: Phishing detection based associative classification data mining. Expert Syst. Appl. 41, 5948–5959 (2014)
Article Google Scholar
Dedakia, M., Mistry, K.: Phishing detection using content based associative classification data mining. J. Eng. Comput. Appl. Sci. 4, 209–214 (2015)
Google Scholar
Chandra, Y., Jana, A.: Improvement in phishing websites detection using meta classifiers. In: 6th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 637–641. IEEE (2019)
Google Scholar
Hadi, W.e., Aburub, F., Alhawari, S.: A new fast associative classification algorithm for detecting phishing websites. Appl. Soft Comput. 48, 729–734 (2016)
Google Scholar
Rahman, S.S.M.M., Rafiq, F.B., Toma, T.R., Hossain, S.S., Biplob, K.B.B.: Performance assessment of multiple machine learning classifiers for detecting the phishing URLs. In: Raju, KSrujan, Senkerik, R., Lanka, S.P., Rajagopal, V. (eds.) Data Engineering and Communication Technology. AISC, vol. 1079, pp. 285–296. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-1097-7_25
Chapter Google Scholar
Alsariera, Y.A., Elijah, A.V., Balogun, A.O.: Phishing website detection: forest by penalizing attributes algorithm and its enhanced variations. Arab. J. Sci. Eng. 45(12), 10459–10470 (2020). https://doi.org/10.1007/s13369-020-04802-1
Article Google Scholar
Chiew, K.L., Tan, C.L., Wong, K., Yong, K.S., Tiong, W.K.: A new hybrid ensemble feature selection framework for machine learning-based phishing detection system. Inf. Sci. 484, 153–166 (2019)
Article Google Scholar
Aydin, M., Baykal, N.: Feature extraction and classification phishing websites based on URL. In: IEEE Conference on Communications and Network Security (CNS), pp. 769–770. IEEE (2015)
Google Scholar
Adeyemo, V.E., Balogun, A.O., Mojeed, H.A., Akande, N.O., Adewole, K.S.: Ensemble-based logistic model trees for website phishing detection. In: Anbar, M., Abdullah, N., Manickam, S. (eds.) ACeS 2020. CCIS, vol. 1347, pp. 627–641. Springer, Singapore (2021). https://doi.org/10.1007/978-981-33-6835-4_41
Chapter Google Scholar
Pham, B.T., Nguyen, V.-T., Ngo, V.-L., Trinh, P.T., Ngo, H.T.T., Tien Bui, D.: A novel hybrid model of rotation forest based functional trees for landslide susceptibility mapping: a case study at Kon Tum Province, Vietnam. In: Tien Bui, D., Ngoc Do, A., Bui, H.-B., Hoang, N.-D. (eds.) GTER 2017, pp. 186–201. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-68240-2_12
Chapter Google Scholar
Ubing, A.A., Jasmi, S.K.B., Abdullah, A., Jhanjhi, N., Supramaniam, M.: Phishing website detection: an improved accuracy through feature selection and ensemble learning. Int. J. Adv. Comput. Sci. Appl. 10, 252–257 (2019)
Google Scholar
Abdulrahaman, M.D., Alhassan, J.K., Adebayo, O.S., Ojeniyi, J.A., Olalere, M.: (2019): Phishing attack detection based on random forest with wrapper feature selection method. Int. J. Inf. Process. Commun. (IJIPC) 7, 209–224 (2019)
Google Scholar
Folorunso, S.O., Ayo, F.E., Abdullah, K.-K.A., Ogunyinka, P.I.: Hybrid vs ensemble classification models for phishing websites. Iraqi J. Sci. 61, 3387–3396 (2020)
Google Scholar
Alsariera, Y.A., Adeyemo, V.E., Balogun, A.O., Alazzawi, A.K.: AI meta-learners and extra-trees algorithm for the detection of phishing websites. IEEE Access 8, 142532–142542 (2020)
Article Google Scholar
Ali, W., Malebary, S.: Particle swarm optimization-based feature weighting for improving intelligent phishing website detection. IEEE Access 8, 116766–116780 (2020)
Article Google Scholar
Osho, O., Oluyomi, A., Misra, S., Ahuja, R., Damasevicius, R., Maskeliunas, R.: Comparative evaluation of techniques for detection of phishing URLs. In: Florez, H., Leon, M., Diaz-, J.M., Belli, S. (eds.) ICAI 2019. CCIS, vol. 1051, pp. 385–394. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32475-9_28
Chapter Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Adeyemo, V.E., Imam, A.A., Bajeh, A.O.: Software defect prediction: analysis of class imbalance and performance stability. J. Eng. Sci. Technol 14, 3294–3308 (2019)
Google Scholar
Yu, Q., Jiang, S., Zhang, Y.: The performance stability of defect prediction models with class imbalance: An empirical study. IEICE Trans. Inf. Syst. 100, 265–272 (2017)
Article Google Scholar
Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowl.-Based Syst. 71, 345–365 (2014)
Article Google Scholar
Adnan, M.N., Islam, M.Z.: Optimizing the number of trees in a decision forest to discover a subforest with high ensemble accuracy using a genetic algorithm. Knowl.-Based Syst. 110, 86–97 (2016)
Article Google Scholar
Wang, W., Zhang, F., Luo, X., Zhang, S.: Pdrcnn: precise phishing detection with recurrent convolutional neural networks. Security and Communication Networks 2019 (2019)
Google Scholar
Rao, R.S., Vaishnavi, T., Pais, A.R.: CatchPhish: detection of phishing websites by inspecting URLs. J. Ambient Intell. Humanized Comput. 11(2), 813–825 (2019). https://doi.org/10.1007/s12652-019-01311-4
Article Google Scholar
Mirjalili, S.: Genetic algorithm. Evolutionary algorithms and neural networks, pp. 43–55. Springer, Cham (2019). Doi: https://doi.org/10.1007/978-3-319-93025-1
Oluwagbemiga, B.A., Shuib, B., Abdulkadir, S., Marian, G., Thabeb, A.: A hybrid ant colony tabu search algorithm for solving next release problems. Int. J. Innov. Technol. Exploring Eng. 8, 191–198 (2019)
Google Scholar
Balogun, A.O., et al.: Improving the phishing website detection using empirical analysis of function tree and its variants. Heliyon 7, e07437 (2021)
Google Scholar
Balogun, A.O., et al.: SMOTE-based homogeneous ensemble methods for software defect prediction. In: Gervasi, O., et al. (eds.) ICCSA 2020. LNCS, vol. 12254, pp. 615–631. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58817-5_45
Chapter Google Scholar
Yadav, S., Shukla, S.: Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In: IEEE 6th International Conference on Advanced Computing (IACC), pp. 78–83. IEEE (2016)
Google Scholar
Balogun, A.O., Basri, S., Abdulkadir, S.J., Hashim, A.S.: Performance analysis of feature selection methods in software defect prediction: a search method approach. Appl. Sci. 9, 2764 (2019)
Article Google Scholar
Balogun, A.O., et al.: Impact of feature selection methods on the predictive performance of software defect prediction models: an extensive empirical study. Symmetry 12, 1147 (2020)
Article Google Scholar
Arlot, S., Lerasle, M.: Choice of V for V-fold cross-validation in least-squares density estimation. J. Mach. Learn. Res. 17, 7256–7305 (2016)
MathSciNet MATH Google Scholar
Balogun, A.O., et al.: Search-based wrapper feature selection methods in software defect prediction: an empirical analysis. In: Silhavy, R. (ed.) CSOC 2020. AISC, vol. 1224, pp. 492–503. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-51965-0_43
Chapter Google Scholar
Basri, S., Almomani, M.A., Imam, A.A., Thangiah, M., Gilal, A.R., Balogun, A.O.: The organisational factors of software process improvement in small software industry: comparative study. In: Saeed, F., Mohammed, F., Gazem, N. (eds.) IRICT 2019. AISC, vol. 1073, pp. 1132–1143. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-33582-3_106
Chapter Google Scholar
Balogun, A.O., Lafenwa, F.B., Mojeed, H.A., Usman-Hamza, F.E., Bajeh, A.O., Adeyemo, V.E., Adewole, K.S., Jimoh, R.G.: Data sampling-based feature selection framework for software defect prediction. In: Abawajy, J.H., Choo, K.-K., Chiroma, H. (eds.) EATI 2020. LNNS, vol. 254, pp. 39–52. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80216-5_4
Chapter Google Scholar
Ahmad, S.N.W., Ismail, M.A., Sutoyo, E., Kasim, S., Mohamad, M.S.: Comparative performance of machine learning methods for classification on phishing attack detection. Int. J 9 (2020)
Google Scholar
Jain, A.K., Gupta, B.: Comparative analysis of features based machine learning approaches for phishing detection. In: 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), pp. 2125–2130. IEEE (2016)
Google Scholar
Karabatak, M., Mustafa, T.: Performance comparison of classifiers on reduced phishing website dataset. In: 2018 6th International Symposium on Digital Forensic and Security (ISDFS), pp. 1–5. IEEE (2018)
Google Scholar
Balogun, A.O., et al.: Empirical analysis of rank aggregation-based multi-filter feature selection methods in software defect prediction. Electronics 10, 179 (2021)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Expl. Newsl. 11, 10–18 (2009)
Article Google Scholar
Adewole, K.S., Akintola, A.G., Salihu, S.A., Faruk, N., Jimoh, R.G.: Hybrid rule-based model for phishing URLs detection. In: Miraz, M.H., Excell, P.S., Ware, A., Soomro, S., Ali, M. (eds.) iCETiC 2019. LNICSSITE, vol. 285, pp. 119–135. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-23943-5_9
Chapter Google Scholar
AlEroud, A., Karabatis, G.: Bypassing detection of url-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the 6th International Workshop on Security and Privacy Analytics, pp. 53–60 (2020)
Google Scholar
Al-Ahmadi, S., Lasloum, T.: PDMLP: phishing detection using multilayer perceptron. Int. J. Network Secur. Appl. 12, 59–72 (2020)
Google Scholar
Ferreira, R.P., et al.: Artificial neural network for websites classification with phishing characteristics. Social Networking 7, 97 (2018)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Ilorin, Ilorin, PMB 1515, Nigeria
Abdullateef O. Balogun, Hammed A. Mojeed, Kayode S. Adewole, Abimbola G. Akintola, Shakirat A. Salihu, Amos O. Bajeh & Rasheed G. Jimoh
Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Bandar Seri Iskandar, 32610, Perak, Malaysia
Abdullateef O. Balogun

Authors

Abdullateef O. Balogun
View author publications
You can also search for this author in PubMed Google Scholar
Hammed A. Mojeed
View author publications
You can also search for this author in PubMed Google Scholar
Kayode S. Adewole
View author publications
You can also search for this author in PubMed Google Scholar
Abimbola G. Akintola
View author publications
You can also search for this author in PubMed Google Scholar
Shakirat A. Salihu
View author publications
You can also search for this author in PubMed Google Scholar
Amos O. Bajeh
View author publications
You can also search for this author in PubMed Google Scholar
Rasheed G. Jimoh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdullateef O. Balogun .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Radek Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Petr Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Zdenka Prokopova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Balogun, A.O. et al. (2021). Optimized Decision Forest for Website Phishing Detection. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Data Science and Intelligent Systems. CoMeSySo 2021. Lecture Notes in Networks and Systems, vol 231. Springer, Cham. https://doi.org/10.1007/978-3-030-90321-3_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-90321-3_47
Published: 17 November 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90320-6
Online ISBN: 978-3-030-90321-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Optimized Decision Forest for Website Phishing Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rotation Forest-Based Logistic Model Tree for Website Phishing Detection

Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations

Malicious Uniform Resource Locator Detection Using Wolf Optimization Algorithm and Random Forest Classifier

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimized Decision Forest for Website Phishing Detection

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Rotation Forest-Based Logistic Model Tree for Website Phishing Detection

Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations

Malicious Uniform Resource Locator Detection Using Wolf Optimization Algorithm and Random Forest Classifier

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation