Abstract
In this era the reflexive response to any question that pops in the mind, is to ask Google. And it is fascinating to witness the speed at which Google is able to give us so many answers with multiple interpretations despite the usage of broken language during type-searching. The module that powers google to provide so many responses is known as a web crawler. Just as the name suggests, it crawls through the web using the hyperlinks or URLs present on each page. The most vital unit of a search engine is the web crawler. When a particular topic is to be searched for, a focused web crawler is put to work. The internet has become the hub of resources of late and the efficiency of web crawlers is extensively put to test. This paper is built over two baseline papers that suggest models for focused web crawling- one for bioinformatics and the other in relevance to building a database for academicians. A multi module hybridized approach which initially classifies the URL database using decision tree and the optimizes with simulated annealing - a meta heuristic optimization algorithm and domain ontology has been devised to lift the efficiency of the focused web crawler. The proposed approach is seen to have an accuracy of 90.12% and an F-measure of 91.62%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Sobecki A, Szyman´ski J, Gil D, Mora H (2019) Deep learning in the fog. Int J Distrib Sensor Netw 15(8):1–17
Ali A, Alfayez F (2018) Hani Alquhayz semantic similarity measures between words: A brief survey. Sci Int Lahore 30(6):907–914
Ahmad SR, Bakar AA, Yaakub MR (2015) Metaheuristic algorithms for feature selection in sentiment analysis. In: Science and information conference (2015)
Kumar M, Bhatia R, Ohri A, Kohli A (2016) Design of focused crawler for information retrieval of Indian origin academicians. In: International conference on advances in computing, communication, and automation (ICACCA)
Mani Sekhar SR, Siddesh GM, Manvi SS, Srinivasa KG (2019) Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. Cybern Inf Technol 19(2):146–158
Gupta A, Anand P (2015) Focused web crawlers and its approaches. In: 1st international conference on futuristic trend in computational analysis and knowledge management (ABLAZE)
Wang W, Chen X, Zou Y, Wang H, Dai Z (2010) A focused crawler based on Naive Bayes classifier. In: Third international symposium on intelligent information technology and security informatics (2010)
Taylan D, Poyraz M, Akyokus S, Ganiz MC (2011) Intelligent focused crawler: learning which links to crawl. In: International symposium on innovations in intelligent systems and applications (2011)
Pant G, Srinivasan P, Menczer F (2004) Crawling the web. In: Web dynamics, pp 153–177. Springer, Heidelberg https://doi.org/10.1007/978-3-662-10874-1_7
Deepak G, Teja V, Santhanavijayan A (2020) A novel firefly driven scheme for resume parsing and matching based on entity linking paradigm. J Discrete Math Sci Crypt 23(1):157–165
Deepak G, Santhanavijayan A (2020) OntoBestFit: A best-fit ocurrence estimation strategy for RDF driven faceted semantic search. Comput Commun 160:284–298
Kumar N, Deepak G, Santhanavijayan A (2020) A novel semantic approach for intelligent response generation using emotion detection incorporating NPMI measure. Procedia Comput Sci 167:571–579
Deepak G, Kumar N, Santhanavijayan A (2020) A semantic approach for entity linking by diverse knowledge integration incorporating role-based chunking. Procedia Comput Sci 167:737–746
Haribabu S, Kumar PSS, Padhy S, Deepak G, Santhanavijayan A, Kumar N (2019) A novel approach for ontology focused inter-domain personalized search based on semantic set expansion. In: Fifteenth international conference on information processing (ICINPRO), pp 1–5. IEEE, December 2019
Deepak G, Kumar N, Bharadwaj GVSY, Santhanavijayan A (2019). OntoQuest: an ontological strategy for automatic question generation for e-assessment using static and dynamic knowledge. In 2019 fifteenth international conference on information processing (ICINPRO), pp 1–6. IEEE, December 2019
Kaushik IS, Deepak G, Santhanavijayan A (2020) QuantQueryEXP: A novel strategic approach for query expansion based on quantum computing principles. J Discrete Math Sci Crypt 23(2):573–584
Varghese L, Deepak G, Santhanavijayan A (2019) An IoT analytics approach for weather forecasting using raspberry Pi 3 Model B+. In: Fifteenth international conference on information processing (ICINPRO), pp 1–5. IEEE, December 2019
Deepak G, Priyadarshini S (2016) A hybrid framework for social tag recommendation using context driven social information. Int J Soc Comput Cyber-Phys Syst 1(4):312–325
Deepak G, Priyadarshini JS (2018) A hybrid semantic algorithm for web image retrieval incorporating ontology classification and user-driven query expansion. In: Rajsingh E, Veerasamy J, Alavi A, Peter J (eds) Advances in Big Data and Cloud Computing, vol 645. Springer, Singapore, pp 41–49. https://doi.org/10.1007/978-981-10-7200-0_4
Deepak G, Gulzar Z (2017) OntoEPDS: Enhanced and personalized differential semantic algorithm incorporating ontology driven query enrichment. J Adv Res Dyn Control Syst, 9(Specia):567–582
Shreyas K, Deepak G, Santhanavijayan A (2020) GenMOnto: A strategic domain ontology modelling approach for conceptualisation and evaluation of collective knowledge for mapping genomes. J Stat Manag Syst 23(2):445–452
Deepak G, Kumar AA, Santhanavijayan A, Prakash N (2019) Design and evaluation of conceptual ontologies for electrochemistry as a domain. In: 2019 IEEE international WIE conference on electrical and computer engineering (WIECON-ECE), pp 1–4. IEEE
Deepak G, Priyadarshini JS (2018) Personalized and enhanced hybridized semantic algorithm for web image retrieval incorporating ontology classification, strategic query expansion, and content-based analysis. Comput Electr Eng 72:14–25
Deepak G, Ahmed A, Skanda B (2019) An intelligent inventive system for personalised webpage recommendation based on ontology semantics. Int J Intell Syst Technol Appl 18(1/2):115–132
Deepak G, Kasaraneni D (2019) OntoCommerce: an ontology focused semantic framework for personalised product recommendation for user targeted e-commerce. Int J Comput Aided Eng Technol 11(4/5):449–466
Santhanavijayan A, Naresh Kumar D, Deepak G (2020) A novel hybridized strategy for machine translation of Indian languages. In: Reddy V, Prasad V, Wang J, Reddy K (eds) Soft computing and signal processing, ICSCSP 2019. Advances in intelligent systems and computing, vol 1118, p 363. Springer, Singapore. https://doi.org/10.1007/978-981-15-2475-2_34
Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing: theory and applications MAIA, vol 37. Springer, Dordrecht, pp 7–15
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Manaswini, S., Deepak, G. (2021). Towards a Novel Strategic Scheme for Web Crawler Design Using Simulated Annealing and Semantic Techniques. In: Shukla, S., Unal, A., Kureethara, J.V., Mishra, D.K., Han, D.S. (eds) Data Science and Security. Lecture Notes in Networks and Systems, vol 290. Springer, Singapore. https://doi.org/10.1007/978-981-16-4486-3_52
Download citation
DOI: https://doi.org/10.1007/978-981-16-4486-3_52
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4485-6
Online ISBN: 978-981-16-4486-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)