Skip to main content

Towards a Novel Strategic Scheme for Web Crawler Design Using Simulated Annealing and Semantic Techniques

  • Conference paper
  • First Online:
Data Science and Security

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 290))

Abstract

In this era the reflexive response to any question that pops in the mind, is to ask Google. And it is fascinating to witness the speed at which Google is able to give us so many answers with multiple interpretations despite the usage of broken language during type-searching. The module that powers google to provide so many responses is known as a web crawler. Just as the name suggests, it crawls through the web using the hyperlinks or URLs present on each page. The most vital unit of a search engine is the web crawler. When a particular topic is to be searched for, a focused web crawler is put to work. The internet has become the hub of resources of late and the efficiency of web crawlers is extensively put to test. This paper is built over two baseline papers that suggest models for focused web crawling- one for bioinformatics and the other in relevance to building a database for academicians. A multi module hybridized approach which initially classifies the URL database using decision tree and the optimizes with simulated annealing - a meta heuristic optimization algorithm and domain ontology has been devised to lift the efficiency of the focused web crawler. The proposed approach is seen to have an accuracy of 90.12% and an F-measure of 91.62%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  2. Sobecki A, Szyman´ski J, Gil D, Mora H (2019) Deep learning in the fog. Int J Distrib Sensor Netw 15(8):1–17

    Google Scholar 

  3. Ali A, Alfayez F (2018) Hani Alquhayz semantic similarity measures between words: A brief survey. Sci Int Lahore 30(6):907–914

    Google Scholar 

  4. Ahmad SR, Bakar AA, Yaakub MR (2015) Metaheuristic algorithms for feature selection in sentiment analysis. In: Science and information conference (2015)

    Google Scholar 

  5. Kumar M, Bhatia R, Ohri A, Kohli A (2016) Design of focused crawler for information retrieval of Indian origin academicians. In: International conference on advances in computing, communication, and automation (ICACCA)

    Google Scholar 

  6. Mani Sekhar SR, Siddesh GM, Manvi SS, Srinivasa KG (2019) Optimized focused web crawler with natural language processing based relevance measure in bioinformatics web sources. Cybern Inf Technol 19(2):146–158

    Google Scholar 

  7. Gupta A, Anand P (2015) Focused web crawlers and its approaches. In: 1st international conference on futuristic trend in computational analysis and knowledge management (ABLAZE)

    Google Scholar 

  8. Wang W, Chen X, Zou Y, Wang H, Dai Z (2010) A focused crawler based on Naive Bayes classifier. In: Third international symposium on intelligent information technology and security informatics (2010)

    Google Scholar 

  9. Taylan D, Poyraz M, Akyokus S, Ganiz MC (2011) Intelligent focused crawler: learning which links to crawl. In: International symposium on innovations in intelligent systems and applications (2011)

    Google Scholar 

  10. Pant G, Srinivasan P, Menczer F (2004) Crawling the web. In: Web dynamics, pp 153–177. Springer, Heidelberg https://doi.org/10.1007/978-3-662-10874-1_7

  11. Deepak G, Teja V, Santhanavijayan A (2020) A novel firefly driven scheme for resume parsing and matching based on entity linking paradigm. J Discrete Math Sci Crypt 23(1):157–165

    Google Scholar 

  12. Deepak G, Santhanavijayan A (2020) OntoBestFit: A best-fit ocurrence estimation strategy for RDF driven faceted semantic search. Comput Commun 160:284–298

    Google Scholar 

  13. Kumar N, Deepak G, Santhanavijayan A (2020) A novel semantic approach for intelligent response generation using emotion detection incorporating NPMI measure. Procedia Comput Sci 167:571–579

    Article  Google Scholar 

  14. Deepak G, Kumar N, Santhanavijayan A (2020) A semantic approach for entity linking by diverse knowledge integration incorporating role-based chunking. Procedia Comput Sci 167:737–746

    Article  Google Scholar 

  15. Haribabu S, Kumar PSS, Padhy S, Deepak G, Santhanavijayan A, Kumar N (2019) A novel approach for ontology focused inter-domain personalized search based on semantic set expansion. In: Fifteenth international conference on information processing (ICINPRO), pp 1–5. IEEE, December 2019

    Google Scholar 

  16. Deepak G, Kumar N, Bharadwaj GVSY, Santhanavijayan A (2019). OntoQuest: an ontological strategy for automatic question generation for e-assessment using static and dynamic knowledge. In 2019 fifteenth international conference on information processing (ICINPRO), pp 1–6. IEEE, December 2019

    Google Scholar 

  17. Kaushik IS, Deepak G, Santhanavijayan A (2020) QuantQueryEXP: A novel strategic approach for query expansion based on quantum computing principles. J Discrete Math Sci Crypt 23(2):573–584

    MathSciNet  MATH  Google Scholar 

  18. Varghese L, Deepak G, Santhanavijayan A (2019) An IoT analytics approach for weather forecasting using raspberry Pi 3 Model B+. In: Fifteenth international conference on information processing (ICINPRO), pp 1–5. IEEE, December 2019

    Google Scholar 

  19. Deepak G, Priyadarshini S (2016) A hybrid framework for social tag recommendation using context driven social information. Int J Soc Comput Cyber-Phys Syst 1(4):312–325

    Article  Google Scholar 

  20. Deepak G, Priyadarshini JS (2018) A hybrid semantic algorithm for web image retrieval incorporating ontology classification and user-driven query expansion. In: Rajsingh E, Veerasamy J, Alavi A, Peter J (eds) Advances in Big Data and Cloud Computing, vol 645. Springer, Singapore, pp 41–49. https://doi.org/10.1007/978-981-10-7200-0_4

    Chapter  Google Scholar 

  21. Deepak G, Gulzar Z (2017) OntoEPDS: Enhanced and personalized differential semantic algorithm incorporating ontology driven query enrichment. J Adv Res Dyn Control Syst, 9(Specia):567–582

    Google Scholar 

  22. Shreyas K, Deepak G, Santhanavijayan A (2020) GenMOnto: A strategic domain ontology modelling approach for conceptualisation and evaluation of collective knowledge for mapping genomes. J Stat Manag Syst 23(2):445–452

    Google Scholar 

  23. Deepak G, Kumar AA, Santhanavijayan A, Prakash N (2019) Design and evaluation of conceptual ontologies for electrochemistry as a domain. In: 2019 IEEE international WIE conference on electrical and computer engineering (WIECON-ECE), pp 1–4. IEEE

    Google Scholar 

  24. Deepak G, Priyadarshini JS (2018) Personalized and enhanced hybridized semantic algorithm for web image retrieval incorporating ontology classification, strategic query expansion, and content-based analysis. Comput Electr Eng 72:14–25

    Article  Google Scholar 

  25. Deepak G, Ahmed A, Skanda B (2019) An intelligent inventive system for personalised webpage recommendation based on ontology semantics. Int J Intell Syst Technol Appl 18(1/2):115–132

    Google Scholar 

  26. Deepak G, Kasaraneni D (2019) OntoCommerce: an ontology focused semantic framework for personalised product recommendation for user targeted e-commerce. Int J Comput Aided Eng Technol 11(4/5):449–466

    Google Scholar 

  27. Santhanavijayan A, Naresh Kumar D, Deepak G (2020) A novel hybridized strategy for machine translation of Indian languages. In: Reddy V, Prasad V, Wang J, Reddy K (eds) Soft computing and signal processing, ICSCSP 2019. Advances in intelligent systems and computing, vol 1118, p 363. Springer, Singapore. https://doi.org/10.1007/978-981-15-2475-2_34

  28. Van Laarhoven PJM, Aarts EHL (1987) Simulated annealing: theory and applications MAIA, vol 37. Springer, Dordrecht, pp 7–15

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Manaswini, S., Deepak, G. (2021). Towards a Novel Strategic Scheme for Web Crawler Design Using Simulated Annealing and Semantic Techniques. In: Shukla, S., Unal, A., Kureethara, J.V., Mishra, D.K., Han, D.S. (eds) Data Science and Security. Lecture Notes in Networks and Systems, vol 290. Springer, Singapore. https://doi.org/10.1007/978-981-16-4486-3_52

Download citation

Publish with us

Policies and ethics