Challenges in Crawling the Web

Garcia-Molina, Hector

doi:10.1007/3-540-45073-4_2

Hector Garcia-Molina⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2712))

Included in the following conference series:

British National Conference on Databases

195 Accesses
1 Citations

Abstract

The World Wide Web, or simply the Web, is rapidly becoming the world’s collective information store, containing everything from news, to entertainment, to personal communications, to product descriptions. This world information store is distributed across millions of computers, but it is often important to gather significant parts of it at a single site. One reason is to build content indices, such as Google. Another reason is to mine the cached Web, looking for trends or data correlations. A third reason for gathering a Web copy is to create a historical record for Web sites that are ephemeral or changing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Current Challenges in Web Crawling

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Web Search

Author information

Authors and Affiliations

Computer Science Department, Stanford University, Stanford
Hector Garcia-Molina

Authors

Hector Garcia-Molina
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Mathematical and Information Sciences, Coventry University, Priory Street, Coventry, CV1 5FB, UK
Anne James & Muhammad Younas &
Department of Computer Science, University of Exeter, Prince of Wales Road, Exeter, EX4 4PT, UK
Brian Lings

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garcia-Molina, H. (2003). Challenges in Crawling the Web. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_2

Download citation

DOI: https://doi.org/10.1007/3-540-45073-4_2
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40536-8
Online ISBN: 978-3-540-45073-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Challenges in Crawling the Web

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Current Challenges in Web Crawling

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Web Search

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Challenges in Crawling the Web

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Current Challenges in Web Crawling

Large Scale Web Crawling and Distributed Search Engines: Techniques, Challenges, Current Trends, and Future Prospects

Web Search

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation