Abstract
The World Wide Web, or simply the Web, is rapidly becoming the world’s collective information store, containing everything from news, to entertainment, to personal communications, to product descriptions. This world information store is distributed across millions of computers, but it is often important to gather significant parts of it at a single site. One reason is to build content indices, such as Google. Another reason is to mine the cached Web, looking for trends or data correlations. A third reason for gathering a Web copy is to create a historical record for Web sites that are ephemeral or changing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Garcia-Molina, H. (2003). Challenges in Crawling the Web. In: James, A., Younas, M., Lings, B. (eds) New Horizons in Information Management. BNCOD 2003. Lecture Notes in Computer Science, vol 2712. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45073-4_2
Download citation
DOI: https://doi.org/10.1007/3-540-45073-4_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40536-8
Online ISBN: 978-3-540-45073-3
eBook Packages: Springer Book Archive