Abstract
TheWorldWideWeb (“WWW” or “web” for short) has become a major repository of data and documents. Although measurements differ and change, the web has grown at a phenomenal rate. According to two studies in 1998, there were 200 million [Bharat and Broder, 1998] to upwards of 320 million [Lawrence and Giles, 1998] static web pages. A 1999 study reported the size of the web as 800 million pages [Lawrence and Giles, 1999]. By 2005, the number of pages were reported to be 11.5 billion [Gulli and Signorini, 2005]. Today it is estimated that the web contains over 25 billion pages1 and growing. These are numbers for the “static” web pages, i.e., those whose content do not change unless the page owners make explicit changes. The size of the web is much larger when “dynamic” web pages (i.e., pages whose content changes based on the context of user requests) are considered. A 2005 study reported the size to be over 53 billion pages [Hirate et al., 2006]. Additionally, it was estimated that, as of 2001, over 500 billion documents existed in the deep web (which we define below) [Bergman, 2001]. Besides its size, the web is very dynamic and changes rapidly. Thus, for all practical purposes, the web represents a very large, dynamic and distributed data store and there are the obvious distributed data management issues in accessing web data.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2011 Springer Science + Business Media, LLC
About this chapter
Cite this chapter
Özsu, M.T., Valduriez, P. (2011). Web Data Management. In: Principles of Distributed Database Systems, Third Edition. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-8834-8_17
Download citation
DOI: https://doi.org/10.1007/978-1-4419-8834-8_17
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-8833-1
Online ISBN: 978-1-4419-8834-8
eBook Packages: Computer ScienceComputer Science (R0)