Abstract
The pages and hyperlinks of the World-Wide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons — mathematical, sociological, and commercial — for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new sub-field of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Weiner. The Lorel Query language for semistructured data. Intl. J. on Digital Libraries, 1(1):68–88, 1997.
R. Agrawal and R. Srikanth. Fast algorithms for mining association rules. Proc. VLDB, 1994.
G. O. Arocena, A. O. Mendelzon, G. A. Mihaila. Applications of a Web query language. Proc. 6th WWW Conf., 1997.
K. Bharat and A. Broder. A technique for measuring the relative size and overlap of public Web search engines. Proc. 7th WWW Conf., 1998.
K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. Proc. ACM SIGIR, 1998.
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Proc. 7th WWW Conf., 1998.
B. Bollobás. Random Graphs, Academic Press, 1985.
J. Carriére and R. Kazman. WebQuery: Searching and visualizing the Web through connectivity. Proc. 6th WWW Conf., 1997.
S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, P. Raghavan and S. Rajagopalan. Automatic resource compilation by analyzing hyperlink structure and associated text. Proc. 7th WWW Conf., 1998.
S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on hypertext IR, 1998.
S. Chakrabarti and B. Dom and P. Indyk. Enhanced hypertext classification using hyperlinks. Proc. ACM SIGMOD, 1998.
H. T. Davis. The Analysis of Economic Time Series. Principia press, 1941.
R. Downey, M. Fellows. Parametrized Computational Feasibility. In Feasible Mathematics II, P. Clote and J. Remmel, eds., Birkhauser, 1994.
L. Egghe, R. Rousseau, Introduction to Informetrics, Elsevier, 1990.
D. Florescu, A. Levy and A. Mendelzon. Database techniques for the World Wide Web: A survey. SIGMOD Record, 27(3): 59–74, 1998.
E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178:471–479, 1972.
N. Gilbert. A simulation of the structure of academic science. Sociological Research Online, 2(2), 1997.
G. Golub, C. F. Van Loan. Matrix Computations, Johns Hopkins University Press, 1989.
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. AMS-DIMACS series, special issue on computing on very large datasets, 1998.
M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14:10–25, 1963.
J. Kleinberg. Authoritative sources in a hyperlinked environment, J. of the ACM, 1999, to appear. Also appears as IBM Research Report RJ 10076(91892) May 1997.
D. Konopnicki and O. Shmueli. Information gathering on the World Wide Web: the W3QL query language and the W3QS system. Trans. on Database Systems, 1998.
S. R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins. Trawling emerging cyber-communities automatically. Proc. 8th WWW Conf., 1999.
L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian. A declarative approach to querying and restructuring the World Wide Web. Post-ICDE Workshop on RIDE, 1996.
R. Larson. Bibliometrics of the World Wide Web: An exploratory analysis of the intellectual structure of cyberspace. Ann. Meeting of the American Soc. Info. Sci., 1996.
A. J. Lotka. The frequency distribution of scientific productivity. J. of the Washington Acad. of Sci., 16:317, 1926.
A. Mendelzon, G. Mihaila, and T. Milo. Querying the World Wide Web, J. of Digital Libraries 1(1):68–88, 1997.
A. Mendelzon and P. Wood. Finding regular simple paths in graph databases. SIAM J. Comp., 24(6):1235–1258, 1995.
E. Spertus. ParaSite: Mining structural information on the Web. Proc. 6th WWW Conf., 1997.
D. Tsur, J. Ullman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov, and A. Rosenthal. Query Flocks: A generalization of association rule mining. Proc. ACM SIGMOD, 1998.
G. K. Zipf. Human behavior and the principle of least effort. New York: Hafner, 1949.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kleinberg, J.M., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.S. (1999). The Web as a Graph: Measurements, Models, and Methods. In: Asano, T., Imai, H., Lee, D.T., Nakano, Si., Tokuyama, T. (eds) Computing and Combinatorics. COCOON 1999. Lecture Notes in Computer Science, vol 1627. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48686-0_1
Download citation
DOI: https://doi.org/10.1007/3-540-48686-0_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66200-6
Online ISBN: 978-3-540-48686-2
eBook Packages: Springer Book Archive