Abstract
Both human users and crawlers face the problem of finding good start pages to explore some topic. We show how to assist in qualifying pages as start nodes by link-based ranking algorithms. We introduce a class of hub ranking methods based on counting the short search paths of the Web. Somewhat surprisingly, the Page Rank scores computed on the reversed Web graph turn out to be a special case of our class of rank functions. Besides query based examples, we propose graph based techniques to evaluate the performance of the introduced ranking algorithms. Centrality analysis experiments show that a small portion of Web pages induced by the top ranked pages dominates the Web in the sense that other pages can be accessed from them within a few clicks on the average; furthermore the removal of such nodes destroys the connectivity of the Web graph rapidly. By calculating the dominations and connectivity decay we compare and analyze the proposed ranking algorithms without the need of human interaction solely from the structure of the Web. Apart from ranking algorithms, the existence of central pages is interesting in its own right, providing a deeper insight to the Small World property of the Web graph.
Research is supported by grants OTKA T 42559 and T 042706 of the Hungarian National Science Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Albert, R., Jeong, H., Barabási, A.: Error and attack tolerance of complex networks. Nature 406, 378–382 (2000)
Amento, B., Terveen, L., Hill, W.: Does authority mean quality? Predicting expert quality ratings of web documents. In: Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York (2000)
Azar, Y., Fiat, A., Karlin, A.R., McSherry, F., Saia, J.: Spectral analysis of data. In: ACM Symposium on Theory of Computing, pp. 619–626 (2001)
Barabási, A.-L., Albert, R., Jeong, H.: Scale-free characteristics of random networks: the topology of the word-wide web. Physica A 281, 69–77 (2000)
Borodin, A., Roberts, G.O., Rosenthal, J.S., Tsaparas, P.: Finding authorities and hubs from link structures on the world wide web. In: 10th International World Wide Web Conference, pp. 415–429 (2001)
Boyan, J., Freitag, D., Joachims, T.: A machine learning architecture for optimizing web search engines. In: Proceedings of the AAAI Workshop on Internet-Based Information Systems (1996)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Chakrabarti, S., Dom, B.E., Kumar, S.R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., Kleinberg, J.: Mining the Web’s link structure. Computer 32(8), 60–67 (1999)
Davison, B.D., Gerasoulis, A., Kleisouris, K., Lu, Y., ju Seo, H., Wang, W., Wu, B.: Discoweb: Applying link analysis to web search. In: Proceedings of the 8th World Wide Web Conference, Toronto, Canada (1999)
Dwork, C., Kumar, S.R., Naor, M., Sivakumar, D.: Rank aggregation methods for the web. In: 10th International World Wide Web Conference, Hong Kong, pp. 613–622 (2001)
Garey, M., Johnson, D.: Computer and Intractability: A Guide to the Theory of NP-completeness. W.H. Freeman, San Fransisco (1979)
Google. Commercial search engine founded by the originators of pagerank, located at, http://www.google.com
Haveliwala, T.H.: Topic-sensitive pagerank. In: 11th International World Wide Web Conference, Honolulu, Hawaii (2002)
Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)
Larbin. Multi-purpose web crawler
Lempel, R., Moran, S.: The stochastic approach for link-structure analysis (SALSA) and the TKC effect. In: 9th International World Wide Web Conference (2000)
Marchiori, M.: The quest for correct information on the web: Hyper search engines. In: 7th International World Wide Web Conference (1998)
Ng, A.Y., Zheng, A.X., Jordan, M.: Stable algorithms for link analysis. In: Proc. 24th Annual Intl. ACM SIGIR Conference (2001)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project (1998)
Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic Combination of Link and Content Information in PageRank. In: Advances in Neural Information Processing Systems 14, MIT Press, Cambridge (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fogaras, D. (2003). Where to Start Browsing the Web?. In: Böhme, T., Heyer, G., Unger, H. (eds) Innovative Internet Community Systems. IICS 2003. Lecture Notes in Computer Science, vol 2877. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39884-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-39884-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20436-7
Online ISBN: 978-3-540-39884-4
eBook Packages: Springer Book Archive