Abstract
World Wide Web has emerged as a primary means for storing and structuring information. In this paper, we present a framework for mining implicit associations among Web documents. We focus on the following problem: “For a given set of seed URLs, find a list of Web pages which reflect the association among these seeds.” In the proposed framework, associations of two documents are induced by the connectivity and linking path length. Based on this framework, we have developed a random walk-based Web mining technique and validated it by experiments on real Web data. In this paper, we also discuss the extension of the algorithm for considering document contents.
This work was performed when the author visited NEC, CCRL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jeffrey Dean and Monika Henzinger. Finding Related Pages in the World Wide Web. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Netscape Communications Corporation. What’s Related web page. Information available at http://home.netscape.com/netscapes/related/’faq.html .
Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pages 668–677, January 1998.
Wen-Syan Li and Selcuk Candan. Integrating Content Search with Structure Analysis for Hypermedia Retrieval and Management. To appear in ACM Computing Survey, 2000.
Frank K. Hwang, Dana S. Richards, and Pawel Winter, editors. The Steiner Tree Problem (Annals of Discrete Mathematics, Vol 53). 1992.
S.L. Hakimi. Steiner’s problem in graphs and its implications. Networks, 1:113–131, 1971.
Krishna Bharat and Monika Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21th Annual International ACM SIGIR Conference, pages 104–111, Melbourne, Australia, August 1998.
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In Proceedings of the 7th World-Wide Web Conference, pages 65–74, Brisbane, Queensland, Australia, April 1998.
Lawrence Page and Sergey Brin. The Anatomy of a Large-Scale Hypertextual Web Search Engine. In Proceedings of the 7th World-Wide Web Conference, Brisbane, Queensland, Australia, April 1998.
David Gibson, Jon M. Kleinberg, and Prabhakar Raghavan. Inferring Web Communities from Link Topology. In Proceedings of the 1998 ACM Hypertext Conference, pages 225–234, Pittsburgh, PA, USA, June 1998.
Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Trawling the Web for Emerging Cyber-Communities. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Krishna Bharat and Andrei Z. Broder. Mirror, Mirror, on the Web: A Study of Host Pairs with Replicated Content. In Proceedings of the 8th World-Wide Web Conference, Toronto, Canada, May 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Selçuk Candan, K., Li, WS. (2000). Using Random Walks for Mining Web Document Associations. In: Terano, T., Liu, H., Chen, A.L.P. (eds) Knowledge Discovery and Data Mining. Current Issues and New Applications. PAKDD 2000. Lecture Notes in Computer Science(), vol 1805. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45571-X_35
Download citation
DOI: https://doi.org/10.1007/3-540-45571-X_35
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67382-8
Online ISBN: 978-3-540-45571-4
eBook Packages: Springer Book Archive