Abstract
Motivated by the problem of detecting link-spam, we consider the following graph-theoretic primitive: Given a webgraph G, a vertex v in G, and a parameter δ ∈ (0,1), compute the set of all vertices that contribute to v at least a δ fraction of v’s PageRank. We call this set the δ-contributing set of v. To this end, we define the contribution vector of v to be the vector whose entries measure the contributions of every vertex to the PageRank of v. A local algorithm is one that produces a solution by adaptively examining only a small portion of the input graph near a specified vertex. We give an efficient local algorithm that computes an ε-approximation of the contribution vector for a given vertex by adaptively examining O(1/ε) vertices. Using this algorithm, we give a local approximation algorithm for the primitive defined above. Specifically, we give an algorithm that returns a set containing the δ-contributing set of v and at most O(1/δ) vertices from the δ/2-contributing set of v, and which does so by examining at most O(1/δ) vertices. We also give a local algorithm for solving the following problem: If there exist k vertices that contribute a ρ-fraction to the PageRank of v, find a set of k vertices that contribute at least a (ρ − ε)-fraction to the PageRank of v. In this case, we prove that our algorithm examines at most O(k/ε) vertices.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Andersen, R., Borgs, C., Chayes, J., Hopcroft, J., Jain, K., Mirrokni, V., Teng, S.: Experimental evaluation of locally computable link-spam features (submitted, 2007)
Andersen, R., Chung, F., Lang, K.: Local graph partitioning using pagerank vectors. In: FOCS 2006: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, pp. 475–486. IEEE Computer Society, Washington, DC (2006)
Becchetti, L., Castillo, C., Donato, D., Leonardi, S., Baeza-Yates, R.: Link-based characterization and detection of web spam (2006)
Benczúr, A.A., Csalogány, K., Sarlós, T., Uher, M.: Spamrank - fully automatic link spam detection. In: First International Workshop on Adversarial Information Retrieval on the Web (2005)
Berkhin, P.: Bookmark-coloring algorithm for personalized pagerank computing. Internet Math. 3(1), 41–62 (2006)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 30(1-7), 107–117 (1998)
Chen, Y., Gan, Q., Suel, T.: Local methods for estimating pagerank values. In: Proc. of CIKM, pp. 381–389 (2004)
Fetterly, D., Manasse, M., Najork, M.: Spam, damn spam, and statistics: using statistical analysis to locate spam web pages. In: WebDB 2004: Proceedings of the 7th International Workshop on the Web and Databases, pp. 1–6. ACM Press, New York (2004)
Fogaras, D., Racz, B.: Towards scaling fully personalized pagerank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)
Gyöngyi, Z., Berkhin, P., Garcia-Molina, H., Pedersen, J.: Link spam detection based on mass estimation. In: Proceedings of the 32nd International Conference on Very Large Databases, ACM, New York (2006)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: VLDB, pp. 576–587 (2004)
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Web content categorization using link information. Technical report, Stanford University (2006)
Haveliwala, T.H.: Topic-sensitive pagerank: A context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW 2003. Proceedings of the 12th World Wide Web Conference, pp. 271–279 (2003)
Mishne, G., Carmel, D.: Blocking blog spam with language model disagreement (2005)
Naor, M., Stockmeyer, L.: What can be computed locally? SIAM J. Comput. 24(6), 1259–1277 (1995)
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: WWW 2006: Proceedings of the 15th international conference on World Wide Web, pp. 83–92. ACM Press, New York (2006)
Raj, R., Krishnan, V.: Web spam detection with anti-trust rank. In: Proc. of the 2nd International Worshop on Adversarial Information Retreival on the Web, pp. 381–389 (2006)
Sarlós, T., Benczúr, A.A., Csalogány, K., Fogaras, D.: To randomize or not to randomize: space optimal summaries for hyperlink analysis. In: WWW, pp. 297–306 (2006)
Spielman, D.A., Teng, S.-H.: Nearly-linear time algorithms for graph partitioning, graph sparsification, and solving linear systems. In: ACM STOC-04, pp. 81–90. ACM Press, New York (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Andersen, R., Borgs, C., Chayes, J., Hopcraft, J., Mirrokni, V.S., Teng, SH. (2007). Local Computation of PageRank Contributions. In: Bonato, A., Chung, F.R.K. (eds) Algorithms and Models for the Web-Graph. WAW 2007. Lecture Notes in Computer Science, vol 4863. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77004-6_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-77004-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77003-9
Online ISBN: 978-3-540-77004-6
eBook Packages: Computer ScienceComputer Science (R0)