Abstract
Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system p T = p T P with a stochastic n × n matrix P, where n ∼ 107–109, is sought (in the class of probability distributions) with accuracy ɛ: ɛ ≫ n −1. Thus, the possibility of brute-force multiplication of P by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process p T t+1 = p T t P quickly reaches a steady state. Additionally, it takes into account another specific feature of P, namely, the nonzero off-diagonal elements of P are equal in rows (this property is used to organize a random walk over the graph with the matrix P). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of P. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game
where S n (1) is a unit simplex in ℝn and I is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis-Khachiyan algorithm (1995). This technique, like the Nazin-Polyak method (2009), is a randomized version of Nemirovski’s mirror descent method. The difference is that randomization in the Grigoriadis-Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices P, the method proposed yields noticeably better results.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” Comput. Network ISDN Syst. 30(1–7), 107–117 (1998).
A. N. Langville and C. D. Meyer, Google’s PageRank and Beyond: The Science of Search Engine Rankings (Princeton Univ. Press, New York, 2006).
P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web: Probabilistic Methods and Algorithms (Wiley, New York, 2003); http://ibook.ics.uci.edu/.
M. Franceschet, “PageRank: Standing on the shoulders of giants,” Commun. ACM 54(6), 92–101 (2011); arXiv:1002.2858v3.
A. V. Gasnikov, E. V. Gasnikova, and O. S. Fed’ko, “On possible dynamics in Google’s PageRank model and a modified model of computation correspondence matrix,” Tr. Mosk. Fiz.-Tekh. Inst. 4(2(14)), 101–120 (2012).
R. P. Agaev and P. Yu. Chebotarev, “Convergence and stability in characteristics-matching problems (overview of basic results),” Upr. Bol’sh. Sist. 30(1), 470–505 (2010).
D. A. Levin, Y. Peres, and E. L. Wilmer, Markov Chain and Mixing Times (AMS, 2009); http://pages.uoregon.edu/dlevin/MARKOV/markovmixing.pdf.
A. V. Nazin and B. T. Polyak, “Randomized algorithm to determine the eigenvector of a stochastic matrix with application to the PageRank problem,” Autom. Remote Control 72(2), 342–352 (2011).
Y. E. Nesterov, “Subgradient methods for huge-scale optimization problems,” CORE Discussion Paper; 2012/2, 2012; http://dial.academielouvain.be/handle/boreal:107876.
Y. E. Nesterov, “Efficiency of coordinate descent methods on large scale optimization problem,” SIAM J. Optim. 22(2), 341–362 (2012); http://dial.academielouvain.be/handle/boreal:121612.
A. Juditsky, G. Lan, A. Nemirovski, and A. Shapiro, “Stochastic approximation approach to stochastic programming,” SIAM J. Optim. 19(4), 1574–1609 (2009).
L. G. Khachiyan, Selected Works (MTsNMO, Moscow, 2009), pp. 38–48 [in Russian].
Y. Nesterov and A. Nemirovski, “Finding the stationary states of Markov chains by iterative methods,” CORE Discussion Paper; 2012/58, 2012; http://dial.academielouvain.be/handle/boreal:122163.
B. T. Polyak and A. A. Tremba, “Regularization-based solution of the PageRank problem for large matrices,” Autom. Remote Control 73(11), 1777–1894 (2012).
D. Spielman, http://www.cse.cuhk.edu.hk/~chi/csc5160/notes/L07.pdf.
B. S. Kashin, “Diameters of some finite-dimensional sets and classes of smooth functions,” Math. USSR Izv. 11(2), 317–333 (1977).
G. Pandurangan, P. Raghavan, and E. Upfal, “Using PageRank to characterize web structure,” Internet Math. 3(1), 1–20 (2006).
N. S. Bakhvalov, N. P. Zhidkov, and G. M. Kobel’kov, Numerical Methods (Binom, Moscow, 2002).
H. Nikaido, Convex Structures and Economic Theory (Academic, New York, 1968; Mir, Moscow, 1972).
V. S. Ryaben’kii, Introduction to Computational Mathematics (Fizmatlit, Moscow, 2000) [in Russian].
P. Chebotarev and R. Agaev, “Forest matrices around the Laplacian matrix,” Linear Algebra Appl. 356, 253–274 (2002).
D. K. Faddeev and V. N. Faddeeva, Computational Methods of Linear Algebra (Freeman, San Francisco, 1963; Fizmatgiz, Moscow, 1963).
M. A. Krasnosel’skii and S. G. Krein, “Remark on the distribution of errors in the solution of a system of linear equations by means of an iterative process,” Usp. Mat. Nauk 7(4(50)), 157–161 (1952).
R. A. Horn and C. R. Johnson, Matrix Analysis (Cambridge Univ. Press, Cambridge, 1985; Mir, Moscow, 1989).
S. M. Ermakov, Monte Carlo Method in Computational Mathematics: An Introductory Course (Binom, Moscow, 2009).
A. M. Raigorodskii, Internet Models (Intellekt, Dolgoprudnyi, 2013).
D. Knuth and A. Yao, “The complexity of nonuniform random number generation,” Algorithms and Complexity: New Directions and Recent Results (Academic, New York, 1976), pp. 357–428.
A. Sinclair and M. Jerrum, “Approximate counting, uniform generation and rapidly mixing Markov chains,” Inf. Comput. 82(1), 93–133 (1989).
M. Dyer, A. Frieze, and R. Kannan, “A random polynomial-time algorithm for approximating of the volume of convex bodies,” J. ACM 38(1), 1–17 (1991).
M. Jerrum and A. Sinclair, “The Markov chain Monte Carlo method: An approach to approximate counting and integration,” Approximation Algorithms for NP-Hard Problems, Ed. by D. S. Hochbaum (PWS, Boston, 1996), pp. 482–520; http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.32.9553.
P. Lezaud, “Chernoff-type bound for finite Markov chains,” Ann. Appl. Probab. 8(3), 849–867 (1998).
D. Aldous and J. Fill, Reversible Markov Chains and Random Walks on Graphs (2002); http://stat-www.berkeley.edu/users/aldous/RWG/book.html.
Fan Chung. “Laplacians and the Cheeger inequality for directed graphs,” Ann. Combin., No. 9, 1–19 (2005); http://math.ucsd.edu/~fan/.
P. Diaconis, “The Markov chain Monte Carlo revolution,” Bull. AMS (New Ser.) 49(2), 179–205 (2009); http://math.uchicago.edu/~shmuel/Network-course-readings/MCMCRev.pdf.
D. Spielman, Lecture No. 2, 2009; http://www.cse.cuhk.edu.hk/~chi/csc5160/notes/L02.pdf.
A. Joulin and Y. Ollivier, “Curvature, concentration and error estimates for Markov chain Monte Carlo,” Ann. Probab. 38(6), 2418–2442 (2010).
R. Montenegro and P. Tetali, Mathematical Aspects of Mixing Times in Markov Chains (2006); http://people.math.gatech.edu/~tetali/PUBLIS/survey.pdf.
S. Boyd and L. Vandenberghe, Convex Optimization (Cambridge Univ. Press, Cambridge, 2004).
S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability (Springer, London, 2005); http://probability.ca/MT/.
D. Paulin, “Concentration inequalities for Markov chains by Marton couplings,” e-print, arXiv:1212.2015v2, 2013.
M. Ledoux, Concentration of Measure Phenomenon (Am. Math. Soc., Providence, RI, 2001).
A. V. Gasnikov and E. V. Gasnikova, “On entropy similar functionals arising in stochastic chemical kinetics at invariant measure concentration and as Lyapunov functions of quasi-mean dynamics,” Mat. Zametki 93(6), 816–824 (2013).
M. A. Krasnosel’skii, E. A. Lifshits, and A. V. Sobolev, Positive Linear Systems: Method of Positive Operators (Nauka, Moscow, 1985) [in Russian].
M. Ya. Kel’bert and Yu. M. Sukhov, Probability and Statistics with Examples and Problems (MTsNMO, Moscow, 2010), Vol. 2 [in Russian].
E. Yu. Klochkov, “Parametric estimation in Transportation Problems,” Bachelor’s Thesis (Moscow Inst. Phys. Eng., Dolgoprudnyi, 2013).
V. Spokoiny, “Parametric estimation: Finite sample theory,” Ann. Stat. 40(6), 2877–2909 (2012).
I. N. Sanov, “On the probability of large deviations of random variables,” Mat. Sb. 42(1), 11–44 (1957).
S. Boucheron, G. Lugosi, and P. Massart, Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford Univ. Press, London, 2013).
A. Romashchenko, Expanders: Construction and Applications (Computer Science Club, Mosk. Gos, Univ., Moscow, 2009); http://www.mccme.ru/~anromash/courses/expanders2009.pdf.
A. S. Nemirovski and D. B. Yudin, Complexity of Problems and Efficiency of Optimization Methods (Nauka, Moscow, 1979) [in Russian].
D. Paulin, “Nonasymptotic confidence interval for MCMC,” e-print, arXiv:1212.2016v4, 2013.
A. V. Gasnikov, Yu. E. Nesterov, and V. G. Spokoiny, “On the efficiency of a randomization method for mirror descent in online optimization problems,” Comput. Math. Math. Phys. (2015) [in press].
Y. Mansour, Algorithmic Game Theory and Machine Learning (2011); www.tau.ac.il/~mansour/advancedagt+ml/.
S. Bubeck, Introduction to Online Optimization: Lecture Notes (Princeton University, New York, 2011); http://www.princeton.edu/~sbubeck/BubeckLectureNotes.pdf.
Y. Nesterov, “Smooth minimization of nonsmooth function,” Math. Program. Ser. A 103(1), 127–152 (2005).
Yu. E. Nesterov, Introduction to Convex Optimization (Mosk. Tsentr Neprer. Mat. Obrazovan., Moscow, 2010) [in Russian].
Y. Nesterov, “Primal-dual subgradient methods for convex problems,” Math. Program. Ser. B 120, 221–259 (2009).
G. G. Magaril-Il’yaev and V. M. Tikhomirov, Convex Analysis: Theory and Applications (Am. Math. Soc., Providence, 2003; URSS, Moscow, 2011).
V. V. V’yugin, Mathematical Foundations of Machine Learning Theory and Prediction (Mosk. Tsentr Neprer. Mat. Obrazovan., Moscow, 2013); http://www.iitp.ru/upload/publications/6256/vyugin1.pdf.
G. Lugosi and N. Cesa-Bianchi, Prediction, Learning, and Games (Cambridge Univ. Press, New York, 2006).
A. B. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, “Recursive aggregation of estimators by the mirror descent algorithm with averaging,” Probl. Inf. Transmission 41(4), 368–384 (2005).
A. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, “Gap-free bounds for stochastic multi-armed bandit,” Proceedings of the 17th IFAC World Congress, Seoul, Korea, July 6–11 2008 (IFAC, 2008), pp. 11560–11563.
S. P. Andersen, A. de Palma, and J.-F. Thisse, Discrete Choice Theory of Product Differentiation (MIT Press, Cambridge, 1992).
A. Juditsky and B. Polyak, “Robust eigenvector of stochastic matrix with application to PageRang,” Proceedings of the 51st IEEE Conference on Decision and Control, December 10–13, 2012, Maui, Hawaii, USA (IEEE, 2012), pp. 3171–3176 (arXiv:1206.4897).
A. Tremba and A. Nazin, “Extension of a saddle point mirror descent algorithm with application to robust PageRank,” Proceedings of the 52nd IEEE Conference on Decision and Control, December 10–13, 2013, Florence, Italy (IEEE, 2013), pp. 3691–3696.
R. Tempo, http://staff.polito.it/roberto.tempo/.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.V. Gasnikov, D.Yu. Dmitriev, 2015, published in Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki, 2015, Vol. 55, No. 3, pp. 355–371.
Rights and permissions
About this article
Cite this article
Gasnikov, A.V., Dmitriev, D.Y. On efficient randomized algorithms for finding the PageRank vector. Comput. Math. and Math. Phys. 55, 349–365 (2015). https://doi.org/10.1134/S0965542515030069
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0965542515030069