FPPR: fast pessimistic (dynamic) PageRank to update PageRank in evolving directed graphs on network changes

Pashikanti, Rohith Parjanya; Kundu, Suman

doi:10.1007/s13278-022-00968-8

FPPR: fast pessimistic (dynamic) PageRank to update PageRank in evolving directed graphs on network changes

Original Paper
Published: 25 September 2022

Volume 12, article number 141, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Social Network Analysis and Mining Aims and scope Submit manuscript

FPPR: fast pessimistic (dynamic) PageRank to update PageRank in evolving directed graphs on network changes

Download PDF

Rohith Parjanya Pashikanti¹^na1 &
Suman Kundu¹^na1

281 Accesses
2 Citations
Explore all metrics

Abstract

The paper presents a new algorithm FPPR which updates PageRanks of a directed network after topological changes in the graphs. The algorithm is capable of regenerating scores on node and link addition/deletion. The changes in the expected value of random surfers are used for updating the scores of the newly added nodes as well as the impacted chain where the nodes/links are added or removed. The complexity of the algorithm for k new node addition is $\mathcal {O}(k\times d^{(k)}_{avg})$ where $d^{(k)}_{avg}$ is the average degree of k nodes added. On the other hand for node deletion, the complexity is $\mathcal {O}(|V_s|+|E_s|)$ where $V_s$ and $E_s$ the set of nodes and edges updated using Selective Breath First Update. Extensive experiments have been performed on different synthetic and real-world networks. The experimental result shows that the rank generated by the proposed method is highly correlated with that of the recalculation on changes using the benchmark Power Iteration algorithm.

FPPR: Fast Pessimistic PageRank for Dynamic Directed Graphs

Fast Incremental PageRank on Dynamic Networks

Monte Carlo Based Incremental PageRank on Evolving Graphs

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Ranking search results of any web search is an important task. PageRank (Page et al. 1998) is one of the pioneer algorithms to rank web pages. There were only a few pages in the initial days of the internet. So, static page ranking algorithms were sufficient. However, as the WWW (World Wide Web) started growing, the calculation of PageRank became more and more complex and computationally challenging. With the growth of the internet, many subnetworks appeared and PageRank provides different values to those networks. Thus PageRank is not limited to web search only. These subnetworks may have dynamic characteristics. Dynamic networks are those networks where nodes and links get added or deleted with time. Today’s internet is full of dynamic networks. For example, the Twitter retweet network, where the addition of nodes (retweets) happens frequently, the following/followers network in Twitter/Instagram, where addition and deletion of nodes, links take place frequently, Twitter mentions network changes with each Tweet post, the citation network, where research papers get added over time, etc. Calculating PageRank in such a dynamic network is an important and challenging research problem. The trivial way to find PageRank in dynamic networks is to run the static PageRank methods after every update in the network. This is time-consuming and non-sustainable for rapid updates in the network.

The real-time use cases like finding the top-k popular products in an online shopping cart or finding top-k spreaders of news considering reshare network, etc., will need high-speed computation of PageRank. PageRank is also used to get suggestions to users with other accounts to follow in Twitter (Gupta et al. 2013). PageRank also finds applications in networks outside the internet. For example, identify new possible drug targets in proteins (Iván and Grolmusz 2011) in biochemistry; predicting how many pedestrians/vehicles visit the individual places or streets (Jiang 2009), etc.

The PageRank was initially proposed in Page et al. (1998). It is the classical algorithm for ranking web pages. Over the years, many variations of PageRank (Lempel and Moran 2000; Breyer 2002; Langville and Meyer 2004; Salehi 2007; Zhou 2015; Vargas 2020) had been evolved. Randomized algorithms (Breyer 2002; Vargas 2020; Salehi 2007; Zhou 2015; Avrachenkov et al. 2007) for approximating PageRanks were also introduced. These algorithms are called Monte Carlo-based algorithms. Randomized Monte Carlo algorithms provide a reasonable estimation of PageRank. All these algorithms are designed for static networks. There are methods in the literature to solve the problem of dynamic PageRank (Desikan et al. 2005; Chien et al. 2004; Langville and Meyer 2004; Zhan et al. 2019; Isham and Seneta 1983; Liao et al. 2017) as well. In Desikan et al. (2005), Chien et al. (2004), Langville and Meyer (2004), a subset of graph is chosen and PageRank is computed on that subset using static methods for any updates. Zhan et al. (2019), Isham and Seneta (1983) and Liao et al. (2017) random walk model is used, where random walk segments are adjusted in case of updates. However, these algorithms could not provide satisfying accuracy in terms of relative ranking of nodes or running time for updates, especially for Big Data networks. Hence, a simple dynamic PageRank algorithm that runs fast is critical to address the above problems.

This paper proposes a simple algorithm, namely fast pessimistic dynamic PageRank (FPPR) to approximate the PageRank on change in network topology. Different topology changes considered are (i) adding a new node/link to the network and (ii) deleting a node/link from the network. The present work is an extension of our initial results (Parjanya and Kundu 2022) where only node addition was considered. The proposed algorithm uses the expected value of random surfers to re-calculate the score for changes in the topology. Here by the expected value of random surfers, we mean the estimated number of visits at a node by the random surfers, considering the network is static at that point. For example, when a new node (or link) is added, the score is calculated by adding the estimated scores contributed by in-links to that new node (or target node) and estimating the visits by random surfers through the link chain it is being added to. The same method is used to update existing nodes through the out-links of the newly added node. These are the links that the random surfer may use to visit (or go out from) that node if the static PageRank algorithm was used at that point. FPPR uses Selective Breadth First Update (SBFU) for node deletion. SBFU updates the score of nodes and links traversed during the process with the appropriate deductions calculated by the expected values. Further for link deletion, FPPR performs a local update adjusting the PageRank score of the target node only. The proposed algorithm takes $\mathcal {O}(k\times d_{avg}^{(k)})$ time complexity for k node or link addition. Here $d_{avg}^{(k)}$ is the average degree of the k nodes added to the graph. Time complexity for node deletion and link deletion is $\mathcal {O}(V_s+E_s)$ and $\mathcal {O}(1)$, respectively. $V_s$ is the updated set of nodes, and $E_s$ is the set of edges associated with $V_s$. Further, FPPR takes only $\mathcal {O}(|V|)$ additional space. Specifically, $4\times |V|$ space is used in the worst case for node addition. For node deletion, additional E space is required. Thus FPPR takes a total $\mathcal {O}(|V+E|)$ additional space. The experiments are performed over several synthetic and real-world networks considering different dynamic behavior of the network. Random graph generators from the networkX library are used to test FPPR for link addition and deletion. The experimental results show that the ranking of the proposed method is highly correlated with the ranking of the benchmark Power Iteration-based recalculation and better than the state-of-the-art methods like Fast Incremental PageRank (FIPR) (Zhan et al. 2019) and Offset Score Propagation (OSP) (Yoon et al. 2018). The FPPR is also tested on the simulation of a real-world graph of growth and decay together. The proposed method has better performance than FIPR (Zhan et al. 2019). In summary, the major contributions of the paper are

1.
We propose FPPR algorithm for the dynamic network, which takes less computation and space with respect to other comparing methods, including classical and state-of-the-art algorithms. The proposed algorithm can estimate the PageRank for both node and link addition and deletion.
2.
We experimentally show that the updated page ranks are highly correlated with those of the Power Iteration (PI) method. Spearman’s rank correlation coefficient is used to compare the ranking of the proposed FPPR with that of the comparing methods.
3.
We showed with experiments that the proposed algorithm works with different network changes for evolving networks. Both growing and decaying networks are simulated.

The paper is organized as: Sect. 1.1 describes the graph model, Sect. 2 provides a brief literature review, the proposed FPPR method and its rationale are presented in Sect. 3, experiments performed and corresponding results are reported in Sect. 4. Finally, Sect. 5 describes the conclusions of the research work.

1.1 The model

In the paper, a directed network is represented with graph G(V, E) where V is a set of nodes in the network and $E = V \times V$ is the set of edges. The graph is directed, i.e., $e(u,v) \ne e(v,u)$. Also, we assume that there is no self-loop in the graph. Symbols used throughout the paper are provided in Table 1 for the reader’s reference.

Table 1 Symbol table

Full size table

2 Related work

Since the inception of WWW in the 1970s, the size of the internet has been increasing on a rapid scale. There is a need for ranking pages to find relevant information from it. This led to the invention of search engines to make web searches possible for any user query. With the increase in size rank of the pages need updates. The early search engines use algorithms like HITS (Kleinberg 1999) and PageRank (Page et al. 1998). Over the years, there have been many methods to get PageRanks, which can be divided into (i) classical PageRank, (ii) static Monte Carlo-based methods, and (iii) PageRank for dynamic networks.

2.1 PageRank

PageRank (Page et al. 1998) defines the importance of web pages based on the link structure of the web. PageRank is inspired by the eigenvector centrality measure. The PageRank of any node u is calculated based on the set of nodes ($\Gamma _{in}(u)$) that point to u (backward links) and the set of nodes ($\Gamma _{out}(u)$) that u points to (forward links). Then the ranking of any node u ($\pi _u$) is given by, $\pi _u = \sum _{b\in \Gamma _{out}(u)}\frac{\pi _b}{|\Gamma _{out}(b)|}$. The PageRank for nodes with no hyperlinks is given by $\frac{1}{\text {Total number of nodes}}$.

In simple terms, it uses a random surfer model where the random surfer clicks out-links with probability $1-\rho$ and terminates its walk to start a new walk at a random page with probability $\rho$. The PageRank transition matrix ($P^*$) is as follows.

$$\begin{aligned} P^* = (1-\rho )P + \rho \frac{1}{n}I \end{aligned}$$

(1)

where I is the unit square matrix and n is the number of nodes in the network. P is transition probability matrix in which each entry is $\frac{1}{d_{out}(u)}$ when $(u,v)\in E$ and 0 otherwise. Here $d_{out}(u) = |\Gamma _{out}(u)|$ is the out-degree of u. The simple pseudocode for calculating the PageRank vector ($\pi$) is described below.

$$\begin{aligned}&{\textbf {while}}\,\,{\delta > \varepsilon }\quad {\textbf {do}}\\&\qquad \pi ^{(i+1)} = \pi ^{(i)} P^*\\&\qquad \varepsilon = |\pi ^{(i+1)} - \pi ^{(i)}|\\&{\textbf {end while}} \end{aligned}$$

The $\delta$ is the convergence threshold provided by the user, and $(i+1)$ is the current iteration. The converged $\pi ^{(i+1)}$ is final PageRank vector.

2.2 Static Monte Carlo-based PageRank algorithms

The Monte-Carlo methods are used to approximate classical PageRank (Page et al. 1998). Salehi (2007), it is stated that PageRank is nothing but a finite-state Markov Chain and there exists an eigenvalue 1. Hence $\pi = P^*\pi$ and $\pi$ is the final rank vector. Replacing $P^*$ with [Eq. (1)] gives the following equation that is interpreted (Zhou 2015) as the distribution of all the random walks ending at each node.

$$\begin{aligned} \pi = \frac{(1-\rho )}{n}[1][I-(\rho )P]^{-1} \end{aligned}$$

(2)

There are many variations evolved (Breyer 2002; Vargas 2020; Salehi 2007; Zhou 2015; Avrachenkov et al. 2007) from the above formulation for approximating PageRank ($\pi$). All these methods majorly follow four different strategies.

1.
Monte Carlo end point with a random start: in this type of method, simulation of N number of random walks initiated at any random node in the graph. The final PageRank of any node u is calculated as the total number of random walks terminating at u, divided by the total random walks. Final rank of u is defined as $\pi _u = \frac{[\text {random walk termination at u}]}{\text {N}}$.
2.
Monte Carlo end point with cyclic start: here, simulation of N random walks initiated at every node u in the graph with an equal number of simulations m. The final PageRank of any node u is defined as $\pi _u = \frac{[\text {terminations at u}]}{N \times m}$.
3.
Monte Carlo complete path: these methods involve simulation of N number of random walks initiated at every node in the graph with an equal number of simulations m. The PageRank of any node u is defined as $\pi _u = \frac{[visits_u]}{\Sigma ^{n}_{u=1}visits}$, where n is total number of nodes.
4.
Monte Carlo complete path stopping at the dangling nodes: it is similar to the Monte Carlo complete path, but the random walk stops at the dangling node. A dangling node is a node with no out-links. In this method, R number of random walks is simulated starting from each node, and the random walk terminates at dangling nodes. The PageRank of any node u is defined as $\pi _u = \frac{[visits_u]}{\Sigma ^{n}_{u=1}visits}$, where n is total number of nodes.

It is shown in Vargas (2020) that out of all the mentioned methods, the last strategy shows better performance in terms of execution time.

2.3 PageRank for dynamic networks

Dynamic networks are those networks where nodes and links get added or deleted dynamically in real time. This is more practical in today’s web 2.0 applications. Many algorithms for the dynamic network were proposed in the literature. These are broadly classified into two categories:

1.
Aggregation algorithms (Desikan et al. 2005; Chien et al. 2004; Langville and Meyer 2004): in this type of method, the algorithm carefully finds the subset of the graph in the vicinity of the updated node or edge, and other parts of the graph are assumed to be supernodes. This gives the smaller graph and then computes the PageRank using static methods. The disadvantages of this approach include accuracy, approximation error, and slower execution time. Accuracy in this type of algorithm purely depends on the selected subset (Bahmani et al. 2010). The approximation error can also accumulate over time. It involves high aggregation computation resulting in a slower execution time.
2.
Monte Carlo-based algorithms (Zhan et al. 2019; Liao et al. 2017; Yoon et al. 2018; Ohsaka 2015; Bautista and Latapy 2022): on the other hand, Monte Carlo based approaches use the theory of Markov Chains (Isham and Seneta 1983). In this method, the algorithm needs to store all the random walks made along with the visits that each random walk contributes to each node. The random walk segment is adjusted only if its path has an updated node or edge (Zhan et al. 2019; Liao et al. 2017). On the other hand, the Offset Score Propagation (OSP) (Yoon et al. 2018) algorithm first calculates offset scores around the modified edges and then propagates the offset scores across the updated graph. Finally, it merges these scores with the current Random Walk Restart (RWR) (Tong et al. 2006) scores to get the updated RWR scores. Ohsaka (2015), whenever an update happens, the residual is calculated, and the PageRank vector is updated by adding the residual. Most recent algorithm proposed in Bautista and Latapy (2022) uses Chebyshev polynomials to approximate PageRank. Monte Carlo methods have two major issues. First, high space usage as the graph evolves. Along with graph evolution, random walks that need to be re-initiated also get lengthy. Second, the random walk segment is to be removed, and the simulated new random walk segment follows the same distribution but is actually a different segment. Hence it brings errors.

3 Proposed fast pessimistic dynamic PageRank (FPPR) for evolving directed graphs

In this section, we present the FPPR algorithm, which is capable of recalculating the PageRanks of a directed network upon the topological changes in the network.

3.1 FPPR for node addition and link addition

When a new node is added, FPPR (Parjanya and Kundu 2022) calculates the expected score of random surfers considering the static graph. That is it tries to calculate the expected score generated by the random walks if the Monte Carlo method is executed on the graph after the changes in the network. We calculate this score in two phases. We will explain this process through examples shown in Fig. 1. Let ‘x6’ be a newly added node. The expected value of (considering $\rho =0.2$) of random walks through out-link reaching ‘x6’ considering a static Monte Carlo simulation is ${80\%}$. This is what we would like to calculate in the first phase. We call this score as Approximate Visits defined in Definition 1. We assume the incoming link is linear in the second phase. This linear assumption makes it easier to check the visits carried by random walks initiated at any node in the chain (all the connected nodes with the same ID), especially when there exists a loop in the chain. For example, in Fig. 1a, R random walks starting from the node (1) holding chain ID ‘id1’ would have reached ‘x6’ in 3 hops as shown in orange color, i.e., the expected number of visits is $R*(0.8)^3$. Similarly, the random walk initiated at node (2) must have gone ‘x6’ in 2 hops (the expected visit is $R*(0.8)^2$) as shown in blue color, and the random walk initiated at node (3) must have gone ‘x6’ in 1 hop (the expected visit is $R*(0.8)^1$) as shown in pink color. All these contribute to the score of the new node. This linearity assumption will provide the correct expectation value for chain ‘id1’ in Fig. 1b even though it is a loop in the network. Accordingly, we defined link sensitivity index in Definition 2. This linearity assumption is taken in a pessimistic way. The literal meaning of ‘pessimistic’ is to think about the worst case happening often. While estimating the random walker visits, we believe that the existence of a loop is the worst-case scenario. Because it will be hard to estimate how many times the random walkers had taken the loop and how many visits it contributed to any node. As a ‘pessimistic’ view, we consider there will always be some existence of a loop, and a linearity assumption is required to calculate the baseline values of the update. Note that the same formulation may not be valid for chain ‘id4’ in Fig. 1b. However, the algorithm thinks pessimistic that either the chain is linearly contributing or it is a loop. In such a case, the algorithm is prone to error. For example, in Fig. 1b, chain ‘id4’ doesn’t have a loop, but still, the algorithms assume the chain is linear/has a loop and accordingly calculate the PageRank scores of ‘x6’. However, we expect this error will be very small. A similar process is adopted in addressing the out-link from the new node as well.

A new link addition in a network would only modify the score of the target node. When a new link is added between source and target, the above-discussed procedure is followed on the target node to estimate the PageRank of the target node.

Definition 1

(Approximate visits (AV)) AV estimates the contribution of the incoming links to the newly added node that a random walk might have used if the random walk is executed from scratch. If nodes have bidirectional edges, a max of approximate visits is considered for both nodes. It is calculated as:

$$\begin{aligned} AV(v) \leftarrow \sum _{u\in \Gamma _{in}(v)} \frac{1}{d_{out}(u)} *(1-\rho ) *(AV(u)) \end{aligned}$$

(3)

Here, $\Gamma _{in}(\cdot )$ and $d_{out}(\cdot )$ return the set of incoming neighbors and out-degree, respectively. AV(u, v) is the contribution of score of node u to v iff u, v are neighbors and it is defined as:

$$\begin{aligned} AV(u,v) \leftarrow \frac{1}{d_{out}(u)} *(1-\rho ) *(AV(u)) \end{aligned}$$

(4)

Definition 2

(Link sensitivity index (LSI)) LSI defines the amount of scores a node gets from the whole link-chain it is joining. In other words, it is the sum of scores received by the random surfers initiated from the nodes of the chain which are added before this node. We assume the link chain is linear, and the LSI is mathematically defined by:

$$\begin{aligned} LSI(v_i) \leftarrow c + c (c)^1 + c (c)^2 + c (c)^3 + ... + c (c)^i = \frac{c(1-(c)^i) }{ (1-c)} \end{aligned}$$

(5)

Here, $c=1-\rho$ is the probability that a random surfer moves forward and i is the length of the chain up to the node $v_i$. In our experiment, we took $c=0.8$.

FPPR does not keep track of all the random walk segments or aggregations. FPPR takes only $4\times |V|$ space for node/link addition as the proposed algorithm uses 4 vectors of size V namely linkID, approxVisits, LinkIDlength, outdegree. That is, space complexity is $\mathcal {O}(|V|)$. For a network with billions of nodes, this provides a significant improvement.

The Algorithms 1 and 2 shows the steps of FPPR algorithm for node and link addition. Each node in the network will have a linkID and all the nodes in a chain will have the same linkID. The linkID of a node is the ID given to a node to identify the chain to which it belongs. In other words, linkID keeps track of distinct chains in the graph. More than one chain may pass through one node. In that case, our proposed method uses the lower linkID for that node. One may choose it randomly because this convention does not have any effect on the scores. As we are keeping track of every link(chain) in the graph, we store the length of each linkID.

3.1.1 FPPR algorithm for node deletion

When a node is deleted, FPPR opts for a Selective Breadth-First Update (SBFU) approach (Algorithm 3) from the deleted node. As part of node deletion, the score contribution from the source node to the target node is used to update the changes in the target node to approximate PageRank. These values can be stored as edge weight parameters during graph evolution itself. In Algorithm 1, lines 8, 24 stores AV(u, v) as edge weight parameter. The AV(u, v) is removed from the target node (v) in the first level of SBFU. In other words, whatever score is contributed from the source node to the target node in the addition phase is now removed/subtracted from the target node. From the next level, the edge weight (AV(u, v)) and PageRank score of the nodes are updated accordingly, i.e., all the neighbors of the nodes that updated (scores (AV(u, v)) being subtracted) are pushed into the queue and further Breadth First Traversal continues until the queue is empty. Over time during the addition phase, it might happen that a source node’s contribution (AV(u, v)) is greater than the target node’s score itself. In that case, the target node’s score is already undervalued. So, no need for further subtraction/removal of its score. This is the reason that the algorithm SBFU puts only selected nodes, for which incoming edge weights and incoming node’s score have been updated, into the queue.

3.1.2 FPPR algorithm for link deletion

For link deletion, the simple local update is used in FPPR. The AV(u, v) value is removed from the target node. This local update causes approximation error but doesn’t seem significant in the experimental results.

4 Experiments and results

4.1 Dataset

Different experiments for comparing the performance of proposed FPPR have been conducted on synthetically generated networks as well as real-world networks. These experiments consider node/link addition/deletion. Both the random network and Barabasi–Albert (Albert and Barabási 2002) networks are used for testing FPPR for node addition and deletion. For node deletion, upto 50 percent of the nodes chosen randomly are deleted. The random graph generators in Networkx library like Erdos–Renyi (Erdös and Rényi 2011), GNM graphs (Knuth 2014) are used for experimenting FPPR for link addition and deletion. For link deletion, up to 10 percent of the randomly chosen links are deleted. The real-world networks are Wiebo reshare network of (Chuai and Zhao 2020) and reptilia-tortoise network, aves-weaver network, mammalia-voles network, bio-mouse-gene network, bio-human-gene2 network from the NetworkRepository (Rossi and Ahmed 2015) and Slashdot social network (Leskovec et al. 2009), Epinions social network (Richardson 2003) from Stanford Large Network Dataset Collection. The salient features of these graphs are presented in Table 2.

Table 2 Dataset used for different experiments

Full size table

As the Erdos–Renyi random graph does not support growth, we use Algorithm 5 to generate the random network. In order to get the dynamic character, we started with one node and added each node according to their creation for Barabasi–Albert and random networks.

4.2 Comparing methods

The proposed methods have been compared with the following methods.

Power Iteration (PI) (Page et al. 1998): we consider this method as a benchmark in our experiments. In the experiment, we restarted PI algorithm and recalculated PageRank for the whole graph on appropriate (depending upon experiments) node/link addition or deletion.
Static Monte Carlo (MC) (Vargas 2020): static Monte-Carlo method is the method of approximate PageRanks of the network using Monte Carlo method. We implemented the version of the complete path with dangling nodes. Similar to PI, we recalculate the PageRanks on network changes in appropriate steps. The number of random walks considered in the experiment is 1000.
Fast Incremental PageRank on Dynamic Networks (FIPR) (Zhan et al. 2019): the method is designed for dynamic networks and proposed in 2019. As our algorithm is designed for dynamic networks, we included this method as related research. Parameter R is set to 16 in the experiments.
Offset Score Propagation (OSP) (Yoon et al. 2018): this method was created for dynamic networks and proposed in 2018. As our algorithm is designed for dynamic networks, we included this method as related research.

4.3 Comparing parameters

All comparing algorithms were executed on different graphs. As expected, the absolute values of PageRanks by different algorithms of a node are different. The results for 10 random nodes for all the datasets with respect to node addition are shown in Fig. 2 for reference. Hence, comparing different algorithms in terms of absolute values of the PageRank is not fair. Therefore Spearman’s rank correlation coefficient (Spearman 1904) is used to compare the ranking of the proposed FPPR with that of the comparing methods. The Spearman correlation between two vectors will be high when observations have a similar rank between the two variables and it will be lower otherwise. A value of it between 0.8 to 1 is considered to be strongly correlated (Zar 2005). Apart from the Spearman rank correlation coefficient, the comparing parameters include (i) change in Spearman rank correlation coefficient over time and (ii) execution time.

4.4 Results

4.4.1 FPPR for node addition

Accuracy: the results of Spearman’s ranking coefficient for all nodes in the network for node addition are shown in Fig. 3. It is evident from the result that the proposed FPPR is highly correlated with the benchmark PI method. Median and mean Spearman’s correlation with PI method for all the experiments performed are 0.98 and 0.97, respectively. Further, the proposed method is equal to or better than FIPR for all the networks except Weibo2. The proposed method is also performing better or close to the OSP algorithm. Note that MC and PI are highly correlated as both are recalculated over the full graph once a new node is added to the graph.

Spearman’s rank correlation with changes in network: as part of the experiment, we would like to see how Spearman’s correlation coefficient changes with the addition of nodes. We recalculate Spearman’s correlation coefficient of the proposed algorithm against the benchmark Power Iteration for the addition of each 10 nodes. The result is plotted in Fig. 4 for 6 datasets. The value dipped around 26% for both random networks, while the Barabasi–Albert network shows consistent improvement in the value of Spearman’s correlation coefficient.

Execution time: we checked the overall execution time of different comparing algorithms and found that the proposed algorithm is much faster than all the methods we experimented with for all datasets with respect to node addition. The comparison is presented in Fig. 5.

4.4.2 FPPR for node deletion

Accuracy: for testing FPPR deletion accuracy, we deleted 50 percent of the randomly chosen nodes from the network and plotted the heatmap of Spearman rank correlation in Fig. 6 similar to node addition. It has been observed that FPPR is performing well with respect to the benchmark PI method. Median and mean Spearman’s correlation with PI method for all the experiments performed for node deletion are 0.89 and 0.90, respectively. FPPR is also observed to be equal to or better than FIPR and OSP.

Spearman’s rank correlation with changes in network: the Spearman rank correlation coefficient is recorded after regular intervals and plotted in Fig. 7. It is transparent that there is a gradual dip in the Spearman rank correlation. The reason for the downhill is that for deletion, FPPR uses Selective Breadth First Update (SBFU). In SBFU, the breadth-first traversal is used, marking visited nodes. In such a case, one node is visited only once, i.e., only one update is possible on each neighboring node. For the scenario shown in Fig. 8a, considering the deletion of the node x, there is a need to update the score of the neighbors of x more than once because the scores of n1, n2, n3 are updated accordingly as x deleted. Still, there is a link between n2, n3 and n3, n1, which must be crawled. This needs further update of score in node n3 and n1 as shown in Fig. 8b. SBFU cannot update the neighboring nodes more than once, accumulating errors over time.

Execution time: the execution time for all the comparing methods with respect to node deletion is plotted in Fig. 9. The FPPR takes a little higher or equal execution time as FPPR’s deletion time depends on the set of updated nodes along with its associated edges.

4.4.3 FPPR for link addition and deletion

Accuracy: the accuracy for the link addition is shown in Fig. 10a–h. Median and mean Spearman’s correlation with the PI method for all the experiments performed are 0.95 and 0.93 for link addition, and 0.94 and 0.90 for link deletion, respectively. This shows that the proposed FPPR is highly correlated with the benchmark PI method. The proposed FPPR is also performing reasonably good in comparison with FIPR except for GNM2 graphs.

As a part of the experiment, 10 percent of the randomly chosen links are deleted from the graph, and PageRank scores are computed with respect to all comparing algorithms. The accuracy for link deletion is depicted in Fig. 10i–p. When coming to the deletion, FPPR performs better than all the comparing methods. Note that all the graphs used for the link addition and deletion experiment Monte Carlo method (MC) had a poor performance as the graphs generated are highly dense.

Spearman’s rank correlation with changes in network: the change in the spearman rank correlation is recorded in regular intervals (after addition or deletion of every 10 links) is shown in Fig. 11a–h and i–p. For link addition, the FPPR spearman rank correlation with PI has minute fluctuations and gradually decreases. For link deletion, there are sharp transitions in the graph. This is due to the local update of the FPPR for link deletion.

Execution time: the execution time of the proposed FPPR is much faster than all the comparing methods for link addition and deletion as evident from Fig. 12.

4.5 Accuracy of FPPR with the change of parameter $\rho$

As part of the experiment, the behavior of FPPR is noted with respect to the change of the parameter $\rho$ (the probability of a random surfer restarting its walk). The parameter $\rho$ is set to values 0.2, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and checked the accuracy of FPPR. The results are captured in Fig. 13. It is evident from the results that the changes in Spearman’s correlation coefficient are very small (between 0.02 and 0.04) in general except Weibo network where it is about 0.12.

4.6 Performance on large-scale data

Four different large-scale temporal datasets have been used to testify the performance of the proposed FPPR algorithms. These networks have nodes ranging from 14 to 77 K while the number of links are ranging from 500 K to 14.5 M. As the datasets are temporal we added the nodes and edges on the network based on their times and calculated the Spearman’s correlation at the end of the generation of the full network. The results are shown in Fig. 14. The results of Spearman’s correlation coefficient are very close to 1 for 3 out of 4 datasets while the results of L2 are also more than 0.8. These results show high accuracy of the proposed method. One of the main objectives of the experiment was to see the performance in terms of execution time. The proposed algorithm is able to generate results in less than 10 seconds for all the datasets. We also captured the accuracy of the FPPR VS Power method along with the FPPR VS matrix multiplication method using Pseudocode. 0 for few graphs in Fig. 14c.

4.7 Approximation error with the exact PageRank

Although Power Iteration is a well-known method for practically performing the PageRank calculation, we wish to see how this is different for matrix multiplication methods. We experimented with small-size datasets and calculate Spearman’s correlation coefficient for the proposed FPPR against both the Power Iteration method and the matrix Multiplication method. The result is plotted in Fig. 15. As expected the correlation coefficient is lower with the matrix multiplication method; however, it is not too far from it. In fact, in both cases, the results show a high correlation with the ranking as the results are over 0.8 for all the datasets.

Table 3 Computation complexities and space complexities of different PageRank algorithms

Full size table

4.8 Computation complexity

Computation complexities for all the operations, node addition, node deletion, link addition, and link deletion of different algorithms along with the space complexity are shown in Table 3. The worst-case complexity of the proposed algorithm for node addition is $\mathcal {O}(k\times d_{avg}^{k})$, where k is the number of nodes/links added to the network as in each update the Algorithm 1 updates for all the outgoing links (line 13) and calculates for all incoming edges (line 2). Each calculation can be done in linear time with the formula presented in Eqs. (3)–(5). The worst-case complexity of the proposed algorithm for node deletion, link deletion is $\mathcal {O}(|V_s| + |E_s|)$ and $\mathcal {O}(1)$, respectively, where $V_s$ is the set of nodes that are updated, $E_s$ is the edges associated with those updated edges. Space required for the proposed algorithm is $4\times |V|$ for node and link addition to keep 4 vectors corresponding to linkID, approxVisits, LinkIDlength, and outdegree. In order to have better deletion techniques intermediate AV(u, v) are stored for each links in the network. Hence, it is taking O(|E|) space. That follows the cumulative space requirement is $O(|V| + |E|)$.

4.9 FPPR for real-world graph simulation (modeling growth and decay together)

Eventually, real-world graph simulators (Algorithm 6) are used to test FPPR in both graph decay and graph growth. In this real-world graph simulation, all four operations have been incorporated: node addition, node deletion, link addition, and link deletion. All the four mentioned operations are performed based on four independent probabilities ($\rho _{n+}, \rho _{l+}, \rho _{n-}, \rho _{l-}$). We tried to capture graph growth and decay by adjusting the independent probabilities. We used them in our experiments to check the accuracy of FPPR concerning other mentioned PageRank algorithms. The setup used for graph growth and decay is as follows. For growing graph, $\rho _{n+}=0.2, \rho _{l+}= 0.2, \rho _{n-}=0.01, \rho _{l-}=0.01$ and for decaying graph, $\rho _{n+}=0.1, \rho _{l+}=0.01, \rho _{n-}=0.3, \rho _{l-}=0.2$.

Accuracy: the accuracy test is performed on various real-world datasets as shown in Table 2 and also on the graphs generated by the real-world simulator (Algorithm 6). Both the graph growth and decay are tested for accuracy. For graph growth and decay, the number of operations (node add, node delete, link add, link delete) are set to 50, 100, and 200. For growth, FPPR is in good correlation with respect to the benchmark PI method. Median and mean Spearman’s correlation with the PI method for all the real-world graph growth simulation experiments are 0.90 and 0.86. For decay, FPPR is reasonably correlated with the PI method, and median and mean values of 0.76 and 0.76 were achieved. In all the dataset graphs, FPPR performed better than the FIPR. The results are shown in Fig. 16a–f for graph growth, Fig. 16g–l for graph decay.

Spearman’s rank correlation with changes in network: the changes in the Spearman rank correlation are captured in regular intervals (after every ten operations) for both graph growth and decay. The results are shown in Fig. 17a–f for node growth, Fig. 17g–l for link growth and Fig. 18a–f for node decay, Fig. 18g–l for link decay. The graph represents time on the x-axis, the change in Spearman rank coefficient of all the comparing methods with respect to the benchmark PI method on the y-axis, and a bar plot showing the number of nodes/links on the z-axis. In the graph growth phase, it is evident that the proposed FPPR is performing better than or equal to FIPR. FPPR is even performing equally with respect to the static MC method for the graphs RWS1, RWS2, ReW1, and ReW3. FPPR had a gradual dip in Spearman rank correlation in the graph decay phase but performed better than or equal to FIPR.

Execution time: the proposed FPPR has faster execution times compared to all the comparing methods in both graph growth and decay phases. The results are presented in Fig. 19a–f for graph growth and in Fig. 19g–l for graph decay.

5 Conclusion

The present paper proposed a new algorithm for calculating PageRank for dynamic directed graphs. The algorithm estimates the PageRank concerning node addition, link addition, node deletion, and link deletion. We showed through experimental results that the results of the proposed algorithm are highly correlated with that of the benchmark Power Iteration method. In particular, the minimum correlation in ranking found for the addition of node, deletion of node, the addition of link, and deletion of the link are 0.95, 0.82, 0.77, and 0.83, respectively. The proposed FPPR is also shown to perform better in the ranking than the FIPR algorithm and better than or equal to the state-of-the-art OSP algorithm for all different topological changes in the network. The execution time is significantly faster while providing more acceptable results.

While the PageRanks of growing and sinking networks with the proposed FPPR method show comparable and better results against FIPR, the execution time for the proposed algorithm is very less. The proposed algorithm performs better in terms of execution time and accuracy for all the operations except the node deletion case.

References

Albert Réka, Barabási Albert-László (2002) Statistical mechanics of complex networks. Rev Mod Phys 74(1):47–97
Article MathSciNet MATH Google Scholar
Avrachenkov K, Litvak N, Nemirovsky D, Osipova N (2007) Monte Carlo methods in PageRank computation: when one iteration is sufficient. SIAM J Numer Anal 45(2):890–904
Article MathSciNet MATH Google Scholar
Bahmani Bahman, Chowdhury Abdur, Goel Ashish (2010) Fast incremental and personalized PageRank. Proc VLDB Endow 4(3):173–184
Article Google Scholar
Bautista Esteban, Latapy Matthieu (2022) A local updating algorithm for personalized PageRank via Chebyshev polynomials. Soc Netw Anal Min 12(1):1–11
Article Google Scholar
Breyer LA (2002) Markovian page ranking distributions: some theory and simulations. Citeseer
Chien Steve, Dwork Cynthia, Kumar Ravi, Simon Daniel R, Sivakumar D (2004) Link evolution: analysis and algorithms. Internet Math 1:277–304
Article MathSciNet MATH Google Scholar
Chuai Y, Zhao J (2020) Anger makes fake news viral online. arXiv:2004.10399
Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: Special interest tracks and posters of the 14th international conference on World Wide Web, WWW ’05, New York. Association for Computing Machinery, pp 1094–1095
Erdös P, Rényi A (2011) On the evolution of random graphs. Princeton University Press, pp 38–82
Gupta P, Goel A, Lin J, Sharma A, Wang D, Zadeh R (2013) Wtf: the who to follow service at twitter. In: Proceedings of the 22nd international conference on World Wide Web, WWW ’13, New York. Association for Computing Machinery, pp 505–514
Isham Valerie, Seneta E (1983) Non-negative matrices and Markov chains. J R Stat Soc. Ser A (Gen) 146(2):202
Article Google Scholar
Iván Gábor, Grolmusz Vince (2011) When the web meets the cell: using personalized PageRank for analyzing protein interaction networks. Bioinformatics 27(3):405–407
Article Google Scholar
Jiang Bin (2009) Ranking spaces for predicting human movement in an urban environment. Int J Geogr Inf Sci 23(7):823–837
Article Google Scholar
Kleinberg Jon M (1999) Authoritative sources in a hyperlinked environment. J ACM 46(5):604–632
Article MathSciNet MATH Google Scholar
Knuth DE (2014) Art of computer programming. Volume 2: seminumerical algorithms. Addison-Wesley Professional
Langville AN, Meyer CD (2004) Updating PageRank with iterative aggregation. In: Proc. of 13th international world wide web conference, New York, pp 1124–1125
Lempel R, Moran S (2000) Stochastic approach for link-structure analysis (SALSA) and the TKC effect. Comput Netw 33(1):387–401
Article Google Scholar
Leskovec Jure, Lang Kevin J, Dasgupta Anirban, Mahoney Michael W (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Article MathSciNet MATH Google Scholar
Liao Q, Jiang S, Yu M, Yang Y, Li T (2017) Monte Carlo based incremental PageRank on evolving graphs. In: Kim J, Shim K, Cao L, Lee J-G, Lin X, Moon Y-S (eds) Advances in knowledge discovery and data mining. Springer, Cham, pp 356–367
Ohsaka N, Maehara T, Kawarabayashi KI (2015) Efficient PageRank tracking in evolving networks. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 875–884
Page Lawrence, Brin Sergey, Motwani Rajeev, Winograd Terry (1998) The PageRank citation ranking: bringing order to the web. WWW Internet Web Inf Syst 54(1999–66):1–17
Google Scholar
Parjanya R, Kundu S (2022) Fppr: fast pessimistic PageRank for dynamic directed graphs. In: Benito RM, Cherifi C, Cherifi H, Moro E, Rocha LM, Sales-Pardo M (eds) Complex networks and their applications X. Springer, Cham, pp 271–281
Richardson M, Agrawal R, Domingos P (2003) Trust management for the semantic web. In: International semantic Web conference. Springer, pp 351–368
Rossi R, Ahmed N (2015) The network data repository with interactive graph analytics and visualization. In: Proceedings of the 29th AAAI conference on artificial intelligence, AAAI’15. AAAI Press, pp 4292–4293
Salehi O (2007) PageRank algorithm and Monte Carlo methods in PageRank computation. PhD thesis, Bogazici University
Spearman C (1904) The proof and measurement of association between two things. Am J Psychol 15(1):72
Article Google Scholar
Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: 6th international conference on data mining (ICDM’06). IEEE, pp 613–622
Vargas B (2020) Exploring PageRank algorithms: power iteration and Monte Carlo methods. PhD thesis, California State University, San Marcos
Yoon M, Jin W, Kang U (2018) Fast and accurate random walk with restart on dynamic graphs with guarantees. In: Proceedings of the 2018 World Wide Web Conference, pp 409–418
Zar JH (2005) Spearman rank correlation. In: Encyclopedia of biostatistics. Wiley
Zhan Z, Hu R, Gao X, Huai N (2019) Fast incremental PageRank on dynamic networks. In: Bakaev M, Frasincar F, In-Young K (eds) Proc. of international conference on web engineering, volume 11496 LNCS. Springer, Cham, pp 154–168
Zhou Z (2015) Evaluation of Monte Carlo method in PageRank. PhD thesis, University of Missouri-Columbia

Download references

Author information

R. P. Pashikanti and S. Kundu are contributed equally to this work.

Authors and Affiliations

Cognitive and Social Analytics Lab, Department of Computer Science and Engineering, Indian Institute of Technology Jodhpur, Rajasthan, 342030, Jodhpur, India
Rohith Parjanya Pashikanti & Suman Kundu

Authors

Rohith Parjanya Pashikanti
View author publications
You can also search for this author in PubMed Google Scholar
Suman Kundu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suman Kundu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Pashikanti, R.P., Kundu, S. FPPR: fast pessimistic (dynamic) PageRank to update PageRank in evolving directed graphs on network changes. Soc. Netw. Anal. Min. 12, 141 (2022). https://doi.org/10.1007/s13278-022-00968-8

Download citation

Received: 01 April 2022
Revised: 30 August 2022
Accepted: 05 September 2022
Published: 25 September 2022
DOI: https://doi.org/10.1007/s13278-022-00968-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

FPPR: fast pessimistic (dynamic) PageRank to update PageRank in evolving directed graphs on network changes

Abstract

Similar content being viewed by others

FPPR: Fast Pessimistic PageRank for Dynamic Directed Graphs

Fast Incremental PageRank on Dynamic Networks

Monte Carlo Based Incremental PageRank on Evolving Graphs

1 Introduction

1.1 The model

2 Related work

2.1 PageRank

2.2 Static Monte Carlo-based PageRank algorithms

2.3 PageRank for dynamic networks

3 Proposed fast pessimistic dynamic PageRank (FPPR) for evolving directed graphs

3.1 FPPR for node addition and link addition

Definition 1

Definition 2

3.1.1 FPPR algorithm for node deletion

3.1.2 FPPR algorithm for link deletion

4 Experiments and results

4.1 Dataset

4.2 Comparing methods

4.3 Comparing parameters

4.4 Results

4.4.1 FPPR for node addition

4.4.2 FPPR for node deletion

4.4.3 FPPR for link addition and deletion

4.5 Accuracy of FPPR with the change of parameter \(\rho\)

4.6 Performance on large-scale data

4.7 Approximation error with the exact PageRank

4.8 Computation complexity

4.9 FPPR for real-world graph simulation (modeling growth and decay together)

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation