1 Introduction

How can we obtain personalized rankings for users in signed social networks? Many social networks have allowed users to express their trust or distrust to other users. For example, in online social networks such as Slashdot [23], a user is explicitly able to mark other users as friends or foes. The users are represented as nodes, and the expressions are represented as positive and negative edges in graphs which are called signed networks [37]. Ranking nodes in signed networks has received much interest from data mining community to reveal trust and distrust between users [23] inducing many useful applications such as link prediction [35], anomaly detection [23], sign prediction [26], and community detection [42] in signed networks.

Traditional ranking models, however, do not provide satisfactory node rankings in signed networks. Existing random walk-based ranking models such as PageRank [31] and Random Walk with Restart [17, 18, 34, 39, 43, 44] assume only positive edges; thus, they are inappropriate in the signed networks containing negative edges. Many researchers have proposed heuristics on the classical methods to make them computable in signed networks [23, 33]. However, those heuristic methods still have room to improve in terms of ranking quality since they do not consider complex social relationships such as friend of enemy or enemy of friend in their rankings as shown in Fig. 2. In addition, most existing ranking models in signed networks focus only on a global node ranking, although personalized rankings are more desirable for individuals in many contexts such as recommendation. Also, the fast ranking computation is important for the computational performance of applications in SRWR.

In this paper, we propose Signed Random Walk with Restart (SRWR), a novel model for effective personalized node rankings in signed networks. The main idea of SRWR is to introduce a sign into a random surfer in order to let the surfer consider negative edges based on structural balance theory [5, 26]. Consequently, our model considers complex edge relationships and makes random walks interpretable in signed networks. We devise SRWR-Iter, an iterative method which naturally follows the definition of SRWR, and iteratively update SRWR scores until convergence. Furthermore, we propose SRWR-Pre, a preprocessing method for computing SRWR scores quickly which is useful for various applications in signed networks. Through extensive experiments, we demonstrate that our proposed approach offers improved performance for personalized rankings compared to alternative methods in signed social networks. Our main contributions are as follows:

Table 1 Table of symbols
  • Novel ranking model We propose Signed Random Walk with Restart (SRWR), a novel model for personalized rankings in signed networks (Definition 1). We show that our model is a generalized version of RWR working on both signed and unsigned networks (Property 2).

  • Algorithm We propose SRWR-Iter and SRWR-Pre for computing SRWR scores. SRWR-Iter is an iterative algorithm which naturally follows the definition of SRWR (Algorithm 2). SRWR-Pre is a preprocessing method which employs a node reordering technique and block elimination to accelerate SRWR computation speed (Algorithms 3 and 4).

  • Experiment We show that SRWR achieves higher accuracy for link prediction (Fig. 7), predicts trolls \(4 \times \) more accurately (Fig. 9), and provides a good performance for sign prediction compared to other ranking models (Fig. 10). In terms of efficiency, SRWR-Pre preprocesses signed networks up to \(4.5 \times \) faster, and requires \(11 \times \) less memory space than baseline preprocessing methods. Furthermore, SRWR-Pre computes SRWR scores up to \(14 \times \) faster than other methods including SRWR-Iter (Fig. 13).

The code of our method and datasets used in this paper are available at http://datalab.snu.ac.kr/srwrpre. The rest of this paper is organized as follows. We first introduce the formal definition of the personalized ranking problem in signed networks at Sect. 2. Then we provide a review of related works in Sect. 3. In Sect. 4, we describe our proposed model and algorithms for computing personalized rankings. After presenting experimental results in Sect. 5, we conclude in Sect. 6. Table 1 lists the symbols used in this paper.

Fig. 1
figure 1

Example of the personalized node ranking problem in Problem 1. Given a signed network and a seed node (in this example, node A is the seed node), our goal is to compute the trustworthiness score vector \(\mathbf {r}\) w.r.t. the seed node. Our proposed model SRWR (see Definition 1 in Sect. 4) aims to compute \(\mathbf {r}\) based on the positive and negative score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\), i.e., \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\)

2 Problem definition

We define the personalized ranking problem in signed networks as follows:

Problem 1

(Personalized Node Ranking in Signed Networks)

  • Input: a signed network \(G={(\mathbf{V },\mathbf{E })}\) and a seed node s where \(\mathbf{V }\) is the set of nodes, and \(\mathbf{E }\) is the set of signed edges.

  • Output: a trustworthiness score vector \(\mathbf {r}\in {\mathbb {R}}^{n}\) of all other nodes for seed node s to rank those nodes w.r.t. seed node s. \(\square \)

In signed social networks, users are represented as nodes, and trust or distrust relations between users are represented as positive or negative edges. When a user u considers that a user v is trustworthy, a positive edge \(u \rightarrow v\) is formed. On the contrary, a negative edge \(u \rightarrow v\) is formed when u distrusts v. Given those signed edges between nodes and a seed node s, the personalized ranking problem is to rank all other nodes w.r.t. seed node s in the order of trustworthiness scores represented by \(\mathbf {r}\) where \(\mathbf {r}_{u}\) indicates how much seed node s should trust node u as depicted in Fig. 1. If the score \(\mathbf {r}_{u}\) is high, then s is likely to trust u. Otherwise, s is likely to distrust u.

3 Related work

In this section, we review related works, which are categorized into four parts: (1) ranking in unsigned networks, (2) ranking in signed networks, (3) applications in signed networks, and (4) fast personalized ranking methods.

Ranking in unsigned networks There are various global ranking measures based on link structure and random walk, e.g., PageRank (PR) [31], HITS [21], and SALSA [25]. Furthermore, personalized ranking methods are explored in terms of node-to-node relevance such as Random Walk with Restart (RWR) [40], Personalized PageRank (PPR) [13], Personalized SALSA (PSALSA) [2]. Among these measures, RWR has received much interests and has been applied to many applications [1, 10, 15, 20]. Note that these methods are not applicable to signed graphs because they assume only positive edges; on the contrary, our model works on signed networks as well as on unsigned networks.

Ranking in signed networks Many researchers have made great efforts to design global node rankings in signed networks. Kunegis et al. [23] presented Signed spectral Ranking (SR) that heuristically computes PageRank scores based on a signed adjacency matrix. Wu et al. [41] proposed Troll-Trust model (TR-TR) which is a variant of PageRank. In the algorithm, the trustworthiness of an individual user is modeled as a probability that represents the underlying ranking values. Shahriari et al. [33] suggested Modified PageRank (MPR), which computes PageRank in a positive subgraph and a negative subgraph separately, and subtracts negative PageRank scores from positive ones. Although the idea of MPR is easily applicable to other personalized ranking models such as RWR by computing ranking scores on the positive and negative subgraphs, this results in many disconnections between nodes. Note that all those models mainly focus on global node rankings, and they do not consider complex relationships between negative and positive edges such as friend of enemy or enemy of friend; in contrast, our model SRWR provides an effective personalized ranking with considering complicated relationships between nodes based on a social theory such as structural balance theory [5].

Applications in signed networks Numerous applications in signed social networks such as link prediction, troll detection, and sign prediction have been studied in many literatures. Song et al. [35] proposed GAUC (Generalized AUC) to measure the quality of link prediction in signed networks where the link prediction task is to predict nodes which will be positively or negatively linked by a node in the future. They devised a matrix factorization-based method GAUC-OPT which approximately maximizes GAUC for link prediction. Kunegis et al. [23] analyzed the Slashdot dataset from the perspective of troll detection, and proposed Negative Rank (NR) as a variant of PageRank for detecting trolls who behave abnormally in the social network. Leskovec et al. [26] proposed LOGIT which is specially designed for sign prediction classifying the sign between two arbitrary nodes. They exploited a logistic classifier trained by node and edge features such as node degrees and common neighbors between those two nodes. Guha et al. [12] also studied sign prediction and devised TRUST measuring trustworthiness between two source and target nodes by propagating trust and distrust from the source node to the target node. Note that our model SRWR shows better performance in link prediction, troll detection, and sign prediction tasks compared to those methods as demonstrated in Sect. 5.

Fast personalized ranking methods Many researchers have emphasized the importance of fast computation for personalized rankings such as RWR to reduce their computational cost and boost the performance of applications based on ranking in terms of efficiency. Tong et al. [40] proposed an approximate method which exploits a low-rank approximation based on matrix decomposition in the preprocessing phase, and computes an RWR query from the decomposed matrices in the query phase. Fujiwara et al. utilized LU factorization [9] with degree ordering to speed up the RWR computation. Shin et al. [34] proposed a block elimination approach based on node reordering to accelerate RWR computation speed. Although those approaches significantly increase the performance of ranking in terms of running time, they only focus on ranking in unsigned networks. In our previous work [16], we designed a random surfer model for ranking nodes in signed networks, and developed an iterative method for computing trust and distrust scores. However, the iterative method is not appropriate for real-time applications since the method is not fast in large signed networks as shown in Fig. 13c. In this work, we also aim to develop an efficient preprocessing method to accelerate the query speed of SRWR in signed networks. We will demonstrate that our preprocessing method SRWR-Pre is the fastest for computing SRWR scores among other baselines as presented in Fig. 13c.

4 Proposed methods

We propose Signed Random Walk with Restart (SRWR), a novel ranking model for signed networks in Sect. 4.1. Then we first develop an iterative algorithm SRWR-Iter for computing SRWR scores w.r.t. a seed node in Sect. 4.2, and then propose a preprocessing algorithm SRWR-Pre to accelerate SRWR computation speed in Sect. 4.3.

4.1 Signed random walk with restart model

As discussed in Sect. 1, complicated relationships of signed edges are the main obstacles for providing effective rankings in signed networks. Most existing works on signed networks have not focused on personalized rankings. In this work, our goal is to design a novel ranking model which resolves those problems in signed networks. The main ideas of our model are as follows:

  • We introduce a signed random surfer. The sign of the surfer is either positive or negative, which means favorable or adversarial to a node, respectively.

  • When the random surfer encounters a negative edge, she changes her sign from positive to negative, or vice versa. Otherwise, she keeps her sign.

  • We introduce balance attenuation factors into the surfer to consider the uncertainty for friendship of enemies.

Fig. 2
figure 2

Examples of traditional random walks and signed random walks. Each case represents (1) friend’s friend, (2) friend’s enemy, (3) enemy’s friend, or (4) enemy’s enemy from the top. A random surfer has either a positive (blue) or a negative (red) sign on each node in (b). When the signed surfer traverses a negative edge, she changes her sign from positive to negative or vice versa (color figure online)

There are four cases according to the signs of edges as shown in Fig. 2: (1) friend’s friend, (2) friend’s enemy, (3) enemy’s friend, and (4) enemy’s enemy. Suppose a random surfer starts at node s toward node t. A traditional surfer just moves along the edges without considering signs as seen in Fig. 2a since there is no way to consider the signs on the edges. Hence, classical models cannot distinguish those edge relationships during her walks. For instance, the model considers that node s and node t are friends for the second case (friend’s enemy), even though node t are more likely to be an enemy w.r.t. node s.

On the contrary, our model in Fig. 2b has a signed random surfer who considers those complex edge relationships. If the random surfer starting at node s with a positive sign encounters a negative edge, she flips her sign from positive to negative, or vice versa. Our model distinguishes whether node t is the friend of node s or not according to her sign at node t. As shown in Fig. 2b, the results for all cases from our model are consistent with structural balance theory [5]. Thus, introducing a signed random surfer enables our model to discriminate those edge relationships.

Trust or distrust relationships between a specific node s and other nodes are revealed as the surfer is allowed to move around a signed network starting from node s. If the positive surfer visits a certain node u many times, then node u is trustable for node s. On the other hand, if the negative surfer visits node u many times, then node s is not likely to trust node u. Thus, rankings are obtained by revealing a degree of trust or distrust between people based on the signed random walks. Here, we formally define our model on signed networks in Definition 1. Note that Definition 1 involves the concept of restart which provides personalized rankings w.r.t. a user.

Definition 1

(Signed Random Walk with Restart) A signed random surfer has a sign, which is either positive or negative. At the beginning, the surfer starts with \(\mathbf {+}\) sign from a seed node s because she trusts s. Suppose the surfer is currently at node u, and c is the restart probability of the surfer. Then, she takes one of the following actions:

  • Action 1: Signed Random Walk The surfer randomly moves to one of the neighbors from node u with probability \(1-c\). The surfer flips her sign if she encounters a negative edge. Otherwise, she keeps her sign.

  • Action 2: Restart The surfer goes back to the seed node s with probability c. Her sign should become \(\mathbf {+}\) at the seed node s because she trusts s. \(\square \)

We measure two probabilities on each node through Signed Random Walk with Restart (SRWR) starting from the seed node s. The two probabilities are represented as follows:

  • \(\mathbf {r}^{+}_{u}=P(u, {+})\): the probability that the positive surfer visits node u after SRWR from seed node s.

  • \(\mathbf {r}^{-}_{u}=P(u, {-})\): the probability that the negative surfer visits node u after SRWR from seed node s.

Note that \(\mathbf {r}^{+}_{u}\) (or \(\mathbf {r}^{-}_{u}\)) corresponds to a ratio of how many times the positive (or negative) surfer visits node u during SRWR. If the positive surfer visits node u much more than the negative one, then s is likely to trust u. Otherwise, s is likely to distrust u. In other words, s would consider u as a positive node if \(\mathbf {r}^{+}_{u}\) is greater than \(\mathbf {r}^{-}_{u}\). On the contrary, s would treat u as a negative one if \(\mathbf {r}^{-}_{u}\) is greater than \(\mathbf {r}^{+}_{u}\). Based on this intuition, we define the relative trustworthiness score \(\mathbf {r}_{u} = \mathbf {r}^{+}_{u} - \mathbf {r}^{-}_{u}\) between s and u. For all nodes, \(\mathbf {r}^{+}\) is a positive score vector and \(\mathbf {r}^{-}\) is a negative score vector of SRWR. Then, the trustworthiness score vector for SRWR is represented as \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\), the output of Problem 1. Many researchers have dealt with trust and distrust between nodes through such representation for trustworthiness [12, 23, 29, 33]. Particularly, the interpretation of the resulting values from \(\mathbf {r}_{u} = \mathbf {r}^{+}_{u} - \mathbf {r}^{-}_{u}\) is consistent with what Kunegis et al. said as follows:

  • “The resulting popularity (based on trustworthiness) measure admits both positive and negative values, and represents a measure of popularity in the network, with positive edges corresponding to a positive endorsement and negative edges to negative endorsements. This interpretation is consistent with the semantics of the ‘friend’ and ‘foe’ relationships [23].”

Note that from the viewpoint of measure theory, the relative trustworthiness \(\mathbf {r}_{u}\) is also an acceptable measure as signed measure [38] if we consider \(\mathbf {r}^{+}_{u}\) and \(\mathbf {r}^{-}_{u}\) as nonnegative measures (i.e., \(\mathbf {r}^{+}_{u} \ge 0\) and \(\mathbf {r}^{-}_{u} \ge 0\)). We discuss this in detail in Appendix A.6.

Fig. 3
figure 3

Examples of how to interpret positive and negative scores of our model between nodes s and u. The bars on node u depict how many the signed surfer visits that node, indicating positive and negative scores between s and u. a and c represent trustful and distrustful cases between those nodes: s is likely to trust u in (a), and s is likely to distrust u in (c). However, if the those scores are similar as in (b), it is difficult for node s to decide whether to trust node u or not. Hence, s is likely to be neutral about node u in (b)

Discussion on positive and negative SRWR scores We explain how to interpret positive and negative SRWR scores using an example in Fig. 3. Suppose the signed surfer starts at node s, and performs SRWR to measure the trustworthiness between nodes s and u. Note that the trustworthiness score depends on which signed surfer stays at node u more frequently. Then, there would be three cases depending on the link structure between s and u as shown in Fig. 3. For the case in Fig. 3a, s is likely to trust u since the positive surfer visits u much more than the negative surfer through paths from s to u (i.e., the positive score is larger than the negative one at u). For the opposite case in Fig. 3c, s is likely to distrust u because the negative surfer frequently visits u. However, if those scores on u are similar as shown in Fig. 3b, then it is hard for s to determine whether to trust u or not. In this case s is likely to be neutral about node u. Thus, the trustworthiness score \(\mathbf {r}_{u}\) of the trustful case is high (and positive in SRWR), and that of the distrustful case is low (and negative in SRWR). For the neutral case, the score would be in the middle (and around zero between \(-1\) and 1 in SRWR).

Connection to balance theory According to balance theory [5, 8], Fig. 3a, c is the balanced networks because the graphs are divided into two sets of users with mutual antagonism between the sets. For example, the set of nodes \(\{v_1, v_2, s\}\) and the other set of nodes \(\{w_1, w_2, u\}\) in Fig. 3c are connected with negative edges, and nodes in each set are positively connected. In the balanced networks, each node has either a positive score or a negative one. Because the signed surfer changes her sign walking negative edges linking the two groups, the positive surfer stays and walks only in one group and the negative surfer stays and walks only in the other group. However, Fig. 3b is an unbalanced network because it cannot be divided into two sets that are negatively connected each other. Hence, positive and negative surfers visits the same node, i.e., each node has both positive and negative scores. In this case, the trustworthiness score on a node is determined by which signed surfer visits the node more frequently, which is represented by the difference between positive and negative scores.

Fig. 4
figure 4

Examples of how \(\mathbf {r}^{+}_{u}\) and \(\mathbf {r}^{-}_{u}\) are defined in SRWR

4.1.1 Formulation for signed random walk with restart

We formulate the probability vectors, \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\), following Signed Random Walk with Restart. First, we explain how to define \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) using the example shown in Fig. 4. In the example, we label a (sign, transition probability) pair on each edge. For instance, the transition probability for the positive edge from node i to node u is 1 / 3 because node i has 3 outgoing edges. This edge is denoted by \((+, 1/3)\). Other pairs of signs and transition probabilities are also similarly defined. In order that the random surfer has a positive sign on node u at time \(t+1\), a positive surfer on one of u’s neighbor at time t must move to node u through a positive edge, or a negative surfer must move through a negative edge according to the signed random walk action in Definition 1. Considering the restart action of the surfer with the probability c, \(\mathbf {r}^{+}_{u}(t+1)\) in Fig. 4a is represented as follows:

$$\begin{aligned} \mathbf {r}^{+}_{u}(t+1) = (1-c)\left( \frac{\mathbf {r}^{+}_{i}(t)}{3} + \frac{\mathbf {r}^{-}_{j}(t)}{2} + \frac{\mathbf {r}^{-}_{k}(t)}{2} \right) + c\mathbf {1}(u = s) \end{aligned}$$

where \(\mathbf {1}(u = s)\) is 1 if u is the seed node s and 0 otherwise. In Fig. 4b, \(\mathbf {r}^{-}_{u}(t+1)\) is defined similarly as follows:

$$\begin{aligned} \mathbf {r}^{-}_{u}(t+1) = (1-c)\left( \frac{\mathbf {r}^{-}_{i}(t)}{3} + \frac{\mathbf {r}^{+}_{j}(t)}{2} + \frac{\mathbf {r}^{+}_{k}(t)}{2} \right) \end{aligned}$$

Note that we do not add the restarting score \(c\mathbf {1}(u = s)\) to \(\mathbf {r}^{-}_{u}(t+1)\) in this case because the surfer’s sign must become positive when she goes back to the seed node s. The recursive equations of our model are defined as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{+}_{u}&= (1-c) \left( \sum _{v \in \overleftarrow{{\mathbf {N}}}_{u}^{+}} \frac{\mathbf {r}^{+}_{v}}{|\overrightarrow{{\mathbf {N}}}_{v}|} + \sum _{v \in \overleftarrow{{\mathbf {N}}}_{u}^{-}}\frac{\mathbf {r}^{-}_{v}}{|\overrightarrow{{\mathbf {N}}}_{v}|} \right) + c\mathbf {1}(u = s) \\ \mathbf {r}^{-}_{u}&= (1-c) \left( \sum _{v \in \overleftarrow{{\mathbf {N}}}_{u}^{-}}\frac{\mathbf {r}^{+}_{v}}{|\overrightarrow{{\mathbf {N}}}_{v}|} + \sum _{v \in \overleftarrow{{\mathbf {N}}}_{u}^{+}} \frac{\mathbf {r}^{-}_{v}}{|\overrightarrow{{\mathbf {N}}}_{v}|} \right) \end{aligned} \end{aligned}$$
(1)

where \(\overleftarrow{{\mathbf {N}}}_{i}\) is the set of in-neighbors of node i, and \(\overrightarrow{{\mathbf {N}}}_{i}\) is the set of out-neighbors of node i. Superscripts of \(\overleftarrow{{\mathbf {N}}}_{i}\) or \(\overrightarrow{{\mathbf {N}}}_{i}\) indicate signs of edges between node i and its neighbors (e.g., \(\overleftarrow{{\mathbf {N}}}_{i}^{+}\) indicates the set of positively connected in-neighbors of node i). We need to introduce several symbols related to an adjacency matrix \(\mathbf {A}\) to vectorize Eq. (1).

Definition 2

(Signed adjacency matrix) The signed adjacency matrix \(\mathbf {A}\) of G is a matrix such that \(\mathbf {A}_{uv}\) is positive or negative when there is a positive or a negative edge from node u to node v, respectively, and zero otherwise. \(\square \)

Definition 3

(Semi-row-normalized matrix) Let \(|\mathbf {A}|\) be the absolute adjacency matrix of \(\mathbf {A}\), and \(\mathbf {D}\) be the out-degree diagonal matrix of \(|\mathbf {A}|\) (i.e., \(\mathbf {D}_{ii} = \sum _{j}|\mathbf {A}|_{ij}\)). Then semi-row-normalized matrix of \(\mathbf {A}\) is \({{\tilde{\mathbf {A}}}}= \mathbf {D}^{-1}\mathbf {A}\). \(\square \)

Definition 4

(Positive or negative semi-row-normalized matrix) The positive semi-row-normalized matrix \({{\tilde{\mathbf {A}}}_{+}}\) contains only positive values in the semi-row-normalized matrix \({{\tilde{\mathbf {A}}}}\). The negative semi-row-normalized matrix \({{\tilde{\mathbf {A}}}_{-}}\) contains absolute values of negative elements in \({{\tilde{\mathbf {A}}}}\). In other words, \({{\tilde{\mathbf {A}}}}= {{\tilde{\mathbf {A}}}_{+}}- {{\tilde{\mathbf {A}}}_{-}}\), and \(|{{\tilde{\mathbf {A}}}}| = {{\tilde{\mathbf {A}}}_{+}}+ {{\tilde{\mathbf {A}}}_{-}}\). \(\square \)

Based on Definitions 3 and 4, Eq. (1) is represented as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{+}&= (1-c) \left( {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{+}+ {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-}\right) + c\mathbf {q} \\ \mathbf {r}^{-}&= (1-c) \left( {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{+}+ {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{-}\right) \\ \end{aligned} \end{aligned}$$
(2)

where \(\mathbf {q}\) is a vector whose sth element is 1 and all other elements are 0.

Fig. 5
figure 5

Examples of balance attenuation factors. a and b represent the uncertainty for “the enemy of my enemy is my friend” with probability \(\beta \), and c and d represent the uncertainty for “the friend of my enemy is my enemy” with probability \(\gamma \)

4.1.2 Balance attenuation factors

The signed surfer measures positive and negative scores of nodes w.r.t. a seed node in terms of trust and distrust according to edge relationships as discussed in Sect. 4.1. Our model in Definition 1 strongly supports the four cases between nodes in Fig. 2b where those cases represent strong balance theory [5, 14]. However, recent works [27] have argued that the strong balance theory is unsatisfactory for fully supporting real-world signed networks, since unbalanced relationships frequently appear. Thus, this limitation would be naturally inherent in our model. To alleviate this limitation, many researchers have studied weak balance theory [6, 27] which generalizes the strong balance theory by allowing several unbalanced cases such as “the enemy of my enemy is my enemy.” Similarly, we adopt the generalization strategy of the weak balance theory to make our model flexible on unbalanced networks through dealing with both balanced and unbalanced cases.

We consider that the relationship of enemies of a seed user is uncertain since the user cannot believe the information provided by her enemies. We reflect the uncertainty of the relationship of those enemies into our ranking model by introducing stochastic parameters, \(\beta \) and \(\gamma \), called balance attenuation factors. Note that we assume that the positive and negative relationship of friends of the seed user is reliable since the user trusts her friends. \(\beta \) is a parameter for the uncertainty of “the enemy of my enemy is my friend,” and \(\gamma \) is for “the friend of my enemy is my enemy.” We first explain \(\beta \) using the fourth case (enemy’s enemy) in Fig. 2b. Suppose a surfer with a positive sign starts at node s toward node t and encounters two consecutive negative edges. Based on strong balance theory, her sign becomes negative at the intermediate node m and positive at node t in Fig. 5a. However, some people might think that the enemy of my enemy is my enemy as shown in Fig. 5b. In this case, her sign will be negative at nodes m and t. To consider this uncertainty, we introduce a parameter \(\beta \) so that if the negative surfer at node m encounters a negative edge, her sign becomes positive with probability \(\beta \) or negative with \(1-\beta \) at node t. The other parameter \(\gamma \) is also interpreted similarly to \(\beta \). When the negative surfer at node m encounters a positive edge, her sign will be negative with probability \(\gamma \) or positive with \(1-\gamma \) at node t as in Fig. 5c, d. SRWR with the balance attenuation factors is represented as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{+}&= (1-c) \left( {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{+}+ \beta {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-}+ (1-\gamma ){{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{-}\right) + c\mathbf {q} \\ \mathbf {r}^{-}&= (1-c) \left( {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{+}+ \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{-}+ (1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-}\right) \\ \end{aligned} \end{aligned}$$
(3)

Discussion on other balance attenuation factors Note that other parameters for the uncertainties of “enemy of friend” and “friend of friend” could be easily adopted into our model. However, we do not reflect those parameters on our model with the following reasons:

  • As described in this subsection, we assume that the positive and negative relationship of friends of a seed user is reliable and stable. If the seed user’s friends distrust a user, then she is unlikely to believe the user since the user trusts her friends.

  • Introducing the additional parameters could improve the performance of applications in signed networks, but it increases the complexity of our model considering too many uncertain cases. We consider that introducing \(\beta \) and \(\gamma \) achieves a good trade-off between the model complexity and the performance of each application as shown in Sect. 5.

Discussion on the initial sign In Definition 1, we initialize the signed surfer as positive when she restarts at a seed node s. One might consider that our model is easily extendable to probabilistically initializing the signed surfer as negative for the restart action. Let p denote the probability of being the positive surfer for the restart action. Then, the extended version is established by changing \(c\mathbf {q}\) to \((c \times p)\mathbf {q}\) in the first equation and adding \((c \times (1-p))\mathbf {q}\) into the second equation of Eq. (3). However, we do not consider such case with the following reason:

  • If the negative surfer starts at s, the surfer becomes positive at nodes negatively connected from s and negative at those positively connected from s. This implies that the surfer recognizes the friends of s as enemies and the enemies of s as friends. Thus, it is hard to interpret the scores measured by the negative initial surfer in terms of trustworthiness for s based on balance theory.

4.2 SRWR-Iter: iterative algorithm for signed random walk with restart

We present an iterative algorithm SRWR-Iter for computing SRWR scores based on Eq. (3). Note that the solution of a linear system with recursive structure is typically and efficiently obtained via an iterative manner such as power iteration and Jacobi method [36]. We also adopt such iterative strategy to solve the recursive equations in Eq. (3). We describe how SRWR-Iter obtains the trustworthiness SRWR score vector \(\mathbf {r}\) given a signed network and a seed node in Algorithms 1 and 2 . Moreover, we prove that the iterative approach in SRWR-Iter converges, and returns a unique solution for the seed node in Theorem 1 of Sect. 4.2.1.

Normalization phase (Algorithm 1) Our proposed algorithm first computes the out-degree diagonal matrix \(\mathbf {D}\) of \(|\mathbf {A}|\), which is the absolute adjacency matrix of \(\mathbf {A}\) (line 1). Then, the algorithm computes the semi-row-normalized matrix \({{\tilde{\mathbf {A}}}}\) using \(\mathbf {D}\) (line 2). We split \({{\tilde{\mathbf {A}}}}\) into two matrices: the positive semi-row-normalized matrix (\({{\tilde{\mathbf {A}}}_{+}}\)) and the negative semi-row-normalized matrix (\({{\tilde{\mathbf {A}}}_{-}}\)) (line 3) satisfying \({{\tilde{\mathbf {A}}}}= {{\tilde{\mathbf {A}}}_{+}}- {{\tilde{\mathbf {A}}}_{-}}\).

Iteration phase (Algorithm 2) Our algorithm computes the SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) for the seed node s with the balance attenuation factors (\(\beta \) and \(\gamma \)) in the iteration phase. We set \(\mathbf {q}\) to sth unit vector, and initialize \(\mathbf {r}^{+}\) to \(\mathbf {q}\) and \(\mathbf {r}^{-}\) to \(\mathbf {0}\) (lines 1 and 2 ). Our algorithm iteratively computes Eq. (3) (lines 4 and 5 ). We concatenate \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) vertically (line 6) into \(\mathbf {h}\). We then compute the error \(\delta \) between \(\mathbf {h}\) and \(\mathbf {h}'\) which is the result in the previous iteration (line 7). We update \(\mathbf {h}\) into \(\mathbf {h}'\) for the next iteration (line 8). The iteration stops when the error \(\delta \) is smaller than a threshold \(\epsilon \) (line 9). We finally return the trustworthiness score vector \(\mathbf {r}\) used for the personalized ranking w.r.t. s by computing \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) (lines 10 and 11 ).

The space and time complexities of Algorithms 1 and 2 are analyzed in Lemma 5 of Appendix A.3.

figure d
figure e

4.2.1 Theoretical analysis of iterative algorithm and signed random walk with restart

We theoretically analyze the iterative algorithm SRWR-Iter and the properties of Signed Random Walk with Restart.

Convergence Analysis of SRWR-Iter We show that the iteration in Algorithm 2 converges to the solution of a linear system as described in the following theorem.

Theorem 1

(Convergence of SRWR-Iter) Suppose \(\mathbf {h}= [\mathbf {r}^{+};\mathbf {r}^{-}]^{\top }\) and \(\mathbf {q}_{s} = [\mathbf {q};\mathbf {0}]^{\top }\). Then the iteration for \(\mathbf {h}\) in Algorithm 2 converges to the solution \(\mathbf {h}= c(\mathbf {I} - (1-c){{\tilde{\mathbf {B}}}}^{\top })^{-1}\mathbf {q}_{s}\) where \({{\tilde{\mathbf {B}}}}^{\top } = \begin{bmatrix} {{\tilde{\mathbf {A}}}_{+}}^{\top }&\beta {{\tilde{\mathbf {A}}}_{-}}^{\top } + (1-\gamma ){{\tilde{\mathbf {A}}}_{+}}^{\top } \\ {{\tilde{\mathbf {A}}}_{-}}^{\top }&(1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top } + \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } \\ \end{bmatrix}\).

Proof

Equation (3) is represented as follows:

$$\begin{aligned} \begin{bmatrix} \mathbf {r}^{+}\\ \mathbf {r}^{-}\end{bmatrix} = (1-c)\begin{bmatrix} {{\tilde{\mathbf {A}}}_{+}}^{\top }&\beta {{\tilde{\mathbf {A}}}_{-}}^{\top } + (1-\gamma ){{\tilde{\mathbf {A}}}_{+}}^{\top } \\ {{\tilde{\mathbf {A}}}_{-}}^{\top }&(1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top } + \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } \\ \end{bmatrix}\begin{bmatrix} \mathbf {r}^{+}\\ \mathbf {r}^{-}\end{bmatrix} + c \begin{bmatrix} \mathbf {q} \\ \mathbf {0} \end{bmatrix} \Leftrightarrow \mathbf {h}= (1-c){{\tilde{\mathbf {B}}}}^{\top }\mathbf {h}+ c\mathbf {q}_{s} \end{aligned}$$

where \({{\tilde{\mathbf {B}}}}^{\top } = \begin{bmatrix} {{\tilde{\mathbf {A}}}_{+}}^{\top }&\beta {{\tilde{\mathbf {A}}}_{-}}^{\top } + (1-\gamma ){{\tilde{\mathbf {A}}}_{+}}^{\top } \\ {{\tilde{\mathbf {A}}}_{-}}^{\top }&(1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top } + \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } \\ \end{bmatrix}\), \(\mathbf {h}= \begin{bmatrix} \mathbf {r}^{+}\\ \mathbf {r}^{-}\end{bmatrix}\), and \(\mathbf {q}_{s} = \begin{bmatrix} \mathbf {q} \\ \mathbf {0} \end{bmatrix}\). Thus, the iteration in Algorithm 2 is written as in the following equation:

$$\begin{aligned} \mathbf {h}^{(k)}&= (1-c){{\tilde{\mathbf {B}}}}^{\top }\mathbf {h}^{(k-1)} + c\mathbf {q}_{s} \nonumber \\&= \left( (1-c){{\tilde{\mathbf {B}}}}^{\top } \right) ^{2}\mathbf {h}^{(k-2)} + \left( (1-c){{\tilde{\mathbf {B}}}}^{\top } + \mathbf {I} \right) c\mathbf {q}_{s} \nonumber \\&= \cdots \nonumber \\&= \left( (1-c){{\tilde{\mathbf {B}}}}^{\top } \right) ^{k}\mathbf {h}^{(0)} + \left( \sum _{j=0}^{k-1} \left( (1-c){{\tilde{\mathbf {B}}}}^{\top } \right) ^{j} \right) c\mathbf {q}_{s} \end{aligned}$$
(4)

The spectral radius \(\rho ( (1-c){{\tilde{\mathbf {B}}}}^{\top }) = (1-c) < 1\) when \(0< c < 1\) since \({{\tilde{\mathbf {B}}}}^{\top }\) is a column stochastic matrix and its largest eigenvalue is 1 [36]. Therefore, \(\lim _{k \rightarrow \infty } ((1-c){{\tilde{\mathbf {B}}}}^{\top })^{k}\mathbf {h}^{(0)} = \mathbf {0}\) and \(\lim _{k \rightarrow \infty }\mathbf {h}^{(k)}\) converges as follows:

$$\begin{aligned} \lim _{k \rightarrow \infty } \mathbf {h}^{(k)} = \mathbf {0} + \lim _{k \rightarrow \infty } \left( \sum _{j=0}^{k-1} \left( (1-c){{\tilde{\mathbf {B}}}}^{\top } \right) ^{j} \right) c\mathbf {q}_{s} = c\left( \mathbf {I} - (1-c){{\tilde{\mathbf {B}}}}^{\top }\right) ^{-1}\mathbf {q}_{s}. \end{aligned}$$

In the above equation, \(\sum _{j=0}^{\infty } ( (1-c){{\tilde{\mathbf {B}}}}^{\top })^{j}\) is a geometric series of the matrix \((1-c){{\tilde{\mathbf {B}}}}^{\top }\), and the series converges to \((\mathbf {I} - (1-c){{\tilde{\mathbf {B}}}^{\top }})^{-1}\) since the spectral radius of \((1-c){{\tilde{\mathbf {B}}}}^{\top }\) is less than one. Note that the inverse matrix is a nonnegative matrix whose entries are positive or zero because the matrix is the sum of nonnegative matrices (i.e., \(\sum _{j=0}^{\infty } ( (1-c){{\tilde{\mathbf {B}}}}^{\top })^{j}\)). Hence, each entry of \(\mathbf {h}\) is nonnegative (i.e., \(\mathbf {h}_{u} \ge 0\) for any node u). \(\square \)

Properties of SRWR We discuss the properties of our ranking model SRWR to answer the following questions: (1) Is the signed random surfer able to visit all nodes in a network which is strongly connected? and (2) Does SRWR work on unsigned networks as well? The first question is answered in Property 1, and the second one is answered in Property 2.

Property 1

Suppose a signed network is strongly connected. Then, all entries of \(\mathbf {r}^{+}+\mathbf {r}^{-}\) are positive (i.e., \(\mathbf {r}^{+}+ \mathbf {r}^{-}> 0\)).

Proof

Let \(\mathbf {r}^{+}+\mathbf {r}^{-}\) be \(\mathbf {p}\). By summing the recursive equations on \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) in Eq. (3), \(\mathbf {p}\) is represented as follows:

$$\begin{aligned} \mathbf {p}= (1-c)\left( {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {p}+ {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p}\right) + c\mathbf {q}\Leftrightarrow \mathbf {p}= (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {q}\Leftrightarrow \mathbf {p}= \mathbf {G}\mathbf {p}\end{aligned}$$

where \(|{{\tilde{\mathbf {A}}}}|={{\tilde{\mathbf {A}}}_{+}}+ {{\tilde{\mathbf {A}}}_{-}}\) by Definition 3, \(\mathbf {G} = (1-c)|{{\tilde{\mathbf {A}}}}|^{\top } + c\mathbf {q}\mathbf {1}^{\top }\), and \(\mathbf {1}^{\top }\mathbf {p}= \sum _{i}\mathbf {p}_{i} = 1\) by Property 3. Note that the graph represented by \(\mathbf {G}\) is also strongly connected since the graph of \(|{{\tilde{\mathbf {A}}}}|\) has the same topology with the original graph which is strongly connected. Moreover, the graph represented by \(\mathbf {G}\) has a self-loop at the seed node s due to \(c\mathbf {q}\mathbf {1}^{\top }\). Thus, \(\mathbf {G}\) is irreducible and aperiodic. Hence, all entries of \(\mathbf {p}=\mathbf {r}^{+}+\mathbf {r}^{-}\) are positive according to Perron–Frobenius theorem [24]. \(\square \)

Note that \(\mathbf {r}^{+}_{u}\) (or \(\mathbf {r}^{-}_{u}\)) indicates that the stationary probability of the positive (or negative) surfer visits node u after performing SRWR starting from a seed node. According to Property 1, \(\mathbf {r}^{+}_{u}+\mathbf {r}^{-}_{u}\) for an arbitrary node u is always positive if a given signed network is strongly connected. That is, the signed random surfer is able to visit node u with probability \(\mathbf {r}^{+}_{u}+\mathbf {r}^{-}_{u}\) which is always greater than zero.

Next, we prove that our model SRWR is a generalized version of RWR working on both unsigned and signed networks in the following property.

Property 2

The result of SRWR on networks containing only positive edges is the same as that of RWR.

Proof

\({{\tilde{\mathbf {A}}}_{+}}={{\tilde{\mathbf {A}}}}\) and \({{\tilde{\mathbf {A}}}_{-}}=\mathbf {0}_{n \times n}\) because the adjacency matrix \(\mathbf {A}\) only contains positive edges. Also, \(\mathbf {r}^{-}=\mathbf {0}_{n \times 1}\) at the beginning time of Algorithm 2. Equation (3) is represented as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{+}&= (1-c) \left( {{\tilde{\mathbf {A}}}}^{\top }\mathbf {r}^{+}+\beta \mathbf {0}_{n \times n}\times \mathbf {0}_{n \times 1} + (1-\gamma ) {{\tilde{\mathbf {A}}}}^{\top }\mathbf {0}_{n \times 1}\right) + c\mathbf {q} \\ \mathbf {r}^{-}&= (1-c) \left( \mathbf {0}_{n \times n}\times \mathbf {r}^{+}+ \gamma {{\tilde{\mathbf {A}}}}^{\top }\mathbf {0}_{n \times 1} + (1-\beta )\mathbf {0}_{n \times n}\times \mathbf {0}_{n \times 1}\right) \end{aligned} \end{aligned}$$

Therefore, \(\mathbf {r}^{-}=\mathbf {0}_{n \times 1}\) and \(\mathbf {r}^{+}= (1-c){{\tilde{\mathbf {A}}}}^{\top }\mathbf {r}^{+}+ c\mathbf {q}\). The equation of \(\mathbf {r}^{+}\) is exactly the same as that of RWR. \(\square \)

4.3 SRWR-Pre: preprocessing algorithm for signed random walk with restart

We propose SRWR-Pre, a preprocessing algorithm to quickly compute SRWR scores. The iterative approach SRWR-Iter in Algorithm 2 requires multiple matrix-vector multiplications to compute SRWR scores whenever seed node s changes; thus, the iterative method is not fast enough when we require SRWR scores for any pair of nodes in large-scale signed networks. Our goal is to directly compute SRWR scores from precomputed intermediate data without iterations. We exploit the following ideas for our preprocessing method:

  • The positive and negative SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are obtained by solving linear systems (Sect. 4.3.1).

  • The adjacency matrix of real-world graphs is permuted so that it contains a large but easy-to-invert block diagonal matrix as shown in Fig. 6 (Sect. 4.3.2).

  • The block elimination approach efficiently solves a linear system on a matrix if it has an easy-to-invert submatrix (Sect. 4.3.3).

Our preprocessing method comprises two phases: preprocessing phase (Algorithm 3) and query phase (Algorithm 4). The preprocessing phase preprocesses a given signed adjacency matrix into several submatrices required in the query phase to compute SRWR scores w.r.t. seed node s. Note that the preprocessing phase is performed once, and the query phase is run for each seed node. The starting vector \(\mathbf {q}\) in Eq. (3) is called an SRWR query, and \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are the results corresponding to the query \(\mathbf {q}\). The query vector \(\mathbf {q}\) is determined by the seed node s, and \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are distinct for each SRWR query. To exploit sparsity of graphs, we save all matrices in a sparse matrix format such as compressed column storage [7] which stores only nonzero entries and their locations.

4.3.1 Formulation of signed random walk with restart as linear systems

We first represent linear systems related to \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\). Let \(\mathbf {p}\) be the sum of \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) (i.e., \(\mathbf {p} = \mathbf {r}^{+}+ \mathbf {r}^{-}\)). Then, \(\mathbf {p}\) is the solution of the following linear system:

$$\begin{aligned} \mathbf {|H|}\mathbf {p} = c\mathbf {q} \Leftrightarrow \mathbf {p} = c\mathbf {|H|}^{-1}\mathbf {q} \end{aligned}$$
(5)

where \(\mathbf {|H|} = \mathbf {I} - (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\) and \(|{{\tilde{\mathbf {A}}}}|={{\tilde{\mathbf {A}}}_{+}}+ {{\tilde{\mathbf {A}}}_{-}}\). The proof of Eq. (5) is presented in Lemma 1. The linear system for \(\mathbf {r}^{-}\) is given by the following equation:

$$\begin{aligned} \mathbf {T}\mathbf {r}^{-}= (1-c){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p} \Leftrightarrow \mathbf {r}^{-}= (1-c)\left( \mathbf {T}^{-1}({{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p})\right) \end{aligned}$$
(6)

where \(\mathbf {T} = \mathbf {I} - (1-c)(\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top })\), and \(\gamma \) and \(\beta \) are balance attenuation factors. Theorem 2 shows the proof of Eq. (6). Based on the aforementioned linear systems in Eqs. (5) and (6), \(\mathbf {r}^{-}\) and \(\mathbf {r}^{+}\) for a given seed node s are computed as follows:

  1. 1.

    Set a query vector \(\mathbf {q}\) whose sth element is 1 and all other elements are 0.

  2. 2.

    Solve the linear system in Eq. (5) to obtain the solution \(\mathbf {p}\).

  3. 3.

    Compute \(\mathbf {r}^{-}\) by solving the linear system in Eq. (6).

  4. 4.

    Compute \(\mathbf {r}^{+}= \mathbf {p} - \mathbf {r}^{-}\).

Lemma 1

Suppose that \(\mathbf {p}=\mathbf {r}^{+}+\mathbf {r}^{-}\), \(\mathbf {|H|} = \mathbf {I} - (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\) and \(|{{\tilde{\mathbf {A}}}}|={{\tilde{\mathbf {A}}}_{+}}+ {{\tilde{\mathbf {A}}}_{-}}\). Then, \(\mathbf {p}\) is the solution of the following linear system:

$$\begin{aligned} \mathbf {|H|}\mathbf {p}=c\mathbf {q} \Leftrightarrow \mathbf {p} = c\mathbf {|H|}^{-1}\mathbf {q} \end{aligned}$$

Proof

According to the result in Property 1, the recursive equation for \(\mathbf {p}\) is represented as follows:

$$\begin{aligned} \mathbf {p}&= (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p} + c\mathbf {q} \end{aligned}$$

where \(|{{\tilde{\mathbf {A}}}}|={{\tilde{\mathbf {A}}}_{+}}+ {{\tilde{\mathbf {A}}}_{-}}\) is the row-normalized matrix of \(|\mathbf {A}|\). The linear system for \(\mathbf {p}\) is represented by moving \((1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\mathbf {p}\) to the left side as follows:

$$\begin{aligned} \left( \mathbf {I} - (1-c)|{{\tilde{\mathbf {A}}}}|^{\top } \right) \mathbf {p}&= c\mathbf {q} \Leftrightarrow \mathbf {|H|}\mathbf {p} = c\mathbf {q} \end{aligned}$$

where \(\mathbf {|H|}\) is \(\mathbf {I} - (1-c)|{{\tilde{\mathbf {A}}}}|^{\top }\). Note that \(\mathbf {|H|}\) is invertible when \(0< c < 1\) because it is strictly diagonally dominant [11]. Hence, \(\mathbf {p} = c\mathbf {|H|}^{-1}\mathbf {q}\). \(\square \)

Theorem 2

The SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) from Eq. (3) are represented as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{+}&= \mathbf {p} - \mathbf {r}^{-}\\ \mathbf {r}^{-}&= (1-c)\left( \mathbf {T}^{-1}({{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p})\right) \end{aligned} \end{aligned}$$

where \(\mathbf {p} = c\mathbf {|H|}^{-1}\mathbf {q}\), \(\mathbf {T} = \mathbf {I} - (1-c)(\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top })\), and \(\gamma \) and \(\beta \) are balance attenuation factors which are between 0 and 1 (i.e., \(0< \gamma ,\beta < 1\)).

Proof

Note that \(\mathbf {r}^{-}= (1-c) ( {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{+}+ \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{-}+ (1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-})\) by Eq. (3), and \(\mathbf {r}^{+}= \mathbf {p} - \mathbf {r}^{-}\) according to Lemma 1. The equation for \(\mathbf {r}^{-}\) is represented by plugging \(\mathbf {r}^{+}= \mathbf {p} - \mathbf {r}^{-}\) as follows:

$$\begin{aligned} \begin{aligned} \mathbf {r}^{-}&= (1-c) \left( {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p} -{{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-}+ \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top }\mathbf {r}^{-}+ (1-\beta ){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {r}^{-}\right) \Leftrightarrow \\ \mathbf {r}^{-}&= (1-c) \left( \gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top } \right) \mathbf {r}^{-}+ (1-c){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p}\\ \end{aligned} \end{aligned}$$

We move \((1-c) (\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top } )\mathbf {r}^{-}\) to the left side; then, the above equation is represented as follows:

$$\begin{aligned} \left( \mathbf {I} - (1-c) (\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top } ) \right) \mathbf {r}^{-}&= (1-c){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p} \Leftrightarrow \mathbf {T}\mathbf {r}^{-}= (1-c){{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p} \end{aligned}$$

where \(\mathbf {T}\) is \(\mathbf {I} - (1-c) (\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top })\). Note that the matrix \(\mathbf {T}\) is strictly diagonally dominant when \(0<c<1\) and \(0<\gamma ,\beta <1\); thus, \(\mathbf {T}\) is invertible. Hence, \(\mathbf {r}^{-}= (1-c)(\mathbf {T}^{-1}({{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p}))\). \(\mathbf {r}^{+}\) is obtained by computing \(\mathbf {r}^{+}= \mathbf {p} - \mathbf {r}^{-}\). \(\square \)

One naive approach (Inversion) for SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) based on the linear systems in Eqs. (5) and (6) is to precompute the inverse of the matrices \(\mathbf {|H|}\) and \(\mathbf {T}\). However, this approach is impractical for large-scale graphs since inverting a matrix requires \(O(n^{3})\) time and \(O(n^{2})\) space where n is the dimensions of the matrix. Another approach (LU) is to obtain the inverse of LU factors of \(\mathbf {|H|}\) and \(\mathbf {T}\) after reordering the matrices in the order of node degrees as suggested in [9] (i.e., \(\mathbf {p}= c(\mathbf {U}^{-1}_{\mathbf {p}}(\mathbf {L}^{-1}_{\mathbf {p}}\mathbf {q}))\); \(\mathbf {r}^{-}= (1-c)(\mathbf {U}^{-1}_{\mathbf {r}^{-}}(\mathbf {L}^{-1}_{\mathbf {r}^{-}}({{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p})))\) where \(\mathbf {|H|}^{-1} = \mathbf {U}^{-1}_{\mathbf {p}}\mathbf {L}^{-1}_{\mathbf {p}}\) and \(\mathbf {T}^{-1}=\mathbf {U}^{-1}_{\mathbf {r}^{-}}\mathbf {L}^{-1}_{\mathbf {r}^{-}}\)). Although LU is more efficient than Inversion in terms of time and space as shown in Fig. 13, LU still has a performance issue due to \(O(n^{3})\) time and \(O(n^{2})\) space complexities. On the other hand, our preprocessing method SRWR-Pre is faster and more memory efficient than Inversion and LU as we will see in Sect. 5.7.

4.3.2 Node reordering based on hub-and-spoke structure

SRWR-Pre permutes the matrices \(\mathbf {|H|}\) and \(\mathbf {T}\) using a reordering technique based on hub-and-spoke structure. Previous works [34] have exploited the reordering technique to reduce computational cost of graph operations in real-world graphs. We also adopt the node reordering based on hub-and-spoke structure to efficiently solve the linear systems in Eqs. (5) and (6).

Fig. 6
figure 6

The result of node reordering on each signed network. ac Are the original matrix \(\mathbf {|H|}\) before node reordering in the Wikipedia, the Slashdot, and the Epinions datasets, respectively. df Present \(\mathbf {|H|}\) reordered by the hub-and-spoke method. Note that \(\mathbf {T}\) is also reordered equivalently to \(\mathbf {|H|}\) since they have the same sparsity pattern. \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) are block diagonal

The hub-and-spoke structure indicates that most real-world graphs follow power-law degree distribution with few hubs (very high-degree nodes) and majority of spokes (low-degree nodes). The structure has been utilized to concentrate entries of an adjacency matrix by reordering nodes as shown in Fig. 6. Any reordering method based on the hub-and-spoke structure can be utilized for the purpose; in this paper, we use SlashBurn [19, 28] as a hub-and-spoke reordering method because it shows the best performance in concentrating entries of an adjacency matrix (see the details in Appendix A.1).

We reorder nodes of the signed adjacency matrix \(\mathbf {A}\) so that reordered matrix contains a large but easy-to-invert submatrix such as block diagonal matrix as shown in Fig. 6. We then compute \(\mathbf {|H|} = \mathbf {I} - (1-c)({{\tilde{\mathbf {A}}}_{+}}^{\top } + {{\tilde{\mathbf {A}}}_{-}}^{\top })\) and \(\mathbf {T} = \mathbf {I} - (1-c)(\gamma {{\tilde{\mathbf {A}}}_{+}}^{\top } - \beta {{\tilde{\mathbf {A}}}_{-}}^{\top })\). Note that \(\mathbf {|H|}\) and \(\mathbf {T}\) have the same sparsity pattern as the reordered adjacency matrix \(\mathbf {A}^{\top }\) except for the diagonal part. Hence, \(\mathbf {|H|}\) and \(\mathbf {T}\) are partitioned as follows:

$$\begin{aligned} \mathbf {|H|} = \begin{bmatrix} \mathbf {|H|}_{11}&\mathbf {|H|}_{12}\\ \mathbf {|H|}_{21}&\mathbf {|H|}_{22}\\ \end{bmatrix}, \mathbf {T} = \begin{bmatrix} \mathbf {T}_{11}&\mathbf {T}_{12}\\ \mathbf {T}_{21}&\mathbf {T}_{22}\\ \end{bmatrix}. \end{aligned}$$
(7)

Let \(n_1\) and \(n_2\) denote the number of spokes and hubs, respectively (see the details in Appendix A.1). Then \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) are \(n_1 \times n_1\) matrices, \(\mathbf {|H|}_{12}\) and \(\mathbf {T}_{12}\) are \(n_1 \times n_2\) matrices, \(\mathbf {|H|}_{21}\) and \(\mathbf {T}_{21}\) are \(n_2 \times n_1\) matrices, and \(\mathbf {|H|}_{22}\) and \(\mathbf {T}_{22}\) are \(n_2 \times n_2\) matrices. The linear systems for \(\mathbf {|H|}\) and \(\mathbf {T}\) in Eqs. (5) and (6) are represented as follows:

$$\begin{aligned} \mathbf {|H|}\mathbf {p} = c\mathbf {q}&\Leftrightarrow \begin{bmatrix} \mathbf {|H|}_{11}&\mathbf {|H|}_{12}\\ \mathbf {|H|}_{21}&\mathbf {|H|}_{22}\\ \end{bmatrix} \begin{bmatrix} \mathbf {p}_{1} \\ \mathbf {p}_{2} \end{bmatrix} = c \begin{bmatrix} \mathbf {q}_{1} \\ \mathbf {q}_{2} \end{bmatrix} \end{aligned}$$
(8)
$$\begin{aligned} \mathbf {T}\mathbf {r}^{-}= (1-c)\mathbf {t}&\Leftrightarrow \begin{bmatrix} \mathbf {T}_{11}&\mathbf {T}_{12}\\ \mathbf {T}_{21}&\mathbf {T}_{22}\\ \end{bmatrix} \begin{bmatrix} \mathbf {r}^{-}_{1} \\ \mathbf {r}^{-}_{2} \end{bmatrix} = (1-c) \begin{bmatrix} \mathbf {t}_{1} \\ \mathbf {t}_{2} \end{bmatrix} \end{aligned}$$
(9)

where \(\mathbf {t}={{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p}\) is an \(n \times 1\) vector.

4.3.3 Block elimination for solving linear systems

The solutions of the partitioned linear systems in Eqs. (8) and (9) are obtained by the following equations:

$$\begin{aligned} \mathbf {p}&= \begin{bmatrix} \mathbf {p}_{1}\\ \mathbf {p}_{2}\\ \end{bmatrix}= \begin{bmatrix} \mathbf {|H|}_{11}^{-1}(c\mathbf {q}_{1}-\mathbf {|H|}_{12}\mathbf {p}_{2})\\ c(\mathbf {S}_{\mathbf {|H|}}^{-1}(\mathbf {q}_{2}-\mathbf {|H|}_{21} (\mathbf {|H|}_{11}^{-1}(\mathbf {q}_{1})))) \end{bmatrix} \end{aligned}$$
(10)
$$\begin{aligned} \mathbf {r}^{-}&= \begin{bmatrix} \mathbf {r}^{-}_{1}\\ \mathbf {r}^{-}_{2}\\ \end{bmatrix}= \begin{bmatrix} \mathbf {T}_{11}^{-1}((1-c)\mathbf {t}_{1}-\mathbf {T}_{12}\mathbf {r}^{-}_{2})\\ (1-c)(\mathbf {S}_{\mathbf {T}}^{-1}(\mathbf {t}_{2}-\mathbf {T}_{21}(\mathbf {T}_{11}^{-1}(\mathbf {t}_{1})))) \end{bmatrix} \end{aligned}$$
(11)

where \(\mathbf {S}_{\mathbf {|H|}}=\mathbf {|H|}_{22} - \mathbf {|H|}_{21}\mathbf {|H|}^{-1}_{11}\mathbf {|H|}_{12}\) is the Schur complement of \(\mathbf {|H|}_{11}\) and \(\mathbf {S}_{\mathbf {T}}=\mathbf {T}_{22} - \mathbf {T}_{21}\mathbf {T}^{-1}_{11}\mathbf {T}_{12}\) is the Schur complement of \(\mathbf {T}_{11}\). Equations (10) and (11) are derived by applying block elimination described in Lemma 2 to the partitioned linear systems in Eqs. (8) and (9), respectively. Note that the submatrices \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) are invertible when \(0< c < 1\) and \(0< \gamma ,\beta < 1\) since they are strictly diagonally dominant. If all matrices in Eqs. (10) and (11) are precomputed, then the SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) are efficiently and directly computed from the precomputed matrices.

Lemma 2

(Block Elimination [4]) Suppose a linear system \(\mathbf {A}\mathbf {x} = \mathbf {b}\) is partitioned as follows:

$$\begin{aligned} \begin{bmatrix} \mathbf {A}_{11}&\mathbf {A}_{12} \\ \mathbf {A}_{21}&\mathbf {A}_{22} \end{bmatrix} \begin{bmatrix} \mathbf {x}_{1} \\ \mathbf {x}_{2} \end{bmatrix} = \begin{bmatrix} \mathbf {b}_{1} \\ \mathbf {b}_{2} \end{bmatrix} \end{aligned}$$

where \(\mathbf {A}_{11}\) and \(\mathbf {A}_{22}\) are square matrices. If the submatrix \(\mathbf {A}_{11}\) is invertible, then the solution \(\mathbf {x}\) is represented as follows:

$$\begin{aligned} \mathbf {x}= \begin{bmatrix} \mathbf {x}_{1}\\ \mathbf {x}_{2}\\ \end{bmatrix}= \begin{bmatrix} \mathbf {A}_{11}^{-1}(\mathbf {b}_{1}-\mathbf {A}_{12}\mathbf {x}_{2})\\ \mathbf {S}^{-1}(\mathbf {b}_{2}-\mathbf {A}_{21}(\mathbf {A}_{11}^{-1}(\mathbf {b}_{1}))) \end{bmatrix} \end{aligned}$$

where \(\mathbf {S}=\mathbf {A}_{22} - \mathbf {A}_{21}\mathbf {A}^{-1}_{11}\mathbf {A}_{12}\) is the Schur complement of \(\mathbf {A}_{11}\). \(\square \)

Lemma 2 implies that a partitioned linear system is efficiently solved if it contains an easy-to-invert submatrix and the dimension of the Schur complement is small. Note that inverting \(\mathbf {H}_{11}\) and \(\mathbf {T}_{11}\) is trivial because they are block diagonal matrices as shown in Fig. 6. Also, the dimension of \(\mathbf {S}_{\mathbf {|H|}}\) and \(\mathbf {S}_{\mathbf {T}}\) is \(n_2\) where \(n_2\) is the number of hubs and most real-world graphs have a small number of hubs compared to the number of nodes (see Table 2).

figure f
figure g

Preprocessing phase (Algorithm 3) Our preprocessing phase precomputes the matrices exploited for computing SRWR scores in the query phase. Our algorithm first reorders nodes of a given signed adjacency matrix \(\mathbf {A}\) using the hub-and-spoke reordering method, and performs semi-normalization on \(\mathbf {A}\) to obtain \({{\tilde{\mathbf {A}}}_{+}}\) and \({{\tilde{\mathbf {A}}}_{-}}\) using Algorithm 1 (lines 1–refalg:method:pre:normalize). Then our algorithm computes \(\mathbf {|H|}\) and \(\mathbf {T}\) and partitions the matrices as shown in Fig. 6 (lines 35). Our algorithm calculates the inverses of \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\), and computes the Schur complements of \(\mathbf {|H|}_{11}\) and \(\mathbf {T}_{11}\) (lines 47). When we compute \(\mathbf {S}^{-1}_{\mathbf {|H|}}\) and \(\mathbf {S}^{-1}_{\mathbf {T}}\), we invert the LU factors of \(\mathbf {S}_{\mathbf {|H|}}\) and \(\mathbf {S}_{\mathbf {T}}\) (lines 8 and 9 ) because this approach is faster and more memory efficient than directly inverting \(\mathbf {S}_{\mathbf {|H|}}\) and \(\mathbf {S}_{\mathbf {T}}\) as in [34].

Query phase (Algorithm 4) Our query phase computes SRWR score vectors \(\mathbf {r}^{+}\) and \(\mathbf {r}^{-}\) for a given seed node s using precomputed matrices from Algorithm 3. Our algorithm first creates a starting vector \(\mathbf {q}\) whose entry at the index of the seed node s is 1 and otherwise 0, and partitions \(\mathbf {q}\) into \(\mathbf {q}_{1}\) and \(\mathbf {q}_{2}\) (line 1). We then compute \(\mathbf {p}_{2}\) and \(\mathbf {p}_{1}\) based on Eq. (10) and concatenate the vectors to obtain \(\mathbf {p}\) (lines 2\(\sim \)4). Our algorithm calculates \(\mathbf {t} = {{\tilde{\mathbf {A}}}_{-}}^{\top }\mathbf {p}\) and partitions \(\mathbf {t}\) into \(\mathbf {t}_{1}\) and \(\mathbf {t}_{2}\) (line 5). We compute \(\mathbf {r}^{-}_{2}\) and \(\mathbf {r}^{-}_{1}\) based on Eq. (11), and concatenate the vectors to obtain \(\mathbf {r}^{-}\) (lines 68). After computing \(\mathbf {r}^{+}= \mathbf {p} - \mathbf {r}^{-}\) to obtain \(\mathbf {r}^{+}\) (line 9), we obtain \(\mathbf {r}= \mathbf {r}^{+}- \mathbf {r}^{-}\) (line 10).

The space and time complexities of Algorithms 3 and 4 are analyzed in Lemmas 68 of Appendix A.3.

5 Experiments

We evaluate the effectiveness of SRWR compared to existing ranking methods. Since there is no ground-truth of personalized rankings for each node in real-world graphs, we exploit an indirect way by examining the performance of applications such as link prediction, troll identification, and sign prediction tasks. We also investigate the performance of our approaches in terms of time and space. Based on these settings, we aim to answer the following questions from the experiments:

  • Q1. Link prediction (Sect. 5.2) How effective is our proposed SRWR model for the link prediction task in signed networks?

  • Q2. User preference preservation (Sect. 5.3) How well does our model SRWR preserve users’ known preferences in personalized rankings in signed networks?

  • Q3. Troll detection (Sect. 5.4) How well do personalized rankings of SRWR capture trolls who are abnormal users compared to those of other models?

  • Q4. Sign prediction (Sect. 5.5) How helpful are trustworthiness scores of SRWR for predicting missing signs of edges in signed networks?

  • Q5. Effects of balance attenuation factors (Sect. 5.6) How effective are the balance attenuation factors of SRWR for applications in signed networks?

  • Q6. Efficiency (Sect. 5.7) How fast and memory efficient is our preprocessing method SRWR-Pre compared to other baselines?

5.1 Experimental settings

Table 2 Dataset statistics

Machines The experiments on the effectiveness of SRWR in Sects. 5.2, 5.4, 5.5 and 5.6 are conducted on a PC with Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz and 8GB memory. The experiments on the computational performance of SRWR-Pre in Sect. 5.7 are performed on a workstation with a single Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz and 256GB memory.

Datasets The signed networks for our experiments are summarized in Table 2. We use all datasets in the link prediction task, the sign prediction task and the experiments for evaluating the computational performance of the proposed methods (Sects. 5.25.5, and 5.7). We use the Slashdot dataset in the troll identification task (Sect. 5.4) since there is a troll list only in the dataset.

Methods To answer Q1–4, we compare our proposed model with Random Walk with Restart (RWR) [13], Modified Random Walk with Restart (M-RWR) [33], Modified Personalized SALSA (M-PSALSA) [30], Personalized Signed spectral Rank (PSR) [23], Personalized Negative Rank (PNR) [23], Troll-Trust Model (TR-TR) [41], TRUST [12], LOGIT [26], and GAUC-OPT [35]. Note that RWR is computed on the absolute adjacency matrix of a signed network. For Q5, we compare our model SRWR to H-SRWR which is a version of SRWR without the balance attenuation parameters. For Q6, we compare our preprocessing method SRWR-Pre to other baseline methods Inversion and LU mentioned in Sect. 4.3.1 including our iterative method SRWR-Iter.

Parameters There are three hyper-parameters in our ranking model, i.e., restart probability c and balance attenuation parameters \(\beta \) and \(\gamma \). We set c to 0.15 for all random walk-based approaches including our model for simplicity. To choose \(\beta \) and \(\gamma \), we perform a grid search over a range \(0 \le \beta , \gamma \le 1\) by 0.1 (i.e., search \((\beta ,\gamma )\) in \(\mathbf {P} = \{(0.1x, 0.1y) | 0 \le x, y \le 10 \text { and } x,y \in {\mathbb {Z}}\}\)). To select proper parameters, we randomly split a dataset into training, validation, and test sets; and then, we compute personalized rankings based on the training set, and choose the best parameter combination \((\beta ,\gamma )\) on the validation set with a target metric corresponding to each task. We report results on the test set with the validated parameters. The detailed settings on how to split the dataset and which metric is used are described in each subsection of the corresponding task. The validated parameters of SRWR are summarized as follows:

  • Link prediction task (Sect. 5.2): In the Epinions and the Slashdot datasets, \(\beta =0.5\) and \(\gamma = 0.8\). In the Wikipedia dataset, \(\beta =0.5\) and \(\gamma = 0.5\).

  • Troll identification task (Sect. 5.4): In this task, the Slashdot dataset is used as mentioned above, and in the dataset, \(\beta =0.1\) and \(\gamma =1.0\).

  • Sign prediction task (Sect. 5.5): In the Epinions and the Slashdot datasets, \(\beta =0.5\) and \(\gamma = 0.8\). In the Wikipedia dataset, \(\beta =0.2\) and \(\gamma = 0.6\).

Fig. 7
figure 7

The link prediction performance of ranking models in terms of GAUC and AUC. GAUC indicates how well a model ranks nodes to be positively connected by a seed node at the top and those to be negatively linked at the bottom. AUC indicates how many positive nodes are ranked higher than negative ones (see the details in Appendix A.5.1). Our proposed model SRWR shows the best link prediction performance in terms of GAUC and AUC for all the datasets

5.2 Link prediction task

We evaluate the performance of personalized ranking models on link prediction in signed networks. The link prediction task is defined as follows: given a signed network and a seed node s, predict nodes which will be positively or negatively linked by the seed node in the future. An ideal personalized ranking for this task should place nodes that the seed node s potentially trusts (i.e., positive links) at the top, those that s potentially distrusts (i.e., negative links) at bottom, and other unknown ones in the middle. GAUC (Generalized AUC), proposed by [35], has been used to evaluate the quality of personalized rankings for link prediction in signed networks, and it measures such ideal ranking as 1.0. We also evaluate the ranking quality in terms of AUC indicating how many positive nodes are ranked higher than negative ones (see the details in Appendix A.5.1).

To perform this evaluation, we randomly select 1000 seed nodes, and choose \(20\%\) edges of positive and negative edges from each seed node to form a validation set. Then, we randomly select another 1000 seed nodes and choose \(20\%\) edges of positive and negative edges from each seed node as a test set. We remove those selected edges and utilize the remaining edges as a training set to compute personalized rankings. For given a parameter combination, we measure GAUC on the personalized ranking w.r.t. each seed node in the validation set, and record the average GAUC over all the seed nodes. Then, we pick the best parameter combination that provides the highest average GAUC in the validation set. With the validated parameters, we report the average GAUC over all seed nodes in the test set. We perform the same procedure for AUC. For nodes directly connected with a seed node s in the training set, we exclude those nodes from a personalized ranking list w.r.t. s since we need to recommend links which are unknown to s.

Results We compare SRWR to other random walk-based models M-RWR, M-PSALSA, PSR, RWR, and TR-TR on the link prediction task in signed networks. We also compare our method to GAUC-OPT which is a matrix factorization-based link prediction method approximately maximizing GAUC [35]. As demonstrated in Fig. 7, SRWR presents the best link prediction performance in terms of GAUC and AUC among the evaluated models over all the datasets. Compared to RWR which does not consider negative signs at all, our approach SRWR shows the significant improvement in the link prediction accuracy. Particularly, GAUC of all other methods considering signed edges is higher than that of RWR as shown in Fig. 7. This indicates that it is important to consider the sign of edges when we compute personalized rankings for link prediction in signed networks. Furthermore, SRWR outperforms other random walk-based models including GAUC-OPT which is specially designed for this task, implying our signed surfer based on balance theory effectively estimates personalized rankings for link prediction in signed networks.

5.3 User preference preservation task

Since a personalized ranking includes known and unknown users for a seed user (or node), how the ranking is consistent with the seed user’s known preferences is also considered as one criterion for evaluating the quality of personalized rankings. In signed social networks, we consider that the known preferences of a seed user s are well preserved in a personalized ranking if positive users for s (i.e., they are positively connected by s) are at the top and negative ones are at the bottom in the ranking. Hence, an ideal ranking for s (excluding s from the ranking) should produce 1.0 GAUC with known positive and negative links from s in terms of user preference preservation. To evaluate the preference preservation performance of each method, we report the average GAUC over all test seed nodes without removing the selected test edges from a training set.

As shown in Table 3, our ranking model SRWR demonstrates the best GAUC in user preference preservation among all tested methods, indicating that SRWR almost perfectly preserves users’ known preferences within their personalized rankings. The main reason for the result is that our signed surfer occasionally restarts at a seed node s with a positive sign; thus, the positive surfer frequently visits the positive neighbors of s, and the negative surfer frequently visits the negative neighbors of s. Hence, the trustworthiness scores on the positive neighbors are high, and those on the negative neighbors are low compared to those on nodes that are not connected by s.

One might think a simple approach that arbitrarily places the positive neighbors at the top, the negative ones at the bottom, and the other unknown nodes at the middle in a ranking list. The simple approach will produce 1.0 GAUC for user preference preservation; however, this cannot work on link prediction since we need to predict target nodes among unknown nodes (i.e., they are not connected to a seed node). On the contrary, our model SRWR is effective for not only user preference preservation but also signed link prediction as shown in Table 3 and Fig. 7.

Table 3 The user preference preservation quality of ranking models in terms of GAUC (Appendix A.5.1)

5.4 Troll identification task

In this section, we investigate the quality of a personalized ranking generated by SRWR in identifying trolls who behave abnormally or cause normal users to be irritated. The task is defined as follows: given a signed network and a normal user, identify trolls using a personalized ranking w.r.t. the user. In signed networks, we consider that a good personalized ranking of the normal user needs to capture trolls at the bottom of the ranking since most normal users are likely to dislike those trolls. Thus, we measure how well a personalized ranking of each method captures trolls at the bottom of the ranking to examine the quality of personalized node rankings.

As in the previous work [23], we also use the enemies of a user called No-More-Trolls in the Slashdot dataset as trolls. The user is an administrative account created for the purpose of collecting a troll list (i.e., the administrator is negatively connected to each troll). There are 96 trolls in the list. We exclude the edges adjacent to No-More-Trolls from the Slashdot dataset and use the remaining edges to estimate a personalized ranking as a training set. We use the bottom-k of the ranking to search for those trolls. We randomly select 1000 seed nodes as a validation set to search for hyper-parameters required by each method. We pick the best parameter combination that provides the highest Mean Reciprocal Rank (MRR) in the validation set. Then, for each user, we search for trolls within the bottom-k ranking, and evaluate how those trolls are ranked low in the ranking, which is measured by MRR. We also measure Mean Average Precision (MAP@k), Normalized Discount Cumulated Gain (NDCG@k), Precision@k, and Recall@k to check the performance of each method in terms of various metrics (see the details in Appendix A.5.2). Since there are no user-graded scores for the troll list, we set those scores to 1 for NDCG.

Fig. 8
figure 8

MRR of SRWR. The measure indicates how trolls are ranked low in a personalized ranking. The SRWR is the highest MRR among all tested models

Fig. 9
figure 9

The performance of ranking models for the troll identification task through various measurements: MAP@k (a), NDCG@k (b), Precision@k (c), and Recall@k (d). SRWR shows the best performance for all the measurements compared to other competitors

Table 4 Troll prediction results of ranking models w.r.t. a normal user “yagu”

Results Our proposed model SRWR significantly outperforms other ranking models for the troll identification task as shown in Figs. 8 and 9 . According to Fig. 8, the rank of a bottom ranked troll from our model is lower than that of other ranking models because MRR of our model is the highest compared to other competitors. More trolls are captured within the bottom-k ranking produced by our proposed model according to MAP@k shown in Fig. 9a. Note that Fig. 9c, d indicates that SRWR achieves higher Precision@k and Recall@k for capturing trolls than other methods. SRWR provides \(4\times \) better performance than PNR, the second best one, in terms of Precision@k when \(k=1\). Many trolls tend to be ranked low in our personalized ranking because SRWR achieves better MAP@k and NDCG@k than other ranking models as presented in Fig. 9a, b.

Case study We investigate the top-20 and the bottom-20 of the personalized ranking for a user called “yagu” in Table 4. We list the users in the bottom-20 ranking in the ascending order of the ranking scores in Table 4. According to the result, more trolls are ranked low in the personalized ranking from SRWR, indicating that our model is more sensitive in capturing trolls than other models. Also, the query user is ranked low at the bottom of the ranking from M-PSALSA, while the user is ranked high in the ranking from our model. The query user should trust himself; thus, the user should be ranked at the top in a personalized ranking. This implies our model is more desirable than other models for personalized rankings in signed networks.

5.5 Sign prediction task

We evaluate ranking scores produced by each ranking model rather than the order between nodes. Note that a ranking score between a seed node s and a target node t is based on the trustworthiness between those nodes. Hence, it is also important to examine how well those ranking scores reflect trust relationships between nodes. We measure the quality of those ranking scores exploiting the sign prediction task which is defined as follows: given a signed network and a seed node s where signs of edges connected from s are missed, predict those signs using the personalized ranking scores of each method with respect to the seed node s.

To construct a validation set, we randomly select 1000 seed nodes and choose \(20\%\) edges of positive and negative links from each seed node. We also randomly select another 1000 seed nodes, and choose \(20\%\) positive and negative edges from each seed node to form a test set. Then, we remove each selected edge \((s \rightarrow t)\) and predict the edge’s sign based on personalized ranking scores w.r.t. node s in the graph represented by the remaining edges. Our ranking score vector is \(\mathbf {r}=\mathbf {r}^{+}-\mathbf {r}^{-}\) whose values range from \(-1\) to 1. If \(\mathbf {r}_{t}\) is greater than or equal to 0, then we predict the sign of the edge \((s \rightarrow t)\) as positive. Otherwise, it is considered as negative. We pick the best parameter combination having the highest micro-accuracy (see the below) in the validation set. With the validated parameters, we measure the following prediction accuracies of a test set, macro- and micro-accuracies which are defined as follows:

$$\begin{aligned} \text {macro-accuracy}&= \frac{1}{n_{Q}}\sum _{i=1}^{n_Q} \text {accuracy}(i) \\ \text {micro-accuracy}&= \frac{\# \text { of correct predictions}}{\# \text { of total test edges}} \end{aligned}$$

where \(n_{Q}\) is the number of test seed nodes, and \(\text {accuracy}(i)\) is the seed-wise accuracy of ith test seed node (i.e., the ratio of the number of correct predictions to the number of test edges on ith seed node).

Results We compare the performance of SRWR to that of other random walk-based ranking models M-RWR, M-PSALSA, TR-TR, and PSR on the sign prediction task. We also compare our model SRWR to TRUST [12] and LOGIT [26] which are specially designed for predicting signs between two arbitrary nodes in signed networks. As shown in Fig. 10a, SRWR shows the best macro-accuracy among all tested methods. Although SRWR obtains higher micro-accuracy than LOGIT in the Epinions dataset, the micro-accuracy of LOGIT is better than that of SRWR in other datasets as shown in Fig. 10b.

Fig. 10
figure 10

The performance of SRWR on the sign prediction task in terms of macro- and micro-accuracies where the macro-accuracy indicates the average seed-wise accuracy, and the micro-accuracy indicates the ratio of the number of correct predictions to the total test edges. While the micro-accuracy of SRWR is the second best, the macro-accuracy of SRWR is the best compared to its competitors

Table 5 The difference between SRWR and LOGIT in terms of macro-accuracies of high- and low-degree groups

Another observation is that LOGIT has a large gap between macro- and micro-accuracies, while SRWR has a small gap as shown in Fig. 10. A large gap implies that on average, the deviation between micro-accuracy and seed-wise accuracy (i.e., \(\text {accuracy}(i)\)) is large, i.e., \(\text {accuracy}(i)\) for ith test seed node is likely to deviate substantially from the micro-accuracy. To analyze such deviation, we look into seed-wise accuracies in terms of node degrees. Since there are a few high-degree nodes and a lot of low-degree nodes in real-world graphs according to power-law degree distribution [3], we split test seed nodes into two groups as follows: high (top-\(20\%\)) and low (bottom-\(80\%\)) groups in the order of out-degrees of test seed nodes. Then, we measure the average of seed-wise accuracies for each group (i.e., the macro-accuracy of the group) and the standard deviation between the accuracies of those groups.

According to Table 5, LOGIT tends to better predict test edges of a seed node in the high-degree group than those in the low-degree one. In particular, on the Epinions and the Slashdot datasets, the macro-accuracy of LOGIT in the low-degree group is rather lower than that of LOGIT in the high-degree group. These results imply that LOGIT is biased toward predicting test edges from a high-degree seed node. Note that the number of test edges from a high-degree node is larger than that of test edges from a low-degree node since \(20\%\) test edges are randomly extracted from each selected test node. Thus, the total number of correct predictions from LOGIT is large (i.e., the micro-accuracy becomes high). However, the seed-wise accuracies of LOGIT are low in the low-degree group (i.e., the macro-accuracy becomes low) as shown in Table 5, thereby increasing the gap of LOGIT between micro- and macro-accuracies.

On the contrary, the gap of SRWR between micro- and macro-accuracies are relatively smaller than that of LOGIT as shown in Fig. 10, along with the small standard deviation of SRWR as shown in Table 5. Note that SRWR outperforms LOGIT in the low-degree group over all the datasets as shown in Table 5. That is why the macro-accuracy of SRWR is higher than that of LOGIT for the total test seed nodes as shown in Fig. 10. SRWR also shows a satisfactory performance in the high-degree group, especially on the Epinions dataset, although the performance of SRWR is not better than that of LOGIT on the Slashdot and the Wikipedia datasets as shown in Table 5. Thus, the standard deviation of SRWR between those groups is smaller than that of LOGIT. These experimental results indicate that SRWR is competitive enough to be comparable to other models such as LOGIT in the sign prediction task.

Fig. 11
figure 11

Accuracy maps of SRWR according to \(\beta \) and \(\gamma \) where each color indicates a degree of accuracy. The Epinions and the Slashdot datasets present similar tendencies, while the Wikipedia dataset shows a different result from those of the two datasets

Note that LOGIT is a graph feature-based method which exploits local graph features, within 1 hop from seed and target nodes, such as node degrees, common neighbors, and local wedges for predicting the sign between the seed and target nodes. A high-degree node is likely to have plentiful features, since the high-degree node has many connections to other nodes. A low-degree node would not have such local features enough due to less connections; hence, LOGIT has a limitation on increasing the predictive performance for test edges from the low-degree node based only on local graph features. On the other hand, SRWR’s inference is based on the information more than 1 hop from the seed node because the signed random surfer visits the target node via various length of paths from the seed node to the target node. That is why SRWR works well on predicting test edges of low-degree nodes compared to LOGIT.

Balance attenuation factors We adjust the balance attenuation factors of SRWR, and evaluate the sign prediction task in terms of micro-accuracy to examine how well balance theory explains signed networks. In this experiment, we use the top-100 highest degree nodes as a test set for each network. The Epinions and the Slashdot datasets show similar results where larger values of \(\beta \) and \(\gamma \) achieve high accuracy as shown in Fig. 11a, b . Unlike these two datasets, the accuracy is high when \(\beta \) is small in the Wikipedia network as shown in Fig. 11c. This implies that “an enemy of my enemy is my friend” would not be correct in the network, which means balance theory does not apply well to the Wikipedia dataset. The reason is that the Wikipedia network represents votes between users to elect administrators; thus, the dataset is different from the Epinions and the Slashdot networks which are general social networks. Note that the validated balance attenuation factors for the sign prediction task in Sect. 5.1 are consistent with the tendency demonstrated in Fig. 11. Another observation is that the ideal balance theory does not apply to real-world signed networks because the accuracy is not the best over all datasets when \(\beta =1\) and \(\gamma =1\) (i.e., the ideal balance theory).

Fig. 12
figure 12

Effect of the balance attenuation factors of SRWR. The performance of SRWR is better than that of H-SRWR (i.e., SRWR without using balance attenuation factors) in terms of the link prediction, the troll identification, and the sign prediction tasks

5.6 Effectiveness of balance attenuation factors

We examine the effects of the balance attenuation factors of SRWR on the performance of the link prediction, the troll identification, and the sign prediction tasks. In this experiment, we use H-SRWR (\(\beta =1\) and \(\gamma =1\)) and SRWR with validated balance factors for each dataset as mentioned in Sect. 5.1. H-SRWR indicates that we compute SRWR scores using Eq. (2) which does not adopt balance attenuation factors. We measure GAUC for link prediction, MRR for troll prediction, and micro-accuracy for sign prediction to compare SRWR and H-SRWR.

Figure 12 indicates that introducing balance attenuation factors is helpful for improving the performance of each application in signed networks. As shown in Fig. 12a, SRWR obtains higher GAUC than H-SRWR in the link prediction task. Also, Fig. 12b presents that SRWR achieves better MRR than H-SRWR on the troll identification task. Moreover, the accuracy of SRWR is higher than that of H-SRWR over all datasets for the sign prediction task as presented in Fig. 12c. Although introducing balance attenuation factors increase the complexity of our model and demand an additional step for searching those factors, it makes our model flexible so that SRWR resolves the weakness inherent from the strong balance theory as discussed in Sect. 4.1.2 through adjusting those factors, and improves the performance of each application in signed social networks.

5.7 Performance of SRWR-Pre

We investigate the performance of our preprocessing method SRWR-Pre in terms of preprocessing time, memory space for precomputed data, and query time. We compare SRWR-Pre to other baseline preprocessing methods Inversion and LU as well as our iterative method SRWR-Iter. Preprocessing and query time are measured in wallclock time, and we measure the average query time for 1000 random seed nodes. We set \(\beta =0.5\), \(\gamma =0.5\), \(c=0.05\) for all tested methods. In SRWR-Pre, we set the hub selection ratio \(t=0.001\) for the hub-and-spoke reordering method to make the number of hubs \(n_2\) small enough as in [34] We also measure how much memory space each preprocessing method needs for the precomputed matrices to compare memory efficiency. We omit bars for SRWR-Iter in Fig. 13a, b because SRWR-Iter does not involve a heavy preprocessing phase (i.e., the time cost for the normalization phase of SRWR-Iter in Algorithm 1 is trivial, and the memory usage of SRWR-Iter is equal to that of the input graph).

Fig. 13
figure 13

Performance of SRWR-Pre: a and b show the comparison of the preprocessing time and the memory space for preprocessed data among preprocessing methods; c compares the query time among all tested methods. SRWR-Pre presents the best performance compared to other preprocessing methods in terms of preprocessing time and memory efficiency. SRWR-Pre also computes SRWR scores more quickly than SRWR-Iter and the baseline methods

Figure 13a, b shows that SRWR-Pre provides better performance than LU and Inversion in terms of preprocessing time and memory space for preprocessed data. SRWR-Pre is up to \(4.5 \times \) faster than the second best preprocessing method LU in terms of preprocessing time. Also, SRWR-Pre requires up to \(11 \times \) less memory space than LU. Particularly, our method SRWR-Pre uses 2.6GB memory for the precomputed data in the Epinions dataset, while LU and Inversion require 28GB and 180GB memory, respectively. These results imply that SRWR-Pre is fast and memory efficient compared to other preprocessing methods. SRWR-Pre also shows the fastest query speed among other competitors including our iterative method SRWR-Iter as presented in Fig. 13c. SRWR-Pre is up to \(14 \times \) faster than SRWR-Iter, and up to \(15 \times \) faster than the second best preprocessing method LU in the Epinions dataset. Note that SRWR-Pre computes SRWR scores for a given seed node in less than 0.3 second over all signed networks. Inversion is the slowest among the tested methods over all datasets. The main reason is that Inversion produces a very large number of nonzeros in precomputed matrices (e.g., Inversion produces about 11 billion nonzeros in the Epinions dataset as presented in Table 6). These results indicate that SRWR-Pre is appropriate to serve given queries in real-time on the datasets with low memory usage compared to other methods.

Discussion In this work, we propose two methods for SRWR: SRWR-Iter and SRWR-Pre which are iterative and preprocessing methods computing SRWR scores, respectively. SRWR-Iter does not require heavy precomputed data to compute SRWR scores. However, SRWR-Iter shows slow query speed as presented in Fig. 13c because SRWR-Iter should perform matrix vector multiplications many times for a given seed node. On the other hand, SRWR-Pre is faster up to \(14 \times \) than SRWR-Iter in term of query speed since SRWR-Pre directly computes SRWR scores from precomputed data. However, in SRWR-Pre, the values of the parameters c, \(\beta \), and \(\gamma \) of SRWR are fixed through the preprocessing phase (Algorithm 3); thus, SRWR-Pre cannot change the parameters in the query phase (Algorithm 4). To obtain SRWR scores with the different values of the parameters, we need to perform the preprocessing phase with the parameters again. On the contrary, SRWR-Iter easily handles the change of the parameters in the query phase (Algorithm 2) without additional operations such as preprocessing. One appropriate usage for our methods is that a user uses SRWR-Iter to find proper parameters for a specific application; and then, the user exploits SRWR-Pre with the discovered parameters to accelerate the query speed in the application.

Table 6 Total number of nonzeros (\(nnz_{t}\)) in precomputed matrices for each preprocessing method

6 Conclusion

We propose Signed Random Walk with Restart, a novel model which provides personalized trust or distrust rankings in signed networks. In our model, we introduce a signed random surfer so that she considers negative edges by changing her sign for surfing on signed networks. Consequently, our model provides personalized trust or distrust rankings reflecting signed edges. Our model is a generalized version of Random Walk with Restart working on both signed and unsigned networks. We also devise SRWR-Iter and SRWR-Pre, iterative and preprocessing methods to compute SRWR scores, respectively. We experimentally show that SRWR achieves the best accuracy for link prediction, predicts trolls \(4\times \) more accurately, and shows a satisfactory performance for inferring missing signs of edges compared to other methods. SRWR-Pre preprocesses a signed network up to \(4.5 \times \) faster, and requires \(11 \times \) less memory space than other preprocessing methods; SRWR-Pre computes SRWR scores up to \(14 \times \) faster than other methods. Future research directions include developing a learning algorithm which automatically learns the balance attenuation factors of our model from a given input graph.