1 Introduction

How can we accurately infer the relation between two arbitrary nodes in edge-labeled graphs? Understanding how nodes are related is crucial for analyzing graph data, and many researchers have designed various relevance measures to effectively identify relevance between nodes in graphs. Random Walk with Restart (RWR) [38], a random surfer model, has been utilized for measuring relevance scores between nodes with considering global network structure [10] and multi-faceted connections between nodes [37]. RWR has been extensively utilized in numerous graph mining applications such as personalized ranking [38], link prediction [22], recommender systems [30], anomaly detection [31], community detection [1], etc.

Although many networks have been modeled with multiple edge labels to represent diverse relationships between nodes [2], RWR has a limitation in inferring the edge’s label between two nodes in edge-labeled graphs. For example, social networks such as Slashdot [15] represent trust or distrust for users as positive or negative edges. In knowledge graphs such as WordNet [27], concepts are associated with predicates. Since RWR does not consider edge labels for its relevance, it cannot identify how nodes are related with in terms of edge labels (see Figure 1). For relation inference on multiple edge labels, there are two main approaches: path feature models and translation models. Path feature models [16] exploit paths between two nodes as features for predicting their relation. However, their reasoning focuses on relatively short paths due to the expensive cost of path enumeration. On the other hand, translation models [4, 24, 41] discover latent embeddings for relations and entities under a relational translation scenario; however, they do not take account of paths between nodes into those embeddings. Although several translation models [9, 23] have been proposed by considering paths, they still require complicated path enumeration as similar to the path feature models. These limitations make the reasoning of those approaches miss the information provided by complex and long paths between the nodes.

Figure 1
figure 1

Limitation of a random surfer in RWR for estimating relation of s and t. Since the surfer does not consider edge labels, it cannot identify the relation

In this paper, we propose MuRWR (Multi-Labeled Random Walk with Restart), a novel random surfer model which accurately identifies label relevance between two nodes in an edge-labeled graph (see Figure 2). Our approaches are 1) to introduce a labeled random surfer whose label indicates the relation between starting and visiting nodes, 2) to change the surfer’s label during random walks for multi-hop reasoning process, and 3) to learn suitable rules on how to change her label from the given edge-labeled graph. We show that letting the labeled surfer move around the graph enables our model to do accurate multi-hop relational reasoning without explicit path enumeration, as well as to generalize RWR into edge-labeled graphs. Through extensive experiments, we show that MuRWR predicts edge labels between nodes more accurately than existing models. Our main contributions are as follows:

  • Model. We propose MuRWR (Multi-Labeled Random Walk with Restart), a novel and accurate model for relation inference in edge-labeled graphs (Definition 4). MuRWR exploits a labeled random surfer who considers edge labels to compute effective relevance scores between nodes for each edge label. We show that MuRWR is a generalized version of RWR to edge-labeled graphs (Property 1).

  • Algorithm. We propose how to learn the surfer’s label from an edge-labeled graph (Lemma 1) and an iterative algorithm for our model (Algorithm 3). We also theoretically prove the convergence of the algorithm (Theorem 1).

  • Experiment. Our experiments show our model provides the most accurate performance for relation inference (Tables 3 and 4).

Figure 2
figure 2

Examples of labeled walks and label transitive triangles. (a) shows how our labeled surfer walks along the path from s to t. The blue surfer is for childOf, the red one is for gclOf (grandchildOf), and the black one is unlabeled. In (a), a dashed line indicates two disconnected nodes are related with a label inferred by the surfer. (b) and (c) are the examples of label transitive triangles used for learning the rules on how to change the surfer’s label

The rest of the paper is organized as follows. We first discuss related works and their limitations in Section 2. Then, we provide preliminaries on a traditional random walk based model, and formally define edge-labeled graphs in Section 3. We describe our model and algorithms for computing our model scores in Section 4. After presenting experimental results in Section 5, we conclude in Section 6. The symbols used in this paper are in Table 1.

Table 1 Symbols

2 Related work

There are various traditional graph measures for relevance between nodes in edge-unlabeled graphs based on random walk, e.g., PageRank [29], HITS [13], and RWR [38]. Among these models, RWR has received much attention and has been applied to many graph mining tasks. However, RWR has a limitation on predicting the relation between two nodes in edge-labeled graphs since it does not consider edge labels for its relevance (details in Section 3).

Several techniques [11, 19] have been proposed to compute RWR in heterogeneous networks. These methods focus on how to determine the weights of edges by exploiting attributes in the networks, and then construct a transition matrix with the weights to compute RWR. However, they also cannot infer the relation between the nodes in edge-labeled graphs because they produce only one relevance score between two nodes, similarly to RWR.

For relation inference, we need to obtain K relevance scores for edge labels between two nodes if a graph has K edge labels.

Many researchers have recently made great efforts to apply RWR for relevance between nodes in signed networks, a special type of edge-labeled graphs, represented by positive (trust) and negative (distrust) edges. Modified RWR (MRWR) [32] computes RWR as trust and distrust scores in positive and negative subgraphs, respectively. Although the idea is applicable to edge-labeled graphs by computing RWR on each subgraph containing only a specific edge label, this leads to many disconnections between nodes; thus, MRWR is unable to exploit meaningful patterns from multi-hop paths. Signed RWR (SRWR) [12] measures trust and distrust scores between nodes based on random walks for signed networks. However, SRWR is inapplicable to relation inference on multiple edge labels since it is specialized for only two sign labels.

Two major approaches on relation inference for multiple edge labels are classified into path feature model and translation based model.

Path Ranking Algorithm (PRA) [16] is commonly used as a path feature model in heterogeneous networks. PRA extracts paths connecting two nodes, and exploits a random surfer to measure path probabilities which are used as features when predicting their relation. PRA, however, requires explicit path enumeration which becomes computationally problematic when it comes to long paths. Although the authors presented heuristic pruning techniques, PRA’s inference has still been restricted to short paths since the path enumeration essentially has an exponential complexity to path length.

Translation based models such as TransE [4], TransR [24], and DistMult [41] have been widely utilized due to its simplicity and effectiveness for modeling relational data. They formulate the relation between two nodes as a translation between the corresponding node embeddings. However, those models consider only one directed edge at a time in training; hence, their reasoning is likely to miss the information provided by multi-hop paths between them.

To resolve this issue, PTransE [23] and COMP [9] have been proposed by utilizing multi-step paths under the translation embedding mechanism. However, these methods also suffer from the path enumeration problem as similar to PRA; thus, they have restricted the length of paths for relational reasoning to at most three steps.

Compared to the above methods, our proposed model MuRWR performs accurate multi-hop relational reasoning without complicated and heuristic path enumeration.

3 Preliminary

We provide preliminaries on relation inference in edge-labeled graphs, our target research task, and RWR in this section. Note that we use the terms edge label and relation interchangeably.

Edge-labeled Graphs

Many researchers from various application areas are confronted with labeled edges which encode diverse relations between entities in graphs [2, 15, 21, 27].

Formally, an edge-labeled graph is defined as follows.

Definition 1 (Edge-labeled Graph and Labeled Adjacency Matrix)

An edge-labeled graph G = (V,E,L) consists of the set V of nodes, the set E of directed edges, and the set L of edge labels. Let L = {l1,⋯ ,lK} where lk is k-th label, and K is the number of edge labels. For each edge \(u \rightarrow v \in \mathbf {E}\) such that u,vV, the edge is associated with an edge label luvL. The labeled adjacency matrix A of G is a sparse matrix such that Auv is luv if there is an luv-labeled edge from u to v, and 0 otherwise.

Relation Inference Task

We describe the formal problem definition of relation inference handled in this paper.

Problem 1 (Relation Inference [16])

The relation inference task is defined as follows:

  • Given: an edge-labeled graph G = (V,E,L), and two disconnected nodes s and t in V,

  • Infer: the edge label from source node s to target node t among edge labels in L.

Note that Problem 1 is different from that of belief propagation (BP) [7, 14] that aims to classify labels on nodes not edges while the goal of MuRWR is to predict labels on edges.

Random Walk with Restart

Given a graph and a source node s, RWR computes a relevance score between s and each node using a random surfer who starts from s. The surfer performs one of the followings at each step:

  • Random Walk. The surfer randomly moves to one of its neighbors from the current node with probability 1 − c.

  • Restart. The surfer goes back to s with probability c.

The relevance score between s and node t is the stationary probability that the surfer is at t after moving around the entire graph from s.

Note that RWR considers multi-faceted paths between nodes [37] with restart probability c which controls how long paths affect the score between s and t. If c is high, the surfer frequently restarts from s; consequently, she mainly visits t via relatively short paths. Otherwise, the surfer is also able to reach t through longer paths as well as short ones.

The main challenge is to devise how to reflect multiple edge labels on relevance scores between nodes in edge-labeled graphs. Although RWR is able to identify relevance scores between nodes, it depends only on the link structure of the graph without considering labels; therefore, it cannot estimate how nodes are related in terms of edge labels. Figure 1 shows an example of representing family relationships such as childOf and spouseOf. The traditional surfer in RWR does not consider edge labels at all during its walks while the labeled edges should be interpreted differently. Thus, the information encoded in the path is ignored when measuring the relevance between nodes s and t; RWR cannot identify which edge label should be associated with nodes s and t. How can we make the surfer recognize edge labels to correctly predict their relation?

4 Proposed Method

We propose MuRWR (Multi-Labeled Random Walk with Restart), a novel random walk based model which measures relevance scores between a source node and other nodes for each edge label in an edge-labeled graph. The technical challenges and our main ideas are as follows:

  • How can we make a surfer consider the edge labels? We introduce a labeled random surfer whose label at a node indicates the relation from the source node to that node.

  • How can the surfer infer the relation between the nodes? We allow the surfer to change her label during random walks with rules for a multi-hop reasoning.

  • How can we discover the rules? We exploit a data-driven approach to extract knowledge from the graph so that the surfer learns the rules.

In MuRWR, the surfer’s movement from source node s to a node t is considered as a process of reasoning about the relation from s to t. Figure 2 depicts how MuRWR infers the relation from node s to node t. The labeled surfer has one of edge-labels such as childOf (blue-colored) or grandchildOf (red-colored) at each node except at s. Assume we have the following rules for the surfer:

  1. Rule 1

    If childOf-surfer moves along childOf-edge, the surfer’s label changes to grandchildOf.

  2. Rule 2

    If grandchildOf-surfer moves along spouseOf-edge, the surfer’s label remains the same, i.e., grandchildOf.

In Figure 2a, the surfer first starts from source node s without any label, and moves to node u. Then, she has childOf label at u since nodes s and u are directly connected with childOf-edge (note that her label indicates the relation between the source node s and the current node). After she moves to node v, her label changes to grandchildOf by the first rule. At this time, MuRWR infers the relation from s to v as grandchildOf as shown in Figure 2a. When the surfer finally arrives at node t, her label is still grandchildOf by the second rule, indicating that the relation from s to t is also inferred as grandchildOf. Thus, introducing a label to the surfer enables her to do a multi-hop relational reasoning by walking around the graph if the surfer knows appropriate rules.

Then, how can we discover the rules for the surfer? For this, we exploit a data-driven approach, which extracts knowledge embraced in the given graph. The knowledge that the surfer learns is represented as labeled transitive triangles as in Figure 2b and c. A transitive triangle is interpreted as a syllogistic knowledge, e.g., Figure 2b implies

$$ \underbrace{\text{\texttt{childOf}(\textit{x}, \textit{y})}}_{\text{current surfer's label}} \!\! \wedge \underbrace{\text{ \texttt{childOf}(\textit{y}, \textit{z})}}_{\text{edge label}} \Rightarrow \underbrace{\text{\texttt{grandchildOf}(\textit{x}, \textit{z})}}_{\text{next surfer's label}}. $$

An intuitive example for the surfer’s learning is as follows. Suppose an untrained surfer (black-colored) is at node x in Figure 2a. The surfer then has childOf-label at node y since x and y are directly related with the label. Similarly, she has grandchildOf-label at node z. This is interpreted as: if childOf-surfer moves along childOf-edge, her label changes to grandchildOf. Thus, we enumerate transitive triangles from the graph, and use them as training instances, called label transition observations, for the surfer.

We first describe the details on label translation observations in Section 4.1, and explain how to learn rules for our labeled surfer from the observations in Section 4.2. Then, we formally define and formulate our model MuRWR in Sections 4.3 and 4.4, respectively. We present the algorithms for computing MuRWR in Section 4.5, and prove its convergence.

4.1 Label transition observation

We describe how to collect observations for learning the surfer’s rules from the graph G. We first define label transition observation as follows:

Definition 2 (Label Transition Observation)

A label transition observation \(l_{i} \! \xrightarrow {l_{k}} \! l_{j}\) is that when li-labeled surfer moves along lk-labeled edge, her label changes to lj.

A label transition observation is obtained from a label transitive triangle interpreted as a transition between edge labels. For example, suppose an untrained surfer (black-colored) is at node x in Figure 3a. When the surfer moves to node y, her label should be li because x and y are directly related with label li. Similarly, her label is lj at node z. Then we observe how the surfer’s label changes as in Figure 3c: when li-labeled surfer moves along lk-labeled edge, her label changes to lj, implying the label transition observation \(l_{j} \! \xrightarrow {l_{k}} \! l_{j}\) in Definition 2. To collect such observations, we enumerate all transitive triangles from the graph G using an efficient triangle enumeration algorithm [17].

Figure 3
figure 3

Example of how to obtain label transition observations from label transitive relationships. (a) presents a label transitive relationship. (b) shows how to interpret the triangle to obtain the label transition observation \(l_{i} \! \xrightarrow {l_{k}} \! l_{j}\) in (c)

In this paper, we exploit only transitive triangles for learning the surfer’s rules. Although other types of structures might be used for this purpose, it is not very straightforward to extract rules from arbitrary subgraphs. Even if we could discover some rules from the subgraphs, it is not efficient and scalable to enumerate all of the subgraphs from a graph. On the other hand, triangles are easily interpreted as in Figure 3b, and efficiently enumerated. We leave the problem of considering other types of subgraphs for the surfer’s rule as a future work.

4.2 Learning label transition probabilities

We explain how to learn the surfer’s rules from the label transition observations. A label transition probability is a conditional probability P(lj|li,lk) such that when li-labeled surfer moves along lk-labeled edge, her label changes to lj. For brevity, let \(P(l_{i} \! \xrightarrow {l_{k}} \! l_{j})\) denote P(lj|li,lk). We formally define label transition probability matrix as follows:

Definition 3 (Label Transition Probability Matrix)

Let \(\mathbf {S}_{k} \in \mathbb {R}^{K \times K}\) be a label transition probability matrix on edge label lk. The (i,j)-th entry Skij of Sk indicates the label transition probability that the surfer’s label changes from li to lj through edge label lk, i.e., \(\mathbf {S}_{kij}=P(l_{i} \! \xrightarrow {l_{k}} \! l_{j})\). Note that \({\sum }_{j=1}^{K}P(l_{j} | l_{i} , l_{k}) = {\sum }_{j=1}^{K}\mathbf {S}_{kij} = 1\).

Given label transition observations, we aim to learn label transition probabilities Skij maximizing a likelihood function of making the observations. Suppose we are given sets \(\mathcal {D}_{k}\) of label transition observations represented as follows:

where nk is the number of the observations in \(\mathcal {D}_{k}\). xkh is a label transition observation \(\mathbf {x}_{khs}\xrightarrow {l_{k}}\mathbf {x}_{kht}\) such that xkhs,xkhtL where xkhs is the surfer’s source label, and xkht is her destination label. Let P(xkh;Sk) denote the probability of xkh with regard to the parameter Sk, i.e., P(xkh;Sk) = Skkxkhsxkht. Then, the log-likelihood function \(L(\mathbf {S}_{k};\mathcal {D}_{k})\) is represented as follows:

$$ L(\mathbf{S}_{k};\mathcal{D}_{k}) = \log{\prod\limits_{h=1}^{n_{k}}P(\mathbf{x}_{kh};\mathbf{S}_{k})} = \sum\limits_{h=1}^{n_{k}} \log{P(\mathbf{x}_{kh};\mathbf{S}_{k})} $$

However, maximizing the classical likelihood \(L(\mathbf {S}_{k};\mathcal {D}_{k})\) would be unsatisfactory since it is sensitive to noises or outliers in observations, and many networks such as knowledge graphs would be noisy and incomplete [6]. Considering this issue, we adopt maximum weighted likelihood estimation (MWLE) [40] that uses weights to vary the importance of \(\log {P(\mathbf {x}_{kh};\mathbf {S}_{k})}\) for each observation. For the purpose, we define destination label weights {wj|1 ≤ jK} that weigh the importance of \(\log {P(\mathbf {x}_{kh};\mathbf {S}_{k})}\) according to destination label xkht. The intuition is that destination labels play a key role in relation inference since the relation between a source node and a destination node u is determined by the surfer’s (destination) label when she arrives at u after starting from the source node. Thus, our weighted log-likelihood function is defined as follows:

$$ WL(\mathbf{S}_{k};\mathcal{D}_{k}) = \sum\limits_{h=1}^{n_{k}}w_{\mathbf{x}_{kht}}\log{P(\mathbf{x}_{kh};\mathbf{S}_{k})} $$
(1)

The following lemma provides the result of MWLE on the weighted log-likelihood function \(WL(\mathbf {S}_{k};\mathcal {D}_{k})\).

Lemma 1 (Maximum Weighted Likelihood Estimation for Label Transition Probability Matrices)

The following estimator \(\hat {\mathbf {S}}_{kij}\) maximizes the weighted log-likelihood function \(WL(\mathbf {S}_{k};\mathcal {D}_{k})\) in (1):

$$ \hat{\mathbf{S}}_{kij} = \frac{w_{j}N_{kij}}{\sum\limits_{z=1}^{K}w_{z}N_{kiz}} $$
(2)

where \(N_{kij} ={\sum }_{h=1}^{n_{k}}\mathbf {1}(\mathbf {x}_{khs} = i, \mathbf {x}_{kht} = j)\) is the count for the label transition observations \(l_{i} \! \xrightarrow {l_{k}} \! l_{j}\), and 1(⋅) returns 1 if a given predicate is true, or 0 otherwise.

Proof

See the proof in Lemma 2 of Appendix A. □

Lemma 1 indicates that \(\hat {\mathbf {S}}_{kij}\) is determined by the label weight wj and the count Nkij for the observations. Suppose all label weights {wj} are fixed to 1 (i.e., no label weights). Then \(\hat {\mathbf {S}}_{k}{i}{j}\) depends only on the given observations. The label weights {wj} are interpreted as the relative importances of the labels on the label transition probabilities \(\hat {\mathbf {S}}_{kij}\). If we set wj to a high value, the probability that the surfer’s label changes to label lj increases compared to other labels during her random walk. We select proper {wj} that provide the best performance in the validation sets, as described in Section 5.2.

4.3 Multi-Labeled Random Walk with Restart

We describe our proposed model Multi-Labeled Random Walk with Restart (MuRWR) in the following definition:

Definition 4 (Multi-Labeled Random Walk with Restart)

A labeled random surfer has label li among K edge labels at a node except at source node s. The surfer starts from source node s without any label. Suppose the surfer is currently at node u, and c is the restart probability with 0 < c < 1. Then, the surfer performs one of the followings at each step:

  • Multi-Labeled Random Walk. The surfer randomly moves from node u to a neighboring node v with probability 1 − c through luv-labeled edge. If her label was li at node u, then her label changes to label lj at node v according to label transition probability \(P(l_{i} \! \xrightarrow {l_{uv}} \! l_{j})\).

  • Restart. The surfer restarts at source node s with probability c. The surfer becomes unlabeled.

MuRWR measures K probabilities at each node. Let Rui denote the probability that li-labeled surfer is at node u after MuRWR from s. If Rui is higher than Ruj for ji, this indicates that li-labeled surfer frequently visits node u, implying source node s is highly related by edge label li to node u. Thus Rui is used for a relevance score between s and u for label li.

Our model MuRWR also considers multi-faceted paths between s and u, which is controlled by the restart probability c. If c is low, the labeled surfer visits u via paths of various lengths since the surfer prefers random walks to restart. On the other hand, if c is high, the surfer mainly visits u through relatively short paths due to frequent restarts. We will empirically study the effect of c on relation inference in the Experiment section (see Figure 6).

Note that when the unlabeled surfer moves from source node s to node u, her label becomes lsu since they are directly related with label lsu as depicted in Figure 2a. To make notations consistent, we add a dummy label ld indicating unlabeled at source node s. Then we set label transition probabilities \(P(l_{d} \! \xrightarrow {l_{su}} \! l_{su})\) to 1 for each out-neighbor u of source node s (line 1 in Algorithm 3).

4.4 Formulation for MuRWR

We present formulations of MuRWR before proposing an algorithm for it. We derive a recursive equation for the probabilities Rui (MuRWR scores).

Element-wise Representation

The MuRWR scores at a node are recursively determined by in-neighbors of the node. Figure 4 shows an example of the formulation for MuRWR probabilities where the figure depicts the part of a graph having two edge labels l1 and l2. In Figure 4, we mark an (edge label, probability) pair on each edge where the probability is for surfer’s choosing the corresponding edge.

Figure 4
figure 4

Example of the formulation for the probability \(\mathbf {R}_{u1}^{(t)}\) that l1-labeled surfer visits node u at time t

Let \({\mathbf {R}}_{u1}^{(t)}\) denote the probability that l1-labeled surfer visits node u at time t after starting from source node s. In order that the surfer visits node u with label l1 at time t, her label should be changed into l1 when the surfer moves to node u from one of in-neighbors of u. For example, when she moves from node v to node u, her label changes to label l1 with probability \(\mathbf {R}_{{v}{1}}^{({t-1})} P(l_{2} \! \xrightarrow {l_{1}} \! l_{1}) + {\mathbf {R}}_{{v}{2}}^{({t-1})} P(l_{2} \! \xrightarrow {l_{1}} \! l_{1})\). Considering the restart action with c, \({\mathbf {R}}_{{u}{1}}^{(t)}\) is determined as:

$$ \begin{array}{@{}rcl@{}} {\mathbf{R}}_{u1}^{(t)} &=& (1-c) \left[\frac{1}{2} \left( {\mathbf{R}}_{v1}^{(t-1)} P(l_{1} \! {\xrightarrow{l_{1}}} \! l_{1}) + {\mathbf{R}}_{v2}^{(t-1)} P(l_{2} \! {\xrightarrow{l_{1}}} \! l_{1})\right)+\right. \\ &&\underbrace{\left.\frac{1}{3}\left( {\mathbf{R}}_{w1}^{(t-1)} P(l_{1} \xrightarrow{l_{2}} l_{1}) + {\mathbf{R}}_{w2}^{(t-1)} P(l_{1} \xrightarrow{l_{2}} l_{1})\right)\right]}_{\text{Labeled Random Walk}}\\ &&+ c\underbrace{\mathbf{1}(u=s, l_{1}=l_{d})}_{\text{Restart}} \end{array} $$

where 1(⋅) returns 1 if a given predicate is true, or 0 otherwise. Note that \({\mathbf {R}}_{u2}^{(t)}\) is also determined similarly to the above equation.

The general equation for \({\mathbf {R}}_{ui}^{(t)}\) is represented as:

$$ \begin{array}{ll} {\mathbf{R}}_{ui}^{(t)} &= (1-c) {\sum}_{v \in \overleftarrow{\mathbf{N}}_{u}}\left( \frac{1}{\lvert \overrightarrow{\mathbf{N}}_{v} \rvert} {\sum}_{j=1}^{K} {\mathbf{R}_{vj}^{(t-1)}}P(l_{j} \! \xrightarrow{l_{vu}} \! l_{i})\right) \\ &+ c\mathbf{1}(u=s, l_{i}=l_{d}) \end{array} $$
(3)

where \(\overleftarrow {\mathbf {N}}_{u}\) is the set of in-neighbors of node u, and \(\overrightarrow {\mathbf {N}}_{v}\) is the set of out-neighbors of node v. Note that \({\mathbf {R}}_{{u}{r}}^{t}\) is the accumulated result of MuRWR until step t with decaying factor 1 − c, as interpreted similarly in PageRank or RWR.

Matrix Representation

We represent (3) in a matrix form using symbols in the following definitions:

Definition 5 (Labeled Semi-adjacency Matrix)

The lk-labeled semi-ad-jacency matrix Ak is a matrix such that (u,v)-th entry Akuv of Ak is 1 if the label of the edge \(u \rightarrow v\) is lk, or 0 otherwise.

Definition 6 (Labeled Semi-row-normalized Matrix)

Let D be the out-degree diagonal matrix of a graph G where Duu is the out-degree of node u. Then lk-labeled semi-row-normalized matrix \(\tilde {\mathbf {A}}_{k}\) is defined by \(\tilde {\mathbf {A}}_{k} = \mathbf {D}^{-1}\mathbf {A}_{k}\). In other words, \(\tilde {\mathbf {A}}_{kuv} = |\overrightarrow {\mathbf {N}}_{u}|^{-1}=\mathbf {D}^{-1}_{uu}\) if there is lk-labeled edge from node u to node v in the graph G, or 0 otherwise, where \(\overrightarrow {\mathbf {N}}_{u}\) is the set of out-neighbors of node u.

Based on Definitions 3, 5 and 6, (3) is rewritten as follows:

$$ \mathbf{R}^{(t)} = (1-c)\!\underbrace{\sum\limits_{k=1}^{K}({\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}^{(t-1)}\mathbf{S}_{k})}_{\text{Labeled Random Walk}} + \text{ } c\!\!\!\!\underbrace{\vphantom {{\sum}_{k=1}^{K}} \mathbf{Q}_{s}}_{\text{Restart}} $$
(4)

where \(\mathbf {R}^{(t)} \in \mathbb {R}^{n \times K}\) is an MuRWR score matrix such that each entry \({\mathbf {R}}_{ui}^{(t)}\) is a score between source node s and node u for edge label li at step t, and \(\mathbf {Q}_{s}\in \mathbb {R}^{n \times K}\) is a single entry matrix whose (s,d)-th entry is 1, and other entries are 0, where the index d is for the dummy label ld. We provide the detailed derivation from (3) to (4) in Lemma 3 of Appendix B.

figure a
figure b

Note that MuRWR is a generalized version of RWR, i.e., MuRWR works on edge-labeled graphs as well as plain graphs without edge labels, which is proved in the following.

Property 1

MuRWR produces the same result with RWR in a plain graph.

Proof

This case is equivalent to when the number K of edge labels is 1. Then, R(t) and Qs are written as vectors r(t) and qs in \(\mathbb {R}^{n \times 1}\), respectively, where n is the number of nodes. Since K = 1, there is only one type of labeled transitive triangle; thus, \(\mathbf {S}_{1}\in \mathbb {R}^{1 \times 1} = 1\). Then, (4) is exactly the same with \(\mathbf {r}^{(t)} = (1-c) \tilde {\mathbf {A}}^{\top } \mathbf {r}^{(t-1)} + c\mathbf {q}_{s}\), the recursive equation of RWR [38]. □

4.5 Algorithm for MuRWR

Learning Phase (Algorithm 1)

Given a labeled adjacency matrix A, the learning phase learns the label transition probability matrices Sk. We enumerate all transitive triangles from the graph represented by A (line 1) using a triangle enumeration algorithm [17]. Based on the enumerated transitive triangles, we estimate the label transition matrices Sk using (2) (lines 2\(\sim \)8).

Normalization Phase (Algorithm 2).

This phase produces the semi-row-normalized matrices \(\tilde {\mathbf {A}}_{k}\) for 1 ≤ kK from the labeled adjacency matrix A after building semi-adjacency matrices Ak by Definitions 5 and 6 (lines 1\(\sim \)3).

figure c

Iteration Phase (Algorithm 3)

This phase computes the MuRWR relevance score matrix R w.r.t. source node s. We first set \(P(l_{d} \! \xrightarrow {l_{su}} \! l_{su})=1\) for each \(u \in \overrightarrow {\mathbf {N}}_{s}\) for the dummy label ld (line 1). After setting starting matrix Qs and initializing r(0) to Qs (line 2), we repeat the update for r(t) based on (4) until convergence (lines 4\(\sim \)8). We compute residual δ(t) between r(t) and r(t− 1) which is the result from the previous iteration (line 7) where δ(t) is measured by entry-wise L1 matrix norm, i.e., \(\| \mathbf {A} \|_{1,1} = {\sum }_{i,j}|\mathbf {A}_{ij}|\). This stops when δ(t) is smaller than error tolerance 𝜖.

Convergence Analysis

We prove the convergence guarantee of the iterative method (Algorithm 3) of MuRWR in Theorem 1.

Theorem 1 (Convergence of MuRWR)

Suppose r(t) = vec(R(t)) and qs = vec(Qs) where R(t) is the MuRWR score matrix at step t in Algorithm 3, and vec(⋅) is the vec-operator which converts a matrix into a vector [18]. Then the residual δ(t) ≤ 2(1 − c)t, and r(t) converges to \(\mathbf {r} = c(\mathbf {I} - (1-c)\tilde {\mathbf {B}}^{\top })^{-1}\mathbf {q}_{s}\) where \(\tilde {\mathbf {B}}^{\top } = {\sum }_{k=1}^{K}\mathbf {S}_{K}^{\top } \otimes {\tilde {\mathbf {A}}^{\top }}_{k}\) and ⊗ is Kronecker product.

Proof

Equation (4) is vectorized by vec(⋅) as follows:

$$ \begin{array}{@{}rcl@{}} vec(\mathbf{R}^{(t)}) &=& (1-c)\sum\limits_{k=1}^{K}vec({\tilde{\mathbf{A}}^{\top}}_{k}\mathbf{R}^{(t-1)}\mathbf{S}_{k}) + c (vec(\mathbf{Q}_{s}) ) \\ &=& (1-c)\sum\limits_{k=1}^{K}(\mathbf{S}^{\top}_{k} \otimes {\tilde{\mathbf{A}}^{\top}}_{k})vec(\mathbf{R}^{(t-1)}) + c (vec(\mathbf{Q}_{s}) ) \\ \Leftrightarrow \mathbf{r}^{(t)} &=& (1-c)\tilde{\mathbf{B}}^{\top}\mathbf{r}^{(t-1)} + c\mathbf{q}_{s} \end{array} $$
(5)

where \(\tilde {\mathbf {B}}^{\top } = {\sum }_{k=1}^{K} {\mathbf {S}}_{k}^{\top } \otimes {\tilde {\mathbf {A}}}_{k}^{\top }\). The second equation is derived by vec(ABC) = (CA)vec(B) [18].Residual analysis The residual δ(t) = ∥R(t)R(t− 1)1,1 is equal to ∥r(t)r(t− 1)1 since ∥A1,1 = ∥vec(A)∥1 [18]. Thus, the residual is bounded as follows:

$$ \begin{array}{@{}rcl@{}} \delta^{(t)} &=& \| \mathbf{r}^{(t)} - \mathbf{r}^{(t-1)} \|_{1} = \| (1-c)\tilde{\mathbf{B}}^{\top}\mathbf{r}^{(t-1)}\! - (1-c)\tilde{\mathbf{B}}^{\top}\mathbf{r}^{(t-2)}\|_{1} \\ &\leq& (1-c)\| \tilde{\mathbf{B}}^{\top} \|_{1} \| \mathbf{r}^{(t-1)} - \mathbf{r}^{(t-2)} \|_{1} \\ &\leq& (1-c)\| \mathbf{r}^{(t-1)} - \mathbf{r}^{(t-2)} \|_{1} \leq {\cdots} \\ &\leq& (1-c)^{t-1}\| \mathbf{r}^{(1)} - \mathbf{r}^{(0)} \|_{1} = (1-c)^{t}\|\tilde{\mathbf{B}}^{\top}\mathbf{q}_{s} - \mathbf{q}_{s}\|_{1} \leq 2(1-c)^{t} \end{array} $$

Note that \(\|\tilde {\mathbf {B}}^{\top }\mathbf {q}_{s} - \mathbf {q}_{s}\|_{1} \leq 2\) since \(\|\tilde {\mathbf {B}}^{\top }\mathbf {q}_{s} - \mathbf {q}_{s}\|_{1} \leq \|\tilde {\mathbf {B}}^{\top }\mathbf {q}_{s}\|_{1} + \|\mathbf {q}_{s}\|_{1}\leq \|\tilde {\mathbf {B}}^{\top }\|_{1}\|\mathbf {q}_{s}\|_{1} + 1 \leq 2\) where ∥qs1 = 1, and \(\| \tilde {\mathbf {B}}^{\top } \|_{1}\leq 1\) by Lemma 6. Hence, the bound for the residual is δ(t) ≤ 2(1 − c)t.Convergence analysis Equation (5) is also represented as follows:

$$ \begin{array}{@{}rcl@{}} \mathbf{r}^{(t)} &=& (1-c)\tilde{\mathbf{B}}^{\top}\mathbf{r}^{(t-1)} + c\mathbf{q}_{s} \\ &=& \left( (1-c)\tilde{\mathbf{B}}^{\top}\right)^{2}\mathbf{r}^{(t-2)} + c\left( (1-c)\tilde{\mathbf{B}}^{\top} + \mathbf{I}\right)\mathbf{q}_{s} = {\cdots} \\ &=& \left( (1-c)\tilde{\mathbf{B}}^{\top}\right)^{t}\mathbf{r}^{(0)} + c\sum\limits_{j=0}^{t-1}\left( (1-c)\tilde{\mathbf{B}}^{\top}\right)^{j}\mathbf{q}_{s} \end{array} $$

The spectral radius \(\rho ((1-c)\tilde {\mathbf {B}}^{\top }) \leq (1-c) < 1\) when 0 < c < 1 since \(\rho (\tilde {\mathbf {B}}^{\top }) \leq 1\) according to Lemma 6 of Appendix C. Hence, \(\lim _{t \rightarrow \infty }((1-c)\tilde {\mathbf {B}}^{\top })^{t}\mathbf {r}^{(0)} = \mathbf {0}\) [35] and \(\lim _{t \rightarrow \infty }\mathbf {r}^{(k)}\) converges as:

$$ \underset{t \rightarrow \infty}{\lim} \mathbf{r}^{(t)} = c\sum\limits_{j=0}^{\infty}\left( (1-c)\tilde{\mathbf{B}}^{\top}\right)^{j}\mathbf{q}_{s} = c(\mathbf{I} - (1-c)\tilde{\mathbf{B}}^{\top})^{-1}\mathbf{q}_{s} $$
(6)

\({\sum }_{j=0}^{\infty }((1-c)\tilde {\mathbf {B}}^{\top })^{j}\) is a geometric series of the matrix \((1-c)\tilde {\mathbf {B}}^{\top }\), and it converges to \((\mathbf {I} - (1-c)\tilde {\mathbf {B}}^{\top })^{-1}\) since the spectral radius of \((1-c)\tilde {\mathbf {B}}^{\top }\) is less than 1 [35]. Note that \(\mathbf {I}-(1-c)\tilde {\mathbf {B}}^{\top }\) is invertible when the spectral radius \(\rho ((1-c)\tilde {\mathbf {B}}^{\top }) \leq (1-c) < 1\) [35]. □

Theorem 1 implies that as step t increases, the residual δ(t) in Algorithm 3 monotonically decreases, and converges to zero since 0 < c < 1 as specified in Definition 4. Also, r(t) converges to a unique solution \(\mathbf {r} = c(\mathbf {I} - (1-c)\tilde {\mathbf {B}}^{\top })^{-1}\mathbf {q}_{s}\).

According to Theorem 1, MuRWR scores in r = vec(R) can be directly obtained from (6). However, the matrix inversion of I − (1 − c)r(B) requires O((Kn)3) time because the dimension of r(B) is Kn × Kn, and matrix inversion exhibits cubic time (i.e., O(t3) for t × t matrix) [35] where K is the number of edge labels and n is the number of nodes. Thus, the matrix inversion is not scalable, especially when n is large. On the other hand, our iterative approach in Algorithm 3 shows a linear scalability w.r.t. m and n, i.e., O(T(Km + K3n)) where T is the number of iterations and m is the number of edges, which is much more efficient than the matrix inversion. We analyze the detailed time complexities of the algorithms of MuRWR in Appendix D.

5 Experiment

We perform experiments to answer the following questions:

  • Q1. Performance of Relation Inference (Section 5.2). How accurately does MuRWR predict edge labels between nodes compared to other existing methods?

  • Q2. Effects of Label Weights (Section 5.3). How does the label weights wj in MuRWR affect the predictive performance of MuRWR for the relation inference task?

  • Q3. Effects of Restart Probability (Section 5.4). How does the restart probability c in MuRWR affect the inference performance of MuRWR?

  • Q4. Convergence (Section 5.5). Does the iterative algorithm for MuRWR converge? How does the restart probability c affect the convergence behavior of the algorithm?

5.1 Experimental settings

Datasets

The datasets used for our experiments are summarized in Table 2. In the Epinions [25] and the Slashdot [15] datasets, users rate each other positively or negatively. The WN11 [33] and WN18 [3] datasets are from WordNet [27], a knowledge graph of words where an edge label is a relation between words. The WikiVote dataset is a signed network where users vote positively or negatively on their candidates [20]. The Advogato dataset is a social network where an edge label indicates a level of trust [26].

Table 2 Dataset statistics. n is the number of nodes, m is the number of edges, and K is the number of edge labels

Competitors

We compare our proposed model MuRWR to other existing methods which are categorized as follows:

  • Random: Random predicts the label of a test edge randomly, which provides the worst performance of each dataset.

  • LINE [36] and node2vec [8]: We exploit well-known embedding models LINE and node2vec as baseline methods, although they are not designed for relation inference. Since they are designed for plain networks, we first extract an embedding vector of each node using those methods in a given network without edge labels. Next, we convert li-edge between two nodes into a training instance, i.e., the feature is the concatenation of the embeddings of those nodes, and the label is li. We perform multinomial logistic regression based on a softmax function to predict the edge label.

  • MRWR [32] and SRWR [12]:

    We compare MuRWR to MRWR and SRWR which are RWR based variants. Although MRWR is originally designed for signed networks, it is easy to make the method work on edge-labeled graphs by computing RWR on each subgraph containing only a specific edge label. Note that SRWR cannot work on general edge-labeled graphs since it is designed only for signed networks (i.e., K = 2); hence, the results of SRWR for other edge-labeled graphs (i.e., K > 2) are omitted in Tables 3 and 4.

  • PRA [16]: PRA is a path feature model used for relation inference in edge-labeled graphs. PRA extracts path features between two nodes using a random surfer, and exploits those features with logistic regressors.

  • TransE [4] and TransR [24]: TransE is a translation based method which considers the relation l between nodes s and t as a translation between the corresponding node embeddings. Specifically, TransE discovers embeddings s, l, and t minimizing f(s,l,t) = ∥s + lt∥. Given nodes s and t, it predicts their relation l that minimizes f(s,l,t). TransR models entities and relations in distinct spaces, i.e., sl = sMl and tl = tMl where Ml is projection matrix for l. Then, TransR minimizes f(s,l,t) = ∥sl + ltl∥.

  • DistMult [41]: DistMult is a general framework modeling triples (s,l,t); it unifies existing neural tensor and translating models for relational learning.

  • PTransE [23]: PTransE extends TransE, and models relational paths between entities. Suppose entities s and t are connected by a path \(s \xrightarrow {l_{1}}e_{1} \xrightarrow {l_{2}}t\). PTransE regards this path as another relation consisting of l1 and l2 (i.e., p = (l1,l2)), and minimizes f(s,p,t) = ∥s + (l1l2) −t∥ where ∘ is an operation such as addition or multiplication to join these relations. PTransE exploits a network flow technique to extract relational paths to be trained.

  • COMP [9]: COMP is a unified framework to model relational paths by generalizing existing embedding approaches such as TransE and bilinear models. COMP samples relational paths by performing random walks on an edge-labeled graph.

Table 3 Performance of relation inference in terms of accuracy. The best method is in bold, and the second best method is in italic. Our proposed model MuRWR (marked ‡) shows the best performance in accuracy
Table 4 Performance of relation inference in terms of macro F1-score. The best method is in bold, and the second best method is in italic. Our proposed model MuRWR (marked ‡) shows better inference performance than other methods on most datasets in terms of macro F1-score

5.2 Relation inference task

We evaluate our proposed model MuRWR on a relation inference task defined as follows: given an edge-labeled graph containing missed labels of edges, predict those edge labels. We randomly select 500 source nodes and choose 20% of out-going edges from each source node as a validation set which is used for selecting proper hyper-parameters of each method. For a test set, we randomly select another 500 source nodes and choose 20% of out-going edges from each source node. We then remove each selected edge \(s \rightarrow t\), and predict the edge’s label using relevance scores w.r.t. source node s, i.e., the predicted label \(\hat {l}\) is decided as follows: \(\hat {l}=\arg \max \limits _{1\leq k \leq K}\mathbf {R}_{tk}\). We measure the prediction accuracy for the test dataset, which is defined as follows: accuracy = # of correct predictions/# of test edges. This task is also considered as multi-class classification (i.e., classify the label of a test edge); thus, we measure macro F1-score which is a multi-class classification accuracy [34]. We repeat the above procedure 10 times, and report the average accuracy over the multiple runs for each method.

For the experiment, we select label weights wj which provide the best inference in the validation set. To search for the label weights, we adopt the forward stepwise selection strategy [28]. Suppose we consider three weights w1, w2, and w3 initialized to 1. For w1, we fix other weights w2 and w3, and choose the value of w1 which provides the best accuracy on the validation set when varying w1 in the range [0,2] by step size 0.2. With the selected value of w1, we fix w3, and repeat the above procedure for w2. We finally perform the stepwise search for w3 with the selected values of w1 and w2. For other methods, we select proper hyper-parameters of the methods via a grid search, which provide the best accuracy on the validation set. For PRA, PTransE, and COMP, we set the maximum path length to 3 Note that we cannot perform relation inference using relevance scores measured by standard RWR as described in Section 1.

Tables 3 and 4 show the performance of MuRWR compared to other methods. Our model MuRWR provides the best accuracy for predicting edge labels: MuRWR obtains \(0.7\sim 6.1\%\) relative improvement on accuracy over the second best method. Also, the performance of MuRWR is higher than that of other methods in terms of macro F1-score for most datasets. MuRWR outperforms the second best method by up to \(0.5 \sim 4\%\) on the datasets except the Slashdot and Epinions datasets. For these two datasets, although the F1-score of DistMult is higher than that of MuRWR, our method exhibits the second best performance.

These results indicate that relevance scores computed by MuRWR are effective for predicting edge labels.

5.3 Effects of label weights in MuRWR

We examine the impact of label weights in MuRWR on the performance of the relation inference task. We use MuRWR-F (a version of MuRWR where all weights {wj} are fixed to 1) and MuRWR using the forward stepwise strategy as described above.

We measure the performance of the relation inference task in terms of accuracy and F1-score.

As seen in Figure 5, the inference performance of MuRWR is better than that of MuRWR-F in terms of accuracy and F1-score, respectively.

Figure 5
figure 5

Effect of the label weights in MuRWR. The performance of MuRWR is better than that of MuRWR-F (i.e., MuRWR where all label weights are fixed to 1), verifying that introducing the weights improves the performance of MuRWR

This indicates that adjusting the label weights is helpful for the performance of MuRWR in the relation inference task.

5.4 Effects of restart probability in MuRWR

We investigate the effect of restart probability c of MuRWR. We measure the inference performance of MuRWR in terms of accuracy varying c from 0.01 to 0.99. As shown in Figure 6, the performance of MuRWR with \(c = 0.15 \sim 0.3\) is better than that of MuRWR when c is too low or high. Note that c controls how far the surfer walks from source node s. If a value of c is high, the surfer frequently jumps back to the source node; thus, the relevance score between node s and a target node t are highly affected by paths of short length from s to t, restricting the model’s complexity severely. Hence, an extremely high value of c such as 0.99 has a bad influence on the performance of MuRWR. On the other hand, a too low value of c such as 0.01 also does not have a positive impact on relational reasoning since such a low value effectively ignores the source vertex. With a proper value of c between 0.15 and 0.3, MuRWR provides the best performance, and outperforms other methods as shown in Tables 3 and 4.

Figure 6
figure 6

Effect of the restart probability c in MuRWR. The inference performance of MuRWR with \(c=0.15 \sim 0.3\) is better than that of MuRWR when c is too low or high

5.5 Convergence of MuRWR

We explore the convergence behavior of MuRWR described in Theorem 1. We vary the restart probability c between 0.1 and 0.9, and measure the residual δ(t) in Algorithm 3 until convergence with error tolerance 𝜖 = 10− 14. As seen in Figure 7, the residual δ(t) monotonically decreases for 0 < c < 1 as step t increases, and then, δ(t) finally becomes less than 𝜖. Another observation is that a high value of c accelerates the convergence rate for the residual δ(t), e.g., c = 0.9 makes MuRWR converge faster than c = 0.1. The reason is that by Theorem 1, δ(t) ≤ 2(1 − c)t; thus, the higher c is, the faster the residual δ(t) goes toward zero. Note that an extremely high value of c such as 0.99 degrades the performance of MuRWR as shown in Figure 6, while a too low value of c requires many iterations to converge as shown in Figure 7. A value of c between 0.15 and 0.3 provides a good trade-off between inference performance and the number of iterations to converge.

Figure 7
figure 7

Convergence of MuRWR. The residual δ(t) in Algorithm 3 monotonically decreases in the datasets for 0 < c < 1 where c is the restart probability in MuRWR

6 Conclusion

We propose MuRWR (Multi-Labeled Random Walk with Restart), a novel random walk based model which accurately infers edge labels between nodes by computing relevance scores between a source node and other nodes for each edge label in edge-labeled graphs. We introduce a labeled random surfer to consider labeled edges for multi-hop relational reasoning. We provide a learning procedure based on label transitive relationships inherent in a given graph, and theoretically analyze our method including its convergence. We also show that MuRWR is a generalized version of RWR.

Experiments show MuRWR gives the best accuracy in the relation inference task. Future works include learning the label weights of MuRWR from a given edge-labeled graph, and extending the method for graphs with complex node labels.