Keywords

1 Introduction

Various disciplines are now able to capture different level of interactions between entities of their interest, which promotes multiple types of relationships within data. Examples include social networks [8, 17], biological networks [3, 9], transportation networks [1, 4], etc. Multi-relational graphs are convenient for representing such complex network-structured data. Recent years have witnessed a strong line of relational learning studies focusing on the inference of node-level and graph-level categorical features [6]. Most of these are working on simple graphs and there has been little interest in the regression of continuous node features across the graph. In particular, node regression on multi-relational graphs still remains unexplored.

In this study, we present a multi-relational node regression framework. Given multi-relational structure of data and partially observed continuous features belonging to the data entities, we aim at completing missing features. It is possible to encode intrinsic structure of the data by a graph accommodating multiple types of directed edges between graph’s nodes that represent the data entities. Accordingly, we establish the main research question we address: How can we achieve node-value imputation on a multi-relational and directed graph? For this purpose, we propose an algorithm which propagates observed set of node features towards missing ones across a multi-relational and directed graph. We take inspiration from the well-known label propagation algorithm [20] aiming at completing categorical features across a simple, weighted graph. We see that simple neighborhood aggregations operated on a given relational structure hold the basis for many iterative graph algorithms including the label propagation. Thus, we first break down the propagation framework by the neighborhood aggregations derived through a simple local generative model. Later, we extend this by incorporating a multi-relational neighborhood and suggest a relational local generative model. Then, we build our algorithm, which we call multi-relational propagation (MrP), by iterative neighborhood aggregation steps originating from this new model. We provide the derivation of the parameters of the proposed model, which can be estimated over the observed set of node features. Our method can be considered as a sophisticated version of the standard propagation algorithm by enabling regression of continuous node features over a multi-relational and directed graph. We compare our multi-relational propagation method against the standard propagation in several node regression scenarios. At each, our approach enhances the results considerably by integrating multi-relational structure of data into the regression framework.

Fig. 1.
figure 1

A fragment of a multi-relational and directed social network

Comparison to Existing Schemes. The node regression problem has been studied on simple graphs for signal inpainting [7, 14] and node representation learning [10, 12, 13, 18]. Many of these approaches implicitly employ a smoothness prior which promotes similar representations at the neighboring nodes of the graph [19]. The smoothness prior exploited in node representation learning studies broadly prescribes minimizing the Euclidean distance between features at the connected nodes. Throughout the paper, we refer to such prior as \(\ell _2\) sense smoothness. Despite its practicality, \(\ell _2\) sense smoothness prior suffers from several major limitations that might mislead regression on a multi-relational and directed graph. First, it treats all neighbors of a node equally while reasoning about the node’s state even though the neighbors connected via different types of relations might play different roles in the inference task. For instance, Fig. 1 illustrates multiple types of relationships that might arise between people. Here, each relation type presumably relies on a different affinity rule or a different level of importance depending on the node regression task. Second, some relation types are inherently symmetric, and some others are asymmetricFootnote 1. Euclidean distance minimization broadly assumes that values at neighboring nodes are as close as possible, which may not always be the case for asymmetric relationships. We thus depart from the straightforward \(\ell _2\) sense of smoothness and augment the prior with a relational local generative model.

Contributions. In this study, (i) we provide a breakdown of propagation algorithm on simple graphs from the Bayesian perspective, (ii) we introduce a relational local generative model, which permits neighborhood aggregation operation on a multi-relational, directed neighborhood, (iii) we propose a novel propagation framework MrP, which properly handles propagating observed continuous node features across a multi-relational directed graph and complete missing ones.

2 Propagation on Simple Graphs

We denote a simple, undirected graph by \(\mathcal {G}(\mathcal {V},\mathcal {E})\) with set of nodes \(\mathcal {V}\) and set of edges \(\mathcal {E}\). Also, we denote \(x_i \in \mathbb {R}\) as the continuous node featureFootnote 2 held by node-i.

Local Generative Model. We recall the smoothness prior prescribing the neighboring node representations to be as close as possible in terms of \(\ell _2\)-norm. Consequently, we write a simple local generative model which relates two neighboring nodes as \(x_i = x_j + \epsilon \) where \((i,j) \in \mathcal {E}\) and \(\epsilon \sim \mathcal {N}(0, \sigma ^2_{ij})\).

First-Order Bayesian Estimate of Node’s Value. The local generative model can be used to obtain an approximation of the node’s state in terms of its local neighborhood. This can be achieved by maximizing the expectation of the node’s feature given that of its first-hop neighbors:

$$\begin{aligned} \mathop {\mathrm {argmax}}\limits _{x_i} \quad \mathrm {p}(x_i| \{x_j : (i,j) \in \mathcal {E}\}) = \mathop {\mathrm {argmax}}\limits _{x_i} \quad \frac{\mathrm {p}(\{x_j : (i,j) \in \mathcal {E}\}|x_i)\mathrm {p}(x_i)}{\mathrm {p}(\{x_j : (i,j) \in \mathcal {E}\})}, \end{aligned}$$
(1)

where Bayes’ rule applies. Here, we make two assumptions. First, we assume that the prior distribution on the node features, \(\mathrm {p}(x_i) \, \forall i \in \mathcal {V}\), is uniform. Second, we only consider the partial correlations between the central node—whose state is to be estimated—and its first-hop neighbors while we neglect any partial correlation among the neighbor set—conditionally independence assumption. Accordingly, we reformulate the problem as

$$\begin{aligned} \mathop {\mathrm {argmax}}\limits _{x_i} \, \prod _{(i,j) \in \mathcal {E}}\mathrm {p}(x_j|x_i) \quad = \mathop {\mathrm {argmin}}\limits _{x_i} \, - \sum _{(i,j) \in \mathcal {E}}\mathrm {log}(\mathrm {p}(x_j|x_i), \end{aligned}$$
(2)

and rewrite it as minimization of negative log-likelihood. Next, we plug in the local generative model and obtain the following problem:

$$\begin{aligned} \mathop {\mathrm {argmin}}\limits _{x_i} \quad \sum _{(i,j) \in \mathcal {E}}\frac{\Vert x_j - x_i\Vert _2^2}{\sigma _{ij}^2}. \end{aligned}$$
(3)

Neighborhood Aggregation. The first-order Bayesian estimate boils down to minimizing the Euclidean distance between node’s feature to that of its neighbors, i.e., suggesting a least squares problem in (3). Its solution is simply found by setting the gradient of the objective to zero:

$$\begin{aligned} \hat{x}_i = \frac{\sum _{(i,j) \in \mathcal {E}} \omega _{ij} x_j}{\sum _{(i,j) \in \mathcal {E}} \omega _{ij}}, \end{aligned}$$
(4)

where \(\omega _{ij} = 1/\sigma _{ij}^2\). As seen, it is a linear combination of the neighbors’ features. The first-order Bayesian estimation in the conditions considered above clarifies the neighborhood aggregation operation accomplished in one iteration of a propagation algorithm [20]. Estimating the node states across the whole graph iteratively, a propagation algorithm expands the scope of the approximation beyond the first-order until a stopping criterion is satisfied. Hence, we summarize the pipeline for developing a propagation algorithm as given in Fig. 2.

Fig. 2.
figure 2

Overview of the pipeline for the development of a propagation algorithm

3 Multi-relational Model

We now introduce a multi-relational and directed graph as \(\mathcal {G}(\mathcal {V},\mathcal {E}, \mathcal {P})\), where \(\mathcal {V}\) is the set of nodes, \(\mathcal {P}\) is the set of relation types, \(\mathcal {E} \subseteq \mathcal {V} \times \mathcal {P} \times \mathcal {V}\) is the set of multi-relational edges. The function \(\mathtt {r}(i,j)\) returns the relation type \(\mathtt {p} \in \mathcal {P}\) that is pointed from node j to node i. If such a relation exists between them, yet pointed from the node i to the node j, the function returns the reverse as \(\mathtt {p}^{-1}\).

Relational Local Generative Model. It is required to diversify the simple local generative model by the set of relationships existing on a multi-relational graph. To this end, we propose the following local generative model for the node’s state given its multi-relational and directed neighbors:

$$\begin{aligned} x_i = \Bigg \{ \begin{array}{l} \eta _{\mathtt {p}} x_j + \tau _{\mathtt {p}} + \epsilon , \quad \forall \mathtt {r}(i,j) = \mathtt {p} \text { where } \epsilon \sim \mathcal {N}(0,\sigma _{\mathtt {p}}^2)\\ \frac{x_j}{\eta _{\mathtt {p}}} - \frac{\tau _{\mathtt {p}}}{\eta _{\mathtt {p}}} + \epsilon , \quad \forall \mathtt {r}(i,j) = \mathtt {p}^{-1} \text { where } \epsilon \sim \mathcal {N}(0,\frac{\sigma _{\mathtt {p}}^2}{\eta _{\mathtt {p}}^2}). \end{array} \end{aligned}$$
(5)

This model builds a linear relationship between neighboring nodes by introducing relation-dependent scaling parameter \(\eta \) and a shift parameter \(\tau \). The latter case in (5) indicates the generative model yielded by the reverse relation, where the direction of the edge is reversed. Such a linear model conforms both symmetric and asymmetric relationships. This is because it can capture any bias over a certain relation through parameter \(\tau \) or any change in scale through parameter \(\eta \). We note that the default set for these parameters are suggested as \(\tau = 0, \eta = 1\), which boils down to the simple local generative model.

First-Order Relational Bayesian Estimate. We now estimate the node’s state by its first-hop neighbors connected via multiple types relationships. We repeat the same assumptions as in (1), which casts the problem as maximizing the likelihood of node’s first-hop neighbors. Once the likelihood of relational neighbors is expressed through the model in (5), the estimation can be found by minimizing the negative log-likelihood as in (2). Consequently, we obtain the following objective:

$$\begin{aligned} \mathop {\mathrm {argmin}}\limits _{x_i} \, \sum _{\mathtt {p} \in \mathcal {P}}\Bigg ( \sum _{\mathtt {r}(i,j) = \mathtt {p}} \frac{\omega _\mathtt {p}}{2}\Big (x_i - \eta _\mathtt {p}x_j -\tau _\mathtt {p}\Big )^2 +\sum _{\mathtt {r}(i,j) = \mathtt {p}^{-1}} \frac{\omega _\mathtt {p}\eta _\mathtt {p}^2}{2}\Big (x_i - \frac{x_j}{\eta _\mathtt {p}} +\frac{\tau _\mathtt {p}}{\eta _\mathtt {p}}\Big )^2 \Bigg ), \end{aligned}$$
(6)

where we apply a change of parameter \(\omega _p =1/\sigma ^2_{\mathtt {p}}\).

Relational Neighborhood Aggregation. For an arbitrary node \(i \in \mathcal {V}\), we denote the loss to be minimized as \(\mathcal {L}_i\). Such a loss leads to a least squares problem whose solution satisfies \(\frac{\partial \mathcal {L}_i}{\partial x_i}(\hat{x}_i) = 0\). Accordingly, the estimate can be found as

$$\begin{aligned} \hat{x}_i = \frac{ \sum _{\mathtt {p} \in \mathcal {P}}\Big ( \sum _{\mathtt {r}(i,j) = \mathtt {p}} \, \omega _\mathtt {p}\Big (\eta _\mathtt {p}x_j +\tau _\mathtt {p}\Big )+ \sum _{\mathtt {r}(i,j) = \mathtt {p}^{-1}} \, \omega _\mathtt {p}\eta ^2_\mathtt {p}\Big (\frac{x_j}{\eta _\mathtt {p}} -\frac{\tau _\mathtt {p}}{\eta _\mathtt {p}}\Big )\Big )}{\sum _{\mathtt {p} \in \mathcal {P}}\Big ( \sum _{\mathtt {r}(i,j) = \mathtt {p}}\, \omega _\mathtt {p}+ \sum _{\mathtt {r}(i,j) = \mathtt {p}^{-1}}\, \omega _\mathtt {p}\eta ^2_\mathtt {p}\Big )}. \end{aligned}$$
(7)

3.1 Estimation of Relational Parameters

The parameters of the local generative model associated with relation type \(\mathtt {p} \in \mathcal {P}\) are introduced as \(\{\tau _\mathtt {p}, \eta _\mathtt {p}\, \omega _\mathtt {p}\}\). These parameters can be estimated over the set of node pairs connected to each other by relation \(\mathtt {p}\), i.e., \(\big \{(x_i,x_j) \, \forall i,j \in \mathcal {V}\, | \, \mathtt {r}(i,j)=\mathtt {p}\big \}\). For this purpose, we carry out the maximum likelihood estimation over the parameters:

$$\begin{aligned} \mathop {\mathrm {argmax}}\limits _{\tau _\mathtt {p}, \eta _\mathtt {p}\, \omega _\mathtt {p}} \quad \mathrm {p}\Big ( \big \{(x_i,x_j) \, \forall i,j \in \mathcal {V}\, | \, \mathtt {r}(i,j)=\mathtt {p} \big \} \, \big | \, \tau _\mathtt {p}, \eta _\mathtt {p}\, \omega _\mathtt {p} \Big ). \end{aligned}$$
(8)

Then, we conduct an approximation over the node pairs that are connected by a given relation type while neglecting any conditional dependency that might exist among these node pairs. Hence, we can write the likelihood on each node pair in a product as follows:

$$\begin{aligned} \mathop {\mathrm {argmax}}\limits _{\tau _\mathtt {p}, \eta _\mathtt {p}\, \omega _\mathtt {p}} \quad \prod _{\mathtt {r}(i,j)=\mathtt {p}} \mathrm {p}\Big ( (x_i,x_j)\, \big | \, \tau _\mathtt {p}, \eta _\mathtt {p}\, \omega _\mathtt {p} \Big ). \end{aligned}$$
(9)

We proceed with the minimization of negative log-likelihood to solve (9). The reader might recognize that its solution is equivalent to the parameters of a linear regression model [15]. This is simply because we introduce linear generative models (5) for the relationships existing on the graph. Therefore, the parameters can be found as follows (\(\mu = \mathrm {mean}(\mathbf {x})\) is the mean of node values):

(10)

Local Generative Model and Local Operation. We summarize the local generative model, the associated loss and the first order estimate in simple and multi-relational cases in Table 1. In the multi-relational case, the neighborhood aggregation is not directly a weighted average of the neighbors but the neighbors are subject to a transformation with respect to the type and the direction of their relation to the central node. The relational transformation is controlled by the parameters \(\eta \) and \(\tau \). For this reason, in Table 1 we use the following functions as shortcuts for the transformations applied on the neighbors: \(f(x) = x\) in simple case—no actual transformation applied, and \(f_\mathtt {p}(x) = \eta _\mathtt {p}x + \tau _\mathtt {p}\) in relational case for type \(\mathtt {p}\). In addition, \(\mathcal {P}^{-1} = \{\mathtt {p}^{-1}, \forall \mathtt {p} \in \mathcal {P}\}\) denotes the set relation types where the edge direction is reversed. For the reversed relationships, the set of parameters can be simply set as \(\eta _{\mathtt {p}^{-1}} = 1/{\eta _{\mathtt {p}}}\), \(\tau _{\mathtt {p}^{-1}} = \tau _{\mathtt {p}}/{\eta _{\mathtt {p}}}\), \(\omega _{\mathtt {p}^{-1}} = \eta _\mathtt {p}^2\omega _\mathtt {p}\). Subsequent to the transformations, the estimation is computed by a weighted average that is controlled by the parameter \(\omega \). It is worth to notice that \(\omega \) is set as the inverse of the error variance of the relational local generative model (5). Therefore, the estimate can be interpreted as the outcome of an aggregation with precision that ranks the relational information.

Table 1. Local generative model and operation

3.2 Multi-relational Propagation Algorithm

In Fig. 2, the propagation algorithm is depicted as an iterative neighborhood aggregation method where each iteration computes the solution of a first-order Bayesian estimation problem. Similarly, we propose a propagation algorithm that relies on the first-order relational Bayesian estimate that is introduced in (6). The algorithm operates iteratively where the relational neighborhood aggregation (7) is accomplished at each node of the graph simultaneously. Thus, we denote a vector \(\mathbf {x}^{(k)} \in \mathbb {R}^N\) composing the values at iteration-k over the set of nodes for \(|\mathcal {V}| = N\). Next, we express the iterations in matrix-vector multiplication format.

Iterations in Matrix Notation. We denote matrix \(\mathbf {A}_\mathtt {p}\) for encoding the adjacency pattern of relation type \(\mathtt {p}\). It is \((N \times N)\) asymmetric matrix storing incoming edges on its rows and outgoing edges on its columns. One can compile aggregations in (7) simultaneously over the entire graph using a matrix notation. Then, the relational local operations at iteration-k can be expressed as

$$\begin{aligned}&\mathbf {x}^{(k)} = \Bigg ( \sum _{\mathtt {p}\in \mathcal {P}} \bigg ( \omega _\mathtt {p}\big (\eta _\mathtt {p}\mathbf {A}_\mathtt {p}\mathbf {x}^{(k-1)}+ \tau _\mathtt {p}\mathbf {A}_\mathtt {p}\mathbf {1}\big )+ \omega _\mathtt {p}\eta _\mathtt {p} \big (\mathbf {A}_\mathtt {p}^\top \mathbf {x}^{(k-1)}- \tau _\mathtt {p}\mathbf {A}_\mathtt {p}^\top \mathbf {1}\big ) \bigg ) \Bigg ) \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \odot \Bigg (\sum _{\mathtt {p}\in \mathcal {P}} \bigg ( \omega _\mathtt {p}\mathbf {A}_\mathtt {p}\mathbf {1} + \omega _\mathtt {p}\eta _\mathtt {p}^2\mathbf {A}_\mathtt {p}^\top \mathbf {1} \bigg ) \Bigg )^{-1}, \end{aligned}$$
(11)

where \(\mathbf {1}\) is the vector of ones, \(\odot \) stands for element-wise multiplication. In addition, the inversion on the latter sum term is applied element-wise. This part, in particular, arranges the denominator in Equation (7) in vector format. Thus, it can be seen as the normalization factor over the neighborhood aggregation. For the purpose of simplification, we re-write (11) as

$$\begin{aligned} \mathbf {x}^{(k)} = (\mathbf {T}\mathbf {x}^{(k-1)} + \mathbf {S}\mathbf {1}) \odot (\mathbf {H}\mathbf {1})^{-1}, \end{aligned}$$
(12)

by introducing the auxiliary matrices

$$\begin{aligned} \mathbf {T} = \sum _{\mathtt {p} \in \mathcal {P}} \eta _\mathtt {p}\omega _\mathtt {p} (\mathbf {A}_\mathtt {p} + \mathbf {A}_\mathtt {p}^\top ),\, \mathbf {S} = \sum _{\mathtt {p} \in \mathcal {P}} \tau _\mathtt {p}\omega _\mathtt {p} (\mathbf {A}_\mathtt {p} - \eta _\mathtt {p} \mathbf {A}_\mathtt {p}^\top ),\, \mathbf {H} = \sum _{\mathtt {p} \in \mathcal {P}} \omega _p(\mathbf {A}_\mathtt {p} + \eta _\mathtt {p}^2 \mathbf {A}_\mathtt {p}^\top ). \end{aligned}$$
(13)

Algorithm. Given the iterations above, we now formalize the Multi-relational Propagation algorithm (MrP). MrP targets a node-level completion task where the multi-relational graph \(\mathcal {G}\) is a priori given and the nodes are partially labeled at \(\mathcal {U} \subseteq \mathcal {V}\). To manage propagation of continuous values at the labeled set of nodes towards the unlabeled, we introduce an indicator vector \(\mathbf {u} \in \mathbb {R}^N\), which encodes the labeled nodes. It is initialized as \(\mathbf {u}^{(0)}_i = 1, \text { if } i \in \mathcal {U}, \text { else } 0\). Then, the vector \(\mathbf {x}\) stores the node values throughout the iterations. It is initialized with the values over \(\mathcal {U}\), and zero-padded at the unlabeled nodes, i.e., \(\mathbf {x}^{(0)}_i = 0 \text { if } i \in \mathcal {V} \setminus \mathcal {U}\).

Similar to the label propagation [20], our algorithm fundamentally consists of aggregation and normalization steps. In order to encompass multi-relational transformations during aggregation, we formulate an iteration of MrP by the steps of aggregation, shift and normalization. In addition, similar to the Page-rank algorithm [5], we employ a damping factor \(\xi \in [0,1]\) to update a node’s state by combining its value from the previous iteration. We provide a pseudocode for MrP in Algorithm 1. The propagation parameters for each relation type, \(\{\tau _\mathtt {p}, \eta _\mathtt {p}, \omega _\mathtt {p}\}\) are estimated over the labeled set of nodes \(\mathcal {U}\), as described in Sect. 3.1 and given to the algorithm as input together with the adjacency matrices encoding the multi-relational, directed graph. Steps 1–4 in Algorithm 1 are essentially responsible for the multi-relational neighborhood aggregation. At Step-5, nodes’ states are updated based on the collected information from the neighbors. If valid information collected from neighbors and the node is labeled, then we employ the damping ratio, \(\xi \), to update node’s state. This adjusts amount of trade-off between the neighborhood aggregation and the previous state of the node. We distinguish whether an arbitrary node is currently labeled or not by the indicator vector, \(\mathbf {u}^{(k)}\), which keeps track of propagated nodes throughout the iterations and ensures that the normalization complies with valid collected neighborhood information. Hence, at Step 6, we update it as well. Finally, at Step 7, we clamp labeled set of nodesFootnote 3 by leaving their values unchanged, simply because they store the governing information for completing the missing features. The algorithm terminates when all the nodes are propagated and the difference between two consecutive iterations is under a certain threshold. Accordingly, the number of iterations is related to the choice of hyperparameter \(\xi \) and the stopping criterion. MrP is implemented using PyTorch-scatter packageFootnote 4, which efficiently computes neighborhood aggregation on a sparse relational structure. Thus, the aggregation steps require \(2|\mathcal {E}|\) operations, then, normalization and update steps require \(|\mathcal {V}|\) operations at each iteration. Therefore, MrP scales linearly with the number of edges in the graph, similar to the standard label propagation algorithm (LP). We finally note that by setting \(\tau _\mathtt {p} = 0, \, \eta _\mathtt {p}=1,\, \omega _\mathtt {p}=1 \, \forall \mathtt {p} \in \mathcal {P}\), MrP drops down to LPFootnote 5 as if we operate on a simple graph regardless of the relation types and directions.

figure a

4 Experiments

We now present a proof of concept of the proposed multi-relational propagation method in several different node regression scenarios in different settings. We first test MrP in estimating weather measurements on a multi-relational and directed graph that connects the weather stations. Then, we evaluate the performance in predicting people’s date of birth, where people are connected to each other on a social network composing different types relationships.

In the experiments, the damping factor is set as \(\xi = 0.5\), then the threshold for terminating the iterations is set as 0.001 of the range of given values. As evaluation metrics, we use root mean square error (RMSE), mean absolute percentage error (MAPE) and a normalized RMSE (nRMSE) with respect to the range of groundtruth values. We calculate them over the estimation error on the unlabeled set of nodes. In the experiments, we leave the parameter \(\eta \) in MrP as default by 1 since we do not empirically observe a scale change over the relation types given by the datasets we work on. Then, we estimate the parameter \(\tau \) and \(\omega \) for each relation type based on the observed set of node values as described in Sect. 3.1.

4.1 Multi-relational Estimation of Weather Measurements

We test our method on a meteorological dataset provided by MeteoSwiss, which compiles various types of weather measurements on 86 weather stations between years 1981–2010Footnote 6. In particular, we use yearly averages of weather measurements in our experiments.

Fig. 3.
figure 3

Distribution of change in temperature measurements between weather stations that are related via altitude proximity in ascend and descend direction.

Construction of Multi-relational Directed Graph. To begin with, we prepare a multi-relational graph representation \(\mathcal {G}(\mathcal {V},\mathcal {E},\mathcal {P})\) of the weather stations, i.e., \(|\mathcal {V}| = 86\), where we relate them based on two types of relationships, i.e., \(|\mathcal {P}| = 2\). First, we connect them based on geographical proximity by inserting an edge between a pair of weather stations if the Euclidean distance between their GPS coordinates is below a threshold. The geographical proximity leads to a symmetric relationship where we acquire 372 edges. Second, we relate them based on the altitude proximity in a similar logic yet we anticipate an asymmetric relationship where the direction of an edge indicates an altitude ascend from one station to another. For both relation types, we adjust the threshold for building connections so that there is not any disconnected node. Consequently, we acquire 1144 edges for the altitude relationship. In the experiments, we randomly sample labeled set of nodes, \(\mathcal {U}\), from the entire node set, \(\mathcal {V}\), with a ratio of \(80\%\). Then, we repeat the experiment in this setting for 50 times in Monte Carlo fashion. The evaluation metrics are then averaged over the series of simulations.

Table 2. Temperature and snowfall prediction performances
Table 3. Precipitation prediction performances

Predicting Temperature and Snowfall on Directed Altitude Graph. We first conduct experiments on a simple scenario where we target predicting temperature and snowfall measurements using altitude relations. We compare the proposed method to the standard label propagation algorithm, LP, which overlooks asymmetric relational reasoning. Thus, we aim at evaluating the directional transformation utility of MrP during the neighborhood aggregation, which is mainly gained by the parameter \(\tau \) and \(\eta \). We visualize the distribution of measurement changes on the altitude edges in Fig. 3. Here, the parameter \(\tau \) directly corresponds to the mean measurement difference computed along the directed altitude edges since \(\eta = 1\). We fit radial basis function (RBF) to the distribution since the residual error in local generative model (5) is assumed to be normal. Then, the parameter \(\omega \) is simply associated with the inverse of its variance. We see that the temperature differences in the ascend direction, i.e., \(\big \{(x_i - x_j) \, \forall \mathtt {r}(i,j) = \mathtt {altitude\_ascend}\big \}\), has a mean in the negative region. This signifies an expected decrease in temperature values along altitude ascend.

As seen in Table 2, even in the case of single relation type—altitude proximity, incorporating the directionality, MrP manages to enhance predictions over the regression realized by the label propagation, LP.

Predicting Precipitation on Directed, Multi-relational Graph. We test MrP in another scenario where we integrate both altitude and geographical proximity relations to predict precipitation measurements on the weather stations. The prediction performance is compared to the regression by LP, that is accomplished over the altitude relations and GPS relations separately. Since MrP handles both of the relation types and the direction of the edges simultaneously, it achieves a better performance than LP, as seen in Table 3.

4.2 Predicting People’s Date of Birth in a Social Network

We also conduct experiment on a small subset of a relational database called Freebase [16]. We work on a graph \(\mathcal {G}(\mathcal {V},\mathcal {E}, \mathcal {P})\) composing 830 people, i.e., \(|\mathcal {V}|= 830\), connected via 8 different types of relationship, i.e., \(|\mathcal {P}| = 8\). Here, the task is to predict people’s date of birth while it is only known for a subset of people. A fragment of the multi-relational graph is illustrated in Fig. 1, where two asymmetric relationships exist: influenced_by and parent. Table 4 summarizes the statistics for each. For symmetric relationships, the date of birth difference is counted along both directions of an edge, which sets the mean to zero. Also, the distributions over certain relation types are visualized in Fig. 4.

Table 4. Statistics for each relationship, columns respectively: number of edges, mean and variance of ‘date of birth’ difference over the associated relation type.
Table 5. Date of birth prediction performances
Fig. 4.
figure 4

Distribution of ‘date of birth’ difference (year) over different relationships. An RBF is fitted to each.

In the experiments, we randomly select the set of people whose date of birth is initially known, \(\mathcal {U}\), with a ratio of \(50\%\) in \(\mathcal {V}\). We again report the evaluation metrics that are averaged over a series of experiments repeated for 50 times. We compare performance of MrP to the regression of date of birth values obtained with label propagation LP. We run LP over the edges of each relation type separately and also at the union of those. The results are given in Table 5. Based on the results, we can say that the most successful relation types for predicting the date of birth seems to be influenced_by and spouse using LP. Nonetheless, when LP operates on the union of the edges provided by different type of relationships, it performs better than any single type. Moreover, MrP is able to surpass this record by enabling a relational neighborhood aggregation over different types of edges. Once again, we argue that its success is due to the fact that it regards asymmetric relationships, here encountered as influenced_by and parent. In addition, it assigns different level of importance to the predictions collected through different type of relationships based on the uncertainty estimated over the observed data.

5 Conclusion

In this study, we proposed MrP, a propagation algorithm working on multi-relational and directed graphs for regression of continuous node features and we show its superior performance on multi-relational data compared to standard propagation algorithm. It is possible to generalize the proposed approach for node embedding learning and then for the node classification tasks. The augmentation of the computational graph of the propagation algorithm using multiple types of directed relationships provided by the domain knowledge permits anisotropic operations on graph, which is claimed to be promising for future directions in graph representation learning [11].