Keywords

1 Introduction

Nowadays, knowledge graphs play a very important role in natural language processing [27], recommendation systems [21] and question answering [8]. However, the existing knowledge graphs are incomplete, so the relation prediction task is required for reasoning and completion. The relation prediction on the knowledge graph is divided into transductive and inductive. Transductive relation prediction [1, 3, 16] learns and operates on latent representations (i.e., embeddings) of entities and relations in a knowledge graph. However, this method can only make relation predictions for entities that appear in the training set, and cannot represent unseen entities. On the contrary, inductive relation prediction [6, 17, 19] is entity-independent, and this approach can make relation prediction for entities that are not present in the training set. For the real world, existing knowledge graphs cannot cover all entities, so the problem of relation prediction for unseen entities has been paid more and more attention by researchers.

Existing models of inductive relation prediction mainly predict missing relations by learning logical rules in knowledge graphs. At present, there are mainly two types of methods for learning logic rules. Rule-based learning explicitly mines logical rules based on co-occurrence patterns of relations. Inductive relation prediction by local subgraph structures, such as GraIL [17], implicitly learns the logical rules in subgraph based on Graph Neural Networks (GNN) [7, 12, 14]. More recently, TACT [2] classifies relation pairs in subgraphs into several patterns, and incorporates these messages into the representation of relations.

Although subgraph-based models have shown inductive learning capability in validating unseen nodes, there are some disadvantages. First, many inference paths are disconnected due to the incompleteness of the knowledge graphs. Therefore, the subgraph will miss a lot of neighboring relation information. Taking Fig. 1 as an example, for entity “GSW”, neighboring relations “coach_of” and “belong_to” are not connected to the tail node so the representation of “GSW” lacks these relation information. Moreover, existing methods do not take into account connection structures between relations in entity representation. For example, in Fig. 1, the relation “part_of” has three connection structures such as “parallel”, “tail-to-head” and “tail-to-tail” with predicted relation “located_in” and these connection structures have different effects on the representation of “GSW”. In this way, the representations of nodes “GSW" and “Californla" with topology information are obtained, respectively. Then, combine the embedding of “located_in" into the scoring function to get the likelihood of this triple.

Fig. 1.
figure 1

An example in knowledge graphs.

To address these disadvantages, we propose a novel entity representation by Neighboring Relations Topology Graph (NRTG) for inductive relation prediction. Specifically, the NRTG extracts all neighboring triples and then divides the connection structures between relations into six topological patterns. Therefore, our method can capture relation information of neighboring triples and connection structures between relations by NRTG.

For predicted triples, our model consists of the following stages: (1) constructing NRTG via relations topology module. (2) getting head and tail entity representations of predicted triple via information aggregation module based on GNN [7]. (3) inputting head and tail entity representations and embedding the predicted relation into the scoring network to obtain the predicted triples score.

Our contributions are as follows. First of all, we propose a novel framework that uses two graph structures to represent the head and tail entities of predicted triples separately. This framework can more completely mine the logical information implied by the head and tail entities in the knowledge graph. Secondly, we design Neighboring Relations Topology Graph (NRTG) to capture the semantic information of connection structures among relations. Finally, it significantly outperforms existing inductive relation prediction methods on benchmark datasets.

The remainder of this article is structured as follows. Related works are introduced in Sect. 2. The specific details of our method are introduced in Sect. 3. The experiments used to analyze and verify the effectiveness of our method in Sect. 4. Section 5 concludes this article and proposes future works.

2 Related Work

At present, there are two main methods for relation prediction on knowledge graphs. One is the rule learning-based methods, and the other is the embedding-based reasoning methods:

Rule Learning-Based Methods. Rule-based methods [4] learn logical rules by relational co-occurrence patterns of knowledge graphs. Because these logic rules are independent of entities, these methods can predict relations between unseen entities. Despite the fact that these methods are inherently inductive, these methods are difficult to scale to large datasets. Recently, NeuralLP [23] proposed an end-to-end framework to address scalability issues. Based on NeuralLP, DRUM [13] can mine more correct logic rules. However, These logical rules cannot learn the complex topological structure between relations.

Embedding-Based Methods. Most of the existing methods are embedding-based methods such as TransE [1], ConvE [3], ComplEx [20] and RotatE [16], which is to learn a low-dimensional embedding vector for each entity and relation in a knowledge graph. In recent years, more and more researchers have applied graph neural networks (GNN) [7, 12, 14] to relation prediction, as knowledge graphs naturally have graph structures. Schlichtkrull et al. [15] propose a relational graph neural network that considers the connected relations to represent entities. Afterward, GAT [11] proposes a graph neural network based on an attention mechanism to teaches the representation of entities, which effectively learn the knowledge of neighboring triples. More recently, Zhang et al. [26] proposed a relational graph neural network with hierarchical attention to effectively utilize the neighborhood information of entities in knowledge graphs.

To predict the relation between unseen entities, GraIL [17] reasons via entity-independent local subgraph structures. On the basis of GraIL, TACT [2] considers semantic correlations between relations, and models correlation coefficients of the different semantic correlations into relation representation. Moreover, there are some neural networks [24, 25] that learn topology.

However, these methods have limitations. The incompleteness of the graph can lead to insufficient learning of neighboring relations. Furthermore, these methods are too simplistic to model entity representations since these methods do not take into account the topological structure between neighboring relations and predicted relations.

3 Methods

In this section, we introduce our proposed model. The task of our model is inductive relation prediction, which predicts the relation between unseen entities. For inductive relation prediction, we need to represent entities that have not been seen in the training set. Therefore, our model uses two Neighboring Relations Topology Graphs (NRTGs), in which the nodes represent the relations and the edges represent the connection structures between relations, to represent the head and tail entity respectively. Then our model scores the predicted triple through head representation, tail entity representation and embedding of predicted relation. Our model consists of the following parts: (1) Relations topology module. (2) Information aggregation module based on GNN [7]. (3) Scoring network and Loss function. Figure 2 gives an overview of our model.

Fig. 2.
figure 2

An overview of our model. The framework consists of two modules. The blue vector represents the initial predicted relation embedding. We use a scoring network to score a triple. (Color figure online)

3.1 Relations Topology Module

In order to solve the problems that the existing model does not capture the complete neighboring relations and does not consider connection structures between relations, we design this module to fully mine the implicit logical rules of predicted triple in knowledge graphs in two aspects: neighboring relations extraction and Neighboring Relations Topology Graph (NRTG).

Neighboring Relations Extraction. For existing subgraph-based methods, they assume that the paths connecting the head and tail entity contain the logical information that could represent the predicted triple. Differing from the existing subgraph-based models, we assume that the relations of all neighboring triples imply the logical rules of relation prediction. Because the knowledge graph is incomplete, many reasoning paths are disconnected. Therefore, neighboring relations that do not exist on the reasoning path can also provide a basis for relation prediction. Furthermore, we extract two subgraphs from the knowledge graph to represent the head and tail entity of the predicted triple, respectively. Compared with using an enclosing subgraph to represent triples, our method can better emphasize the logical information implied by entities.

In this module, we extract all n-hop neighboring triples of the head and tail entity, respectively. For example, given a predicted triplet \(\left( u, r_{t}, v \right) \), we iteratively obtain the n-hop neighboring triples of the node u and node v through the breadth-first search(BFS) algorithm. Let \(\mathcal {N} _{n}\left( u \right) \) and \(\mathcal {N} _{n}\left( v \right) \) be set of triples in the n-hop neighborhood of node u and node v in the KG. For existing subgraph-based methods, they compute the enclosing subgraph by taking the intersection, \(\mathcal {N} _{n}\left( u \right) \cap \mathcal {N} _{n}\left( v \right) \), of these k-hop neighborhood sets. However, these models will lack many neighboring triples. Therefore, we respectively use \(\mathcal {N} _{n}\left( u \right) \) and \(\mathcal {N} _{n}\left( v \right) \) to represent node u and node v, which can fully capture the logical rules implied by the neighboring triples of the head and tail entity.

Neighboring Relations Topology Graph. Since the n-hop neighboring triples extracted from the KG do not consider the connection structures between relations, we design the Neighboring Relations Topology Graph (NRTG) to address this problem. Inspired by TACT [2], to model the connection structures between relations of neighboring triples, we categorize relation pairs, consisting of neighboring and predicted relations, into six topological patterns. As illustrated in Fig. 3, there are six connection structures for connected relations in the knowledge graph, namely “head-to-tail”, “tail-to-tail”, “head-to-head”, “tail-to-head”, “parallel”, and “loop”. The connection structures are called topological patterns and they are named “H-T”, “T-T”, “H-H”, “T-H”, “PARA” and “LOOP” respectively.

Fig. 3.
figure 3

An illustration of the transition from connection structure between relations to logical patterns.

Based on the definition of different topological patterns, we can convert the n-hop neighboring triples to NRTG, where the nodes represent the relations and the edges indicate the topological patterns between neighboring relations and predicted relations. For example, the triples \(\left( e_{1}, r_{1}, e_{2} \right) \) and \(\left( e_{2}, r_{2}, e_{3} \right) \) are connected by \(e_{2}\), and their topological pattern is “H-T”. So, we construct a new triple \(\left( r_{1}, H-T, r_{2} \right) \) in NRTG. For n-hop neighboring triples of head and tail entity, \(\mathcal {N} _{n}\left( u \right) \) and \(\mathcal {N} _{n}\left( v \right) \), we can convert the n-hop neighboring triples of entity u and entity v into NRTG in this way, respectively.

In this module, we extract the n-hop neighboring triples of the head and tail entity, respectively, and then convert the neighboring triples into NRTGs. As we can see, the NRTGs not only contain neighboring relations of the entities, but also take into account the connection structures between relations. Therefore, entity representation by NRTG can better mine the logical rules implied by entities predicted in KG. The detailed procedure is presented in Algorithm 1.

figure a

3.2 Information Aggregation Module

Based on the Neighboring Relations Topology Graphs (NRTGs) of the head and tail entity, we design a module to aggregate neighboring relations and topological patterns between relations in NRTGs as entity representations. Specifically, the information aggregation module is based on Relational Graph Convolutional Network (R-GCN) [15], and uses a message passing mechanism [5] in graph neural networks to update node representations. Finally, we use the average pooling of all the latent node representations to represent head, and tail entities of the predicted triple, respectively. As we can see, the entity representation contains the neighboring relations and topological patterns through this module. In this module, the message passing mechanism of node update is mainly divided into message function and aggregation function.

Message Function. The purpose of the message function is to pass the information to update the node in the NRTG. For each target node, it may receive messages from multiple nodes. Inspired by R-GCN [15], we define the message function of the k-th layer as:

$$\begin{aligned} m_{t}^{k} = \sum _{p=1}^{P} \sum _{s\in \mathcal {N} _{r} } a_{t,s}^{k} W_{p}^{k} n_{s}^{k-1}, \end{aligned}$$
(1)

where \(\mathcal {N} _{r}\) is the neighboring relations of predicted triple and P is the topological patterns between relations. \(n_{s}^{k-1}\) represent the node representation of the last layer, and it is represented as the embedding of relation when in the input layer. \(W_{p}^{k}\) represents the transformation matrix of the topological pattern p of the relation pair at the k-th layer. \(a_{t,s}^{k}\) is the edge attention weight at the k-th layer corresponding to the edge connecting nodes s and t via topological patterns. The attention weights of the k-th layer are as follows:

$$\begin{aligned} a_{t,s}^{k} = \sigma \left( W_{1}^{a} \cdot ReLU\left( W_{2}^{a} \left[ n_{s}^{k-1} \oplus n_{t}^{k-1} \oplus n_{p}^{a} \right] \right) \right) . \end{aligned}$$
(2)

Here \(W_{1}^{a}\) and \(W_{2}^{a}\) are the weight parameters in the attention mechanism, respectively. \(n_{p}^{a}\) is the attention vector of topological pattern p. \(\sigma \left( \cdot \right) \) and \(Relu\left( \cdot \right) \) are the activation functions.

Aggregation Function. The purpose of the aggregation function is to update the representation of the node according to the neighboring message. After obtaining the message vector \(m_{t}^{k}\), we update the nodes in the NRTG. The aggregation function of the k-th layer is:

$$\begin{aligned} n_{t}^{k} = \sigma \left( W_{0}^{k} n_{t}^{k-1} + m_{t}^{k} \right) , \end{aligned}$$
(3)

where \(W_{0}^{k}\) is the weight parameters.

We acquire the node representations of the NRTG through the message function and aggregation function. Finally, the representation of the entity is obtained by average pooling of all the latent node representations in the NRTG:

$$\begin{aligned} e^{k} = \frac{1}{\left| \mathcal {V} \right| } \sum _{i\in \mathcal {V} } n_{i}^{k}, \end{aligned}$$
(4)

where \(\mathcal {V}\) denotes the set of vertices in the graph.

In this module, based on two NRTGs, we adopt two identical R-GCN [15] to get the representation of head and tail entities, respectively.

3.3 Scoring Network and Loss Function

Scoring Network. The final step in our framework is to score the likelihood of predicted triples. For the predicted triple \(\left( u, r_{t}, v \right) \), the representations of entity u and entity v is obtained by the information aggregation module, and then we design a scoring network to output scores. The scoring function is defined as:

$$\begin{aligned} f\left( u,r_{t},v \right) = W^{T} [e_{u}^{k} \oplus v_{r_{t} } \oplus e_{v}^{u} ]. \end{aligned}$$
(5)

In the scoring network, we obtain the scoring by a linear layer.

Loss Function. For each triple in the training graph, we sample a negative triple by replacing the head (or tail) entity. Afterward, we train our model to score positive triplets higher than the negative by using noise-contrastive hinge loss [1]. The specific loss function is as follows:

$$\begin{aligned} \mathcal {L} = \sum _{i=1}^{\left| \varepsilon \right| } \textrm{max} \left( 0, f\left( {u}'_{i}, {r}'_{t}, {v}'_{i} \right) -f\left( u, r_{t}, v \right) +\gamma \right) , \end{aligned}$$
(6)

where \(\gamma \) is the margin hyperparameter; \(\varepsilon \) is the set of all triplets in the neighboring relations topology graph. \(\left( {u}'_{i}, {r}'_{t}, {v}'_{i} \right) \) denotes the i-th negative triple of the ground-truth triple \(\left( u, r_{t}, v \right) \).

4 Experiments

In this section, there are the following parts. First, we introduce the experimental setup, such as datasets, training protocol, and evaluation protocol. Second, we compare our model with other approaches on several benchmark datasets. Third, we show the results of ablation studies to verify the effectiveness of our method. At last, we do some experiments to analyze the effect of hops on our model.

4.1 Experimental Setup

Table 1. Statistics of inductive benchmarks. We use #E and #R and #TR to denote the number of entities, relations, and triples, respectively.

Datasets. In order to facilitate inductive testing, the test set needs to contain entities not seen in the training set. Therefore, we use some benchmark datasets for inductive relation prediction proposed in GraIL [17], which are derived from WN18RR [3], FB15k-237 [18], and NELL995 [22]. Specifically, each dataset consists of a pair of graphs: train-graph and ind-test-graph. We randomly select 10% of the edges/tuples in ind-test-graph as test edges. Details of the datasets are summarized in Table 1. The distribution of the six topological patterns in WN18RR and FB15k-237 is relatively uniform, and there are enough training examples. NELL-995 is a dataset with very sparse relationships, in which “PAPR” and “LOOP” are relatively rare. Furthermore, the same relational pairs have different topological patterns in each dataset.

Training Protocol. During training, we set the batch size to 32 and set the epoch to 100. We set the size of relations embedding to 32. In order to represent the entity, we convert 2-hop (or 3-hop) neighboring triples to Neighboring Relations Topology Graph (NRTG) and then use one-layer R-GCN [15] to represent an entity. We use Adam [9] to optimize all the parameters with an initial learning rate set at 0.01.

Evaluation Protocol. In the relation prediction task, the aim is to predict a triple \(\left( u, r_{t}, v \right) \) with u or v missing. We use the area under the precision-recall curve (AUC-PR) and Hits@10 to evaluate our models. To calculate the AUC-PR, we replace the head or tail entity with a random entity to sample the negative triple, and then score the positive triples with an equal number of negative triples. To evaluate Hits@10, We select the top 10 triples among the 50 negative triples, and then calculate the proportion of correct triples.

4.2 Results and Analysis

We validate the models on classification metrics (AUC-PR) and ranking metrics (Hit@10), respectively. Then, we compare our method to several state-of-the-art methods on these metrics, such as NeuralLP [23], DRUM [13], RuleN [10], GraIL [17] and TACT [2].

Table 2. AUC-PR results on the inductive benchmark datasets extracted from WN18RR, FB15k-237 and NELL-995. The best score is in bold and second best score is underlined.
Table 3. H@10 results on the inductive benchmark datasets extracted from WN18RR, FB15k-237, and NELL-995.The best score is in bold and second best score is underlined.

Table 2 shows the mean AUC-PR results, averaged over 5 runs. The results show that our model achieves improvements on WN18RR and FB-237. Especially on the FB-237, the accuracy is improved by an average of 5%. Competitive results are achieved on the NELL-995. Table 3 shows the mean Hit@10 results, averaged over 5 runs. Our model achieves the state-of-the-art results on WN18RR and FB-237, and also competitive results on NELL-995.

As we can see, our model achieves huge improvements on all metrics on WN18RR and FB237. Therefore, our model successfully captures neighboring relations as well as the topological patterns between relations in entity representation. Meanwhile, the improvement is particularly significant on FB237, which indicates that our method can better model complex topological structures between relations. Possible reasons why there is no improvement on NELL-995 are: compared to the other two datasets, the relational connection structures of the NELL-995 dataset are relatively sparse, which makes it difficult for our method to learn the topological patterns.

4.3 Ablation Study

In this part, we conduct ablation experiments to verify the effectiveness of our model. We mainly emphasize the effectiveness of our method by two experiments respectively: (1) Our work w/o NR (2) Our work w/o TP.

Table 4. Ablation results the inductive benchmark datasets extracted from WN18RR, FB15k-237, and NELL-995. The best score is in bold

Our Work w/o NR. In order to learn the logical rules between the target nodes of predicted triples, the existing methods extract paths of head and tail nodes. However, these methods have obvious drawbacks: the incompleteness of the knowledge graph leads to missing paths that disconnect with target nodes, so the model lacks many useful neighboring relations. To verify the effectiveness of neighboring relations extraction in the relations topology module, we perform an ablation experiment with the subgraph construction method proposed by GraIL [17] instead of our method. We called this ablation experiment “Our work w/o NR”.

Our Work w/o TP. In the relations topology module, we classify the connection structures between neighboring relations and predicted relations into six topological patterns, and use the information aggregation module based on R-GCN [15] to represent head and tail entities. To verify that we capture the topological patterns between neighboring relations and predicted relation in the entity representation, we set all topological patterns between relations to 1. We called this ablation experiment “Our work w/o TP”.

Table 4 shows the performance of our method on three datasets. Results show that our model performs better than the two ablated models on three datasets. Experiments demonstrate that our method can more completely capture the relational logic rules for predicting triples, and better represent entities.

4.4 Performance with Different Number of Hops

In this part, on WN18RR_v1 and NELL-995_v1, we extract the 1-hop, 2-hop, 3-hop and 4-hop neighboring triples of the head and tail entities respectively to construct NRTGs for inductive relation prediction. And we report the mean MRR, averaged over 5 runs. In Fig. 4, the performance of the model improves with the increase of the number of hops at the beginning, but after reaching 2 or 3 hops, the performance of the model does not improve, or even declines. The results show that the more hops, the more complete the logical information in the knowledge graph can be learned. However, the more hops will add a lot of noise information, which will reduce the performance of the model. Furthermore, with the number of hops increases, the fluctuation of the MRR value will also increase. Experiments demonstrate that our model can best learn the logical information implicit in the knowledge topology through the 2-hop (or 3-hop) NRTG.

Fig. 4.
figure 4

The effect of different hops on MRR

5 Conclusion

We propose a novel entity representation method for inductive relation prediction. This entity representation method is based on a Neighboring Relations Topology Graph (NRTG), in which the nodes represent relations and the edges represent topological patterns between relations. The NRTG not only implies the logical rules of neighbor, but also is entity-independent. Thus, our model is able to make relation predictions in an inductive setting. Experiments demonstrate that our method significantly outperforms several existing state-of-the-art methods on benchmark datasets for the inductive link prediction task. In the future, we plan to extend our model further to capture the implicit logical rules in Few-shot Relations.