Keywords

1 Introduction

A knowledge graph (KG) [1, 2] is a multi-relational graph consisting of entities (represented as nodes) and relations (represented as edges) between entities. The facts in KG are normally expressed as RDF triples. KGs like YAGO [3] and DBpedia [4, 5] have been widely used in knowledge-based applications such as question answering.

KG embedding [1, 2] is aimed at embedding entities and relations into continuous, low-dimensional vector space for efficiently performing downstream tasks such as link prediction and triple classification. According to the different types of scoring functions, there are two categories of KG embedding models: translational distance models and semantic matching models [1, 2]. TransE [6] is a representative of translation distance models. TransE has flaws in dealing with multi-mapping relations (one-to-many, many-to-one, and many-to-many). Based on TransE, many improved models have been proposed, for example, TransH [7], TransR [8], and TransD [9]. HolE [10], DistMult [11], and ComplEx [12] are typical semantic matching models.

Lv et al. [13] pointed out that existing KG embedding models fail to distinguish between concepts and instances, leading to some problems. Hence, they proposed a new model called TransC, which distinguishes between concepts and instances, and divides the triples in a KG into three disjoint subsets: the instanceOf triple set, the subClassOf triple set, and inter-instance relation triple set. However, TransC embeds concepts, instances, and various relations into the same vector space, which leads to the following problems: (1) The same instance in different triples that model different inter-instance relations is represented as the same vector, resulting in improper representation of different properties possessed by this instance; (2) Multiple instances not belonging to one concept may be located in the sphere representing this concept, resulting in an inaccurate modeling of the instanceOf relations between these instances and the concept.

Based on TransC, this paper proposes a fine-grained KG embedding model called TransFG. TransFG embeds concepts, instances, and relations into different vector spaces and projects the instance vectors from the instance space to the concept space and the relation spaces through dynamic mapping matrices. This causes the projected vectors of the same instance in different triples to have different representations and the projected vectors of multiple instances belonging to the same concept to be spatially close to each other; otherwise they are far away. We used two typical KG downstream tasks, triple classification and link prediction, to compare and evaluate TransFG, TransC, and several KG embedding models including TransE, TransH, TransR, TransD, HolE, DistMult, and ComplEx on the YAGO39K and M-YAGO39K datasets [13]. The experimental results show that on the triple classification task, TransFG outperforms TransC and other typical embedding models in terms of accuracy, precision, recall and F1-score in most cases, and on the link prediction task, TransFG outperforms these compared models in terms of MRR (the mean reciprocal rank of all correct instances) and Hits@N (the proportion of correct instances in the top-N ranked instances) in most cases.

2 TransFG: A Fine-Grained Model

In this section, we expatiate on our proposed TransFG model. We first briefly explain the basic idea of the model, then describe the model in detail, and finally explain the method of model training.

2.1 Basic Idea of TransFG

Like TransC, TransFG divides KG triples into three types: the instanceOf triples, the subClassOf triples, and inter-instance relation triples, and defines different loss functions for each type of triples. The main difference between TransFG and TransC is the improvement of the representations of the instanceOf triples and inter-instance relation triples. This is mainly achieved by embedding concepts, instances, and various relations between them into different spaces and applying corresponding mapping matrices.

For the representation of the instanceOf triples, TransFG projects instance vectors from the instance space to the concept space through dynamic mapping matrices. Let us use Fig. 1 to explain the representation of this type of triples. The meanings of the mathematical symbols in the figure are listed in Table 1. As shown in the figure, triangles, such as \( {\mathbf{e}} \) and \( {\mathbf{f}} \), and pentagrams, such as \( {\mathbf{b}} \), denote different instance vectors belonging to two different concepts \( c_{i} \) and \( c_{j} \), and \( s_{i} ({\mathbf{p}}_{i} ,\;m_{i} ) \) and \( s_{j} ({\mathbf{p}}_{i} ,\;m_{j} ) \) are two different concept spheres, respectively. For three instanceOf triples \( (e,\;r_{e} ,\;c_{i} ) \), \( (f,\;r_{e} ,\;c_{i} ) \), and \( (b,\;r_{e} ,\;c_{j} ) \), TransFG projects \( {\mathbf{e}} \), \( {\mathbf{f}} \), and \( {\mathbf{b}} \) from the instance space to the concept space through the mapping matrices \( {\mathbf{M}}_{ei} \), \( {\mathbf{M}}_{fi} \), and \( {\mathbf{M}}_{bj} \) and obtains the projected vectors \( {\mathbf{e}}_{ \bot } \), \( {\mathbf{f}}_{ \bot } \), and \( {\mathbf{b}}_{ \bot } \). If the three triples are positive triples (i.e., they exist in the KG), \( {\mathbf{e}}_{ \bot } \) and \( {\mathbf{f}}_{ \bot } \) are located in the sphere \( s_{i} ({\mathbf{p}}_{i} ,\;m_{i} ) \), and \( {\mathbf{b}}_{ \bot } \) is located in sphere \( s_{j} ({\mathbf{p}}_{j} ,\;m_{j} ) \). The loss functions for the instanceOf triples are then defined using the relative positions between the projected vectors and the concept spheres.

Fig. 1.
figure 1

Representation of the instanceOf triples in TransFG.

Table 1. Mathematical symbols introduced in [13] and in our paper.

For the representation of the subClassOf triples, TransFG directly uses the corresponding method in TransC [13], that is, the loss functions for the subClassOf triples are defined using the relative positions between the two concept spheres.

For the representation of inter-instance relation triples, just like TransD [9], TransFG projects instance vectors from the instance space to the relation spaces through the corresponding mapping matrices. The loss functions for inter-instance relation triples are then defined using the projected vectors.

2.2 The TransFG Model

In this subsection, we describe the TransFG model in detail. We first list the mathematical symbols used to describe the model, and then define the loss functions for the three different types of triples: the instanceOf triples, the subClassOf triples, and inter-instance relation triples.

The mathematical symbols introduced in TransC’s paper [13] and in our paper are listed in Table 1, where the symbols in the first eleven rows are introduced in [13].

InstanceOf Triple Representation.

For an instanceOf triple \( (i,\;r_{e} ,\;c) \), TransFG learns the instance vector \( {\mathbf{i}} \) of instance \( i \), the projection vector \( {\mathbf{i}}_{p} \) for \( i \), the center vector \( {\mathbf{p}} \) of the sphere representing concept \( c \), and the projection vector \( {\mathbf{p}}_{p} \) for \( c \). The vectors \( {\mathbf{i}}_{p} \) and \( {\mathbf{p}}_{p} \) are used to construct the mapping matrix \( {\mathbf{M}}_{ic} \in {\mathbb{R}}^{k \times n} \). TransFG projects \( {\mathbf{i}} \) from the instance space to the concept space through the mapping matrix, which is defined as Eq. (1).

$$ {\mathbf{M}}_{ic} = {\mathbf{p}}_{p} {\mathbf{i}}_{p}^{{ \top }} + {\mathbf{E}}^{k \times n} $$
(1)

where \( {\mathbf{E}}^{k \times n} \) is an identity matrix used to initialize the mapping matrix. As can be seen from Eq. (1), each mapping matrix is determined by an instance and a concept. Hence, TransFG uses different matrices to project the same instance in different instanceOf triples, and the projected vectors are also different. The projected vector \( {\mathbf{i}}_{ \bot } \) of \( {\mathbf{i}} \) is defined as Eq. (2).

$$ {\mathbf{i}}_{ \bot } = {\mathbf{M}}_{ic} {\mathbf{i}} $$
(2)

In TransFG, the instanceOf relations are represented using the relative positions between the projected vectors and the concept spheres. For an instanceOf triple \( (i,\;r_{e} ,\;c) \), if it is a positive triple, then \( {\mathbf{i}}_{ \bot } \) should be inside the concept sphere of \( c \) to represent the instanceOf relation between \( i \) and \( c \). If \( {\mathbf{i}}_{ \bot } \) is outside the concept sphere, the instance embedding and concept embedding need to be optimized. The loss function is defined as Eq. (3).

$$ f_{e} (i,\;c) = \;||{\mathbf{i}}_{ \bot } - {\mathbf{p}}||_{2} - m $$
(3)

SubClassOf Triple Representation.

For a subClassOf triple \( (c_{i} ,\;r_{c} ,\;c_{j} ) \), the concepts \( c_{i} \) and \( c_{j} \) are represented by the spheres \( s_{i} ({\mathbf{p}}_{i} ,\;m_{i} ) \) and \( s_{j} ({\mathbf{p}}_{j} ,\;m_{j} ) \). There are four relative positions between the two spheres, as illustrated in Fig. 2 (this figure is taken from [13]). As shown in Fig. 2(a) and described in [13], if \( (c_{i} ,\;r_{c} ,\;c_{j} ) \) is a positive triple, the sphere \( s_{i} \) should be inside the sphere \( s_{j} \) to represent the inclusion relation between the two concepts, which is the optimization goal.

Fig. 2.
figure 2

(Source: Figure 2 in [13])

Four relative positions between concept spheres \( s_{i} \) and \( s_{j} \).

If the two spheres are separate from each other (as shown in Fig. 2(b)) or intersect (as shown in Fig. 2(c)), the two spheres need to get closer via optimization. The loss function is thus defined as Eq. (4) [13].

$$ f_{c} (c_{i} ,\;c_{j} ) = \;||{\mathbf{p}}_{i} - {\mathbf{p}}_{j} ||_{2} + m_{i} - m_{j} $$
(4)

where \( ||{\mathbf{p}}_{i} - {\mathbf{p}}_{j} ||_{2} \) denotes the distance \( d \) between \( {\mathbf{p}}_{i} \) and \( {\mathbf{p}}_{j} \) of the two spheres.

If \( s_{j} \) is inside \( s_{i} \) as shown in Fig. 2(d), we need to reduce \( m_{i} \) and increase \( m_{j} \). The loss function is therefore defined as Eq. (5) [13].

$$ f_{c} (c_{i} ,\;c_{j} ) = m_{i} - m_{j} $$
(5)

Inter-instance Relation Triple Representation.

Just like TransD [9], for an inter-instance relation triple \( (h,\;r,\;t) \), TransFG learns six vectors: the head instance vector \( {\mathbf{h}} \), relation vector \( {\mathbf{r}} \), tail instance vector \( {\mathbf{t}} \), projection vector \( {\mathbf{h}}_{p} \) for \( h \), projection vector \( {\mathbf{r}}_{p} \) for \( r \), and projection vector \( {\mathbf{t}}_{p} \) for \( t \). The projection vectors \( {\mathbf{h}}_{p} \), \( {\mathbf{r}}_{p} \), and \( {\mathbf{t}}_{p} \) are used to construct the mapping matrices \( {\mathbf{M}}_{rh} \) and \( {\mathbf{M}}_{rt} \), which are defined as Eqs. (6) and (7) [9].

$$ {\mathbf{M}}_{rh} = {\mathbf{r}}_{p} {\mathbf{h}}_{p}^{{ \top }} + {\mathbf{E}}^{z \times n} $$
(6)
$$ {\mathbf{M}}_{rt} = {\mathbf{r}}_{p} {\mathbf{t}}_{p}^{{ \top }} + {\mathbf{E}}^{z \times n} $$
(7)

where \( {\mathbf{E}}^{z \times n} \) is an identity matrix. TransFG projects \( {\mathbf{h}} \) and \( {\mathbf{t}} \) from the instance space to the corresponding relation space through the mapping matrices, obtaining the projected vectors \( {\mathbf{h}}_{ \bot } \) and \( {\mathbf{t}}_{ \bot } \). The vectors \( {\mathbf{h}}_{ \bot } \) and \( {\mathbf{t}}_{ \bot } \) are defined as Eq. (8) [9]:

$$ {\mathbf{h}}_{ \bot } = {\mathbf{M}}_{rh} {\mathbf{h}},\;\;{\mathbf{t}}_{ \bot } = {\mathbf{M}}_{rt} {\mathbf{t}} $$
(8)

The loss function is then defined as Eq. (9) [9].

$$ f_{r} (h,\;t) = \;||{\mathbf{h}}_{ \bot } + {\mathbf{r}} - {\mathbf{t}}_{ \bot } ||_{2}^{2} $$
(9)

Finally, similar to other embedding models [6,7,8,9, 13], we enforce constraints as \( ||{\mathbf{h}}||_{2} \le 1 \), \( ||{\mathbf{t}}||_{2} \le 1 \), \( ||{\mathbf{r}}||_{2} \le 1 \), \( ||{\mathbf{p}}||_{2} \le 1 \), \( ||{\mathbf{h}}_{ \bot } ||_{2} \le 1 \), \( ||{\mathbf{t}}_{ \bot } ||_{2} \le 1 \) in our experiments.

2.3 Model Training

The KG contains only positive triples, but model training requires negative triples, which need to be created with positive triples. For a positive triple \( (s,\;p,\;o) \) in the training set, either negative triple \( (s',\;p,\;o) \) or negative triple \( (s,\;p,\;o') \) is generated by replacing \( s \) or \( o \) with the same type of KG element (instance or concept) as \( s \) or \( o \). Like many existing studies, we use two replacement strategies, “unif” and “bern” [7], to generate negative triples. The replacement strategy “unif” means replacing the subjects or the objects in positive triples with the same probability, while “bern” means replacing the subjects or the objects with the different probabilities for reducing false negative labels. Each positive or negative triple is indicated by a label.

Just like TransC [13], we define the margin-based ranking loss \( {\mathcal{L}}_{e} \) for the instanceOf triples as Eq. (10) [13].

$$ {\mathcal{L}}_{e} = \sum\limits_{{\xi \in {\mathcal{S}}_{e} }} {\sum\limits_{{\xi ' \in {\mathcal{S}}_{e} '}} {\hbox{max} (0,\;\upgamma_{e} + f_{e} (\xi ) - f_{e} (\xi '))} } $$
(10)

Similarly, the margin-based ranking loss \( {\mathcal{L}}_{c} \) for the subClassOf triples and the margin-based ranking loss \( {\mathcal{L}}_{l} \) for inter-instance relation triples are defined as Eqs. (11) and (12) [13].

$$ {\mathcal{L}}_{c} = \sum\limits_{{\xi \in {\mathcal{S}}_{c} }} {\sum\limits_{{\xi ' \in {\mathcal{S}}_{c} '}} {\hbox{max} (0,\;\upgamma_{c} + f_{c} (\xi ) - f(\xi '))} } $$
(11)
$$ {\mathcal{L}}_{l} = \sum\limits_{{\xi \in {\mathcal{S}}_{l} }} {\sum\limits_{{\xi ' \in {\mathcal{S}}_{l} '}} {\hbox{max} (0,\;\upgamma_{l} + f_{r} (\xi ) - f_{r} (\xi '))} } $$
(12)

The overall ranking loss \( {\mathcal{L}} \) is therefore defined as Eq. (13) [13].

$$ {\mathcal{L}} = {\mathcal{L}}_{e} + {\mathcal{L}}_{c} + {\mathcal{L}}_{l} $$
(13)

The goal of model training is to minimize the overall ranking loss using stochastic gradient descent (SGD).

3 Experimental Evaluation

3.1 Experimental Design

Evaluation Tasks.

We used two typical KG downstream tasks, triple classification and link prediction, to compare and evaluate our TransFG and several KG embedding models including TransC [13], TransE [6], TransH [7], TransR [8], TransD [9], HolE [10], DistMult [11], and ComplEx [12]. We used Accuracy, Precision, Recall and F1-score as the evaluation metrics for the triple classification task, while we used MRR and Hits@N as the evaluation metrics for the link prediction task.

Implementation.

The program code of TransC directly uses its C++ code published in [13] (cf. https://github.com/davidlvxin/TransC). The program code of TransFG was generated by modifying the loss function calculation module and model training module of TransC’s code. These codes were used to generate the corresponding experimental results. We copied the results of the other models from [13], as both experiments used the same experimental datasets and parameter settings.

Datasets.

We used the same experimental datasets YAGO39K and M-YAGO39K as in [13]. YAGO39K was built in [13] by randomly extracting triples from YAGO, consisting of 39 types of relations including instanceOf , subClassOf , and inter-instance relations. As stated in [13], M-YAGO39K was formed based on YAGO39K by generating some new triples using the transitivity of the IS-A relations. We trained the TransC and TransFG models using the training set of YAGO39K, and obtained the best parameter configurations on YAGO39K and on M-YAGO39K through the validation sets of the two datasets, respectively. The triple classification task was evaluated on the test sets of the two datasets, respectively, while the link prediction task was evaluated on the test set of YAGO39K.

3.2 Experimental Results of Triple Classification

Triple classification is a binary classification task that determines whether a given triple is a positive triple or not. When the triple classification task is performed by TransFG, we set a threshold \( \delta_{r} \) for each relation \( r \). The threshold is achieved by maximizing the classification accuracy on the validation set. For each triple in the test set, TransFG uses the loss function defined for the triple to calculate the score. If the score is less than the threshold, the triple is classified as positive; otherwise it is classified as negative.

In our experiments, we set the parameters of TransC according to the best configurations given in [13]. For TransFG, we select the learning rate for SGD among {0.1, 0.01, 0.001}, the dimension of concept, instance and relation vectors \( k,\;n,\;z \) among {20, 50, 100}, and the margins \( \upgamma_{e} \), \( \upgamma_{c} \), \( \upgamma_{l} \) among {0.1, 0.3, 0.5, 1}. The best configurations of TransFG are then determined by the classification accuracy on the validation set. The best configurations on both YAGO39K and M-YAGO39K datasets are: \( \lambda = 0.001 \), \( k = 100 \), \( n = 100 \), \( z = 100 \), \( \upgamma_{e} = 0.1 \), \( \upgamma_{c} = 0.1 \), \( \upgamma_{l} = 1 \), and taking \( L_{1} \) as dissimilarity. We train the TransC and TransFG models for 1,000 rounds.

The experimental results of performing the instanceOf , subClassOf , and inter-instance relation triple classification tasks on the experimental datasets are shown in Tables 2, 3, and 4, respectively, where “P” stands for Precision, “R” Recall, and “F1” F1-Score, and “unif” and “bern” are the two replacement strategies described earlier. Observing these results, we have the following findings:

Table 2. Results (%) of the instanceOf triple classification on the two datasets.
Table 3. Results (%) of the subClassOf triple classification on the two datasets.
Table 4. Results (%) of inter-instance relation triple classification on YAGO39K.
  1. 1.

    From Table 2, we can find that TransC and TransFG perform slightly worse than other models on the instanceOf triple classification task on YAGO39K. We agree with the viewpoint in [13]: Since the instanceOf triples account for the majority (53.5%) in YAGO39K, the instanceOf relation is trained too many times in the other models, resulting in almost the best performance on this task, but the performance on other triple classification tasks is not good. It is worth noting that TransC and TransFG achieve relatively balanced performance on all three triple classification tasks. TransFG achieves slightly worse performance than TransC under the “unif” strategy, but it achieves better performance than TransC under the “bern” strategy.

  2. 2.

    From Tables 2, 3, and 4, we can find that TransFG performs better than TransC on all three triple classification tasks in terms of almost all evaluation metrics, except for the Recall value of the instanceOf triple classification task on M-YAGO39K, the Precision value of the subclassOf triple classification task on M-YAGO39K, and the Precision value of the inter-instance triple classification task on YAGO39K. This suggests that TransFG does better than TransC in terms of the representation of multiple different properties possessed by an instance and the modeling of the instanceOf relations. TransFG also performs better than all other models on all three triple classification tasks in terms of all evaluation metrics, except for the instanceOf triple classification task on YAGO39K.

  3. 3.

    From Tables 2 and 3, we can find that in most cases of performing the instanceOf and subClassOf triple classification tasks, TransFG performs better on the M-YAGO39K dataset than on the YAGO39K dataset, which indicates that TransFG can handle the transitivity of the IS-A relations very well.

  4. 4.

    From Tables 2, 3, and 4, we can find that in most cases of performing all three triple classification tasks on the both datasets, TransFG performs better under the “bern” strategy than under the “unif” strategy, which indicates that the “bern” strategy can reduce false negative labels more effectively than the “unif” strategy.

Based on the above findings, we can conclude that the performance of TransFG performing the triple classification task is generally better than that of TransC and other compared models.

3.3 Experimental Results of Link Prediction

Link prediction is aimed at predicting the missing head or tail for an inter-instance triple. When the link prediction task is performed by TransFG, for each triple in the test set we first replace the head instance and the tail instance with all instance in \( {\mathcal{I}} \) one by one, thereby obtaining so-called corrupted triples (two corrupted triples per replacement). Then we obtain the scores by calculating the loss function defined for the corrupted triples, and finally rank the instances in \( {\mathcal{I}} \) in ascending order of the scores. Note that a corrupted triple may also exist in the KG, so such a triple that exists in the KG should be regarded as a correct prediction (a positive triple). Like existing works [6,7,8,9,10, 12, 13], our experiments also use two common evaluation settings “Raw” and “Filter”. The “Raw” setting means that the corrupted but positive triples are not filtered out, while the “Filter” setting means that these triples are filtered out.

In our experiments, we set the parameters of TransC according to the best configurations given in [13]. For TransFG, the parameters are selected in the same way as on the triple classification task as described in Sect. 3.2. The best configurations of TransFG are then determined according to the Hits@10 on the verification set. The best configurations on the YAGO39K dataset are: \( \lambda = 0.001 \), \( k = 100 \), \( n = 100 \), \( z = 100 \), \( \upgamma_{e} = 0.1 \), \( \upgamma_{c} = 1 \), \( \upgamma_{l} = 1 \), and taking \( L_{1} \) as dissimilarity. We train the TransC and TransFG models for 1,000 rounds.

The experimental results of performing the link prediction task on YAGO39K are shown in Table 5. Observing these results, we have the following findings:

Table 5. Results of link prediction for inter-instance relation triples on YAGO39K. Hist@N uses the results in the “Filter” evaluation setting.
  1. 1.

    TransFG performs slightly worse than DisMult, but is the same as TransE and better than all other models, in terms of MRR (in the “Raw” setting).

  2. 2.

    TransFG outperforms TransC and all other models in terms of MRR (in the “Filter” settings) and Hits@N, which indicates that TransFG can represent multiple different properties possessed by an instance very well.

  3. 3.

    TransFG performs better under the “bern” strategy than under the “unif” strategy, which indicates that the “bern” strategy can reduce false negative labels more effectively than the “unif” strategy.

Based on the above findings, we can conclude that the performance of TransFG performing the link prediction task is generally better than that of TransC and other compared models.

4 Conclusions

Based on TransC, this paper proposes a fine-grained KG embedding model TransFG that embeds concepts, instances, and relations into different vector spaces and projects instance vectors from the instance space to the concept space and the relation spaces through dynamic mapping matrices. We conducted experimental evaluation through link prediction and triple classification on datasets YAGO39K and M-YAGO39K. The results show that TransFG outperforms TransC and other typical KG embedding models in most cases, especially in terms of the representation of multiple different properties possessed by an instance and the modeling of the instanceOf relations.