1 Introduction

How similar are Coldplay and Snow Patrol? What are the closest entities to Figure Skating? Is Oriental Pearl related to Beijing? Nowadays, with the proliferation of knowledge graph (KG) and its vast applications, it is intuitive to hold doubts concerning the relatedness between entities—the basic components of KG. Entities are unique identifiers of objects, which also serve as pivots connecting unstructured free-form texts with regularized KG. In many KG-related works, such as entity linking [1, 2] and KG based document ranking [3], entity relatedness is considered as an indispensable part for enhancing overall performance.

Exploring the semantic relatedness of different entities is a routine yet deceptively complex task. Most of the existing entity relatedness measures are proposed and evaluated in extrinsic tasks, e.g., entity liking [4,5,6], while the intrinsic evaluation against human judgements of relatedness has been rare and confined mainly to word pairs instead of entity pairs [7]. Nonetheless, on the one hand, regarding entity relatedness as a sub-task of extrinsic problems might well render it task-specific and less applicable to other scenarios. On the other, the well-researched word similarity solutions cannot be directly applied to entity relatedness measurement due to the abundant semantic information entities contain. In consequence, the study of entity relatedness measure and intrinsic evaluation are of significance.

Fig. 1
figure 1

Estimate entity relatedness via text corpus and graph structure

Existing entity relatedness measures can be mainly divided into two categories, methods based on corpus text and methods leveraging graph structure. The approaches harnessing corpus text model the textual description content of an entity with a vector of real numbers, which is then utilized to estimate relatedness with other vector-represented entities via traditional geometric measures. Meanwhile, in the line of works based on graph structure, entities are regarded as nodes and entity relatedness is in turn transformed to node-to-node similarity, which is characterized by the mutual neighbouring nodes or the whole graph structure. Figure 1 illustrates these two categories of approaches. Note that normally Wikipedia is leveraged as supporting KG due to its rich textual and graphical information.

The state-of-the-art intrinsic entity relatedness solution [7] implements representative existing solutions, among which the frontrunners are selected and constitute a two-stage framework for computing entity relatedness with higher effectiveness and efficiency. Nevertheless, in the recommended configuration of the two-stage framework, merely graph-based approaches are harnessed, whereas the significance of corpus text is neglected. In addition, the corpus text based method reported with the best performance, albeit outperformed by graph structure approaches, is ENTITY2VEC, which extracts the latent semantic meanings of entities via embeddings. The inferiority can partially be attributed to the overlook of entity description text.

In short, the drawback of existing entity relatedness measures is two-fold:

  • Most of the approaches are either adapted from word/document similarity measures, or driven by extrinsic tasks, which might overlook the semantics of entities and fail to cater to broader application scenarios; and

  • The latent semantic meaning of entity description is yet neglected and corpus text information has not been taken fully advantage of.

In this work, we offer an intrinsic entity relatedness solution, E5, which measures Entity rElatedness via Entity and tExt joint Embedding. E5 comprises two methods. The primary contributor is the embedding similarity between entities and texts (entity descriptions) computed via the joint embedding network, which aims at making the most of the latent corpus text information. Additionally, graph structure is not overlooked and we adopt M&W  [8], a method utilizing hyperlink structure of Wikipedia to characterize entity relatedness, the effectiveness of which has been embodied in many entity linkers. As a linear combination of these two measures, E5 achieves promising results in the experimental evaluation.

Furthermore, as input for joint entity and text embedding network, the result of joint entity and words embedding training affects the overall performance. Hence, we propose to enhance the embedding quality by introducing an expanded corpus and evaluate with parameter analysis and case study.

Contributions The main contributions of this article can be summarized into three ingredients:

  • Joint entity and text embedding network is leveraged to measure the relatedness of entities in accordance to embedding similarities between entities and entity descriptions, which highlights and makes fully advantage of entity description text.

  • We propose E5, an entity relatedness measure via entity and text joint embedding, which characterizes entity relatedness in terms of both corpus text and graph structure.

  • The empirical results validate the usefulness of E5 regarding to the intrinsic evaluation of entity relatedness. Additionally, the joint embedding of entity and word using expanded corpus also proves to be effective.

Organization Section 2 overviews related work. Joint embedding of entity and text is elaborated in Sect. 3. Section 4 throws light on the experimental settings and entity relatedness evaluation results, followed by conclusion in Sect. 5.

2 Related Work

Entity Relatedness is a relatively new task and there is not much previous work directly devoted to measuring entity relatedness. Nonetheless, some existing solutions focused on calculating similarity between words and graph nodes can be adapted for measuring entity relatedness. Hence, we first overview relatedness measures focused on entities, which can further be divided into intrinsic and extrinsic evaluations. Then the extended relatedness methods are introduced, which are similar, but cannot be directly applied, to entity relatedness measurement.

Intrinsicentity relatedness methods The difference between intrinsic and extrinsic entity relatedness evaluations lies in the motivations. While the former aims at developing general methods measuring entity relatedness that can cater to different downstream applications, the latter devises methods merely according to the requirement of specific tasks. Our work strives to offer an effective intrinsic entity relatedness measure.

The initial intrinsic entity relatedness measure could be traced back to [8] and [9], which centred on measuring relatedness between Wikipedia items. Especially, M&W  [8] was established on the hypothesis that the semantic relatedness of two concepts can be defined by the number of incoming links they share. Nevertheless, in these graph structure based work, the conception of entity relatedness was not put forward and the focus was Wikipedia concept. The intrinsic entity relatedness task was formally proposed and defined in [10], in which the semantic meaning of an entity was represented by its distribution in the high dimensional concept space derived from Wikipedia. Zhao et al. [11] incorporated multiple types of relations to measure the semantic relatedness between Wikipedia entities and the task was transformed to completion of a sparse entity-entity association matrix. Still, the entity description text was yet not fully taken advantage of.

The state-of-the-art entity relatedness work [7] presented a thorough study of relatedness measures based on Wikipedia, offered an intrinsic evaluation dataset of entity relatedness, and devised a two-stage framework utilizing the existing entity relatedness measures with best performance. Despite that the method achieved promising results, the best configuration was still a combination of graph structure based approaches. In our work, we propose a corpus text based entity relatedness measure via joint embeddings, which in combination with a simple yet effective graph structure based method, can attain superior performance.

Extrinsicentity relatedness methods Entity relatedness serves as a crucial part in many entity-related tasks such as entity linking and entity recommendation. Consequently, a large body of existing entity relatedness methods [4,5,6, 12] were developed in those extrinsic tasks [13,14,15,16,17,18,19], where their intrinsic performances were not evaluated. Especially, Yamada et al. [4] proposed to measure entity relatedness via joint embedding of entity and word, which was similar to E5, but it did not directly model arbitrary-length text and neglected the contribution made by graph-based methods.

Extended relatedness measures There is a large number of studies devoted to measuring similarity between objects other than entities, such as words, documents, and graph nodes, which are similar to entity relatedness task. Existing methods can also be roughly clustered into two groups, relatedness based on corpus text and relatedness based on graph structure. The former models the textual content of a word/document/graph node with a real-number vector and outputs the relatedness by calculating cosine similarity between vectors. The representative approaches include Vector Space Model (VSM), Explicit Semantic Analysis (ESA) [20] and Latent Dirichlet Allocation (LDA) [21]. Meanwhile, the latter places targeted objects in a graph and computes relatedness via node-to-node similarity. Dominant methods comprise PPR+Cos [22], CoSIMRANK [23] and DEEPWALK [24].

In line with [7], we adapt aforementioned methods for measuring entity relatedness and the experiment results are reported and discussed in Sect. 4.

3 Methodology

Figure 2 illustrates the joint entity and text embedding process, which initiates from joint embedding of words and entities in text. The entity and word embeddings generated from the first stage are utilized as inputs for the second step. The second stage, joint entity and text embedding network, projects texts and entities into the same high-dimensional space, so that the relatedness scores between entities, entity description texts, entities and entity descriptions can be computed accordingly by utilizing cosine similarity. Eventually, E5 combines relatedness score generated from corpus text based joint embedding network, with a simple yet effective graph structure based method, M&W, to form overall entity relatedness measure.

Fig. 2
figure 2

Workflow of entity and text joint embedding

3.1 Joint Embedding of Entity and Word

Embeddings are n-dimensional vectors of concepts that describe the similarities between these concepts using cosine similarity [25]. It is assumed that the concepts are similar if they frequently co-occur with the same other concepts. In literature, this has already been well researched for words [26] and documents [27]. Take word embedding as an instance. The word embedding vectors are designed to capture the semantic similarity between words when similar words are placed near one another in the vector space. Consequently, entities can also be projected to a relatively high dimensional vector space so as to better represent their semantic meanings.

In line with recent work [4], we harness an embedding method that jointly embeds entities and words into the same vector space, where similar entities and words are placed in adjacent. Note that different from [4], we construct an expanded corpus for training, which yields embeddings with better quality, as reported in Sect. 4.

The joint embedding method stems from conventional skip-gram model [26] that learns word embedding, the training objective of which is to generate word representations that can predict context words given a specific word. Formally, let \({w_1, w_2,...w_N}\) be a sequence of words, the model aims to maximize average log probability:

$$\begin{aligned} \varTheta _w = \dfrac{1}{N} \sum _{i=1}^N \sum _{-c\le j\le c,j\ne 0} \log P(w_{i+j}|w_{i}). \end{aligned}$$
(1)

where \(w_i\) represents the target word and \(w_{i+j}\) is a context word, c is the size of context window. The conditional probability is defined as:

$$\begin{aligned} P(w_{i+j}|w_{i}) = \dfrac{\exp ({\upsilon ^{'}_{w_{i+j}}}^{\top } \upsilon _{w_{i}})}{\sum _{w\in W} \exp ({\upsilon ^{'}_{w}}^{\top } \upsilon _{w_{i}})}. \end{aligned}$$
(2)

where W represents the set of all words in the vocabulary, \(\upsilon _{w}\) and \(\upsilon ^{'}_{w}\) denote the ‘input’ and ‘output’ vector representations of word w. After training, the ‘output’ \(\upsilon ^{'}_{w}\) is used for word embedding.

Fig. 3
figure 3

Corpus Expansion

Then we extend the conventional skip gram model to joint embedding model. As for how to create the training corpus, specifically, the texts in Wikipedia pages consist of words and anchor texts and by utilizing the link associated with each anchor text, the entity identifier for the corresponding anchor text could be obtained. As is illustrated in Fig. 3, the expanded sentences (Expanded 1) for joint embeddings can be generated by replacing anchor texts with entity identifiers. Plus, entity identifiers from the original sentences are also extracted to form new inputs so as to better capture the relations between entities (Expanded 2).

Since entity identifier can be regarded as a special form of word, Eqs. 1 and 2 can be altered as follows:

$$\begin{aligned} \varTheta _{ew}= & {} \dfrac{1}{N} \sum _{i=1}^N \sum _{-c\le j\le c,j\ne 0} \log P(\tau _{i+j}|\tau _{i}). \end{aligned}$$
(3)
$$\begin{aligned} P(\tau _{i+j}|\tau _{i})= & {} \dfrac{\exp ({\upsilon ^{'}_{\tau _{i+j}}}^{\top } \upsilon _{\tau _{i}})}{\sum _{\tau \in \varGamma } \exp ({\upsilon ^{'}_{\tau }}^{\top } \upsilon _{\tau _{i}})}. \end{aligned}$$
(4)

where \({\tau _1, \tau _2,\ldots \tau _N}\) is a sequence of tokens (words or entity identifiers), \(\tau _i\) and \(\tau _{i+j}\) represent the target token and context token, respectively. \(\varGamma \) denotes the set of all tokens in the corpus, \(\upsilon _{\tau }\) and \(\upsilon ^{'}_{\tau }\) represent the ‘input’ and ‘output’ vector representations of token \(\tau \). After training, the ‘output’ \(\upsilon ^{'}_{\tau }\) is used for joint word and entity embedding.

3.2 Joint Embedding of Entity and Text

In spite of the fact that entity relatedness can be estimated by cosine similarity of entity embeddings derived from entity and word joint training process, it fails to take into account entity description information. Therefore, inspired by [28], we establish a neural network that jointly learns vector representations of texts and KB entities, so that the similarity between entities, entity description texts, an entity and a piece of of entity description text can be obtained accordingly, all of which can contribute to the estimation of entity relatedness.

Similar to the corpus for entity and word joint embedding, we train entity and text joint embedding on annotated Wikipedia pages. The target is to predict entities referred to by anchor links in Wikipedia text. Given a piece of text \(t = \{w_1, w_2,\ldots w_N\}\), which contains entities \(E_t = \{e_1, e_2,\ldots e_n\}\), Eqs. 1 and 2 can be transformed as follows to predict entities that appear in text:

$$\begin{aligned} \varTheta _{et}= & {} \sum _{(t,E_t)\in \Delta } \sum _{e \in E_t} \log P(e|t). \end{aligned}$$
(5)
$$\begin{aligned} P(e|t)= & {} \dfrac{\exp ({\upsilon _e}^{\top } \upsilon _{t})}{\sum _{e^{*}\in E_{K}} \exp ({\upsilon _{e^{*}}}^{\top } \upsilon _{t})}. \end{aligned}$$
(6)

where \(\Delta \) denotes a set of pairs, each of which comprises a text t, as well as the entities \(E_t\) contained in it. P(e|t) represents the probability that a text t contains an entity e. All entities in KB are denoted by \(E_{K}\) and a random entity in \(E_{K}\) is represented by \(e^{*}\). \(\upsilon _e\) and \(\upsilon _t\) are vector representations of entity e and text t respectively.

Noteworthy is that the vector representation \(\upsilon _t\) of text \(t = \{w_1, w_2,\ldots w_N\}\) is obtained by \(L_2\) normalization of the sum of word embedding vectors in t:

$$\begin{aligned} \upsilon _t = W \dfrac{\sum _{m=1}^{N} \upsilon _{w_m}}{\Vert \sum _{m=1}^{N} \upsilon _{w_m}\Vert } + b. \end{aligned}$$
(7)

where W is a weight matrix, b is a bias vector, which will be learned through the training process, and \(\upsilon _{w_m}\) denotes the embedding vector of word \(w_m\). Both word embedding \(\upsilon _{w}\) and entity embedding \(\upsilon _{e}\) are derived from joint embedding of entity and word.

3.3 Overall Entity Relatedness Measure

After obtaining the well-trained joint entity and text embedding network, given two entities \(e_i\) and \(e_j\), with their Wikipedia description texts, \(d_i\) and \(d_j\), the corpus text based entity relatedness can be measured by:

$$\begin{aligned} R_T(e_i,e_j) = \alpha _1 sim(\upsilon _{e_i},\upsilon _{e_j}) + \alpha _2 sim(\upsilon _{e_i},\upsilon _{d_j}) + \alpha _3 sim(\upsilon _{e_j},\upsilon _{d_i}) + \alpha _4 sim(\upsilon _{d_i},\upsilon _{d_j}) \end{aligned}$$
(8)

where \(sim(*)\) calculates the cosine similarity between two embedding vectors, and accordingly \(sim(\upsilon _{e_i},\upsilon _{e_j})\) denotes the similarity between the embeddings of two entities \(e_i\) and \(e_j\), while \(sim(\upsilon _{e_i},\upsilon _{d_j}), sim(\upsilon _{e_j},\upsilon _{d_i}), sim(\upsilon _{d_i},\upsilon _{d_j})\) represent the similarity between \(e_i\) and \(d_j\), \(e_j\) and \(d_i\), \(d_i\) and \(d_j\) respectively, computed by utilizing our proposed joint text and entity embedding network. \(\alpha _1,\alpha _2,\alpha _3,\alpha _4\) are parameters balancing contributions made by different similarity measures, which are assigned with equal weights in our experiment.

Besides text corpus based method, as is pointed out in [7], a simple but effective graph structure based method, M&W, can achieve promising results. M&W is based on the assumption that the semantic relatedness of two concepts can be defined by the number of incoming links they share. Formally, the M&W relatedness between entities \(e_i\) and \(e_j\) is:

$$\begin{aligned} R_G(e_i,e_j) = 1-\dfrac{\log (\max {\{|I(e_i)|,|I(e_j)|\}}) -\log (|I(e_i)\cap I(e_j)|)}{\log (n)-\log (\min {\{|I(e_i)|,|I(e_j)|\}})} \end{aligned}$$
(9)

where I(e) denotes the incoming links of entity e, n represents the total number of entities in Wikipedia.

Hence, E5 generates the eventual relatedness between entities \(e_i\) and \(e_j\) as:

$$\begin{aligned} R(e_i,e_j) = \eta R_T(e_i,e_j) + \theta R_G(e_i,e_j) \end{aligned}$$
(10)

where \(\eta \) and \(\theta \) are two parameters balancing the importance of corpus text based and graph structure based measures, which are considered to be of equal significance and both assigned with 0.5 in our work.

Fig. 4
figure 4

Work flow of E5

Figure 4 further explains the work flow of E5. Specifically, given two entities and their description texts, the words in texts and entities are first transformed into embeddings via the well-trained network. Then we jointly embed the description texts and entities by harnessing the trained parameters. Accordingly, \(R_T\), the corpus text based entity relatedness, can be obtained by combining the similarities between embeddings of texts and entities. In the end, we generate the eventual E5 relatedness by summing up \(R_T\) and the graph structure based entity relatedness \(R_G\), where the specific definitions of \(R_T\) and \(R_G\) can be found above.

4 Experiment

In this section, we first elaborate the experimental settings, followed by evaluation results and analysis.

4.1 Experimental Settings

4.1.1 Additional Corpus Information

As is detailed in Sect. 3.1, the corpus for embedding training is composed of the original form and two expanded forms of each sentence, the punctuation of which should be removed before being forwarded to training. We use Wikipedia dump on 20-Mar-2018 Footnote 1 as text source.

The corpus for entity and text joint embedding is derived from DBpedia abstract corpus [29], which comprises the first introductory sections of all Wikipedia pages with links (entities) preserved. In contrast to the whole Wikipedia dump which might contain much noise, the first paragraph or section of each Wikipedia article generalizes the main topic and is of relatively higher quality. Plus, the first paragraph of entity’s corresponding Wikipedia page is also considered as its entity description in our work.

4.1.2 Training Settings

For joint entity and word embedding training, we utilized word2vec implementation in Gensim Footnote 2. The embedding dimension was set to 300, window size was assigned with 20, and iteration was 1.

As for learning the parameters, W and b, in entity and text joint embedding network, we initialized the dimension of the fully connected neural network layer to 300. The batch size was set to 128. To accelerate the training process, we utilized the negative sampling strategy [26] by generating the same number of irrelevant entities as negative samples. The number of negative samples for each positive pair was set to 10.

4.1.3 Dataset

Following the state-of-the-art work, we utilize WIRE [7] as the evaluation dataset, which consists of 503 pairs of named entities in Wikipedia and associated relatedness scores assigned by a group of human accessors. Entity pairs are of different levels of relatedness and many pairs are similar yet far in the Wikipedia graph structure, which renders KG distance a less effective entity relatedness measure.

Furthermore, we also consider WIKISIM [8] as an appropriate benchmark for the intrinsic evaluation of entity relatedness. It stems from WORDSIM-353 dataset comprising 353 word pairs, which is then manually annotated to their corresponding Wikipedia entities.

4.1.4 Evaluation Metric

The harmonic mean of Pearson correlation index and Spearman correlation index is utilized as the evaluation metric in our work. While the former, Pearson correlation index, highlights difference between predicted and correct results, the latter emphasizes the ranking order of the predicted relatedness values. The two indexes capture different aspects of the predicted results and are of equal significance to evaluating relatedness measures. In consequence, we adopt the harmonic mean of the two indexes as the final indicator, which can embody these two different features.

Specifically, suppose the relatedness scores generated by a specific method are denoted by \(\mathbf {X}\), and the corresponding ground-truth relatedness scores are \(\mathbf {Y}\). Then Pearson correlation coefficient is calculated by:

$$\begin{aligned} r_p = \dfrac{\sum \mathbf {X} \mathbf {Y} -\dfrac{\sum \mathbf {X} \sum \mathbf {Y}}{N}}{\sqrt{\left( \sum \mathbf {X}^2 - \frac{(\sum \mathbf {X})^2}{N}\right) \left( \sum \mathbf {Y}^2 - \frac{(\sum \mathbf {Y})^2}{N}\right) }}. \end{aligned}$$
(11)

As for Spearman correlation index, both \(\mathbf {X}\) and \(\mathbf {Y}\) are first sorted in a ascending or descending order. Suppose the length of \(\mathbf {X}\) and \(\mathbf {Y}\) is n, and \(d_i = \mathbf {X}_i - \mathbf {Y}_i, 0\le i < n\), then Spearman coefficient is calculated by:

$$\begin{aligned} r_s = 1 - \frac{6 \sum _{i=1}^{n} (d_i)^2}{n(n^2 - 1)} \end{aligned}$$
(12)

Then the harmonic mean of \(r_p\) and \(r_s\) is considered as the evaluation metric.

4.1.5 Competitors

We adopt the following measures, as well as the state of the art work [7], as competitors:

  • Vector Space Model (VSM) measures entity relatedness by comparing the similarity of entity description texts, which are represented as sparse vectors over terms in text weighed by tf-idf.

  • Explicit Semantic Analysis (ESA) [20] represents entities by their related Wikipedia articles with tf-idf weights. Then the relatedness is calculated by cosine similarity between sparse vectors of entities over all Wikipedia pages.

  • Latent Dirichlet Allocation (LDA) [21] compares entities by cosine similarity between topic distribution weights of their description texts.

  • ENTITY2VEC (E2V) [4] maps entities and words to the same embedding space and attains relatedness by entity embedding similarity score. This method is similar to our joint entity and word embedding, whereas we utilize the expanded corpus to improve embedding quality.

  • PPR+Cos [22] measures the relatedness of two nodes in KG by cosine similarity between their PageRank vectors in the graph.

  • CoSIMRANK [23] improves PPR+Cos by taking into account the fact that early meetings during the two separate random walks are of more value than later encounters.

  • DEEPWALK (DW) [24] embeds the whole graph and measures the node relatedness via embedding similarity.

  • M&W  [8] is based on the assumption that the semantic relatedness of two concepts can be defined by the number of incoming links they share.

  • Two-Stage Framework (TSF) [7] devises a two-stage framework, which creates a sub-graph of two entities by retrieving their most similar entities via M&W measure. The edge weights are computed by a linear combination of M&W and DW measures. Eventually entity relatedness is obtained by applying CoSIMRANK on the sub-graph.

Table 1 Entity relatedness results on WIRE
Table 2 Entity relatedness results on WIKISIM

4.2 Results and Analysis

4.2.1 Results Against Other Approaches

The full experiment results are presented in Tables 1 and 2. E5 achieves the best performance over three metrics and outperforms the runner-up by 2% on WIRE, and 1% on WIKISIM, in terms of harmonic value, which verifies the effectiveness of combining joint entity and text embedding network for measuring corpus text based relatedness, with M&W for evaluating graph structure based similarity.

Furthermore, to validate the superiority of our joint entity and word training process, we report the results of E5-, which merely utilizes the entity embedding obtained from the joint entity and word training to compute entity relatedness. Compared with E2V, which also measures entity relatedness via entity embedding similarity, E5- improves the results by 2% on both datasets in terms of the Pearson correlation index and harmonic value. The superiority can be mainly attributed to the expanded corpus.

With regard to the comparison within corpus text based approaches and graph structure based methods, mapping entities or words to a higher dimensional space for measuring text-based similarity attains superior results since it can better capture the semantic meanings underneath text, while M&W, a simple method harnessing Wikipedia graph structure, surprisingly outperforms other graph-based solutions, which can be explained by the high quality of human-annotated Wikipedia links. TSF selects the best graph structure based methods to constitute a two-stage framework and further improves the results. Nonetheless, as a combination of two methods focused on different aspects, E5 attains the best overall performance.

Table 3 Parameter optimization results

4.2.2 Parameter Optimization

Noteworthy is that we assigned equal weights to the parameters in Eqs. 8 and 10, since there are no training/validation datasets and the annotation of entity relatedness scores between pairs of entities is both time and labour consuming.

Nevertheless, to examine the effectiveness of parameter optimization on the final results, we conducted a fivefold cross validation by randomly splitting the WIRE and WIKISIM datasets, with 80% train and 20% test in each fold, and a linear RankSVM [30] was harnessed for parameter training as well as evaluation. The averaged result from the fivefold cross validation is denoted as E5+.

As is revealed in Table 3, performing parameter optimization can improve the final performance. However it would be unfair to compare E5+ with previous methods since the training set is obtained from the test datasets, i.e., WIRE and WIKISIM. Consequently, we merely aim to show that parameter optimization can improve overall results here and leave the construction of a large training dataset as future work, which might require the introduction of techniques such as distant supervision and crowdsourcing.

4.2.3 Time Consumption

It should be noted that the main time consumption of E5 comes from the two joint embedding processes, which can be trained in an off-line manner, while computing the relatedness score given two entities can be achieved within seconds by utilizing the well-trained framework. And since we have implemented the joint training process beforehand, the time consumption of E5, which can be represented by the time cost of the latter process, is relatively small.

Table 4 The most relevant entities/words to Figure Skating trained with E5- and E2V

4.2.4 Case Study

Table 4 presents the most relevant entities/words to entity Figure Skating trained with E5- and E2V. The bold items represent words while the rest denote entities. Compared with the results generated by E2V, which contain entities irrelevant to winter sports such as Triathlon and Gymnastics, our proposed training method with expanded corpus achieves superior performance.

In response to the question proposed in the beginning of the paper, ‘How similar are Coldplay and Snow Patrol?’, we present the most similar entities/words to entity Coldplay and entity Snow Patrol in Table 5. It is evident that these two entities are closely related since their similar entities are all bands with indie/alternative style, and they share the same relative entity Arctic Monkeys. Nevertheless, if inspected carefully, the difference is also obvious. While the entities related to Coldplay are bands from different nations, the most similar entities of Snow Patrol are mainly Scottish or Northern Irish artists, indicating the wider popularity of Coldplay. The specific relatedness score between them calculated using E5 is 0.788.

Table 5 The most relevant entities to Coldplay and Snow Patrol trained with E5-

5 Conclusion

Entities are unique identifiers of objects, which play an increasingly more significant role in many natural language processing related tasks. In those tasks, the estimation of entity relatedness is required. Current state-of-the art methods measure entity relatedness either by merely utilizing the graph structure of KG’s, or by harnessing entity embeddings trained from text corpus, whereas the use of entity description text is yet neglected.

In this work, we propose E5, an effective entity relatedness measure which combines text corpus based and graph structure based approaches. The words and entities are first projected to the same high-dimensional vector space, and the outputs are utilized as inputs for the following joint entity and text embedding training. The well-trained entity and text embedding network can then be leveraged to measure similarity between entities and entity descriptions, which in combination with a graph structure based method, constitute the eventual entity relatedness measure. The experimental outcome not only verifies the effectiveness of E5, but also shows high quality of the word and entity embedding as an affiliated contribution.

Potential future research directions include applying the proposed measure on downstream tasks such as entity linking and entity recommendation, and creating a large entity relatedness training set by harnessing distant supervision or crowdsourcing.