Keywords

1 Introduction

Relation extraction (RE) is the task of identification and classification of relations between named entities (such as persons, locations or organizations) in free text. RE is of utmost practical interest for various fields including event detection, knowledge base construction and question answering. Figure 1 illustrates a typical RE task. For the first two sentences, RE should identify the semantic relation type birth place between the named entity pairs regardless of the surface pattern used to express the relation such as hometown is or was born in. RE should also distinguish it from the album production relation between the same named entities in the third sentence.

Distant supervision techniques for RE [1, 4] have proven to be very efficient in solving that problem. However, distant supervision is limited to a fixed set of relations in a given knowledge base, which hinders its adaptation to new domains. Unsupervised approaches [3, 7] can potentially overcome these limitations by applying purely unsupervised methods enabling extraction of open relations (relations unknown in the knowledge base in advance). In this paper, we propose an unsupervised approach to extract and cluster open relations between named entities from free text by re-weighting word embeddings and using the types of named entities as additional features.

Fig. 1.
figure 1

Sentences containing textual relations between named entities.

2 Proposed Method

Our system builds sentence representations based on the types of the involved named entities, and the terms forming the relations. For the latter, we use pre-trained word embeddings after re-weighting them according to the dependency path between the named entities. These representations are clustered so that different representations of the semantically equivalent relations are mapped to the same cluster. Figure 2 presents an overview of our system for unsupervised open relation extraction, consisting of four stages: preprocessing, feature extraction, sparse feature reduction and relation clustering described in the following.

Fig. 2.
figure 2

System overview

Preprocessing. For each sentence in the dataset, we extract named entities using DBpedia Spotlight and consider all sentences containing at least two entities. For this set of sentences, the Stanford CoreNLP dependency parser is utilized to extract the lexicalized dependency path between each pair of named entities.

Feature Extraction. For each sentence, our method outputs a vector representation of the textual relation between each named entity pair. Features include word embeddings, dependency paths between named entities, and named entity types. Word embeddings provide an estimation of the semantic similarity between terms using vector proximity. Sentence representations are typically built by averaging word vectors. However, not all words in a sentence equally contribute to the expression of the relation between two named entities. Therefore we develop a novel method to re-weight the pre-trained word embeddings. Terms that appear within the lexicalized dependency path between the two named entities are given a higher weight. Intuitively, shorter dependency paths are more likely to represent true relationships between the named entities. The vector representation s(WD) of each sentence is calculated through the following function:

$$\begin{aligned} s(W,D) = \sum _{w_i \in W} f(w_{i},W,D) \cdot v(w_{i}), \quad f(w_{i},W,D)= {\left\{ \begin{array}{ll} \frac{C_{in} \cdot |W| }{|D|},&{} \text {if } w_{i} \epsilon D \\ C_{out}, &{} \text {otherwise} \end{array}\right. }, \end{aligned}$$

where \(W=\{w_1,...,w_n\}\) is the set of terms in the sentence, \(D \subset W\) is the set of terms in the lexicalized dependency path between the named entities in the sentence, and \(v(w_{i})\) is the pre-trained word embedding vector for \(w_i\). \(C_{in} \ge 1\) and \(C_{out}\) are constant values experimentally set to 1.85 and 0.02. We use GloveFootnote 1 word embeddings of size 100. As a baseline, we compare these representations with standard sentence representations features such as: TF-IDF, the sum of word embeddings, and the sum of IDF re-weighted word embeddings [5]. Intuitively, relations can connect entities of certain types. For example, a birth place relation connects a person and a location, although other relations between person and location are possible. Therefore, for each named entity, we use its DBpedia types and Stanford NER tags as features.

Sparse Feature Reduction. Some of the features are more sparse than the others; concatenating them for each relation skews the clustering. In supervised relation extraction, this is not an issue as any learning algorithm is expected to do feature selection automatically using the training data. In unsupervised relation extraction there is no training data, hence we devise a novel strategy in order to circumvent the sparse features bias. Individual feature reduction of the sparse features is applied before merging them with the rest of the feature vectors. For feature reduction, we use Principal Component Analysis (PCA) [2].

Relation Clustering. We use Hierarchical Agglomerative Clustering (HAC) to cluster the feature representations of each relation, with Ward’s [6] linkage criteriaFootnote 2, which yields slightly better results than the k-means clustering algorithm.

3 Evaluation

To evaluate our system, we use the NYT-FB dataset [3]. This dataset contains approximately 1.8M sentences divided into 80%–20% test-validation splits and aligned automatically to the statements (triples) from Freebase. The alignment between sentences and the properties of the Freebase triples in this dataset is considered as the gold standard for the relation clustering algorithm.

We use the validation split to tune the parameters for re-weighting word vectors and the PCA algorithm, and the test set for evaluating relation discovery methods. We compare our method using the best identified feature combination with the state-of-the-art models for unsupervised Relation Discovery, namely the variational autoencoders model [3] and two other systems, Rel-LDA [7], and Hierarchical Agglomerative Clustering (HAC) baseline with standard features [8]. To make our results comparable we set the number of relations to induce (number of clusters k) to 100, following the SOA systems.

Table 1 shows the performance of the clustering algorithm by relying only on sentence representations as features. Results demonstrate that our method of word embeddings re-weighted by the dependency path shows a significant improvement over other traditional sentence representations. Table 2 shows the performance when the dependency re-weighted word embeddings are merged with the rest of the proposed features and applying individual feature reduction. Our method outperforms the state-of-the-art relation discovery algorithm scoring a pairwise F1 score of 41.6%.

Table 1. Comparison between different features for clustering.
Table 2. Pairwise F\(_1\) (%) scores of different models on the test set of the NYT-FB dataset.

4 Conclusion

In this paper, we proposed an approach for unsupervised relation extraction from free text. Our approach is based on a novel method of re-weighting word vectors according to the dependency parse tree of the sentence. As additional features, we use the types of named entities involved in the relations. A final HAC clustering is applied to the sentence representations so that similar representation of a relation are mapped to the same cluster. Our evaluation results demonstrate that our method outperforms the state-of-the-art relation clustering method by 5.8% pairwise F1 score. The code for feature building and dimensionality reduction is publicly availableFootnote 3.