Automated Relational Triple Extraction from Unstructured Text Using Transformer

Hari, Akshay; Kumar, Priyanka

doi:10.1007/978-981-19-1742-4_40

Akshay Hari⁴⁰ &
Priyanka Kumar⁴⁰

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 893))

Included in the following conference series:

International Conference on Electrical and Electronics Engineering

643 Accesses

Abstract

In modern times, large amount of textual data is generated. Quick comprehension of knowledge from this massive amount of data is difficult for human beings as well as machines. In this paper, we propose a deep learning based framework for joint extraction of entities and relations from unstructured text. This will be implemented with state-of-the-art Transformer based language model. Our model is a light version of the existing state-of-the-art models for the same task with only half of their trainable parameter while maintaining good evaluation scores. The model is trained and tested on NYT and WebNLG dataset and evaluation is done using metrics such as Precision, Recall and F1 scores.

Access provided by Autonomous University of Puebla. Download conference paper PDF

SETFF: A Semantic Enhanced Table Filling Framework for Joint Entity and Relation Extraction

Joint Entity and Relation Extraction with Triple Discrimination

A survey on neural relation extraction

Article 15 September 2020

Keywords

1 Introduction

In the modern age, there has been an increase in data. These data are mostly stored in the electronic form. The most common is the textual form in which information is stored in an unstructured manner. In order to this data to be useful, we should be able to retrieve most important information from this text seamlessly.

In this work, we propose a deep-learning based approach to extract relational triples in text. Relational triples are entities in a sentence which are in the form subject - predicate - objects. These relational triples can then be used for knowledge engineering applications.

The relational triples are extracted from unstructured text using a DistilBERT [1, 2] based transformer language model. One of the major highlights of our transformer-based model is that it will be able to capture dependencies in long sentences. Our model is also capable of extracting sentences with overlapping entities. This is a case where triples share same entities and relations. This scenario is explained in detail in the coming section. The final important aspect of our model is the joint entity-relation extraction. In the earlier models, entities and relations were separately learned in a pipe-lined manner, which resulted in error propagation from one stage to another.

1.1 Overlapping Entities

Earlier models for this task were not able to handle sentences with overlapping entities. This is a scenario where entities are shared by multiple triples in same sentence. This can be categorized into mainly two types: Single Entity Overlapping (SEO) and Entity Pair Overlapping (EPO). Single Entity Overlapping occurs when multiple triples have same entity shared as subject or object. Entity Pair Overlap occurs when multiple triples have same entity pairs. A visual representation of these scenarios is given in Fig. 1.

1.2 Transformers

Most of the state-of-the-art models in the Natural Language Processing domain currently uses Transformer based language models. Transformer models are a new type of deep learning model [3] which uses attention mechanism to find global dependencies in input and output. The transformer model processes sentences in a non-sequential manner which helps in processing sentence as whole rather than word by word. All the above features were not prevalent in earlier deep neural network-based models for same task. However, most of the state-of the-art transformers have very high number of layers and parameters.

In our work, we are mainly using encoder mechanism of the transformer for language modelling. Our model can be summarized as follows. First raw textual inputs are converted into tokens using tokenizer. Then these tokens along with masks are fed into the encoder module to get the embedding. Since we’re using DistilBERT based model, the output embedding is contextual in nature. This embedding along with the embedding of triplet labels are fed into the model for training. The loss is calculated in propagation with subject’s and object’s head and tail position in the sentence. The final output is the subject’s and object’s head and tail position in the sentence along with the relation.

2 Related Works

In the Information Extraction or Relation Extraction domain, one of the earlier notable work is [4] extracting features using Support Vector Machines. Later [5] approached the problem with a two-step solution, first is finding all entities using Named Entity Recognition (NER) and then classifying all the extracted entity pairs using relation classification (RC). These pipeline-based approaches however suffered from error propagation problem. To address this issue, joint models [6] have been proposed which learns entities and relations together. The earlier works, however did not addressed the problem of overlapping entities encountered in a sentence i.e. multiple triples in same sentence sharing same entities. This problem was only recently addressed using deep neural network based models in the works of [7], which is based on sequence-to-sequence learning with copy mechanism using Bi-directional LSTM. Later the evaluation scores were improved by [8] using Graph Convolutional Networks and Bi-LSTMs. The recent works by [9] and [10] further improves the evaluation scores using BERT based transformer language model. Other recent works involving the usage of transformers in knowledge extractions include [11,12,13].

3 Dataset

For training and testing of our relation extraction framework, we are using two public dataset, New York Times dataset and WebNLG dataset. The original NYT dataset [14] was created with distant supervision approach and WebNLG dataset [15] for Natural Language Generation. These datasets have been modified as per the requirement [7]. The resulting NYT dataset consists of 24 classes, 56195 training data, 5000 validation data and 5000 test data. The WebNLG dataset consists of total 5019 training data, 500 validation data and 703 test data. Detailed information is given in Table 1.

Total train data in each dataset used for training, however testing is done on individual component. The testing data can be classified into three types Normal, Entity Pair Overlap and Single Entity Overlap. The testing data can be further classified on basis of number of relational triples exists on a single sentence. All the testing data without categorized is marked as main. Tabulated information of the dataset is given in Table 2. In the Table 2, for the rows ‘Triple-i’, i denotes number of triples in a single sentence.

Table 1. Dataset information

Full size table

Table 2. Categorization of testing data

Full size table

4 Model Architecture

For the relational extraction model, we followed the work of [9, 10] which was implemented using BERT based encoder and Graph Neural network. We have optimized the size of the same using DistilBERT based transformer framework without using Graph Network Layer from baseline as accuracy gains from Graph Neural Network was negligible in our experiments when considering number of trainable parameters it added. This allowed us to significantly reduce the trainable parameters without compromising much of accuracy as well as lowering the model training time.

For relation extraction framework (Fig. 2), our work consists of two parts: encoding words from input sentence into vector embeddings and encoding each relation into vectors and then subject and object tagger based relational triple extraction.

The problem can be formulated as mentioned. Given a sentence x, and set of all triplets (s,r,o) in training set T, our goal is to maximize the data-likelihood in the training set. This can be mathematically defined as mention in Eq. 1:

$$ \begin{aligned} & \,\,\,\,\,\prod\limits_{{(s,r,o) \in T}} {p((s,r,o)|x)} \\ & = \prod\limits_{{s \in T}} p (s|x)\prod\limits_{{(r,o) \in T{\mid }s}} {p((r,o)|x,s)} \\ & = \prod\limits_{{s \in T}} p (s|x)\prod\limits_{{r \in T{\mid }s}} p (o|x,s,r)\prod _{{r \in R\backslash T{\mid }s}} p\left( {o_{\emptyset } |x,s,r} \right) \\ \end{aligned} $$

(1)

where T | s is the triplet set with s as subject in T. Similarly, (r,o) ∈ T | s is the set of all relation-object pair in T. R is the set of all relations and R\T | s means all the relations except subject s in T. $o_{\emptyset} $ represents all relations except those in triplet T |s will have no corresponding objects.

First, for a given input sentence, a pre-trained DistilBERT encoder is used for extracting tokens for each word and for each predefined relation, an embedding is created as shown in the Eq. 2.

$$ \begin{array}{*{20}c} {\left[ {h_{1} ,h_{2} , \cdots h_{n} } \right] = E_{D} \left( {\left[ {w_{1} ,w_{2} \ldots w_{n} } \right]} \right)} \\ {\left[ {p_{1} ,p_{2} \ldots p_{m} } \right] = W_{r} E\left( {\left[ {r_{1} ,r_{2} \ldots r_{m} } \right]} \right) + b_{r} } \\ \end{array} $$

(2)

where w_i is word from input sentence and h_i is the output token from DistilBERT encoder E_D Similarly, p_i is the output after relation embedding matrix E embeds predefined relations r_i. W_r and b_r are trainable parameters.

For relation extraction, subject taggers and object taggers are used. The subject tagger defined in Eq. 4 will identify all possible subjects in the word nodes. More specifically, it will tag the head and tail of the subject using sigmoid function, defined in Eq. 3.

$$ \sigma (x) = \frac{1}{{1 + e^{ - x} }} $$

(3)

The sigmoid function maps the values between 0 and 1.

$$ \begin{gathered} P_{i}^{{ss_{-} head}} = \sigma \left( {W_{{s_{ - } head}} {\text{Tanh}} \left( {h_{i}^{o} } \right) + b_{{s_{ - } head}} } \right) \hfill \\ \,\,\,\,P_{i}^{{s_{t} tail}} = \sigma \left( {W_{{s_{ - } {\text{tail }}}} {\text{Tanh}} \left( {h_{i}^{o} } \right) + b_{{s_{ - } tail}} } \right) \hfill \\ \end{gathered} $$

(4)

where $P_{i}^{s\_head} , P_{i}^{s\_tail}$ are the probabilities of identifying the ith word as head and tail position of the subject respectively which is calculated by the sigmoid function σ. The values $W_{{s_{ - } head \, }} ,W_{{s_{ - } tail \, }} ,b_{{s_{ - } h_{head \, } }} ,b_{{s_{ - } tail }}$ are trainable weights. $h_{i}^{o}$ is the encoded representation of the word from previous stage.

Similarly, the object tagger, defined in Eq. 5 uses encoded word token which is different from token used by subject tagger.

$$ \begin{gathered} P_{i}^{{o\_head}} = \sigma \left( {W_{{o_{ - } head}} \overline{h}_{ijk} + b_{{o_{ - } head}} } \right) \hfill \\ \,\,\,\,P_{i}^{{o_{ - } tail}} = \sigma \left( {W_{{o_{ - } tail}} \overline{h}_{ijk} + b_{{o_{ - } {\text{tail }}}} } \right) \hfill \\ \end{gathered} $$

(5)

where $P_{i}^{o\_head } ,P_{i}^{{o_{ - } tail }}$ are the probabilities of identifying the ith word as the head and tail position of the object respectively which is calculated by the sigmoid function σ. The values $W_{{o\_{head}}} , \, W_{{o\_{tail} }} , \, b_{{o\_{head} }} , \, b_{{o\_{tail} }}$ are trainable weights. The term $\overline{h}_{ijk}$ is encoded word token representation which can be defined as

$$ \overline{h}_{ijk} = {\text{Tanh}} \left( {W_{h} \left[ {s_{k} ;p_{j}^{0} ;h_{i}^{0} } \right] + b_{h} } \right) $$

(6)

where s_k is the subject representation of the kth candidate subject, p⁰_j and h^o_i are the encoded representation of the pre-defined relation and word token respectively.

Therefore, in line with Eq. 1, we can define subject tagger and object tagger as Eq. 7 and 8 respectively:

$$ P_{{\theta_{s} }} (s|x) = \prod\limits_{{t \in \left\{ {s_{ - } head \, ,s_{ - } tail} \right\}}} {\prod\limits_{i = 1}^{N} {\left( {P_{i}^{t} } \right)^{{{\rm I}\left\{ {y_{i}^{t} = 1} \right\}}} } } \left( {1 - P_{i}^{t} } \right)^{{{\rm I}\left\{ {y_{i}^{t} = 0} \right\}}} $$

(7)

$$ P_{{\theta_{o} }} (o|x,s,r) = \prod\limits_{{t \in \left\{ {o_{ - } head,o\_tail} \right\}}} {\prod\limits_{i = 1}^{N} {\left( {P_{i}^{t} } \right)^{{{\rm I}\left\{ {y_{i}^{t} = 1} \right\}}} } } \left( {1 - P_{i}^{t} } \right)^{{{\rm I}\left\{ {y_{i}^{t} = 0} \right\}}} $$

(8)

where θ_s and θ_o are the parameters of the subject tagger and object tagger respectively. I{z} = 1 if z is true otherwise it is 0 $y_{i}^{{s\_head}} ,y_{i}^{{s\_tail}} \, {\rm and} \, y_{i}^{{o\_head}} , \, y_{i}^{{o\_tail}}$ are binary tags of subject’s and object’s heads and tails respectively for the ith word in x, For the null object o_∅ in Eq. 1, $y_{i}^{{o_{\varnothing \_head} }} = y_{i}^{{o\varnothing\_tail}} = 0$ for all i.

Taking the logarithm of 1, we get the objective function which is defined in Eq. 9

$$ \begin{aligned} L = & \,\log \prod\limits_{{(s,r,o) \in T}} {p((s,r,o)|x)} \\ = & \sum\limits_{{s \in T_{j} }} {\log } p_{{\theta _{s} }} (s|x) + \sum\limits_{{r \in T_{j} {\mid }s}} {\log } p_{{\theta _{o} }} (o|x,s,r) + \sum _{{r \in R\backslash T_{j} |s}} \log p_{{\theta _{o} }} (o\emptyset |x,s,r) \\ \end{aligned} $$

(9)

The log-likelihood function is then maximized by using Stochastic Gradient Descent during training. The learning rate is set as 0.1 for both datasets.

5 Evaluation Metrics

We used precision, recall and F1-scores as evaluation metrics following the baseline approach. A triplet is considered correct only if its predicate and its corresponding subject and object is correct. Additionally, we also used number of trainable parameter in transformer model for comparison as it will help us to identify the efficiency of the model with respect to neural network size as well as gives us an idea of model training time.

6 Implementation Details and Results

The model is implemented with PyTorch library along with CUDA 11. Base DistilBERT model is used from Huggingface [16] with transformer library version 4.12. For both datasets, the models are set to run on maximum of 60 epochs with an early stopping mechanism. The early stopping mechanism will be triggered if there is no improvement in the score for 15 consecutive epochs. Both of the datasets used Stochastic Gradient Boost optimizer with a learning rate of 0.1. The training data is further split into training and validation data. The hyperparameters are determined from this validation data.

We were able to significantly reduce the number of trainable parameters. A comparison of trainable parameters with other transformer-based model is given in Table 3.

Table 3. Comparison of trainable parameters

Full size table

The detailed result of our model from testing of different categories of testing data is tabulated and given in Table 4. It is observable that our model performed fairly good in all triple category scenarios. A slight drop in the score in WebNLG dataset when compared with NYT dataset maybe attributed to the fact that WebNLG main category has most of the data in SEO and EPO form. For the NYT dataset, our model performed the best when there were 4 triples in the sentence and for the WebNLG dataset, the model performed well when there were 3 triples in the sentence. Therefore, from these results, it is evident that our transformer based model is perfectly capable of handling complex scenarios in relational triple extraction.

Table 4. Evaluation results on NYT and WebNLG dataset

Full size table

7 Conclusion and Future Scope

In this paper, we proposed a light version of transformer-based model for Relation Extraction based on joint entity-relation extraction framework. Our model performed well in all triplet overlapping scenarios such as Entity Pair Overlapping (EPO) and Single Entity Overlapping (SEO) and can extract multiple triplets from same sentence while reducing the number of trainable parameters in the transformer. In the future, we aim to reduce the number of trainable parameters further while improving the performance.

References

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Zhou, G., Su, J., Zhang, J., Zhang, M.: Exploring various knowledge in relation extraction. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 427–434 (2005)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)
Google Scholar
Miwa, M., Sasaki, Y.: Modeling joint entity and relation extraction with table representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1858–1869 (2014)
Google Scholar
Zeng, X., Zeng, D., He, S., Liu, K., Zhao, J.: Extracting relational facts by an end-to-end neural model with copy mechanism. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (vol. 1: Long Papers), pp. 506–514 (2018)
Google Scholar
Fu, T.-J., Li, P.-H., Ma, W.-Y.: Graphrel: modeling text as relational graphs for joint entity and relation extraction. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1409–1418 (2019)
Google Scholar
Wei, Z., Su, J., Wang, Y., Tian, Y., Chang, Y.: A novel cascade binary tagging framework for relational triple extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 1476–1488 (2020)
Google Scholar
Zhao, K., Xu, H., Cheng, Y., Li, X., Gao, K.: Representation iterative fusion based on heterogeneous graph neural network for joint entity and relation extraction. Knowl.-Based Syst. 219, 106888 (2021)
Article Google Scholar
Veena, G., Athulya, S., Shaji, S., Gupta, D.: A graph-based relation extraction method for question answering system. In: 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 944–949. IEEE (2017)
Google Scholar
Nair, A.M., Bindu, K.R.: Semantic role labelling using transfer learning model. In: Journal of Physics: Conference Series, vol. 1767, p. 012024. IOP Publishing (2021)
Google Scholar
Gangadharan, V., Gupta, D., Amritha, L., Athira, T.A.: Paraphrase detection using deep neural network based word embedding techniques. In: 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI) (48184), pp. 517–521. IEEE (2020)
Google Scholar
Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10
Chapter Google Scholar
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planning. In: 55th Annual Meeting of the Association for Computational Linguistics (ACL) (2017)
Google Scholar
Wolf, T., et al.: Huggingface’s transformers: state-of-the-art natural language processing. arXiv preprint arXiv:1910.03771 (2019)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore, India
Akshay Hari & Priyanka Kumar

Authors

Akshay Hari
View author publications
You can also search for this author in PubMed Google Scholar
Priyanka Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Akshay Hari .

Editor information

Editors and Affiliations

School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Hawthorn, VIC, Australia
Saad Mekhilef
Office of the International Relations, Bharath Institute of Higher Education and Research, Chennai, India
Rabindra Nath Shaw
Department of Management and Innovation Systems, University of Salerno, Fisciano, Salerno, Italy
Pierluigi Siano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hari, A., Kumar, P. (2022). Automated Relational Triple Extraction from Unstructured Text Using Transformer. In: Mekhilef, S., Shaw, R.N., Siano, P. (eds) Innovations in Electrical and Electronic Engineering. ICEEE 2022. Lecture Notes in Electrical Engineering, vol 893. Springer, Singapore. https://doi.org/10.1007/978-981-19-1742-4_40

Download citation

DOI: https://doi.org/10.1007/978-981-19-1742-4_40
Published: 27 April 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1741-7
Online ISBN: 978-981-19-1742-4
eBook Packages: EnergyEnergy (R0)

Publish with us

Policies and ethics

Automated Relational Triple Extraction from Unstructured Text Using Transformer

Abstract

Similar content being viewed by others

SETFF: A Semantic Enhanced Table Filling Framework for Joint Entity and Relation Extraction

Joint Entity and Relation Extraction with Triple Discrimination

A survey on neural relation extraction

Keywords

1 Introduction

1.1 Overlapping Entities

1.2 Transformers

2 Related Works

3 Dataset

4 Model Architecture

5 Evaluation Metrics

6 Implementation Details and Results

7 Conclusion and Future Scope

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automated Relational Triple Extraction from Unstructured Text Using Transformer

Abstract

Similar content being viewed by others

SETFF: A Semantic Enhanced Table Filling Framework for Joint Entity and Relation Extraction

Joint Entity and Relation Extraction with Triple Discrimination

A survey on neural relation extraction

Keywords

1 Introduction

1.1 Overlapping Entities

1.2 Transformers

2 Related Works

3 Dataset

4 Model Architecture

5 Evaluation Metrics

6 Implementation Details and Results

7 Conclusion and Future Scope

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation