Keywords

1 Introduction

With the advent of the big data and the development of the Internet, a large number of unstructured texts have been produced on the Internet, among which a large number of unstructured texts contain a lot of useful unstructured information, which are ambiguous and fuzzy, making it difficult for computers identifying and acquiring knowledge. Therefore, how to mine valuable information from these unstructured texts and present it in a way that is “easy to understand” by computers has become a big challenge in the field of NLP.

As a dynamic semantic unit, event has attracted more and more attention. It is generally accepted that an unstructured text is composed of multiple events, which contain various static concepts, such as time, place, participants, etc., and there are generally some semantic relations between events, such as causal relation, temporal relation, etc. Event relation is important and has great research significance in fields of medicine, politics, and aviation safety.

In the early research of event relation, it is usually inclined to use pattern matching and traditional machine learning methods to classify relation. These methods usually rely on the experience of experts to obtain the corresponding features and external resources, which are time-consuming and labor-intensive, and may ignore some important recessive characteristics, while the accuracy rate is high, the recall rate is not ideal. In recent years, the accumulation of large-scale annotation data and the development of deep learning have promoted the deep neural network and its application in the extraction of event relations. The neural network model not only reduces the cost of domain experts’ work, but also uses some hidden features. Although a large amount of annotated data has been accumulated in the corpus at present, the sample number of some relationship types is still too rare, resulting in the failure to identify them by using neural network method and the accuracy cannot be guaranteed. At present, some researchers have put forward deep weakly supervised learning method, which combines rule-based method, feature-based method and neural network model, and has achieved good results. Relation extraction is an important sub-task of information extraction. The main work of this paper is to summarize the methods of event relation extraction for unstructured text.

2 Evolution of Event Relation Extraction Method

2.1 Temporal Relation Extraction Method

The research of event temporal relation extraction (ETE) has been carried out earlier, and with the development of deep learning, various types of neural networks have been successively applied to ETE tasks, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long-Short Term Memory networks (LSTM) and so on. Dligach et al. [1] proposed the model structure of ETE using CNN with tags as input and Bi-LSTM with richer semantics. Zhou et al. [2] proposed adding the attention mechanism to the Bi-LSTM model to learn long-distance dependencies, while Zhang et al. [3] combined the multi-head attention mechanism with the non-linear network layer to further improve the representation ability of the attention network. Li et al. [4] introduced BERT into the Bi-LSTM structure, and integrated multi-dimensional event information to mine cross-sentence timing information. Compared with the traditional ETE method, the ETE method based on deep learning can learn automatically and has higher performance and stronger generalization ability without relying on manual features.

2.2 Causal Relation Extraction Method

Traditional event causal relation extraction (ECE) research mainly uses lexical features, semantic features, manual construction patterns and other methods to extract causal relation between events, while today ECE methods are similar to ETE, mostly using neural network methods based on deep learning. Silva et al. [5] and Li et al. [6] introduced CNN into ECE, and Dasgupta et al. [7] proposed to use LSTM structure to explore the potential semantic information in the text. Zheng et al. [8] use the idea of divide and conquer to decompose ECE into two sequence labeling tasks, use CRF to complete sequence labeling, introduce BERT and CNN to enhance the expression ability of event causal features, and introduce residual ideas to capture important semantic features of text. It effectively solves the problems of insufficient semantic representation of causal relations and weak boundary recognition capabilities. Li et al. [9] solves the problem of insufficient data through contextual string embedding, and introduces the multi-head self-attention mechanism into the BiLSTM-CRF structure to understand the interdependence between causal words.

3 Event Relation Extraction Model Based on Deep Learning

Although the traditional event relation extraction method has high accuracy, it is difficult to be applied in practice because it is often time-consuming and costly. With the development of deep learning, many methods based on deep learning have performed well in the field of relation extraction. Relation extraction methods based on deep learning can be divided into strong supervision methods and weak supervision methods. At present, strong supervision techniques based on deep learning in the task of event relationship recognition are still the best.

3.1 Strong Supervision Model Based on Deep Learning

Traditional relation extraction methods based on supervised learning mostly rely on feature engineering. In recent years, a variety of strong supervised relation extraction models based on deep learning can solve the artificial dependence problem in traditional methods. The strong supervision method based on deep learning usually regards event relation extraction as a classification problem, and trains the model through the existing labeled corpus to obtain the optimal model, and finally judges the output of the model to achieve the classification purpose. According to different learning methods, the event relation extraction methods based on deep strongly supervision can currently be divided into two types: pipeline method and joint learning method.

Pipeline Method.

Most of the early deep strongly supervision methods used in relation extraction are based on pipeline. The idea of the pipeline method based on the event relationship is to use the pipeline form to decompose the relation extraction task into two sub-tasks: event extraction and relation classification. The two sub-tasks are separated from each other and do not interfere with each other. The relation classification task is to classify the relationship between events on the basis of event extraction, and the result of the relation classification depends on the result of the event extraction task.

Since the pipelined model is relatively simple to construct, many scholars have proposed the application of pipelined methods to solve problems in the early NLP. The DMCNN model proposed by Chen et al. [10] is a typical pipe-like event extraction model, which transforms event extraction into two sub-tasks, event recognition and argument role classification. Liu et al. [11] divided the relation extraction problem into two parts: entity extraction and relation classification, and put forward for the first time the use of convolutional neural network to deal with the relation classification between two given entities. Most of the early pipelined methods were based on extended optimization of convolutional neural network and recurrent neural network. Zeng et al. [12] have applied CNN model to relation classification, and used convolution DNN algorithm to replace the traditional feature engineering method, and used convolution depth neural network (DNN) to extract lexical and sentence features, and proposed that the position feature (PF) is used to encode the relative distance between the current word and the target word. Finally, the relation is classified by the hiding layer and the softmax layer, automatic learning can be achieved without any external resources or NLP modules for optimal performance at the time.

In the follow-up research, researchers proposed some models based on CNN structure for relation extraction, such as Multi-Window CNN [13], Multi-Level attention CNNs [14], etc., which have made good progress, but CNN cannot handle global features and time series information, especially for the long-distance dependence of the text sequence, the effect of the CNN model will be worse. Therefore, Zhang et al. [15] proposed to use RNN to learn remote semantic information. Compared with CNN, the RNN model can handle long-distance patterns and is very suitable for time-series feature extraction models. It is difficult to capture long-term time association because of gradient explosion or disappearance in recurrent neural network model with time, and LSTM can solve this problem well. LSTM is an improvement of RNN, RNN can only maintain short-term memory because of gradient disappearance, and LSTM can solve the problem of gradient disappearance to some extent by introducing memory unit and gate control unit to combine short-term memory with long-term memory. Xu et al. [16] proposed the SDP-LSTM model to classify the relationship between two entities in a sentence, and used the shortest dependency path (SDP) between two entities to obtain heterogeneous information, at the same time, the multi-channel LSTM network is used to integrate information from heterogeneous sources in a dependent path.

Relational classification is to classify the relation between entity pairs, and event relation extraction also needs to judge whether there is a relationship between temporal entity pairs or causal entity pairs. Taking into account that temporal relation extraction is similar to relational classification task, Cheng et al. [17] proposed to apply DP and LSTM to ETE in view of the remarkable effect of dependency path (DP) and LSTM in relational extraction task, aiming at the problem of how to represent the dependent paths between the cross-sentence entities, the Bi-LSTM model along the dependent paths is adopted and a “common root” hypothesis is proposed to extend the DP representation of the cross-sentence join, in which both sentences are represented as dependent paths, share a “common root”. Ning et al. [18] considering whether ETE can benefit from external resources, a probabilistic knowledge base which is named Temprob is constructed according to the fact that event words themselves contain time information that can be used as prior knowledge, the ETE task is divided into two steps: extracting event words and extracting relationship.

Pipelining method needs to identify event trigger words and related arguments first, and then classify event relations. This method ignores the internal correlation between two sub-tasks, and it is easy to lose information. It is easy for the error of the previous task to affect the subsequent relational classification task, resulting in error propagation. Pairing unrelated pairs of events creates information redundancy. These will interfere with the performance of relation classification.

Joint Learning Method:

Event relationship and event information are closely interactive. The joint learning method combines two sub-tasks and optimizes them together in a unified model. This method can use the potential correlation between the two sub-tasks to solve the problems in the pipeline.

In the study of relation extraction, Miwa et al. [19] proposed an end-to-end neural network model for joint modeling of entities and relations for the first time. Although this model introduced the idea of joint learning, the model learning process was still similar to the pipeline method. Zheng et al. [20] proposed a hybrid neural network model to alleviate the problem of long-distance dependency between entity tags unresolved by Miwa’s method. Both of these two models adopt the method of parameter sharing, which can effectively improve the problem of error propagation and neglecting the inherent correlation between subtasks existing in the pipeline method, and improve the robustness of the model, but it is still easy to produce information redundancy.

Aiming at the problem of information redundancy in the shared parameter method, Zheng et al. [21] proposed a new sequence labeling scheme, which converts the joint extraction problem into a sequence labeling problem. Katiyar et al. [22] proposed a new LSTM model based on the attention mechanism to jointly identify entities and relation, and after the current location entity is identified, the attention mechanism is used to compare its similarity with all previous location entities, which is considered to be real joint learning. In response to the problem that the previous joint learning model relied heavily on artificial features and external NLP tools [19,20,21,22], Bekoulis et al. [23] proposed to use CRF to model entity recognition and use relationship classification as a multi-head selection problem. The final result also proved that the model is superior to the automatic feature extraction model at that time.

Inspired by the joint learning model of entities and relation, Han et al. [24] proposed a Neural SSVM (Neural Structured Support Vector Machine) model for the first time to extract events and temporal sequence relations between events simultaneously. This model belongs to end-to-end structured joint learning model. In the bottom layer, the word representation obtained by the BERT model is input to the BI-LSTM layer for coding. By sharing embedding in E-E module and relation extraction module, the joint identification of events and event timing relationship is realized by placing it in SSVM to judge whether events and event relations exist or not. In view of the advantages of joint extraction method for sequence labeling in relation extraction, Li et al. [9] designed a causality labeling scheme to directly extract event causality and proposed a SCITE (self-attentional BILSTM-CRF transfer embedding) model, introducing a self-attentional mechanism to capture long-term dependencies between causality. Experiments show that the SCITE model based on the causal labeling scheme is effective.

The joint learning method combines the two sub-tasks of event identification and relationship classification, attaches importance to the interaction and association between the two sub-tasks, and effectively solves the problems existing in the pipeline method. By sharing parameters, the problem of error propagation and information loss is alleviated. Sequence labeling effectively solves the problem of entity redundancy in shared parameters. However, no matter the joint learning method based on shared parameters or sequence labeling is adopted, it still does not have a good effect on the problem of overlapping relationship.

3.2 Week Supervision Model Based on Deep Learning

At present, the deep learning technology based on strongly supervised learning has achieved great success in the field of event relation extraction, with high accuracy and recall rate. However, the model based on strongly supervised learning needs to rely on a large number of hand-marked training data, which is costly, time-consuming and laborious. In contrast, the weakly supervised learning method with low labeling cost has attracted more and more researchers’ attention and has been initially applied in the field of relation extraction. Methods based on deep weakly supervision can be divided into three categories: semi-supervision method, remote supervision method and unsupervised method.

Semi-supervision Method:

Compared with the strong supervised learning method, the semi-supervised method only needs a small number of labeled samples and a large number of unlabeled samples, better suited to the current era of big data.

The commonly used methods for semi-supervised relation extraction include Bootstrapping, Collaborative training, and annotation propagation, etc. At present, the most commonly used semi-supervised learning method in the field of relation extraction is Bootstrapping, which uses a small number of seed labeled samples to train the model in order to extract more entity pairs of relations, and then iteration training was performed again. Brin et al. [25] first introduced the semi-supervised method based on Bootstrapping in the field of relation extraction and established the DIPRE system to automatically obtain new relationship instances from the World Wide Web. Kipf et al. [26] proposed that graph convolutional networks can be used for semi-supervised classification, which can effectively learn the hidden layer representation of graph structure and node features. As the joint learning method based on deep strong supervision fails to solve the overlapping relation problem well, Phi et al. [27] proposed to creatively put forward the sorting of automatic seed selection and remote supervised data noise reduction tasks in Bootstrapping.

The semi-supervised method only needs to construct the initial seed set manually, which can reduce the dependence of the event relation extraction on the tagged corpus to some extent, but it requires high quality of the initial seed, and the initial iteration can not guarantee absolutely accurate, and there will be an inevitable decrease in the accuracy rate under iteration, that is, semantic drift.

Remote Supervision Method:

As early as 2009, Mintz et al. [28] first proposed the application of remote surveillance in relation extraction, which can be aligned with unstructured text using relational instances in the knowledge base, based on the assumption that a pair of entities contains a certain relationship, as long as the sentences containing the pair of entities contain such a relationship, a training corpus and a classifier are automatically constructed to solve the traditional method’s dependence on manual annotation. However, the assumption of the remote supervision method is too positive, which can easily lead to the problem of incorrect labeling and introduce a lot of noisy data. At present, the existing literature has proposed a variety of effective solutions to the remote supervision noise problem and the wrong label problem, such as the introduction of multi-instance learning, attention mechanism and other methods.

Aiming at the noise problem, Zeng et al. extended to remote supervision on the basis of the original method [12], and proposed the PCNN model [29]. The piece-wise convolutional neural network (PCNN) was used to improve the original global max pooling and made local Max pooling, and use multi-instance learning to solve the problem of mislabeling. In addition, the previous methods applied the supervision model to the designed features when acquiring labeled data through remote supervision. These features usually come from pre-existing NLP tools. Because NLP tools inevitably have errors, they will cause errors in the features. The extraction continues to propagate or accumulate, and most deep learning methods require sentence-level tags. In this regard, Lin et al. [30] added an attention mechanism to the PCNN model to reduce the negative impact of false labels, and used CNN to express the relationship with the semantic combination of sentence embeddings, so as to make full use of the information of the training knowledge base, and achieved good results.

Unsupervised Method:

Unsupervised learning does not need to label training corpus at all. As early as 2004 [31], unsupervised learning has been applied in the field of relationship extraction, and the extraction method is bottom-up. Vo et al. [32] used the Open IE system to automatically extract the relational triplet to construct the event network, and realized unsupervised automatic identification of the potential time and causality between two nodes in the event network by performing a specific form of traversal on the already constructed event network.

Unsupervised event relation extraction is don’t need to depend on the manual annotation corpus, also don’t need any predefined relationship types, can be automatically extracted in unstructured text event, for strong adaptability in the multidisciplinary event relation extraction, domain migration performance is good, but the current extraction model based on unsupervised event relation of overall accuracy and recall rate are low. For some low frequency instance relation extraction rate, it is difficult to quantify and unify the evaluation standard of event relation extraction.

3.3 Comparative Analysis of Event Relation Extraction Models

At present, both the deep strongly supervised learning method and the deep weakly supervised learning method can achieve good results, and the deep strongly supervised learning method achieves the best effect in the task of event relation extraction. CNN, RNN, LSTM, and later graph neural networks, hybrid neural networks, etc. have become common structures for event relationship extraction models. Introducing attention mechanisms and shortest dependency paths into these structures have also become common practices for event relationship extraction. The pipeline method based on deep strongly supervision has problems of information loss, error propagation and information redundancy due to ignoring the internal correlation between two sub-tasks. A similar joint learning method is proposed in the field of event relation extraction, which combines event extraction and event relation classification to enhance the interaction between two sub-tasks, it can be divided into joint learning methods based on shared parameters and sequence annotation. The shared parameters method can solve the problems of information loss and error propagation in pipeline, sequence annotation can alleviate the problem of information redundancy in shared parameter method.

Compared with the deep strongly supervision method, the weak supervision method only needs a small amount of annotated corpus to achieve the effect close to the strong supervision method, which can save a lot of costs in practical application and is very practical. However, semi-supervised method is prone to semantic drift due to continuous iteration, so it is necessary to improve the quality of seed set. Remote supervision method is easy to lead to mislabeling and noise problems, attention mechanism and multi-instance learning are introduced to reduce noise and error label. The unsupervised method has a low extraction rate for some low-frequency relationship instances. Compared with the strong supervision method, the weak supervision method is still immature in the field of event relation extraction and is still in the early stage of exploration. It is difficult to accurately calculate the accuracy and recall rate, and more effective evaluation methods are still in need.

4 Conclusion and Future Work

In recent years, good results have been achieved for event relation extraction, but it is still in the early stage of exploration and there are still some difficulties to be solved. At present, there is no mature solution for cross-domain, cross-language, cross-data set and other aspects of event relation extraction model. In the future, transfer learning can be introduced into the field of event relation extraction, and the alignment of knowledge base and unstructured text can improve the generalization ability of the model. In addition, compared with the strong supervision method, the method based on deep weakly supervision only requires a small amount of annotated corpus or even no annotated corpus at all, which is of great practical application significance. In the future, the application of weak supervision method in the field of event relation extraction and the accuracy of weak supervision evaluation method should be strengthened.