Exploiting Extensive External Information for Event Detection Through Semantic Networks Word Representation and Attention Map

Wang, Zechen; Wang, Shupeng; Zhang, Lei; Wang, Yong

doi:10.1007/978-3-030-77961-0_56

Zechen Wang¹³,
Shupeng Wang¹⁴,
Lei Zhang¹⁴ &
…
Yong Wang¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12742))

Included in the following conference series:

International Conference on Computational Science

1631 Accesses
1 Citations

Abstract

Event detection is one of the key tasks to construct knowledge graph and reason graph, also a hot and difficult problem in information extraction. Automatic event detection from unstructured natural language text has far-reaching significance for human cognition and intelligent analysis. However, limited by the source and genre, corpora for event detection can not provide enough information to solve the problems of polysemy, synonym association and lack of information. To solve these problems, this paper proposes a brand new Event Detection model based on Extensive External Information (EDEEI). The model employs external corpus, semantic network, part of speech and attention map to extract complete and accurate triggers. Experiments on ACE 2005 benchmark dataset show that the model effectively uses the external knowledge to detect events, and is significantly superior to the state-of-the-art event detection methods.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Event Relation Reasoning Based on Event Knowledge Graph

Chinese Event Extraction Based on Hierarchical Attention Mechanism

DE3TC: Detecting Events with Effective Event Type Information and Context

Article Open access 06 March 2024

Keywords

1 Introduction

Event detection mines the necessary information from unstructured data to represent events structurally with 5W1H framework [21]. An event is composed of a trigger which indicates the existence of the event, and several arguments which constitute the detail information [9]. Thus, the target of event detection is to detect whether trigger existing in the sentence, and determine which event type it belongs to.

Event detection is conducive to the storage, retrieval, representation and analysis of event information. Although event detection has been widely used in various areas, [17] such as abnormal public opinion detection [11, 20], news recommendation systems [7, 19], risk analysis application [1], monitoring system [14] and decision support tools [4], it still faces tremendous difficulties and challenges. There are great obstacles in event detection. Especially, it is difficult to deal with words with multiple meanings on the basis of the information contained in the local context. For example, in the field of financial analysis, an investment event can be detected from the sentence “The charity was prospected of giving money from a bank.” It can attribute to three problems in event detection tasks as follows: First, \(\mathbf{Polysemy}\). In different languages, a word may represent different types of entities in different contexts. Thus, the meaning of “bank” can not be judged directly in the sentence from the above example. Consequently, the wrong judgment causes the problem that polysemous word can not be extracted as the correct trigger. Finally, the event element related to investment would be wrongly judged, which may result in the loss of important elements in further analysis and eventually cause unwise investment decision. Second, \(\mathbf{Synonym }\) \(\mathbf{Association}\). Some word may only appear once in a corpus, but it may have several synonyms in the same corpus. Hence, it is difficult to establish a relationship between synonyms only through local corpus. For example, the word “giving” in the above example is synonymous with the verb “investing” which expresses investment behavior. Yet these words do not appear in the same context, so the words like “investing” can not be directly associated with the word “giving”. The word “giving” dose not always represent a investment-related behavior, but may have other meanings such as “present” and “supply”. Therefore, if it does not associate with words like “investing”, it is nearly impossible to classify the word as a trigger of investment event. Third, \(\mathbf{Lack}\) \(\mathbf{of}\) \(\mathbf{Information}\). The context usually includes word information, word position information and so on. More seriously, the information does not contain part of speech information. Nonetheless, it is crucial to identify and classify triggers in the event detection task. But if part of speech information is not used sufficiently, it would be difficult to extract triggers which tend to hide in nouns, verbs and adjectives. It will reduce the accuracy of the extraction of triggers.

Therefore, the above three problems need to be solved at the same time to detect events completely and accurately.

2 Related Works

In recent years, with the increasingly wide application of event detection, there are some related research works. The existing studies are in mainly three categories according to the problems they solve: polysemy [3, 6], synonym association [8, 12] and lack of information [16].

The problem of polysemy is mainly reflected in the stage of word expression. Various word embedding models, such as CBOW and skip-gram, have been proposed. These models can not generate different vectors for different meanings of the same word. At present, the methods to solve the problem of polysemy include: 1) The methods cluster the meanings of the same word corresponding to different contexts to distinguish the meanings of the word [3]; 2) The methods use cross language information. These methods translate different meanings of the same word into other languages to make each meaning of the word corresponding to one word in the target language [6]. But these methods have a disadvantage that a word is fixed to one word vector, no matter how many meanings the word has. Accordingly, the meanings of a word can not be distinguished. As a result, this problem can not be well solved.

The methods aim at solving the problem of synonym association mainly describe the association between synonyms by using rules directly according to dictionaries or through external corpus information. Synonym association plays an important role in event detection task, especially in the event type classification mentioned above. Synonyms can be associated on the basis of event-trigger dictionaries [8] or synonym sets according to semantic networks [12]. However, these methods need to construct a word list in advance to support event detection, and update the word list if corpus changes. Consequently, these methods can only be used in limited situations, and can not solve the problem of synonym association in broad areas.

To solve the problem of lack of information, many features such as vocabulary, syntax and semantics can be used as input. For example, an idea proposed by Liu et al. [16] claims that triggers and arguments should receive more attention than other words, so they construct gold attention vectors which only encode each trigger, argument and context word. Nevertheless, the constitution of gold attention vectors rely heavily on syntax knowledge and domain knowledge which can be used in event detection.

In summary, due to the limitation of the scope of prior knowledge, the studies on event detection till now can partly solve problems in event detection. Furthermore, they also can not solve the problems of polysemy, synonym association and lack of information at once within a single model.

3 Method

In this section, we present our framework for the proposed EDEEI algorithm. The proposed framework is illustrated in Fig. 1.

The word vector representation module generates three kinds of vectors: Content Word Features (CWF), Semantic Features (SF) and Position Features (PF). The newly-generated word vectors are used for feature extraction, feature optimization and feature selection. Feature optimization refined raw features from feature extraction, and provide processed features to feature selection to capture high-quality features. Finally, the features are inputted into the classifier to obtain the triggers and their classification.

Word Vector Representation. We derive a new word vector structure based on BERT [5] and wnet2vec [18]. To solve the problem of polysemy, the proposed framework utilizes BERT to generate the CWF, which identifies different word meanings of one word according to a variable external corpus. Wnet2vec is exploited in our model to generate SFs which can better express the semantic relationship between synonyms. Wnet2vec has the ability of generating word vectors through the transformation from semantic network to semantic space. Finally, the PF is defined, that is, the relative distance between the current word and the candidate trigger. In order to encode the PF, each distance value is represented by an embedded vector. Let the size of CWF be \(d_{\text{ CWF }}\), the size of SF be \(d_{\text{ SF }}\), and the size of position code be \(d_{\text{ PF }}\). Represent word vector of the i-th word in the sentence as \(x_{i} \in R^{d}\) , \(d=d_{\text{ CWF } }+d_{\text{ SF }}+d_{\text{ PF }}{ }^{*} 2\). Represent sentence of length n as:\(x_{1: n}=x_{1} \oplus x_{2} \oplus \ldots \oplus x_{n}\), where \(\oplus \) is the concatenation operator. The vectors are spliced to form a matrix \(X \in R^{n \times d}\).

Feature Extraction. The convolution layer is to capture and compress the semantics of the whole sentence, so as to extract these valuable semantics into feature maps. Each convolution operation contains a convolution kernel \(w \in R^{h \times d}\), and the window size is h. For example, from the window size of \(x_{i: i+h-1}\), the module generates the feature \(c_{i}\): \(c_{i}=f\left( w \cdot x_{i: i+h-1}+b\right) \), where b is the bias and f is the nonlinear activation function. Apply the calculation process to the sentence \(x_{i: n}\) to generate the feature map \(\mathrm {C}=\left\{ c_{1}, c_{2}, \ldots , c_{n}\right\} \). Corresponding to m convolution kernels \(\mathrm {W}=w_{1}, w_{2}, \ldots , w_{m}\), the result of feature maps generated by M is expressed as \(\left\{ C_{1}, C_{2}, \ldots , C_{m}\right\} \).

Feature Optimization. This module uses the POS tag generation tool provided by Stanford CoreNLP to annotate the sentences. One-hot coding is applied to POS tags of different types, and the coding vector is \(k_1\). According to the coding of each POS tag, POS matrix is generated for sentences of length n, and the matrix size is \(M_{P O S} \in R^{k_1 \times n}\).

The crucial parts corresponding to specific POS tags are emphasised by attention mechanism. The feature optimization module takes POS tags and feature maps as the inputs of attention mechanism. Each feature map generated in feature extraction module is a vector of length \(n-h+1\). The feature maps represented as K are taken as the inputs of attention mechanism. The attention calculation process is as follows. A new random matrix WQ of length w is computed. The product of two vectors is calculated to obtain a new matrix Q. The random matrix WK whose width is w and the length is \(k_1\) is acquired, and the product of the random matrix WK and WQ produces the matrix WV. Calculate the product of WV and feature map to gain the matrix V.

Based on the three generated K, Q, V matrices, an attention matrix Z is calculated by using the following formula:\(\mathrm {Z}={softmax}\,\left( \frac{Q \times K^{T}}{\sqrt{n-h+1}}\right) \mathrm {V}\). Train Wk, WQ, WV matrices. The scoring function is as follows:\(f_{\text{ score } }=\frac{Q \cdot K^{T}}{\sqrt{n-h+1}}\). Then the Z matrix is compressed with max pooling to generate a vector z. Based on the updated WK, WQandWV, the product of z and K constructs a new attention map with the size of \(n-h+1\).

Feature Selection. The feature selection module employs dynamic multi-pooling to further extract the valuable features. Furthermore, the features are concatenated to produce lexical level vectors which contain information of what role the current word plays in an event. The calculation process of dynamic multi-pooling is given as follows: \(\left[ y_{1, p_{t}}\right] _{i} =\max \left\{ \left[ C_{1}\right] _{i}, \ldots ,\left[ C_{p_{t}}\right] _{i}\right\} \), \(\left[ y_{p_{t}+1, p_{n}}\right] _{i} =\max \left\{ \left[ C_{p_{t}+1}\right] _{i^{\prime }} \ldots ,\left[ C_n\right] _{i}\right\} \), where \([y]_{i}\) is the \(i-th\) value of the vector, \(p_t\) is the position of trigger t, and \(C_i\) is the \(i_th\) value in the attention map C. We use the maximum pooling result of each segment as the feature vector L at the sentence level.

Classifier. This module concatenates the CWFs of the current word and the words on the left and right of the current one, to obtain the vector P of length \(3*CWF\). The learned sentence level features and word features are connected into a vector \(\mathrm {F}=[\mathrm {L}, \mathrm {P}]\). In order to calculate the confidence of the event type of each trigger, the feature vector is inputted into the classifier \(O=W_{s} F+b_{s}\). \(W_{s}\) is the transformation matrix of the classifier, \(b_s\) is the bias, and O is the final output of the network, where the output type is equal to the total number of event types plus one to include the “not a trigger” tag.

4 Experiment

The ACE 2005 is utilized as the benchmark experimental dataset. The test set used in the experiment contains 40 Newswire articles and 30 other documents randomly selected from different genres. The remaining 529 documents are used as the training set.

Evaluation of Event Detection Methods. To demonstrate how the proposed algorithm improves the performance over the state-of-the-art event detection methods, we compare the following representative methods from the literature:

(1)
Li’s baseline [13]: Li et al. proposed a feature-based system which used artificially designed lexical features, basic features and syntactic features.
(2)
Liao’s cross-event [15]: The cross-event detection method proposed by Liao and Grishman used document level information to improve the performance.
(3)
Hong’s cross-entity [10]: Hong et al. exploited a method to extract events through cross-entity reasoning.
(4)
Li’s joint model [13]: Li et al. also developed an event extraction method based on event structure prediction.
(5)
DMCNN method [2]: A framework based on dynamic multi-pooling convolutional neural network.

Table 1. Overall performance on the ACE 2005 blind test data

Full size table

In all the methods, EDEEI model has achieved the best performance. Compared with the state-of-the-art methods, F value of trigger identification is significantly improved. The results displayed in Table 1 illustrate three important facts on the method. Firstly, it is necessary to solve the problems of polysemy, synonym association and lack of information at the same time. Secondly, the variable external knowledge can effectively improve the accuracy of event detection. Thirdly, the hierarchical detecting method with feature optimization can make event detection more completely and precisely.

Analysis of Different Word Vectors. This section presents a detailed comparison of the word vectors generated from Word2vec, BERT, wnet2vec and BERT+wnet2vec respectively. The purpose is to test for advantages and disadvantages of BERT+wnet2vec approaches versus other word vectors under the task of event detection.

Table 2. Performance with different word vectors

Full size table

The advantages of using BERT+wnet2vec can be observed visually and quantitatively in Table 2. It can be seen that the combination of BERT and wnet2vec achieves the best performance on both trigger identification and trigger classification.

In conclusion, traditional methods such as word2vec rely on a small corpus to generate word vectors, and can not solve the problem of polysemy and synonym association. Compared with word2vec, the combination of the two methods can obtain the best experimental effect. This proves that BERT+wnet2vec can effectively solve the problem of polysemy and synonym association.

5 Conclusion

This paper addresses three important problems in event detection: polysymy, synonym association and lack of information. In order to solve these problems, we propose a brand new Event Detection model based on Extensive External Information (EDEEI), and give a novel method which involves external corpus, semantic network, part of speech and attention map to detect events completely and accurately. This framework can solve the above three problems at the same time. An attention mechanism with part of speech information is designed to optimize the extracted features and make the features related to triggers easier to capture. The experiments on widely used ACE 2005 benchmark dataset confirm that the proposed method significantly outperforms the existing state-of-the-art methods for event detection. Furthermore, we present numerous qualitative and quantitative analyses about experimental results. In the light of excellent performance and analyses, it is believed that the proposed algorithm can be a useful tool for event detection.

References

Chau, M.T., Esteves, D., Lehmann, J.: A neural-based model to predict the future natural gas market price through open-domain event extraction. ESWC 2611, 17–31 (2020)
Google Scholar
Chen, Y., Xu, L., Liu, K., Zeng, D., Zhao, J.: Event extraction via dynamic multi-pooling convolutional neural networks. In: ACL, pp. 167–176 (2015)
Google Scholar
Chen, Y., Yang, H., Liu, K.: Collective event detection via a hierarchical and bias tagging networks with gated multi-level attention mechanisms. In: EMNLP, pp. 1267–1276 (2018)
Google Scholar
Cheng, D., Yang, F., Wang, X.: Knowledge graph-based event embedding framework for financial quantitative investments. In: ACM SIGIR, pp. 2221–2230 (2020)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: ACL, pp. 4171–4186 (2019)
Google Scholar
Ferguson, J., Lockard, C., Weld, D.S., Hajishirzi, H.: Semi-supervised event extraction with paraphrase clusters. In: NAACL-HLT, pp. 359–364 (2018)
Google Scholar
George, S.K., Jagathy Raj, V.P., Gopalan, S.K.: Personalized news media extraction and archival framework with news ordering and localization. In: Tuba, M., Akashe, S., Joshi, A. (eds.) Information and Communication Technology for Sustainable Development. AISC, vol. 933, pp. 463–471. Springer, Singapore (2020). https://doi.org/10.1007/978-981-13-7166-0_46
Chapter Google Scholar
Han, S., Hao, X., Huang, H.: An event-extraction approach for business analysis from online Chinese news. Electron. Commer. R A 28, 244–260 (2018)
Article Google Scholar
Hogenboom, F., Frasincar, F., Kaymak, U., De Jong, F.: An overview of event extraction from text. In: DeRiVE 2011, vol. 779, pp. 48–57 (2011)
Google Scholar
Hong, Y., Zhang, J., Ma, B., Yao, J., Zhou, G., Zhu, Q.: Using cross-entity inference to improve event extraction. In: ACL, pp. 1127–1136 (2011)
Google Scholar
Huang, L., et al.: Liberal event extraction and event schema induction. In: ACL, pp. 258–268 (2016)
Google Scholar
Iqbal, K., Khan, M.Y., Wasi, S., Mahboob, S., Ahmed, T.: On extraction of event information from social text streams: an unpretentious NLP solution. IJCSNS 19(9), 1 (2019)
Google Scholar
Li, Q., Ji, H., Huang, L.: Joint event extraction via structured prediction with global features. In: ACL, pp. 73–82 (2013)
Google Scholar
Liang, Z., Pan, D., Deng, Y.: Research on the knowledge association reasoning of financial reports based on a graph network. Sustainability 12(7), 2795 (2020)
Article Google Scholar
Liao, S., Grishman, R.: Using document level cross-event inference to improve event extraction. In: ACL, pp. 789–797 (2010)
Google Scholar
Liu, S., Chen, Y., Liu, K., Zhao, J.: Exploiting argument information to improve event detection via supervised attention mechanisms. In: ACL, pp. 1789–1798 (2017)
Google Scholar
Mukhina, K., Visheratin, A., Nasonov, D.: Spatiotemporal filtering pipeline for efficient social networks data processing algorithms. In: ICCS, pp. 86–99 (2020)
Google Scholar
Saedi, C., Branco, A., Rodrigues, J., Silva, J.: Wordnet embeddings. In: RepL4NLP, pp. 122–131 (2018)
Google Scholar
Sheu, H.S., Li, S.: Context-aware graph embedding for session-based news recommendation. In: RecSys, pp. 657–662 (2020)
Google Scholar
Wang, Z., Sun, L., Li, X., Wang, L.: Event extraction via dmcnn in open domain public sentiment information. In: ICPCSEE, pp. 90–100 (2020)
Google Scholar
Xiang, W., Wang, B.: A survey of event extraction from text. IEEE Access 7, 173111–173137 (2019)
Article Google Scholar

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China (No. 61931019) and National Key Research and Development Program Project (No. 2019QY2404).

Author information

Authors and Affiliations

School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China
Zechen Wang
Institute of Information Engineering, CAS, Beijing, China
Shupeng Wang, Lei Zhang & Yong Wang

Authors

Zechen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shupeng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shupeng Wang or Lei Zhang .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M. A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, Z., Wang, S., Zhang, L., Wang, Y. (2021). Exploiting Extensive External Information for Event Detection Through Semantic Networks Word Representation and Attention Map. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12742. Springer, Cham. https://doi.org/10.1007/978-3-030-77961-0_56

Download citation

DOI: https://doi.org/10.1007/978-3-030-77961-0_56
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77960-3
Online ISBN: 978-3-030-77961-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Extensive External Information for Event Detection Through Semantic Networks Word Representation and Attention Map

Abstract

Similar content being viewed by others

Event Relation Reasoning Based on Event Knowledge Graph

Chinese Event Extraction Based on Hierarchical Attention Mechanism

DE3TC: Detecting Events with Effective Event Type Information and Context

Keywords

1 Introduction

2 Related Works

3 Method

4 Experiment

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Exploiting Extensive External Information for Event Detection Through Semantic Networks Word Representation and Attention Map

Abstract

Similar content being viewed by others

Event Relation Reasoning Based on Event Knowledge Graph

Chinese Event Extraction Based on Hierarchical Attention Mechanism

DE3TC: Detecting Events with Effective Event Type Information and Context

Keywords

1 Introduction

2 Related Works

3 Method

4 Experiment

5 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation