Abstract
Event extraction from the news is essential for making financial decisions accurately. Therefore, it has been researched in many languages for a long time. However, to the best of our knowledge, no study has been conducted in the domain of Turkish financial and economic text mining. To fill this gap, we have created an ontology and presented a well-defined and high-quality company-specific event corpus of Turkish economic and financial news. Using our dataset, we conducted a preliminary evaluation of the event extraction model to serve as a baseline for further work. Most approaches in the event extraction domain rely on machine learning and require large amounts of labeled data. However, building a training corpus with manually annotated events is a very time-consuming and intensive process. To solve this problem, we tried active learning and weak supervision methods to reduce human effort and automatically produce more labeled data without degrading machine learning performance. Experiments on our dataset show that both methods are useful. Furthermore, when we combined the manually annotated dataset with the automatically labeled dataset and used it in model training, we demonstrated that the performance increased by %2,91 for event classification, %13,76 for argument classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chen, C., Ng, V.: Joint modeling for Chinese event extraction with rich linguistic features. In: Proceedings of COLING 2012, pp. 529–544 (2012)
Chieu, H.L., Ng, H.T.: A maximum entropy approach to information extraction from semi-structured and free text. Aaai/iaai 2002, 786–791 (2002)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. ArXiv preprint arXiv:1810.04805 (2018)
Eaton, J., Gaubitch, N.D., Moore, A.H., Naylor, P.A.: The ace challenge-corpus description and performance evaluation. In: 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 1–5. IEEE (2015)
Ferguson, J., Lockard, C., Weld, D.S., Hajishirzi, H.: Semi-supervised event extraction with paraphrase clusters. ArXiv preprint arXiv:1808.08622 (2018)
Güneş, A., Tantuğ, A.C.: Turkish named entity recognition with deep learning. In: 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1–4 (2018). https://doi.org/10.1109/SIU.2018.8404500
Hachey, B., Alex, B., Becker, M.: Investigating the effects of selective sampling on the annotation task. In: Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. 144–151 (2005)
Hoi, S.C., Jin, R., Lyu, M.R.: Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th international conference on World Wide Web, pp. 633–642 (2006)
Huang, L., Ji, H., Cho, K., Voss, C.R.: Zero-shot transfer learning for event extraction. ArXiv preprint arXiv:1707.01066 (2017)
Jacobs, G., Hoste, V.: Sentivent: enabling supervised information extraction of company-specific events in economic and financial news. Language Resources and Evaluation, pp. 1–33 (2021)
Karamanolakis, G., Mukherjee, S., Zheng, G., Awadallah, A.H.: Self-training with weak supervision. ArXiv preprint arXiv:2104.05514 (2021)
Lefever, E., Hoste, V.: A classification-based approach to economic event detection in dutch news text. In: 10th International Conference on Language Resources and Evaluation (LREC), pp. 330–335. ELRA (2016)
Liao, S., Grishman, R.: Using prediction from sentential scope to build a pseudo co-testing learner for event extraction. In: Proceedings of 5th International Joint Conference on Natural Language Processing. pp. 714–722 (2011)
Lison, P., Barnes, J., Hubin, A.: skweak: Weak supervision made easy for NLP. ArXiv preprint arXiv:2104.09683 (2021)
Lison, P., Hubin, A., Barnes, J., Touileb, S.: Named entity recognition without labelled data: A weak supervision approach. ArXiv preprint arXiv:2004.14723 (2020)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)
Nguyen, T., Grishman, R.: Graph convolutional networks with argument-aware pooling for event detection. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Nguyen, T.H., Cho, K., Grishman, R.: Joint event extraction via recurrent neural networks. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 300–309 (2016)
Nguyen, T.H., Fu, L., Cho, K., Grishman, R.: A two-stage approach for extending event detection to new types via neural networks. In: Proceedings of the 1st Workshop on Representation Learning for NLP, pp. 158–165 (2016)
Nguyen, T.H., Grishman, R.: Event detection and domain adaptation with convolutional neural networks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 365–371 (2015)
Ratner, A., Bach, S.H., Ehrenberg, H., Fries, J., Wu, S., Ré, C.: Snorkel: Rapid training data creation with weak supervision. In: Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, vol. 11, p. 269. NIH Public Access (2017)
Reichart, R., Rappoport, A.: Self-training for enhancement and domain adaptation of statistical parsers trained on small datasets. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 616–623 (2007)
Riloff, E., et al.: Automatically constructing a dictionary for information extraction tasks. In: AAAI, vol. 1, pp. 2–1. Citeseer (1993)
Ringger, E., McClanahan, P., Haertel, R., Busby, G., Carmen, M., Carroll, J., Seppi, K., Lonsdale, D.: Active learning for part-of-speech tagging: Accelerating corpus annotation. In: Proceedings of the Linguistic Annotation Workshop, pp. 101–108 (2007)
Schweter, S.: Berturk - Bert models for Turkish. https://doi.org/10.5281/zenodo.3770924 (2020)
Uzun, A., Tantuğ, A.C.: Itutime: Turkish temporal expression extraction and normalization. In: International Symposium on Distributed Computing and Artificial Intelligence, pp. 74–85. Springer (2021)
Yimam, S.M., Gurevych, I., de Castilho, R.E., Biemann, C.: Webanno: A flexible, web-based and visually supported system for distributed annotations. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 1–6 (2013)
Zhu, J., Hovy, E.: Active learning for word sense disambiguation with methods for addressing the class imbalance problem. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 783–790 (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kaynak, K.Ş., Tantuğ, A.C. (2023). TFEEC: Turkish Financial Event Extraction Corpus. In: Machado, J.M., et al. Distributed Computing and Artificial Intelligence, Special Sessions, 19th International Conference. DCAI 2022. Lecture Notes in Networks and Systems, vol 585. Springer, Cham. https://doi.org/10.1007/978-3-031-23210-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-23210-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23209-1
Online ISBN: 978-3-031-23210-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)