Abstract
Automatic Terminology Extraction (ATE) is a technique for extracting phrases representing a dataset. This technique is required for translating specialistic books and documents. An existing method focused on the fact that terminologies tend to be composed of two or more single nouns. However, it does not deal with modification relations but only co-occurrence relations among single nouns. Moreover, we have to consider the fact that phrases defined as terminology tend to be explained in another sentence when we propose a novel approach. In this study, we propose a method for extracting terminologies from a dataset considering the modification relations obtained by dependency analysis. In particular, we propose how to extract features enabling us to distinguish whether or not the phrase is terminology from a dependency structure of a sentence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Java Automatic Term Extraction: https://github.com/ziqizhang/jate.
- 2.
Research Purpose Use of NTCIR Test Collections or Data Archive/User Agreement: http://research.nii.ac.jp/ntcir/permission/perm-en.html#ntcir-1 (accessed on October 20th, 2020).
- 3.
Gensen Web: http://gensen.dl.itc.u-tokyo.ac.jp/gensenweb_eng.html. (accessed on November 27th, 2020).
- 4.
Japanese Dependency and Case Structure Analyzer KNP: http://nlp.ist.i.kyoto-u.ac.jp/EN/?KNP. (accessed on November 27th, 2020).
- 5.
Kurohashi, Kawahara, and Murawaki Laboratory (2018),
“Japanese Morphological Analysis System JUMAN++”, http://nlp.ist.i.kyoto-u.ac.jp/index.php?JUMAN++, (November 28th, 2020).
- 6.
Kurohashi, Kawahara, and Murawaki Lab (2018), “Japanese Syntactic, Case, and Linguistic Analysis System KNP”, http://nlp.ist.i.kyoto-u.ac.jp/?KNP, (November 28th, 2020).
- 7.
Hiroshi Nakagawa, Akira Maeda and Hiroyuki Kojima (2003), “Gensen Web” http://gensen.dl.itc.u-tokyo.ac.jp/, (November 28th, 2020).
References
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multi-word terms: the C-value/NC-value method. Int. J. Digit. Libr. 3(2), 115–130 (2000). https://doi.org/10.1007/s007999900023
Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)
Justeson, J., Katz, S.: Technical terminology: some linguistic properties and an algorithm for identification in text. Nat. Lang. Eng. 1(01), 9–27 (1995). https://doi.org/10.1017/S1351324900000048
Mao, Z., Cromieres, F., Dabre, R., Song, H., Kurohashi, S.: JASS: japanese-specific sequence to sequence pre-training for neural machine translation. (2020)
Mihalcea, R., Tarau, P.: TextRank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404–411 (2004)
Nakagawa, H., Mori, T.: Automatic term recognition based on statistics of compound nouns and their components. Terminology 9(2), 201–219 (2003)
Šajatović, A., Buljan, M., Šnajder, J., Dalbelo Bašić, B.: Evaluating automatic term extraction methods on individual documents. In: Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), Association for Computational Linguistics, pp. 149–154 (2019). https://doi.org/10.18653/v1/W19-5118
Sato, S., Sasaki, Y.: Automatic collection of related terms from the web. In: Information Processing Society of Japan Natural Language Processing, pp. 57–64 (2003)
Tanaka, T., Miyao, Y., Asahara, M., Uematsu, S., Kanayama, H., Mori, S., Matsumoto, Y.: Universal dependencies for Japanese. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). European Language Resources Association (ELRA), pp. 1651–1658 (2016). https://www.aclweb.org/anthology/L16-1261
Terryn, A.R., Hoste, V., Lefever, E.: In no uncertain terms: a dataset for monolingual and multilingual automatic term extraction from comparable corpora. Lang. Resour. Eval. (2019)
Tolmachev, A., Kawahara, D., Kurohashi, S.: Juman++: a morphological analysis toolkit for scriptio continua. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, pp. 54–59 (2018). https://doi.org/10.18653/v1/D18-2010, https://www.aclweb.org/anthology/D18-2010
Wang, M., Zhao, B., Huang, Y.: PTR: phrase-based topical ranking for automatic keyphrase extraction in scientific publications. In: 23rd International Conference on Neural Information Processing. LNCS, vol. 9950, pp. 120–128. Springer International Publishing (2016)
Zhang, Y., Zincir-Heywood, N., Milios, E.: World wide web site summarization. Web Intell. Agent Syst. 2(1), 39–53 (2004)
Yuan, Y., Gao, J., Zhang, Y.: Supervised learning for robust term extraction. In: 2017 International Conference on Asian Language Processing (IALP), pp. 302–305 (2017)
Zhang, Z., Gao, J., Ciravegna, F.: JATE 2.0: java automatic term extraction with apache solr. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). European Language Resources Association (ELRA), pp. 2262–2269 (2016)
Acknowledgement
This study was partly supported by JSPS Research Grant (JP19H01138) and a grant for research promotion from the Graduate School of Culture and Information Studies, Doshisha University. The NTCIR1 test collection for terminology extraction research was provided by the National Institute of Informatics. We hereby express my gratitude.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kimura, Y., Kusu, K., Hatano, K., Baba, T. (2021). Automatic Terminology Extraction Using a Dependency-Graph in NLP. In: Abraham, A., Sasaki, H., Rios, R., Gandhi, N., Singh, U., Ma, K. (eds) Innovations in Bio-Inspired Computing and Applications. IBICA 2020. Advances in Intelligent Systems and Computing, vol 1372. Springer, Cham. https://doi.org/10.1007/978-3-030-73603-3_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-73603-3_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73602-6
Online ISBN: 978-3-030-73603-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)