Yet Another Ranking Function for Automatic Multiword Term Extraction

Lossio-Ventura, Juan Antonio; Jonquet, Clement; Roche, Mathieu; Teisseire, Maguelonne

doi:10.1007/978-3-319-10888-9_6

Juan Antonio Lossio-Ventura²⁰,
Clement Jonquet²⁰,
Mathieu Roche^20,21 &
…
Maguelonne Teisseire^20,21

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8686))

Included in the following conference series:

International Conference on Natural Language Processing

2067 Accesses
12 Citations

Abstract

Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Biomedical term extraction: overview and a new methodology

Article 25 August 2015

Topic Models Can Improve Domain Term Extraction

Evaluation of cutoff policies for term extraction

Article Open access 14 July 2015

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Ahmad, K., Gillam, L., Tostevin, L.: University of Surrey Participation in TREC8: Weirdness Indexing for Logical Document Extrapolation, Retrieval (WILDER). In: TREC (1999)
Google Scholar
Barrón-Cedeño, A., Sierra, G., Drouin, P., Ananiadou, S.: An improved automatic term recognition method for Spanish. In: Gelbukh, A. (ed.) CICLing 2009. LNCS, vol. 5449, pp. 125–136. Springer, Heidelberg (2009)
Chapter Google Scholar
Blanco, R., Lioma, C.: Graph-based term weighting for information retrieval. Information Retrieval 15, 54–92 (2012)
Article Google Scholar
Conrado, M.S., Pardo, T.A.S., Rezende, S.O.: Exploration of a Rich Feature Set for Automatic Term Extraction. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013, Part I. LNCS (LNAI), vol. 8265, pp. 342–354. Springer, Heidelberg (2013)
Chapter Google Scholar
Dobrov, B., Loukachevitch, N.: Multiple Evidence for Term Extraction in Broad Domains. In: Proceeding of Recent Advances in Natural Language Processing (RANLP), Hissar, Bulgaria, pp. 710–715 (2011)
Google Scholar
Frantzi, K., Ananiadou, S., Mima, H.: Automatic recognition of multiword terms: the C-value/NC-value Method. International Journal on Digital Libraries 3, 115–130 (2000)
Article Google Scholar
Gaizauskas, R., Demetriou, G., Humphreys, K.: Term recognition, classification in biological science journal articles. In: Proceeding of the Computional Terminology for Medical, Biological Applications Workshop of the 2nd International Conference on NLP, pp. 37–44 (2000)
Google Scholar
Hliaoutakis, A., Zervanou, K., Petrakis, E.G.M.: The AMTEx approach in the medical document indexing, retrieval application. Data & Knowl. Engineering 68, 380–392 (2009)
Article Google Scholar
Ittoo, A., Bouma, G.: Term Extraction from Sparse, Ungrammatical Domain-specific Documents. Expert Systems with Applications 40, 2530–2540 (2013)
Article Google Scholar
Ji, L., Sum, M., Lu, Q., Li, W., Chen, Y.: Chinese Terminology Extraction Using Window-Based Contextual Information. In: Gelbukh, A. (ed.) CICLing 2007. LNCS, vol. 4394, pp. 62–74. Springer, Heidelberg (2007)
Chapter Google Scholar
Kageura, K., Umino, B.: Methods of automatic term recognition: A review. Terminology 3, 259–289 (1996)
Article Google Scholar
Kozakov, L., Park, Y., Fin, T., Drissi, Y., Doganata, N., Confino, T.: Glossary extraction, knowledge in large organisations via semantic web technologies. In: Proceedings of the 6th International Semantic Web Conference, he 2nd Asian Semantic Web Conference (Semantic Web Challenge Track) (2004)
Google Scholar
Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Biomedical Terminology Extraction: A new combination of Statistical, Web Mining Approaches. In: Proceedings of Journées Internationales d’Analyse Statistique des Données Textuelles (JADT 2014), Paris, France (2014)
Google Scholar
Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M.: Combining C-value, Keyword Extraction Methods for Biomedical Terms Extraction. In: Proceedings of the Fifth International Symposium on Languages in Biology, Medicine (LBM 2013), Tokyo, Japan, pp. 45–49 (2013)
Google Scholar
Lossio-Ventura, J.A., Hacid, H., Ansiaux, A., Maag, M.L.: Conversations reconstruction in the social web. In: Proceedings of the 21st International Conference Companion on World Wide Web (WWW 2012), pp. 573–574. ACM, Lyon (2012)
Chapter Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. International Journal on Artificial Intelligence Tools 13, 157–169 (2004)
Article Google Scholar
Newman, D., Koilada, N., Lau, J.H., Baldwin, T.: Bayesian Text Segmentation for Index Term Identification, Keyphrase Extraction. In: Proceedings of 24th International Conference on Computational Linguistics, Mumbai, India, pp. 2077–2092 (2012)
Google Scholar
Noh, T., Park, S., Yoon, H., Lee, S., Park, S.: An Automatic Translation of Tags for Multimedia Contents Using Folksonomy Networks. In: Proceedings of the 32Nd International ACM SIGIR Conference on Research, Development in Information Retrieval, SIGIR 2009, pp. 492–499. ACM, Boston (2009)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: Bringing order to the web. Stanford InfoLab (1999)
Google Scholar
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Mining: Theory, Applications, pp. 1–20. John Wiley, Sons, Ltd. (2010)
Google Scholar
Rousseau, F., Vazirgiannis, M.: Graph-of-word, TW-IDF: New Approach to Ad Hoc IR. In: Proceedings of the 22nd ACM International Conference on Conference on Information, Knowledge Management, CIKM 2013, pp. 59–68. ACM, San Francisco (2013)
Chapter Google Scholar
Stoykova, V., Petkova, E.: Automatic extraction of mathematical terms for precalculus. Procedia Technology Journal 1, 464–468 (2012)
Article Google Scholar
Van Eck, N.J., Waltman, L., Noyons, E.C.M., Buter, R.K.: Automatic term identification for bibliometric mapping. Scientometrics 82, 581–596 (2010)
Article Google Scholar
Zhang, X., Song, Y., Fang, A.C.: Term recognition using conditional random fields. In: International Conference on Natural Language Processing, Knowledge Engineering (NLP-KE), pp. 1–6. IEEE (2010)
Google Scholar
Zhang, Z., Iria, J., Brewster, C., Ciravegna, F.: A Comparative Evaluation of Term Recognition Algorithms. In: Proceedings of the Sixth International Conference on Language Resources, Evaluation (LREC 2008), Marrakech, Morocco (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Montpellier 2, LIRMM, CNRS - Montpellier, France
Juan Antonio Lossio-Ventura, Clement Jonquet, Mathieu Roche & Maguelonne Teisseire
Irstea, CIRAD, TETIS - Montpellier, France
Mathieu Roche & Maguelonne Teisseire

Authors

Juan Antonio Lossio-Ventura
View author publications
You can also search for this author in PubMed Google Scholar
Clement Jonquet
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Roche
View author publications
You can also search for this author in PubMed Google Scholar
Maguelonne Teisseire
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5, 01-248, Warsaw, Poland
Adam Przepiórkowski & Maciej Ogrodniczuk &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lossio-Ventura, J.A., Jonquet, C., Roche, M., Teisseire, M. (2014). Yet Another Ranking Function for Automatic Multiword Term Extraction. In: Przepiórkowski, A., Ogrodniczuk, M. (eds) Advances in Natural Language Processing. NLP 2014. Lecture Notes in Computer Science(), vol 8686. Springer, Cham. https://doi.org/10.1007/978-3-319-10888-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-10888-9_6
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10887-2
Online ISBN: 978-3-319-10888-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Yet Another Ranking Function for Automatic Multiword Term Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Biomedical term extraction: overview and a new methodology

Topic Models Can Improve Domain Term Extraction

Evaluation of cutoff policies for term extraction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Yet Another Ranking Function for Automatic Multiword Term Extraction

Abstract

Chapter PDF

Similar content being viewed by others

Biomedical term extraction: overview and a new methodology

Topic Models Can Improve Domain Term Extraction

Evaluation of cutoff policies for term extraction

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation