Abstract
Mining terminology translation from a large amount of Web data can be applied in many fields such as reading/writing assistant, machine translation and cross-language information retrieval. How to find more comprehensive results from the Web and obtain the boundary of candidate translations, and how to remove irrelevant noises and rank the remained candidates are the challenging issues. In this paper, after reviewing and analyzing all possible methods of acquiring translations, a feasible statistics-based method is proposed to mine terminology translation from the Web. In the proposed method, on the basis of an analysis of different forms of term translation distributions, character-based string frequency estimation is presented to construct term translation candidates for exploring more translations and their boundaries, and then sort-based subset deletion and mutual information methods are respectively proposed to deal with subset redundancy information and prefix/suffix redundancy information formed in the process of estimation. Extensive experiments on two test sets of 401 and 3511 English terms validate that our system has better performance.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Somers, H.: Bilingual Parallel Corpora and Language Engineering. In: Proc. Anglo-Indian Workshop Language Engineering for South-Asian languages (2001)
Véronis, J.: Parallel Text Processing - Alignment and Use of Translation Corpora. Kluwer Academic Publishers, The Netherlands (2000)
Grefenstette, G.: The WWW as a Resource for Example-Based MT Tasks. In: Proc. ASLIB Translating and the Computer 21 Conference (1999)
Cao, Y., Li, H.: Base Noun Phrase Translation Using Web Data and the EM Algorithm. In: Proc. 19th Int’l Conf. Computational Linguistics, pp. 127–133 (2002)
Li, H., Cao, Y., Li, C.: Using Bilingual Web Data to Mine and Rank Translations. IEEE Intelligent Systems 4, 54–59 (2003)
Navigli, R., Velardi, P., Gangemi, A.: Ontology Learning and Its Application to Automated Terminology Translation. IEEE Intelligent Systems 1, 22–31 (2003)
Nagata, M., Saito, T., Suzuki, K.: Using the Web as a Bilingual Dictionary. In: Proc. ACL 2001 Workshop Data-Driven Methods in Machine Translation, pp. 95–102 (2001)
Rapp, R.: Identifying Word Translations in Nonparallel Texts. In: Proc. 33th Annual Meeting of the Association for Computational Linguistics, pp. 320–322 (1995)
Tanaka, K., Iwasaki, H.: Extraction of Lexical Translation from Non-Aligned Corpora. In: Proc. 16th Int’l Conf. Computational Linguistics, pp. 580–585 (1996)
Rapp, R.: Automatic Identification of Word Translations from Unrelated English and German Corpora. In: Proc. 37th Annual Meeting Assoc. Computational Linguistics, pp. 519–526 (1999)
Fung, P.: Compiling Bilingual Lexicon Entries from a Non-Parallel English-Chinese Corpus. In: Proc. Third Annual Workshop on Very Large Corpora, pp. 173–183 (1995)
Fung, P.: Finding Terminology Translations from Nonparallel Corpora. In: Proc. Fifth Annual Workshop on Very Large Corpora (WVLC 1997), pp. 192–202 (1997)
Fung, P., Yee, L.P.: An IR Approach for Translation New Words from Nonparallel, Comparable Texts. In: Proc. 17th Int’l Conf. Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, pp. 414–420 (1998)
Shahzad, I., Ohtake, K., Masuyama, S., Yamamoto, K.: Identifying Translations of Compound Nouns Using Non-Aligned Corpora. In: Proc. Workshop on Multilingual Information Processing and Asian Language Processing, pp. 108–113 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fang, G., Yu, H., Nishino, F. (2005). Web-Based Terminology Translation Mining. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_87
Download citation
DOI: https://doi.org/10.1007/11562214_87
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)