Abstract
Although there are many existing approaches for solving the OOV term translation problems, but existing approaches are not able to handle different types of OOV terms, especially hybrid translations, such as “Kenny-Caffey syndrome (Kenny-Caffey 氏症候群)”. We proposed a novel integrated ranking approach to consider the types of OOV terms before translating them. Thus, different types of OOV terms could be translated differently. Furthermore, the translations mined in other languages are also OOV terms, none of existing approaches offer the context information or definitions of the OOV terms. Users without special knowledge cannot easily understand meanings of the OOV terms. Our integrated ranking approach also extracts monolingual definitions and multilingual context information of OOV terms. Moreover, we propose a novel adaptive rules approach with Bayesian net and Adaboost for handling hybrid translations. Experiments show our approach performs better than existing approaches.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Lu, W.-H., Chien, L.-F., Lee, H.-J.: Anchor text mining for translation of Web queries: A transitive translation approach. ACM Trans. Inf. Syst. 22(2), 242–269 (2004)
Cheng, P.-J., et al.: Translating unknown queries with web corpora for cross-language information retrieval. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 146–153. ACM, Sheffield (2004)
Zhang, Y., Huang, F., Vogel, S.: Mining translations of OOV terms from the web through cross-lingual query expansion. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 669–670. ACM, Salvador (2005)
Zhang, Y., Vines, P.: Using the web for automated translation extraction in cross-language information retrieval. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 162–169. ACM, Sheffield (2004)
Zhang, Y., Vines, P.: Detection and translation of OOV terms prior to query time. In: ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 524–525. ACM, Sheffield (2004)
Zhang, Y., Vines, P., Zobel, J.: Chinese OOV translation and post-translation query expansion in chinese-english cross-lingual information retrieval. ACM Transactions on Asian Language Information Processing (TALIP) 4(2), 57–77 (2005)
Zhang, Y., Wang, Y., Xue, X.: English-Chinese bi-directional OOV translation based on web mining and supervised learning. In: ACL-IJCNLP 2009 Conference Short Papers, pp. 129–132. Association for Computational Linguistics, Suntec (2009)
Lu, C., Xu, Y., Geva, S.: Translation disambiguation in web-based translation extraction for English-Chinese CLIR. In: ACM Symposium on Applied Computing, pp. 819–823. ACM, Seoul (2007)
Tiffin, N., et al.: Integration of text and data-mining using ontologies successfully selects disease gene candidates. Nucleic Acids Res. 33, 1544–1552 (2005)
Fellbaum, C.: WordNet An Electronic Lexical Database (1998)
Ferreira da Silva, J., Dias, G., Guilloré, S., Pereira Lopes, J.G.: Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J.J. (eds.) EPIA 1999. LNCS (LNAI), vol. 1695, pp. 113–132. Springer, Heidelberg (1999)
Shi, L.: Mining OOV Translations from Mixed-Language Web Pages for Cross Language Information Retrieval. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 471–482. Springer, Heidelberg (2010)
Rapidminer, Rapidminer data mining tool (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Qu, J., Shimazu, A., Le Nguyen, M. (2012). OOV Term Translation, Context Information and Definition Extraction Based on OOV Term Type Prediction. In: Isahara, H., Kanzaki, K. (eds) Advances in Natural Language Processing. JapTAL 2012. Lecture Notes in Computer Science(), vol 7614. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33983-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-33983-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33982-0
Online ISBN: 978-3-642-33983-7
eBook Packages: Computer ScienceComputer Science (R0)