Abstract
This paper focuses on the Web-based Chinese-English Out-of-Vocabulary (OOV) term translation pattern, and emphasizes on the translation selection based on multiple feature fusion and the ranking based on Ranking Support Vector Machine (Ranking SVM). By utilizing the SIGHAN2005 corpus for the Chinese Named Entity Recognition (NER) task and selected new terms, the experiments based on different data sources show the consistent results. From the experimental results for combining our model with Chinese-English Cross-Language Information Retrieval (CLIR) on the data sets of TREC, it can be found that the obvious performance improvements for both query translation and CLIR are obtained.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Al-Onaizan, Y., Knight, K.: Translating Named Entities using Monolingual and Bilingual Resources. In: Proceedings of ACL 2002, pp. 400–408 (2002)
Cao, Y.B., Xu, J., Liu, T.Y., Li, H., Huang, Y.L., Hon, H.W.: Adapting Ranking-SVM to Document Retrieval. In: Proceedings of SIGIR 2006, pp. 186–193 (2006)
Chen, C., Chen, H.H.: A High-Accurate Chinese-English NE Backward Translation System Combining Both Lexical Information and Web Statistics. In: Proceedings of COLING-ACL 2006, pp. 81–88 (2006)
Fang, G.L., Yu, H., Nishino, F.: Chinese-English Term Translation Mining based on Semantic Prediction. In: Proceedings of COLING-ACL 2006, pp. 199–206 (2006)
Ge, Y.D., Hong, Y., Yao, J.M., Zhu, Q.M.: Improving Web-Based OOV Translation Mining for Query Translation. In: Cheng, P.-J., Kan, M.-Y., Lam, W., Nakov, P. (eds.) AIRS 2010. LNCS, vol. 6458, pp. 576–587. Springer, Heidelberg (2010)
Hu, R., Chen, W., Bai, P., Lu, Y., Chen, Z., Yang, Q.: Web Query Translation via Web Log Mining. In: Proceedings of SIGIR 2008, pp. 749–750 (2008)
Huang, S., Chen, Z., Yu, Y., Ma, W.Y.: Multitype Features Coselection for Web Document Clustering. IEEE Transactions on Knowledge and Data Engineering 18(4), 448–459 (2006)
Jiang, L., Zhou, M., Chien, L.F., Niu, C.: Named Entity Translation with Web Mining and Transliteration. In: Proceedings of IJCAI 2007, pp. 1629–1634 (2007)
Joachimes, T.: Optimizing Search Engines using Click through Data. In: Proceedings of SIGKDD 2002, pp. 133–142 (2002)
Lee, C.J., Chang, J.S., Jang, J.R.: Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources. ACM Transactions on Asian Language Processing 5(2), 121–145 (2006)
Lu, W.H., Chien, L.F.: Translation of Web Queries using Anchor Text Mining. ACM Transactions on Asian Language Information Processing 1(2), 159–172 (2002)
Lu, W.H., Chien, L.F.: Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach. ACM Transactions on Information Systems 22(2), 242–269 (2004)
Ren, F.L., Zhu, M.H., Wang, H.Z., Zhu, J.B.: Chinese-English Organization Name Translation Based on Correlative Expansion. In: Proceedings of the 2009 Named Entities Workshop, ACL-IJCNLP 2009, pp. 143–151 (2009)
Shao, L., Ng, H.T.: Mining New Word Translations from Comparable Corpora. In: Proceedings of COLING 2004, pp. 618–624 (2004)
Shi, L.: Mining OOV Translations from Mixed-Language Web Pages for Cross Language Information Retrieval. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 471–482. Springer, Heidelberg (2010)
Sproat, R., Tao, T., Zhai, C.X.: Named Entity Transliteration with Comparable Corpora. In: Proceedings of COLING-ACL, pp. 73–80 (2006)
Virga, P., Khudanpur, S.: Transliteration of Proper Names in Cross-Language Applications. In: Proceedings of SIGIR 2003, pp. 365–366 (2003)
Wang, J.H., Teng, J.W., Cheng, P.J., Lu, W.H., Chien, L.F.: Translating Unknown Cross-Lingual Queries in Digital Libraries using a Web-based Approach. In: Proceedings of JCDL 2004, pp. 108–116 (2004)
Wu, J.C., Chang, J.S.: Learning to Find English to Chinese Transliterations on the Web. In: Proceedings of EMNLP-CoNLL 2007, pp. 996–1004 (2007)
Xu, J., Cao, Y.B., Li, H., Zhao, M.: Ranking Definitions with Supervised Learning Methods. In: Proceedings of WWW 2005, pp. 811–819 (2005)
Yang, F., Zhao, J., Zou, B., Liu, K.: Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages. In: Proceedings of ACL 2008, pp. 541–549 (2008)
Yang, F., Zhao, J., Liu, K.: A Chinese-English Organization Name Translation System Using Heuristic Web Mining and Asymmetric Alignment. In: Proceedings of ACL-AFNLP 2009, pp. 387–395 (2009a)
Yang, M., Shi, Z., Li, S., Zhao, T., Qi, H.: Ranking vs. Classification: a Case Study in Mining Organization Name Translation from Snippets. In: Proceedings of IALP 2009, pp. 308–313 (2009b)
Zhang, Y., Huang, F., Vogel, S.: Mining Translations of OOV Terms from the Web through Cross-Lingual Query Expansion. In: Proceedings of SIGIR 2005, pp. 669–670 (2005)
Zhang, Y., Vines, P.: Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval. In: Proceedings of SIGIR 2004, pp. 162–169 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhao, Y., Zhu, Q., Jin, C., Zhang, Y., Huang, X., Zhang, T. (2014). Chinese-English OOV Term Translation with Web Mining, Multiple Feature Fusion and Supervised Learning. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-12277-9_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)