Abstract
We present an approach for automatically detecting synonyms between simplified Chinese used in mainland China and traditional Chinese used in Taiwan from large scale corpus. After pre-processing step (including doing segmentation and POS tagging on our corpora), all words are classified into 3 categories according to their frequency: words exclusively used in mainland China, words exclusively used in Taiwan, and words commonly used in both sides. We use word vectors to represent meanings of words, calculate semantic similarities between words of both sides, and extract synonyms. The experiment shows that our approach can find synonyms that are not present in handcrafted dictionary.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Su, J.: Research on Homographs across the Straits. Studies of the Chinese Language 1995(2), 107–117 (1995). (苏金智: 海峡两岸同形异义词研究. 中国语文. 1995(2), 107–117 (1995)). (in Chinese)
Li, X., Qiu, Z.: Determination and Treatment of Diverse Words in the Cross-Straits Dictionary. Applied Linguistics 2012(4), 74–81 (2012). (in Chinese)
The Common Words Dictionary of the Cross-Straits. http://www.zhonghuayuwen.org/PageInfo.aspx?Id=375. (in Chinese)
Richardson, R., Smeaton, A., Murphy, J.: Using WordNet as a knowledge base for measuring semantic similarity between words. In: Proceedings of AICS Conference (1994)
Liu, Q., Li, S.: Word Similarity Computing Based on How-net. Computational Linguistics and Chinese Language Processing 7(2), 59–76 (2002). (in Chinese)
Chen, Y., Shi, X., Zhou, C.: A simplified-traditional chinese character conversion model based on log-linear models. In: Proceedings of International Conference on Asian Language Processing (2011)
Wang, S., Cao, C., Pei, Y., Xia, F.: A Collocation-based Method for Semantic Similarity Measure for Chinese Words. Journal of Chinese Information Processing. 27(1), 7–14 (2013). (in Chinese)
Shi, J., Wu, Y., Qiu, L., Lv, X.: Chinese Lexical Semantic Similarity Computing Based on Large-scale Corpus. Journal of Chinese Information Processing 27(1), 1–6+80 (2013). (in Chinese)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at ICLR (2013)
Swain, M.J., Ballard, D.H.: Color Indexing. IJCV 7(1), 11–32 (1991)
Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing & Management 24(5), 513–523 (1988)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, B., Shi, X. (2015). On Detection of Synonyms Between Simplified Chinese of Mainland China and Traditional Chinese of Taiwan: A Semantic Similarity Method. In: Lu, Q., Gao, H. (eds) Chinese Lexical Semantics. CLSW 2015. Lecture Notes in Computer Science(), vol 9332. Springer, Cham. https://doi.org/10.1007/978-3-319-27194-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-27194-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27193-4
Online ISBN: 978-3-319-27194-1
eBook Packages: Computer ScienceComputer Science (R0)