Study on Tibetan Word Segmentation as Syllable Tagging

Li, Yachao; Yu, Hongzhi

doi:10.1007/978-3-642-41644-6_34

Yachao Li⁴ &
Hongzhi Yu⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 400))

Included in the following conference series:

CCF International Conference on Natural Language Processing and Chinese Computing

1829 Accesses

Abstract

Tibetan word segmentation (TWS) is the basic problem for Tibetan natural language processing. The paper reformulates the segmentation as a syllable tagging problem, and studies the performance of TWS with different sequence labeling models. Experimental results show that, the TWS system with conditional random field achieves the best performance in the condition of current 4-tag set, at the same time, the other models achieve good results too. All the above show that, the segmentation as a syllable tagging problem that is an efficient approach to deal with TWS.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Tibetan Word Segmentation as Sub-syllable Tagging with Syllable’s Part-of-Speech Property

Construction of Word Segmentation Model Based on HMM + BI-LSTM

Stemming and Segmentation for Classical Tibetan

Keywords

References

Bai, G.: Research on the Segmentation Unit of Tibetan Word for Information Processing. Journal of Chinese Information Processing 24(3), 124–128 (2009)
Google Scholar
Chen, Y., Li, B., Yu, S.: A Tibetan Segmentation Scheme Based on Case-auxiliary Word and Continuous Features. Journal of Chinese Information Processing 17(3), 15–20 (2003)
Google Scholar
Kun-Yu, Q.: On Tibetan Automatic Participate Research with the Aid of Information Treatment. Journal of Northwest University for Nationalities (Philosophy and Social Science) (4), 92–97 (2006)
Google Scholar
Zhi-Jie, C.: Identification of Abbreviated Word in Tibetan Word Segmentation. Journal of Chinese Information Processing 23(1), 35–37 (2009)
Google Scholar
Liu, H., Zhao, W., Nuo, M., Jiang, L., Wu, J., He, Y.: Tibetan Number Identification Based on Classification of Number Components in Tibetan Word Segmentation. In: Proceedings of the 23rd International Conference on Computational Linguistics (Posters Volume) (Coling 2010), pp. 719–724 (2010)
Google Scholar
Liu, H., Nuo, M., Ma, L., Wu, J., He, Y.: Tibetan Word Segmentation as Syllable Tagging Using Conditional Random Fields. In: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 2011), pp. 168–177 (2011)
Google Scholar
Xue, N., Converse, S.P.: Combining classifiers for Chinese word segmentation. In: Proceedings of the First SIGHAN Workshop on Chinese Language Processing, Taipei, Taiwan, pp. 63–70 (2002)
Google Scholar
Yachao, L., Yangkyi, J., Chengqing, Z., Hongzhi, Y.: Research and Implementation of Tibetan Automatic Word Segmentation with Conditional Random Field. Journal of Chinese Information Processing 4(27), 52–58 (2013)
Google Scholar
Cortes, C., Vapnik, V.: Support- vector networks. Machine Learning 20(3), 273–297 (1995)
MATH Google Scholar
Berger, A.L., Pietra, S.A.D., Pietra, V.J.D.: A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics (22), 39–71 (1996)
Google Scholar
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of ICML 2001, pp. 282–289 (2001)
Google Scholar
Taskar, B., Guestrin, C., Koller, D.: Max-margin Markov networks. In: Processing Syst., Vancouver (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Lab of Chinese National Linguistic Information Technology, Northwest University for Nationalities, Lanzhou, China, 730030
Yachao Li & Hongzhi Yu

Authors

Yachao Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Soochow University, 1 Shizi Street, 215006, Suzhou, China
Guodong Zhou
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Juanzi Li
Institute of Computer Science & Technology, Peking University, 100871, Beijing, China
Dongyan Zhao & Yansong Feng &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Yu, H. (2013). Study on Tibetan Word Segmentation as Syllable Tagging. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2013. Communications in Computer and Information Science, vol 400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41644-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-642-41644-6_34
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41643-9
Online ISBN: 978-3-642-41644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Study on Tibetan Word Segmentation as Syllable Tagging

Abstract

Chapter PDF

Similar content being viewed by others

Tibetan Word Segmentation as Sub-syllable Tagging with Syllable’s Part-of-Speech Property

Construction of Word Segmentation Model Based on HMM + BI-LSTM

Stemming and Segmentation for Classical Tibetan

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Study on Tibetan Word Segmentation as Syllable Tagging

Abstract

Chapter PDF

Similar content being viewed by others

Tibetan Word Segmentation as Sub-syllable Tagging with Syllable’s Part-of-Speech Property

Construction of Word Segmentation Model Based on HMM + BI-LSTM

Stemming and Segmentation for Classical Tibetan

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation