Abstract
We present a method of chunking in Korean texts using conditional random fields (CRFs), a recently introduced probabilistic model for labeling and segmenting sequence of data. In agglutinative languages such as Korean and Japanese, a rule-based chunking method is predominantly used for its simplicity and efficiency. A hybrid of a rule-based and machine learning method was also proposed to handle exceptional cases of the rules. In this paper, we present how CRFs can be applied to the task of chunking in Korean texts. Experiments using the STEP 2000 dataset show that the proposed method significantly improves the performance as well as outperforms previous systems.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abney, S.: Parsing by chunks. In: Berwick, R., Abney, S., Tenny, C. (eds.) Principle-based Parsing. Kluwer Academic Publishers, Dordrecht (1991)
Ramashaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Proceedings of the Thired ACL Workshop on Very Large Corpora (1995)
Tjong Kim Sang, E.F., Buchholz, S.: Introduction to the CoNLL-2000 shared task: Chunking. In: Proceedings of CoNLL-2000, pp. 127–132 (2000)
Kudo, T., Matsumoto, Y.: Chunking with support vector machines. In: Proceedings of NAACL 2001, ACL (2001)
Park, S.-B., Zhang, B.-T.: Combining a Rule-based Method and a k-NN for Chunking Korean Text. In: Proceedings of the 19th International Conference on Computer Processing of Oriental Languages, pp. 225–230 (2001)
Park, S.-B., Zhang, B.-T.: Text Chunking by Combining Hand-Crafted Rules and Memory-Based Learning. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pp. 497–504 (2003)
Shin, H.-P.: Maximally Efficient Syntactic Parsing with Minimal Resources. In: Proceedings of the Conference on Hangul and Korean Language Information Processing, pp. 242–244 (1999)
Kim, M.-Y., Kang, S.-J., Lee, J.-H.: Dependency Parsing by Chunks. In: Proceedings of the 27th KISS Spring Conference, pp. 327–329 (1999)
Yoon, J.-T., Choi, K.-S.: Study on KAIST Corpus, CS-TR-99-139, KAIST CS (1999)
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
McCallum, A., Freitag, D., Pereira, F.: Maximum entropy Markov models for information extraction and segmentation. In: Proceedings of International Conference on Machine Learning, Stanford, California, pp. 591–598 (2000)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the 18th International Conference on Machine Learning, pp. 282–289 (2001)
Sha, F., Pereira, F.: Shallow Parsing with Conditional Random Fields. In: Proceedings of Human Language Technology-NAACL, Edmonton, Canada (2003)
Wallach, H.: Efficient Training of Conditional Random Fields. Thesis. Master of Science School of Cognitive Science, Division of Informatics. University of Edinburgh (2002)
Tan, Y., Yao, T., Chen, Q., Zhu, J.: Applying conditional random fields to chinese shallow parsing. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 167–176. Springer, Heidelberg (2005)
Hammersley, J., Clifford, P.: Markov fields on finite graphs and lattices. Unpublished manuscript (1971)
Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large-scale optimization. Mathematic Programming 45, 503–528 (1989)
Phan, H.X., Nguyen, M.L.: FlexCRFs: A Flexible Conditional Random Fields Toolkit (2004), http://www.jaist.ac.jp/~hieuxuan/flexcrfs/flexcrfs.html
Pinto, D., McCallum, A., Wei, X., Croft, W.B.: Table extraction using conditional random fields. In: Proceedings of the ACM SIGIR (2003)
Chen, S.F., Rosenfeld, R.: A Gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, Carnegie Mellon University (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, YH., Kim, MY., Lee, JH. (2005). Chunking Using Conditional Random Fields in Korean Texts. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_14
Download citation
DOI: https://doi.org/10.1007/11562214_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)