Abstract
Named entity recognition (NER) is an essential component of text mining applications. In Chinese sentences, words do not have delimiters; thus, incorporating word segmentation information into an NER model can improve its performance. Based on the framework of dynamic conditional random fields, we propose a novel labeling format, called semi-joint labeling which partially integrates word segmentation information and named entity tags for NER. The model enhances the interaction of segmentation tags and NER achieved by traditional approaches. Moreover, it allows us to consider interactions between multiple chains in a linear-chain model. We use data from the SIGHAN 2006 NER bakeoff to evaluate the proposed model. The experimental results demonstrate that our approach outperforms state-of-the-art systems.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Borthwick, A.: A Maximum Entropy Approach to Named Entity Recognition. New York University, New York (1999)
Chen, W., Zhang, Y., Isahara, H.: Chinese Named Entity Recognition with Conditional Random Fields. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 118–121 (2006)
Duh, K.: A Joint Model for Semantic Role Labeling. In: Proceedings of the 9th Conference on Computational Natural Language Learning, pp. 173–176 (2005)
Guo, H., Jiang, J., Hu, G., Zhang, T.: Chinese Named Entity Recognition Based on Multilevel Linguistic Features. In: International Joint Conference on Natural Language Processing, pp. 90–99 (2004)
Hendrickx, I., Bosch, A.v.d.: Memory-based One-step Named-entity Recognition: Effects of Seed List Features, Classifier Stacking, and Unannotated Data. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL, pp. 176–179 (2003)
Isozaki, H., Kazawa, H.: Efficient support vector classifiers for named entity recognition. In: Proceedings of the 19th international conference on Computational linguistics, pp. 1–7 (2002)
Ji, H., Grishman, R.: Improving Name Tagging by Reference Resolution and Relation Detection. In: Proceedings of the 43rd Annual Meeting of the ACL, pp. 411–418 (2005)
Klein, D., Smarr, J., Nguyen, H., Manning, C.D.: Named Entity Recognition with Character-Level Models. In: Conference on Computational Natural Language Learning, pp. 180–183 (2003)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, pp. 282–289 (2001)
Levow, G.-A.: The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, pp. 108–117 (2006)
Sun, J., Gao, J., Zhang, L., Zhou, M., Huang, C.: Chinese named entity identification using class-based language model. In: Proceedings of the 19th international conference on Computational linguistics, pp. 1–7 (2002)
Sutton, C., McCallum, A.: Composition of Conditional Random Fields for Transfer Learning. In: Proceedings of Human Language Technologies / Empirical Methods in Natural Language Processing, pp. 748–754 (2005)
Sutton, C., Rohanimanesh, K., McCallum, A.: Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data. In: Proceedings of the Twenty-First International Conference on Machine Learning, pp. 99–107 (2004)
Wu, Y., Zhao, J., Xu, B.: Chinese Named Entity Recognition Combining Statistical Model wih Human Knowledge. In: Dignum, F.P.M. (ed.) ACL 2003. LNCS (LNAI), vol. 2922, Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, CW., Tsai, R.TH., Hsu, WL. (2008). Semi-joint Labeling for Chinese Named Entity Recognition. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-68636-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)