Abstract
Biomedical named entity recognition is a critical task for automatically mining knowledge from biomedical literature. In this paper, we introduce Conditional Random Fields model to recognize biomedical named entities from biomedical literature. Rich features including literal, context and semantics are involved in Conditional Random Fields model. Shallow syntactic features are first introduced to Conditional Random Fields model and do boundary detection and semantic labeling at the same time, which effectively improve the model’s performance. Experiments show that our method can achieve an F-measure of 71.2% in JNLPBA test data and which is better than most of state-of-the-art system.
This work is supported by National Natural Science Foundation of China (60504021) and The 863 high Technology Research and Development Programme of China (2002AA117010-09).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
- Natural Language Processing
- Conditional Random Field
- Biomedical Literature
- Name Entity Recognition
- Entity Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Tsai, T.H., Chou, W.C., Wu, S.H., Sung, T.Y., Hsiang, J., Hsu, W.L.: Integrating Linguistic Knowledge into a Conditional Random Field Framework to Identify Biomedical Named Entities. Expert Systems with Applications 30(1), 117–128 (2006)
Hirschman, L., Yeh, A., Blaschke, C., Valencia, A.: Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics 6(Suppl 1) (2005)
Kim, J.D., Ohta, T., Tsuruoka, Y., Tateisi, Y.: Introduction to the Bio-Entity Recognition Task at JNLPBA. In: Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 70–75 (2004)
Kou, Z., Cohen, W.W., Murphy, R.F.: High-recall protein entity recognition using a dictionary. Bioinformatics 21(Suppl. 1), i266–i273 (2005)
Cohen, A.M., Hersh, W.R.: A survey of current work in biomedical text mining. Bridfings In Bioinformatics 6(1), 57–71 (2005)
Zhou, G.D., Su, J.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 96–99 (2004)
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: Proceedings of the ACL Workshop on Natural Language Processing in the Biomedical Domain, pp. 1–8 (2002)
Finkel, J., Dingare, S., Nguyen, H., Nissim, M., Manning, C., Sinclair, G.: Exploiting Context for Biomedical Entity Recognition: From Syntax to the Web. In: Joint Workshop on Natural Language Processing in Biomedicine and its Applications, pp. 88–91 (2004)
Burr, S.: Biomedical Named Entity Recognition Using Conditional Random Fields and Novel Feature Sets. In: Joint Workshop on Natural Language Processing in Biomedicine and its Application, pp. 104–107 (2004)
Sha, F., Pereira, F.: Shallow parsing with conditional random fields. In: Proceedings of HLT-NAACL, pp. 213–220 (2003)
Shatkay, H., Feldman, R.: Mining the Biomedical Literature in the Genomic Era: An Overview. Journal Of Computational Biology 10(6), 821–855 (2003)
Yeh, A.S., Morgan, A., Colosimo, M., Hirschman, L.: BioCreAtIvE task 1A: gene mention finding evaluation. BMC Bioinformatics 6(Suppl 1) (2005)
Tsai, T.H., Wu, C.W., Hsu, W.L.: Using Maximum Entropy to Extract Biomedical Named Entities without Dictionaries. In: Proceedings of IJCNL, pp. 270–275 (2005)
Lafferty, J., McCallum, A., Pereira, F.: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In: Proceedings of the International Conference on Machine Learning, pp. 282–289 (2001)
Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning (2005), http://www.cs.umass.edu/~mccallum/papers/crf-tutorial.pdf
Wallach, H.M.: Efficient training of conditional random fields. Master’s thesis. University of Edinburgh (2002)
McCallum, A.: MALLET: A Machine Learning for Language Toolkit (2002), http://mallet.cs.umass.edu
Tsuruoka, Y., Tateishi, Y., Kim, J.D.: Developing a Robust Part-of-Speech Tagger for Biomedical Text. In: Advances in Informatics - 10th Panhellenic Conference on Informatics, pp. 382–392 (2005)
Erik, F., Sang, T.K., Buchholz, S.: Introduction to the CoNLL-2000 Shared Task: Chunking. In: Proceedings of CoNLL-2000 and LLL-2000, pp. 127–132 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Sun, C., Guan, Y., Wang, X., Lin, L. (2006). Biomedical Named Entities Recognition Using Conditional Random Fields Model. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_159
Download citation
DOI: https://doi.org/10.1007/11881599_159
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)