Abstract
In this paper, the semantic relationships between a predicate and its arguments in terms of semantic roles are employed to improve lexical-based named entity recognition (NER) in the molecular biology domain. The semantic roles were realized in various sets of syntactic features used by a machine learning model to explore what should be the efficient way in allowing this knowledge to provide the highest positive effect on the NER. The empirical results show that the best feature set consists of predicate’s surface form, predicate’s lemma, voice, and the united feature of subject-object head’s lemma and transitive-intransitive sense. The performance improvement from using these features indicates the advantage of the predicate-argument semantic knowledge on NER. There are still rooms to enhance NER by using this semantic knowledge (e.g. to employ other semantic roles besides agent and theme and to extend the rules for efficient identification of an argument’s boundary).
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
DARPA. The 6th Message Understanding Conference. Columbia, Maryland (1995)
Stapley, B.J., Benoit, G.: Biobibliometrics: Information retrieval and visualization from co-occurrences of gene names in Medline abstracts. In: Pac. Symp. Biocomp., pp. 529–540 (2000)
Willett, R.: Recent trends in hierarchic document clustering: a critical review. Information Processing & Management 25, 577 (1998)
Ohta, T., Tateishi, Y., Kim, J.D.: The GENIA corpus: An annotated research abstract corpus in the molecular biology domain. HLT (2002)
Fukuda, K., Tamura, A., Tsunoda, T., Takagi, T.: Toward information extraction: identifying protein names from biological papers. In: Pac. Symp. Biocomp, pp. 707–718 (1998)
Spasic, I., Nenadic, G., Ananiadou, S.: Using domain-Specific Verbs for Term Classification. In: The ACL Workshop on NLP in Biomed., pp. 17–24 (2003)
Takeuchi, K., Collier, N.: Use of Support Vector Machines in Extended Named Entity Recognition. In: CONLL, pp. 119–125 (2002)
Zhou, G., Su, J.: Exploring Deep Knowledge Resources in Biomedical Name Recognition. In: The Joint Workshop on NLP in Biomed. and its App (JNLPBA), pp. 84–87 (2004)
Kim, J.D., Ohta, T., Tsuruoka, Y., Tateishi, Y., Collier, N.: Introduction to the Bio-Entity Task at JNLPBA, pp. 70–75 (2004)
Collier, N., Nobata, C., Tsujii, J.: Extracting the names of genes and gene products with a Hidden Markov Model. In: COLING, pp. 201–207 (2000)
Kazama, J., Makino, T., Ohta, Y., Tsujii, J.: Tuning Support Vector Machines for Biomedical Named Entity Recognition. In: The ACL Workshop on NLP in Biomed, pp. 1–8 (2002)
Lee, K.J., Hwang, Y.S., Rim, H.C.: Two-phase biomedical NE Recognition based on SVMs. In: The ACL Workshop on NLP in Biomed, pp. 33–40 (2003)
Vapnix, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1998)
Blaschke, C., Andrade, M.A., Ouzounis, C., Valencia, A.: Automatic extraction of biological information from scientific text: Protein-protein interactions. In: The Int. Conf. on Intelligent System Molecular Biology, pp. 60–67 (1999)
Ono, T., Hishigaki, H., Tanigami, A., Takagi, T.: Automated extraction of information on protein-protein interactions from the biological literature. Bioinform 17, 155–161 (2001)
Pustejovsky, J., Castano, J., Zhang, J.: Robust Relational parsing over Biomedical Literature: Extracting Inhibit Relations. In: Pac. Symp. Biocomput., pp. 505–516 (2002)
Rindflesch, T.C., Rajan, J.V., Hunter, L.: Extracting Molecular Binding Relationships from Biomedical Text. In: ANLP, pp. 188–195 (2000)
Wattarujeekrit, T., Shah, P., Collier, N.: PASBio: predicate-argument structures for event extraction in molecular biology. BMC Bioinformatics 5, 155 (2004)
Tapanainen, P., Jarvinen, T.: A non-projective dependency parser. In: ANLP, pp. 64–71 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wattarujeekrit, T., Collier, N. (2005). Exploring Predicate-Argument Relations for Named Entity Recognition in the Molecular Biology Domain. In: Hoffmann, A., Motoda, H., Scheffer, T. (eds) Discovery Science. DS 2005. Lecture Notes in Computer Science(), vol 3735. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11563983_23
Download citation
DOI: https://doi.org/10.1007/11563983_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29230-2
Online ISBN: 978-3-540-31698-5
eBook Packages: Computer ScienceComputer Science (R0)