Abstract
Concept mapping is a fundamental task in biomedical text mining in which textual mentions of concepts of interest are annotated with specific entries of lexicons, terminologies, ontologies, or databases representing these concepts. Though there has been a significant amount of research, there are still a limited number of practical, publicly available tools for concept mapping of biomedical text specified by the user as an independent task. In this chapter, several tools that can automatically map biomedical text to concepts from a wide range of terminological resources are presented, followed by those that can map to more restricted sets of these resources. This presentation is intended to serve as a guide to researchers without a background in biomedical concept mapping of text for the selection of an appropriate tool based on usability, scalability, configurability, balance between precision and recall, and the desired set of terminological resources with which to annotate the text. Only with effective automatic concept-mapping tools will systems be able to scalably analyze the biomedical literature and other large sets of documents as a fundamental part of more complex text-mining tasks such as information extraction and hypothesis evaluation and generation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nadeau K, Sekine S (2007) A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1):3–26
Tanabe L, Xie N, Thom LH, Matten W, Wilbur WJ (2005) GENETAG: a tagged corpus for gene/protein named entity recognition. BMC Bioinform 6(Suppl I):S3
Krauthammer M, Nenadic G (2004) Term identification in the biomedical literature. J Biomed Inform 37:512–526
Morgan AA, Lu Z, Wang X, Cohen AM, Fluck J, Ruch P, Divoli A, Fundel K, Leaman R, Hakenburg J, Sun C, Liu H-H, Torres R, Krauthammer M, Lau WM, Liu H, Hsu C-N, Schuemie M, Cohen KB, Hirschman L (2008) Overview of BioCreative II gene normalization. Gen Biol 9(Suppl 2):S3
Bales ME, Lussier YA, Johnson SB (2007) Topological analysis of large-scale biomedical terminology structures. J Am Med Inform Assoc 14:788–797
Whetzel PL, Noy NF, Shah NH, Alexander RR, Nyulas C, Tudorache T, Musen MA (2011) BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic Acids Res 39(Web Server issue):W541–W545
Chen L, Liu H, Friedman C (2005) Gene name ambiguity of eukaryotic nomenclatures. Bioinformatics 21:248–255
Hirschman L, Morgan AA, Yeh AS (2002) Rutabaga by any other name: extracting biological names. J Biomed Inform 35(4): 247–259
McCray AT, Browne AC, Bodenreider O (2002) The lexical properties of the gene ontology. Proc AMIA Annual Symp, 504–508
Kim JD, Ohta T, Tateisi Y, Tsujii J (2003) GENIA corpus: a semantically annotated corpus for bio-text mining. Bioinformatics 19(Suppl 1):i180–i182
Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T (2007) BioInfer: a corpus for information extraction in the biomedical domain. BMC Bioinform 8:50
Bada M, Eckert M, Evans D, Garcia K, Shipley K, Sitnikov D, Baumgartner Jr. WA, Cohen KB, Verspoor V, Blake JA, Hunter LE (2012) Concept annotation in the CRAFT corpus. BMC Bioinform 13:161
Briscoe T (1991) Lexical issues in natural language processing. In: Klein E, Veltman F (eds) Natural language and speech. Springer, Berlin
Hirst G (2009) Ontology and the Lexicon. In: Staab S, Studer S (eds) Handbook on ontologies. Springer, Berlin, pp 269–292
Fellbaum C (1998) WordNet: an electronic lexical database. MIT Press, Cambridge, MA
McCray AT, Srinavasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care, 235–239
Quochi V, Monachini M, Del Gratta R, Calzolari N (2008) A lexicon for biology and bioinformatics: the BOOTStrep experience. Proceedings international conf on language resources and evaluation (LREC) 2008, Marrakech, Morocco
Chute C (2000) Clinical classification and terminology: some history and current observations. J Am Med Informatics Assoc 7(3): 298–303
Svenonius E (2003) Design of controlled vocabularies. In: Drake M (ed) Encyclopedia of library and information science. Marcel Dekker, New York, NY, pp 822–838
Ingenerf J, Pöppl S (2007) Biomedical vocabularies: the demand for differentiation. Proc Internat Conf Med Informatics (MEDINFO) 2007, Brisbane
Sioutos N, de Coronado S, Haber MW, Hartel FW, Shaiu W-L, Wright LW (2007) NCI thesaurus: a semantic model integrating cancer-related clinical and molecular information. J Biomed Inform 40:30–43
Gray KA, Daugherty LC, Gordon SM, Seal RL, Wright MW, Bruford EA (2013) Genenames.org: the HGNC resources in 2013. Nucl Acids Res 41(Database issue):D545–D552
The UniProt Consortium (2012) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40(D1): D71–D75
Smith B (2003) Ontology. In: Floridi L (ed) Blackwell guide to the philosophy of computing and information. Blackwell, Oxford, pp 155–166
Gruber TR (1995) Toward principles for the design of ontologies used for knowledge sharing. Int J Hum Comp Stud 43(5/6):907–928
Bodenreider O, Stevens R (2006) Bio-ontologies: current trends and future directions. Brief Bioinform 7(3):256–274
Rubin DL, Shah NH, Noy NF (2007) Biomedical ontologies: a functional perspective. Brief Bioinform 9(1):75–90
Smith B, Ashburner M, Rosse C, Bard C, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, The OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S (2007) The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol 25:1251–1255
The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29
Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236
Schuyler PL, Hole WT, Tuttle MS, Sherertz DD (1993) The UMLS Metathesaurus: representing different views of biomedical concepts. Bull Med Libr Assoc 81(2):217–222
Dai M, Shah NH, Xuan W, Musen MA, Watson SJ, Athey BD, Meng F (2008) An efficient solution for mapping free text to ontology terms. Proc AMIA Summit Translat Bioinform
Jonquet C, Shah NH, Musen MA (2009) The open biomedical annotator. Proc AMIA Summit Translat Bioinform
Tanenblatt M, Coden A, Saminsky I (2010) The ConceptMapper approach to named entity recognition. Proc 7th Internat Conf Lang Resources and Eval (LREC)
Ferrucci D, Lally A (2004) UIMA: An architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 10(3–4):327–348
Schuemie MJ, Jelier R, Kors JA (2007) Peregrine: lightweight gene name normalization by dictionary lookup. Proc 2nd BioCreative Challenge Evaluation Workshop, 131–133
Browne AC, Divita G, Lu C, McCreedy L, Nace D (2003) Lexical systems; a report to the board of scientific counselors. Lister Hill National Center for Biomedical Communications Technical Report LHNCBC-TR-2003-003
Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA (2009) Comparison of concept recognizers for building the open biomedical annotator. BMC Bioinform 10 (Suppl 9):S14
Stewart SA, von Maltzahn ME, Abidi SSR (2012) Comparing MetaMa to MGrep as a tool for mapping free text to formal medical lexicons. Proc 1st international workshop on knowledge extraction and consolidation from social media (KECSM)
Hripcsak G, Rothschild AS (2005) Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 12:296–298
Funk C, Baumgartner Jr. W, Garcia B, Roeder C, Bada M, Cohen KB, Hunter LE, Verspoor K (2013) Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters. BMC Bioinform
Kang N, Singh B, Afzal Z, van Mulligen EM, Kors JA (2013) Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc 0:1–6
Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez Gene: gene-centered information at NCBI. Nucl Acids Res 39(Database Issue):D52–D57
Wermter J, Tomanek K, Hahn U (2009) High-performance gene name normalization with GENO. Bioinformatics 25(6):815–821
Hakenberg J, Gerner M, Haeussler M, Solt I, Plake C, Schroeder M, Gonzalez G, Nenadic G, Bergman CM (2011) The GNAT library for local and remote gene mention normalization. Bioinformatics 27(19):2769–2771
Sayers EW, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Mizrachi I, Ostell J, Pruitt KD, Schuler GD, Sequeria E, Sherry ST, Shumway M, Sirotkin K, Souvarov A, Starchenko G, Tatusova TA, Wagner L, Yaschenko E, Ye J (2009) Database resources of the National Center for Biotechnology Information. Nucl Acids Res 37(Database Issue):D5–D15
Gerner M, Nenadic G, Bergman CM (2010) LINNAEUS: a species name identification system for biomedical literature. BMC Bioinform 11:85
Degtyarenko K, de Matos P, Ennis M, Hastings J, Zbinden M, McNaught A, Alcantara R, Darsow M, Guedj M, Ashburner M (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucl Acids Res 36(Database Issue):D344–D350
Jessop DM, Adams SE, Willighagen EL, Hawizy L, Murray-Rust P (2011) OSCAR4: a flexible architecture for chemical text-mining. J Cheminform 3:41
Weisgerber DW (1997) Chemical abstracts service chemical registry system: history, scope, and impacts. J Am Soc Inform Sci 48(4): 349–360
Tomasulo P (2002) ChemIDplus: super source for chemical and drug information. Med Ref Serv Q 21(1):53–59
Li Q, Cheng T, Wang Y, Bryant SH (2010) PubChem as a public resource for drug discovery. Drug Discov Today 15(23–24):1052–1057
Rocktäschel T, Weidlich M, Leser U (2012) ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics 28(12): 1633–1640
Knox C, Law V, Jewison T, Liu P, Ly S, Frolkis A, Pon A, Banco K, Mak C, Neveu V, Djombou Y, Eisner R, Guo AC, Wishart DS (2011) DrugBank 3.0: a comprehensive resource for “Omics” research on drugs. Nucl Acids Res 39(Database Issue): D1035–D1041
Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A (2008) Text processing through Web services: calling Whatizit. Bioinformatics 24(2):296–298
Doms A, Schroeder M (2005) GoPubMed: exploring PubMed with the gene ontology. Nucl Acids Res 33(Web Server Issue):W783–W786
Pafilis E, Donoghue SI, Jensen LJ, Horn H, Kuhn M, Brown NP, Schneider R (2009) Reflect: augmented browsing for the life scientist. Nat Biotechnol 27:508–510
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Bada, M. (2014). Mapping of Biomedical Text to Concepts of Lexicons, Terminologies, and Ontologies. In: Kumar, V., Tipney, H. (eds) Biomedical Literature Mining. Methods in Molecular Biology, vol 1159. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0709-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0709-0_3
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-0708-3
Online ISBN: 978-1-4939-0709-0
eBook Packages: Springer Protocols