Abstract
In contemporary electronic medical records much of the clinically important data—signs and symptoms, symptom severity, disease status, etc.—are not provided in structured data fields but rather are encoded in clinician-generated narrative text. Natural language processing (NLP) provides a means of unlocking this important data source for applications in clinical decision support, quality assurance, and public health. This chapter provides an overview of representative NLP systems in biomedicine based on a unified architectural view. A general architecture in an NLP system consists of two main components: background knowledge that includes biomedical knowledge resources and a framework that integrates NLP tools to process text. Systems differ in both components, which we review briefly. Additionally, the challenge facing current research efforts in biomedical NLP includes the paucity of large, publicly available annotated corpora, although initiatives that facilitate data sharing, system evaluation, and collaborative work between researchers in clinical NLP are starting to emerge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Abbreviations
- BNF:
-
Backus–Naur form
- cTAKES:
-
Clinical Text Analysis and Knowledge Extraction System
- EMR:
-
Electronic medical record
- GATE:
-
General Architecture for Text Engineering
- LSP:
-
Linguistic String Project
- MedLee:
-
Medical Language Extraction and Encoding System
- MLP:
-
Medical language processor
- NER:
-
Named entity recognition
- NLP:
-
Natural language processing
- POS:
-
Part of speech
- UIMA:
-
Unstructured Information Management Architecture
- UMLS:
-
Unified Medical Language System
References
Sager N, Friedman C, Lyman M (1987) Medical language processing: computer management of narrative data. Addison-Wesley, Reading, MA
Lindberg DA, Humphreys BL, McCray AT (1993) The Unified Medical Language System. Methods Inf Med 32:281–291
Spyns P (1996) Natural language processing in medicine: an overview. Methods Inf Med 35:285–301
Demner-Fushman D, Chapman WW, McDonald CJ (2009) What can natural language processing do for clinical decision support? J Biomed Inform 42:760–772
Friedman C (2005) Semantic text parsing for patient records. In: Chun H, Fuller S, Friedman C et al (eds) Knowledge management and data mining in biomedicine. Springer, New York, pp 423–448
Nadkarni PM, Ohno-Machado L, Chapman WW (2011) Natural language processing: an introduction. J Am Med Inform Assoc 18:544–551
Friedman C, Elhadad N (2014) Natural language processing in health care and biomedicine. In: Shortliffe EH, Cimino J (eds) Biomedical informatics; computer applications in health care and biomedicine. Springer, London, pp 255–284
Friedman C, Rindflesch TC, Corn M (2013) Natural language processing: state of the art and prospects for significant progress, a workshop sponsored by the National Library of Medicine. J Biomed Inform 46:765–773
McCray AT, Srinivasan S, Browne AC (1994) Lexical methods for managing variation in biomedical terminologies. Proc Annu Symp Comput Appl Med Care 1994:235–239
Xu H, Stenner SP, Doan S et al (2010) MedEx: a medication information extraction system for clinical narratives. J Am Med Inform Assoc 17:19–24
Doan S, Bastarache L, Klimkowski S et al (2010) Integrating existing natural language processing tools for medication extraction from discharge summaries. J Am Med Inform Assoc 17:528–531
Sager N, Lyman M, Bucknall C et al (1994) Natural language processing and the representation of clinical data. J Am Med Inform Assoc 1:142–160
Harris Z (1968) Mathematical structures of language. Wiley, New York
Harris Z (1982) A Grammar of English on mathematical principles. Wiley, Australia
Harris Z (1991) A theory of language and information: a mathematical approach. Clarendon, Oxford
Hirschman L, Puder K (1985) Restriction grammar: a Prolog implementation. In: Warren D, van Canegham M (eds) Logic programming and its applications. Ablex Publishing Corporation, Norwood, NJ, pp 244–261
Sager N, Lyman M, Nhàn NT et al (1994) Automatic encoding into SNOMED III: a preliminary investigation. Proc Annu Symp Comput Appl Med Care 1994:230–234
Sager N, Lyman M, Nhàn NT et al (1995) Medical language processing: applications to patient data representation and automatic encoding. Methods Inf Med 34:140–146
Friedman C, Alderson PO, Austin JH et al (1994) A general natural-language processor for clinical radiology. J Am Med Inform Assoc 1:161–174
Friedman C, Cimino JJ, Johnson SB (1994) A schema for representing medical language applied to clinical radiology. J Am Med Inform Assoc 1:233–248
Knirsch CA, Jain NL, Pablos-Mendez A et al (1998) Respiratory isolation of tuberculosis patients using clinical guidelines and an automated clinical decision support system. Infect Control Hosp Epidemiol 19:94–100
Friedman C, Hripcsak G (1999) Natural language processing and its future in medicine. Acad Med 74:890–895
Friedman C, Shagina L, Lussier Y et al (2004) Automated encoding of clinical documents based on natural language processing. J Am Med Inform Assoc 11:392–402
Friedman C, Kra P, Yu H et al (2001) GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles. Bioinformatics 17:S74–S82
Haug P, Koehler S, Lau LM et al (1994) A natural language understanding system combining syntactic and semantic techniques. Proc Annu Symp Comput Appl Med Care 1994:247–251
Haug PJ, Koehler S, Lau LM et al (1995) Experience with a mixed semantic/syntactic parser. Proc Annu Symp Comput Appl Med Care 1995:284–288
Koehler S (1998) SymText: a natural language understanding system for encoding free text medical data. Doctor Dissertation, University of Utah. ISBN:0-591-82476-0
Christensen LM, Haug PJ, Fiszman M (2002) MPLUS: a probabilistic medical language understanding system. In: Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain, vol 3, pp 29–36
Haug PJ, Christensen L, Gundersen M et al (1997) A natural language parsing system for encoding admitting diagnoses. Proc AMIA Annu Fall Symp 1997:814–818
Fiszman M, Chapman WW, Evans SR et al (1999) Automatic identification of pneumonia related concepts on chest x-ray reports. Proc AMIA Symp 1999:67–71
Fiszman M, Chapman WW, Aronsky D et al (2000) Automatic detection of acute bacterial pneumonia from chest X-ray reports. J Am Med Inform Assoc 7:593–604
Aronson AR (2001) Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp 2001:17–21
Aronson AR, Lang F-M (2010) An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 17:229–236
Shah PK, Perez-Iratxeta C, Bork P et al (2003) Information extraction from full-text scientific articles: where are the keywords? BMC Bioinformatics 4:20
Meystre SM, Thibault J, Shen S et al (2010) Textractor: a hybrid system for medications and reason for their prescription extraction from clinical text documents. J Am Med Inform Assoc 17:559–562
Pakhomov S, Shah N, Hanson P et al (2008) Automatic quality of life prediction using electronic medical records. AMIA Annu Symp Proc 2008:545–549
Doan S, Lin K-W, Conway M et al (2014) PhenDisco: phenotype diversity system for the database of genotypes and phenotypes. J Am Med Inform Assoc 21:31–36
Chapman WW, Bridewell W, Hanbury P et al (2001) A simple algorithm for identifying negated findings and diseases in discharge summaries. J Biomed Inform 34:301–310
Mork JG, Bodenreider O, Demner-Fushman D et al (2010) Extracting Rx information from clinical narrative. J Am Med Inform Assoc 17:536–539
Uzuner O, Solti I, Cadag E (2010) Extracting medication information from clinical text. J Am Med Inform Assoc 17:514–518
Zeng QT, Goryachev S, Weiss S et al (2006) Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 6:30
Goryachev S, Sordo M, Zeng QT (2006) A suite of natural language processing tools developed for the I2B2 project. AMIA Annu Symp Proc 2006:931
Savova GK, Masanz JJ, Ogren PV et al (2010) Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 17:507–513
Apache Software Foundation OpenNLP. http://opennlp.apache.org/
Savova GK, Ogren PV, Duffy PH et al (2008) Mayo clinic NLP system for patient smoking status identification. J Am Med Inform Assoc 15:25–28
Sohn S, Savova GK (2009) Mayo clinic smoking status classification system: extensions and improvements. AMIA Annu Symp Proc 2009:619–623
de Bruijn B, Cherry C, Kiritchenko S et al (2011) Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 18:557–562
Albright D, Lanfranchi A, Fredriksen A et al (2012) Towards comprehensive syntactic and semantic annotations of the clinical narrative. J Am Med Inform Assoc 20:922–930
Chapman WW, Nadkarni PM, Hirschman L et al (2011) Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc 18:540–543
Ohno-Machado L, Bafna V, Boxwala AA et al (2012) iDASH: integrating data for analysis, anonymization, and sharing. J Am Med Inform Assoc 19:196–201
Denny JC (2012) Chapter 13: mining electronic health records in the genomics era. PLoS Comput Biol 8:e1002823
Acknowledgements
S.D. and L.O.M. were funded in part by NIH grants U54HL108460 and UH3HL108785.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this protocol
Cite this protocol
Doan, S., Conway, M., Phuong, T.M., Ohno-Machado, L. (2014). Natural Language Processing in Biomedicine: A Unified System Architecture Overview. In: Trent, R. (eds) Clinical Bioinformatics. Methods in Molecular Biology, vol 1168. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-0847-9_16
Download citation
DOI: https://doi.org/10.1007/978-1-4939-0847-9_16
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-0846-2
Online ISBN: 978-1-4939-0847-9
eBook Packages: Springer Protocols