Abstract
Recent advances in technology have led to the exponential growth of scientific literature in biomedical sciences. This rapid increase in information has surpassed the threshold for manual curation efforts, necessitating the use of text mining approaches in the field of life sciences. One such application of text mining is in fostering in silico drug discovery such as drug target screening, pharmacogenomics, adverse drug event detection, etc. This chapter serves as an introduction to the applications of various text mining approaches in drug discovery. It is divided into two parts with the first half as an overview of text mining in the biosciences. The second half of the chapter reviews strategies and methods for four unique applications of text mining in drug discovery.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Reichert JM (2003) Trends in development and approval times for new therapeutics in the United States. Nat Rev Drug Discov 2(9):695–702. https://doi.org/10.1038/nrd1178
Woodcock J, Woosley R (2008) The FDA critical path initiative and its influence on new drug development. Annu Rev Med 59:1–12. https://doi.org/10.1146/annurev.med.59.090506.155819
Claus BL, Underwood DJ (2002) Discovery informatics: its evolving role in drug discovery. Drug Discov Today 7(18):957–966
Percha B, Garten Y, Altman RB (2012) Discovery and explanation of drug-drug interactions via text mining. Pac Symp Biocomput:410–421
Huang CC, Lu Z (2016) Discovering biomedical semantic relations in PubMed queries for information retrieval and database curation. Database (Oxford) 2016. https://doi.org/10.1093/database/baw025
Kraus M, Niedermeier J, Jankrift M, Tietbohl S, Stachewicz T, Folkerts H, Uflacker M, Neves M (2017) Olelo: a web application for intuitive exploration of biomedical literature. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx363
Bodenreider O (2004) The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 32(Database issue):D267–D270. https://doi.org/10.1093/nar/gkh061
Mattingly CJ, Colby GT, Forrest JN, Boyer JL (2003) The comparative Toxicogenomics database (CTD). Environ Health Perspect 111(6):793–795
Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34(Database issue):D668–D672. https://doi.org/10.1093/nar/gkj067
Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA, Wang J, Yu B, Zhang J, Bryant SH (2016) PubChem substance and compound databases. Nucleic Acids Res 44(D1):D1202–D1213. https://doi.org/10.1093/nar/gkv951
Nelson SJ, Zeng K, Kilbourne J, Powell T, Moore R (2011) Normalized names for clinical drugs: RxNorm at 6 years. J Am Med Inform Assoc 18(4):441–448. https://doi.org/10.1136/amiajnl-2011-000116
Krallinger M, Rabal O, Lourenco A, Oyarzabal J, Valencia A (2017) Information retrieval and text mining technologies for chemistry. Chem Rev 117(12):7673–7761. https://doi.org/10.1021/acs.chemrev.6b00851
Leaman R, Gonzalez G (2008) BANNER: an executable survey of advances in biomedical named entity recognition. Pac Symp Biocomput:652–663
Leaman R, Islamaj Dogan R, Lu Z (2013) DNorm: disease name normalization with pairwise learning to rank. Bioinformatics 29(22):2909–2917. https://doi.org/10.1093/bioinformatics/btt474
Leaman R, Lu Z (2016) TaggerOne: joint named entity recognition and normalization with semi-Markov models. Bioinformatics 32(18):2839–2846. https://doi.org/10.1093/bioinformatics/btw343
Swain MC, Cole JM (2016) ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J Chem Inf Model 56(10):1894–1904. https://doi.org/10.1021/acs.jcim.6b00207
Leaman R, Wei CH, Lu Z (2015) tmChem: a high performance approach for chemical named entity recognition and normalization. J Chem 7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S3. https://doi.org/10.1186/1758-2946-7-S1-S3
Iyer SV, Harpaz R, LePendu P, Bauer-Mehren A, Shah NH (2014) Mining clinical text for signals of adverse drug-drug interactions. J Am Med Inform Assoc 21(2):353–362. https://doi.org/10.1136/amiajnl-2013-001612
Han X, Kim JJ, Kwoh CK (2016) Active learning for ontological event extraction incorporating named entity recognition and unknown word handling. J Biomed Semantics 7:22. https://doi.org/10.1186/s13326-016-0059-z
Singhal A, Simmons M, Lu Z (2016) Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature. J Am Med Inform Assoc 23(4):766–772. https://doi.org/10.1093/jamia/ocw041
Xu J, Wu Y, Zhang Y, Wang J, Lee HJ, Xu H (2016) CD-REST: a system for extracting chemical-induced disease relation in literature. Database (Oxford) 2016. https://doi.org/10.1093/database/baw036
Sohn S, Kocher JP, Chute CG, Savova GK (2011) Drug side effect extraction from clinical narratives of psychiatry and psychology patients. J Am Med Inform Assoc 18(Suppl 1):i144–i149. https://doi.org/10.1136/amiajnl-2011-000351
Dalleau K, Marzougui Y, Da Silva S, Ringot P, Ndiaye NC, Coulet A (2017) Learning from biomedical linked data to suggest valid pharmacogenes. J Biomed Semantics 8(1):16. https://doi.org/10.1186/s13326-017-0125-1
Singhal A, Leaman R, Catlett N, Lemberger T, McEntyre J, Polson S, Xenarios I, Arighi C, Lu Z (2016) Pressing needs of biomedical text mining in biocuration and beyond: opportunities and challenges. Database (Oxford) 2016. https://doi.org/10.1093/database/baw161
Jensen PB, Jensen LJ, Brunak S (2012) Mining electronic health records: towards better research applications and clinical care. Nat Rev Genet 13(6):395–405. https://doi.org/10.1038/nrg3208
Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG (2016) MIMIC-III, a freely accessible critical care database. Sci Data 3:160035. https://doi.org/10.1038/sdata.2016.35
Fleurence RL, Curtis LH, Califf RM, Platt R, Selby JV, Brown JS (2014) Launching PCORnet, a national patient-centered clinical research network. J Am Med Inform Assoc 21(4):578–582. https://doi.org/10.1136/amiajnl-2014-002747
Dey N, Williams C, Leyland-Jones B, De P (2017) Mutation matters in precision medicine: a future to believe in. Cancer Treat Rev 55:136–149. https://doi.org/10.1016/j.ctrv.2017.03.002
Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J, Jang W, Katz K, Ovetsky M, Riley G, Sethi A, Tully R, Villamarin-Salomon R, Rubinstein W, Maglott DR (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res 44(D1):D862–D868. https://doi.org/10.1093/nar/gkv1222
Hamosh A, Scott AF, Amberger J, Valle D, McKusick VA (2000) Online Mendelian inheritance in man (OMIM). Hum Mutat 15(1):57–61. https://doi.org/10.1002/(SICI)1098-1004(200001)15:1<57::AID-HUMU12>3.0.CO;2-G
Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, Wooster R (2004) The COSMIC (catalogue of somatic mutations in cancer) database and website. Br J Cancer 91(2):355–358. https://doi.org/10.1038/sj.bjc.6601894
MacArthur J, Bowler E, Cerezo M, Gil L, Hall P, Hastings E, Junkins H, McMahon A, Milano A, Morales J, Pendlington ZM, Welter D, Burdett T, Hindorff L, Flicek P, Cunningham F, Parkinson H (2017) The new NHGRI-EBI catalog of published genome-wide association studies (GWAS catalog). Nucleic Acids Res 45(D1):D896–D901. https://doi.org/10.1093/nar/gkw1133
Cheng D, Knox C, Young N, Stothard P, Damaraju S, Wishart DS (2008) PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res 36(Web Server issue):W399–W405. https://doi.org/10.1093/nar/gkn296
Rebholz-Schuhmann D, Marcel S, Albert S, Tolle R, Casari G, Kirsch H (2004) Automatic extraction of mutations from Medline and cross-validation with OMIM. Nucleic Acids Res 32(1):135–142. https://doi.org/10.1093/nar/gkh162
Doughty E, Kertesz-Farkas A, Bodenreider O, Thompson G, Adadey A, Peterson T, Kann MG (2011) Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature. Bioinformatics 27(3):408–415. https://doi.org/10.1093/bioinformatics/btq667
Wei CH, Kao HY, Lu Z (2015) GNormPlus: an integrative approach for tagging genes, gene families, and protein domains. Biomed Res Int 2015:918710. https://doi.org/10.1155/2015/918710
Wei CH, Harris BR, Kao HY, Lu Z (2013) tmVar: a text mining approach for extracting sequence variants in biomedical literature. Bioinformatics 29(11):1433–1439. https://doi.org/10.1093/bioinformatics/btt156
Ravikumar KE, Wagholikar KB, Li D, Kocher JP, Liu H (2015) Text mining facilitates database curation - extraction of mutation-disease associations from bio-medical literature. BMC Bioinformatics 16:185. https://doi.org/10.1186/s12859-015-0609-x
Torii M, Hu Z, Wu CH, Liu H (2009) BioTagger-GM: a gene/protein name recognition system. J Am Med Inform Assoc 16(2):247–255. https://doi.org/10.1197/jamia.M2844
Caporaso JG, Baumgartner WA Jr, Randolph DA, Cohen KB, Hunter L (2007) MutationFinder: a high-performance system for extracting point mutation mentions from text. Bioinformatics 23(14):1862–1865. https://doi.org/10.1093/bioinformatics/btm235
Wei CH, Kao HY, Lu Z (2013) PubTator: a web-based text mining tool for assisting biocuration. Nucleic Acids Res 41(Web Server issue):W518–W522. https://doi.org/10.1093/nar/gkt441
Wermter J, Tomanek K, Hahn U (2009) High-performance gene name normalization with GeNo. Bioinformatics 25(6):815–821. https://doi.org/10.1093/bioinformatics/btp071
Mahmood AS, Wu TJ, Mazumder R, Vijay-Shanker K (2016) DiMeX: a text mining system for mutation-disease association extraction. PLoS One 11(4):e0152725. https://doi.org/10.1371/journal.pone.0152725
Van Cutsem E, Kohne CH, Hitre E, Zaluski J, Chien CRC, Makhson A, D'Haens G, Pinter T, Lim R, Bodoky G, Roh JK, Folprecht G, Ruff P, Stroh C, Tejpar S, Schlichting M, Nippgen J, Rougier P (2009) Cetuximab and chemotherapy as initial treatment for metastatic colorectal cancer. New Engl J Med 360(14):1408–1417. https://doi.org/10.1056/Nejmoa0805019
Hewett M, Oliver DE, Rubin DL, Easton KL, Stuart JM, Altman RB, Klein TE (2002) PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res 30(1):163–165
Maglott D, Ostell J, Pruitt KD, Tatusova T (2011) Entrez gene: gene-centered information at NCBI. Nucleic Acids Res 39:D52–D57. https://doi.org/10.1093/nar/gkq1237
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29(1):308–311
Pakhomov S, McInnes BT, Lamba J, Liu Y, Melton GB, Ghodke Y, Bhise N, Lamba V, Birnbaum AK (2012) Using PharmGKB to train text mining approaches for identifying potential gene targets for pharmacogenomic studies. J Biomed Inform 45(5):862–869. https://doi.org/10.1016/j.jpi.2012.04.007
Ian H, Witten EF (2011) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco
Xu R, Wang Q (2013) A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine. J Biomed Inform 46(4):585–593. https://doi.org/10.1016/j.jbi.2013.04.001
Hakenberg J, Voronov D, Nguyen VH, Liang S, Anwar S, Lumpkin B, Leaman R, Tari L, Baral C (2012) A SNPshot of PubMed to associate genetic variants with drugs, diseases, and adverse reactions. J Biomed Inform 45(5):842–850. https://doi.org/10.1016/j.jbi.2012.04.006
Chang JT, Altman RB (2004) Extracting and characterizing gene-drug relationships from the literature. Pharmacogenetics 14(9):577–586
Lakiotaki K, Kartsaki E, Kanterakis A, Katsila T, Patrinos GP, Potamias G (2016) ePGA: a web-based information system for translational pharmacogenomics. PLoS One 11(9). ARTN e0162801). https://doi.org/10.1371/journal.pone.0162801
Dalma-Weiszhausz DD, Warrington J, Tanimoto EY, Miyada CG (2006) The affymetrix GeneChip platform: an overview. Methods Enzymol 410:3–28. https://doi.org/10.1016/S0076-6879(06)10001-4
Ding H, Takigawa I, Mamitsuka H, Zhu S (2014) Similarity-based machine learning methods for predicting drug-target interactions: a brief review. Brief Bioinform 15(5):734–747. https://doi.org/10.1093/bib/bbt056
Cheng F, Liu C, Jiang J, Lu W, Li W, Liu G, Zhou W, Huang J, Tang Y (2012) Prediction of drug-target interactions and drug repositioning via network-based inference. PLoS Comput Biol 8(5):e1002503. https://doi.org/10.1371/journal.pcbi.1002503
Chen B, Ding Y, Wild DJ (2012) Assessing drug target association using semantic linked data. PLoS Comput Biol 8(7). ARTN e1002574). https://doi.org/10.1371/journal.pcbi.1002574
Chen B, Dong X, Jiao D, Wang H, Zhu Q, Ding Y, Wild DJ (2010) Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics 11:255. https://doi.org/10.1186/1471-2105-11-255
Chen B, Ding Y, Wild DJ (2012) Improving integrative searching of systems chemical biology data using semantic annotation. J Chem 4(1):6. https://doi.org/10.1186/1758-2946-4-6
Zong N, Kim H, Ngo V, Harismendy O (2017) Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations. Bioinformatics 33(15):2337–2344. https://doi.org/10.1093/bioinformatics/btx160
Xu R, Wang Q (2014) Automatic construction of a large-scale and accurate drug-side-effect association knowledge base from biomedical literature. J Biomed Inform 51:191–199. https://doi.org/10.1016/j.jbi.2014.05.013
Schriml LM, Arze C, Nadendla S, Chang YW, Mazaitis M, Felix V, Feng G, Kibbe WA (2012) Disease ontology: a backbone for disease semantic integration. Nucleic Acids Res 40(Database issue):D940–D946. https://doi.org/10.1093/nar/gkr972
Brown EG, Wood L, Wood S (1999) The medical dictionary for regulatory activities (MedDRA). Drug Saf 20(2):109–117
Canada A, Capella-Gutierrez S, Rabal O, Oyarzabal J, Valencia A, Krallinger M (2017) LimTox: a web tool for applied text mining of adverse event and toxicity associations of compounds, drugs and genes. Nucleic Acids Res. https://doi.org/10.1093/nar/gkx462
Iqbal E, Mallah R, Jackson RG, Ball M, Ibrahim ZM, Broadbent M, Dzahini O, Stewart R, Johnston C, Dobson RJ (2015) Identification of adverse drug events from free text electronic patient records and information in a large mental health case register. PLoS One 10(8):e0134208. https://doi.org/10.1371/journal.pone.0134208
Wang G, Jung K, Winnenburg R, Shah NH (2015) A method for systematic discovery of adverse drug events from clinical notes. J Am Med Inform Assoc 22(6):1196–1204. https://doi.org/10.1093/jamia/ocv102
Takarabe M, Kotera M, Nishimura Y, Goto S, Yamanishi Y (2012) Drug target prediction using adverse event report systems: a pharmacogenomic approach. Bioinformatics 28(18):I611–I618. https://doi.org/10.1093/bioinformatics/bts413
Acknowledgments
This research was supported by the NIH Intramural Research Program, National Library of Medicine, and the NIH Medical Research Scholars Program, a public-private partnership supported jointly by the NIH and generous contributions to the Foundation for the NIH from the Doris Duke Charitable Foundation, the Howard Hughes Medical Institute, the American Association for Dental Research, the Colgate-Palmolive Company, and other private donors. No funds from the Doris Duke Charitable Foundation were used to support research that used animals. This work was also supported by the National Natural Science Foundation of China (Grant No. 81601573), the National Key Research and Development Program of China (Grant No. 2016YFC0901901), the National Population and Health Scientific Data Sharing Program of China, and the Knowledge Centre for Engineering Sciences and Technology (Medical Centre) and the Key Laboratory of Knowledge Technology for Medical Integrative Publishing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Zheng, S., Dharssi, S., Wu, M., Li, J., Lu, Z. (2019). Text Mining for Drug Discovery. In: Larson, R., Oprea, T. (eds) Bioinformatics and Drug Discovery. Methods in Molecular Biology, vol 1939. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-9089-4_13
Download citation
DOI: https://doi.org/10.1007/978-1-4939-9089-4_13
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-9088-7
Online ISBN: 978-1-4939-9089-4
eBook Packages: Springer Protocols