Abstract
The pipeline of drug discovery consists of a number of processes; drug–target interaction determination is one of the salient steps among them. Computational prediction of drug–target interactions can facilitate in reducing the search space of experimental wet lab-based verifications steps, thus considerably reducing time and other resources dedicated to the drug discovery pipeline. While machine learning-based methods are more widespread for drug–target interaction prediction, network-centric methods are also evolving. In this chapter, we focus on the process of the drug–target interaction prediction from the perspective of using machine learning algorithms and the various stages involved for developing an accurate predictor.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hughes J, Rees S, Kalindjian S, Philpott K (2011) Principles of early drug discovery. Br J Pharmacol 162(6):1239–1249. https://doi.org/10.1111/j.1476-5381.2010.01127.x
Xu L, Ru X, Song R (2021) Application of machine learning for drug–target interaction prediction. Front Genet 12:680117. https://doi.org/10.3389/fgene.2021.680117
Sachdev K, Gupta MK (2019) A comprehensive review of feature based methods for drug target interaction prediction. J Biomed Inform 93:103159. https://doi.org/10.1016/j.jbi.2019.103159
Anusuya S, Kesherwani M, Priya VK, Vimala A, Shanmugam G, Velmurugan D, Gromiha MM (2018) Drug-target interactions: prediction methods and applications. Curr Protein Pept Sci 19(6):537–561. https://doi.org/10.2174/1389203718666161108091609
Bagherian M, Sabeti E, Wang K, Sartor MA, Nikolovska-Coleska Z, Najarian K (2020) Machine learning approaches and databases for prediction of drug–target interaction: a survey paper. Brief Bioinform 22(1):247–269. https://doi.org/10.1093/bib/bbz157
Peng Y, Wang J, Wu Z, Zheng L, Wang B, Liu G, Li W, Tang Y (2022) MPSM-DTI: prediction of drug–target interaction via machine learning based on the chemical structure and protein sequence. Digital Discovery 1(2):115–126. https://doi.org/10.1039/d1dd00011j
Ezzat A, Wu M, Li X-L, Kwoh C-K (2016) Drug-target interaction prediction via class imbalance-aware ensemble learning. BMC Bioinformatics 17(19):509. https://doi.org/10.1186/s12859-016-1377-y
Wang L, You Z-H, Yan X, Liu G, Zhang W (2018) RFDT: a rotation forest-based predictor for predicting drug-target interactions using drug structure and protein sequence information. Curr Protein Pept Sci 19:445–454. https://doi.org/10.2174/1389203718666161114111656
Xiao X, Min J-L, Wang P, Chou K-C (2013) iGPCR-drug: a web server for predicting interaction between GPCRs and drugs in cellular networking. PLoS One 8(8):e72234
Pliakos K, Vens C (2020) Drug-target interaction prediction with tree-ensemble learning and output space reconstruction. BMC Bioinformatics 21(1):49. https://doi.org/10.1186/s12859-020-3379-z
Shi H, Liu S, Chen J, Li X, Ma Q, Yu B (2019) Predicting drug-target interactions using Lasso with random forest based on evolutionary information and chemical structure. Genomics 111(6):1839–1852. https://doi.org/10.1016/j.ygeno.2018.12.007
Pan J, Li L-P, You Z-H, Yu C-Q, Ren Z-H, Chen Y (2021) Prediction of drug–target interactions by combining dual-tree complex wavelet transform with ensemble learning method. Molecules 26(17):5359
Xuan P, Sun C, Zhang T, Ye Y, Shen T, Dong Y (2019) Gradient boosting decision tree-based method for predicting interactions between target genes and drugs. Front Genet 10:459. https://doi.org/10.3389/fgene.2019.00459
Chu Y, Shan X, Chen T, Jiang M, Wang Y, Wang Q, Salahub DR, Xiong Y, Wei D-Q (2020) DTI-MLCD: predicting drug-target interactions using multi-label learning with community detection method. Brief Bioinform 22(3):bbaa205. https://doi.org/10.1093/bib/bbaa205
Wu Z, Li W, Liu G, Tang Y (2018) Network-based methods for prediction of drug-target interactions. Front Pharmacol 9:1134. https://doi.org/10.3389/fphar.2018.01134
Fakhraei S, Huang B, Raschid L, Getoor L (2014) Network-based drug-target interaction prediction with probabilistic soft logic. IEEE/ACM Trans Comput Biol Bioinform 11(5):775–787. https://doi.org/10.1109/tcbb.2014.2325031
Ye Q, Hsieh C-Y, Yang Z, Kang Y, Chen J, Cao D, He S, Hou T (2021) A unified drug–target interaction prediction framework based on knowledge graph and recommendation system. Nat Commun 12(1):6775. https://doi.org/10.1038/s41467-021-27137-3
Yamanishi Y, Araki M, Gutteridge A, Honda W, Kanehisa M (2008) Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics 24(13):i232–i240. https://doi.org/10.1093/bioinformatics/btn162
Wang Y, Zeng J (2013) Predicting drug-target interactions using restricted Boltzmann machines. Bioinformatics 29:i126–i134. https://doi.org/10.1093/bioinformatics/btt234
Cao D-S, Zhang L-X, Tan G-S, Xiang Z, Zeng W-B, Xu Q-S, Chen AF (2014) Computational prediction of drug target interactions using chemical, biological, and network features. Mol Inform 33(10):669–681. https://doi.org/10.1002/minf.201400009
Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M (2007) DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 36(suppl_1):D901–D906. https://doi.org/10.1093/nar/gkm958
Chen X, Ji ZL, Chen YZ (2002) TTD: therapeutic target database. Nucleic Acids Res 30(1):412–415. https://doi.org/10.1093/nar/30.1.412
Gao Z, Li H, Zhang H, Liu X, Kang L, Luo X, Zhu W, Chen K, Wang X, Jiang H (2008) PDTD: a web-accessible protein database for drug target identification. BMC Bioinformatics 9(1):104. https://doi.org/10.1186/1471-2105-9-104
Kim S, Chen J, Cheng T, Gindulyte A, He J, He S, Li Q, Shoemaker BA, Thiessen PA, Yu B, Zaslavsky L, Zhang J, Bolton EE (2020) PubChem in 2021: new data content and improved web interfaces. Nucleic Acids Res 49(D1):D1388–D1395. https://doi.org/10.1093/nar/gkaa971
Gaulton A, Bellis LJ, Bento AP, Chambers J, Davies M, Hersey A, Light Y, McGlinchey S, Michalovich D, Al-Lazikani B, Overington JP (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100–D1107. https://doi.org/10.1093/nar/gkr777
Irwin JJ, Shoichet BK (2005) ZINC − a free database of commercially available compounds for virtual screening. J Chem Inf Model 45(1):177–182. https://doi.org/10.1021/ci049714+
Irwin JJ, Tang KG, Young J, Dandarchuluun C, Wong BR, Khurelbaatar M, Moroz YS, Mayfield J, Sayle RA (2020) ZINC20—a free Ultralarge-scale chemical database for ligand discovery. J Chem Inf Model 60(12):6065–6073. https://doi.org/10.1021/acs.jcim.0c00675
Mohanraj K, Karthikeyan BS, Vivek-Ananth RP, Chand RPB, Aparna SR, Mangalapandi P, Samal A (2018) IMPPAT: a curated database of Indian medicinal plants, phytochemistry and therapeutics. Sci Rep 8(1):4329. https://doi.org/10.1038/s41598-018-22631-z
Gilson MK, Liu T, Baitaluk M, Nicola G, Hwang L, Chong J (2015) BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res 44(D1):D1045–D1053. https://doi.org/10.1093/nar/gkv1072
Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model 54(3):735–743. https://doi.org/10.1021/ci400709d
Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29(11):1046–1051. https://doi.org/10.1038/nbt.1990
Jo T, Japkowicz N (2004) Class imbalances versus small disjuncts. SIGKDD Explor Newsl 6(1):40–49. https://doi.org/10.1145/1007730.1007737
Wei Q, Dunbrack RL Jr (2013) The role of balanced training and testing data sets for binary classifiers in bioinformatics. PLoS One 8(7):e67863
Nath A, Karthikeyan S (2017) Enhanced prediction and characterization of CDK inhibitors using optimal class distribution. Interdisciplinary Sciences: Computational Life Sciences 9(2):292–303. https://doi.org/10.1007/s12539-016-0151-1
Mohammed R, Rawashdeh J, Abdullah M Machine learning with oversampling and Undersampling techniques: overview study and experimental results. In: 2020 11th international conference on information and communication systems (ICICS), 7–9 April 2020 2020, pp 243–248. https://doi.org/10.1109/icics49469.2020.239556
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357
Han H, Wang W-Y, Mao B-H (2005) Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In: Huang DS, Zhang XP, Huang GB (eds) Advances in intelligent computing. ICIC 2005, Lecture notes in computer science, vol 3644. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11538059_91
Nakamura M, Kajiwara Y, Otsuka A, Kimura H (2013) LVQ-SMOTE – Learning Vector Quantization based Synthetic Minority Over–sampling technique for biomedical data. BioData mining 6:16. https://doi.org/10.1186/1756-0381-6-16
Barua S, Islam MM, Yao X, Murase K (2014) MWMOTE--Majority Weighted Minority Oversampling Technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425. https://doi.org/10.1109/tkde.2012.232
Batista G, Prati R, Monard M-C (2004) A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explorations 6:20–29. https://doi.org/10.1145/1007730.1007735
Yadav A, Sahu R, Nath A (2020) A representation transfer learning approach for enhanced prediction of growth hormone binding proteins. Comput Biol Chem 87:107274. https://doi.org/10.1016/j.compbiolchem.2020.107274
Kennard RW, Stone LA (1969) Computer aided design of experiments. Technometrics 11(1):137–148. https://doi.org/10.1080/00401706.1969.10490666
Sahu R, Yadav A, Nath A (2021) Estimation of maximum recommended therapeutic dose of anti-retroviral drugs using diversified sampling and varied descriptors. Minerva Biotechnol Biomol Res 33(4):210–218
Jain AK (2008) Data clustering: 50 years beyond K-means. In: Daelemans W, Goethals B, Morik K (eds) Machine learning and knowledge discovery in databases. ECML PKDD 2008. Lecture notes in computer science(), vol 5211. Springer, Berlin, Heidelberg
Nath A, Subbiah K (2016) Unsupervised learning assisted robust prediction of bioluminescent proteins. Comput Biol Med 68:27–36. https://doi.org/10.1016/j.compbiomed.2015.10.013
Chen Z, Zhao P, Li F, Leier A, Marquez-Lago TT, Wang Y, Webb GI, Smith AI, Daly RJ, Chou K-C, Song J (2018) iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences. Bioinformatics 34(14):2499–2502. https://doi.org/10.1093/bioinformatics/bty140
Wang J, Yang B, Revote J, Leier A, Marquez-Lago TT, Webb G, Song J, Chou K-C, Lithgow T (2017) POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles. Bioinformatics 33(17):2756–2758. https://doi.org/10.1093/bioinformatics/btx302
Mohammadi A, Zahiri J, Mohammadi S, Khodarahmi M, Arab SS (2022) PSSMCOOL: a comprehensive R package for generating evolutionary-based descriptors of protein sequences from PSSM profiles. Biol Methods Protoc 7(1):bpac008. https://doi.org/10.1093/biomethods/bpac008
Cao D-S, Xu Q-S, Liang Y-Z (2013) propy: a tool to generate various modes of Chou’s PseAAC. Bioinformatics 29(7):960–962. https://doi.org/10.1093/bioinformatics/btt072
Xiao N, Cao D-S, Zhu M-F, Xu Q-S (2015) protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences. Bioinformatics 31(11):1857–1859. https://doi.org/10.1093/bioinformatics/btv042
Ruiz-Blanco YB, Paz W, Green J, Marrero-Ponce Y (2015) ProtDCal: a program to compute general-purpose-numerical descriptors for sequences and 3D-structures of proteins. BMC Bioinformatics 16(1):162. https://doi.org/10.1186/s12859-015-0586-0
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Emmert-Streib F, Yang Z, Feng H, Tripathi S, Dehmer M (2020) An introductory review of deep learning for prediction models with Big Data. Front Artif Intell 3:4. https://doi.org/10.3389/frai.2020.00004
Vargas R, Mosavi A, Ruiz R (2017) Deep learning: a review. Advances in intelligent systems and computing 5
Huang K, Fu T, Glass LM, Zitnik M, Xiao C, Sun J (2020) DeepPurpose: a deep learning library for drug–target interaction prediction. Bioinformatics 36(22–23):5545–5547. https://doi.org/10.1093/bioinformatics/btaa1005
Tripathi M, Shrivastava S, Karthikeyan S, Sinha D, Nath A (2021) Application of machine learning and molecular modeling in drug discovery and cheminformatics, pp 201–214. https://doi.org/10.1201/9781003126164-10
Cao Y, Charisi A, Cheng L-C, Jiang T, Girke T (2008) ChemmineR: a compound mining framework for R. Bioinformatics 24(15):1733–1734. https://doi.org/10.1093/bioinformatics/btn307
Backman TWH, Cao Y, Girke T (2011) ChemMine tools: an online service for analyzing and clustering small molecules. Nucleic Acids Res 39(suppl 2):W486–W491. https://doi.org/10.1093/nar/gkr320
Dong J, Cao D-S, Miao H-Y, Liu S, Deng B-C, Yun Y-H, Wang N-N, Lu A-P, Zeng W-B, Chen AF (2015) ChemDes: an integrated web-based platform for molecular descriptor and fingerprint computation. J Cheminform 7(1):60. https://doi.org/10.1186/s13321-015-0109-z
Yap CW (2011) PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints. J Comput Chem 32(7):1466–1474. https://doi.org/10.1002/jcc.21707
Moriwaki H, Tian Y-S, Kawashita N, Takagi T (2018) Mordred: a molecular descriptor calculator. J Cheminform 10(1):4. https://doi.org/10.1186/s13321-018-0258-y
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Nath A, Leier A (2020) Improved cytokine–receptor interaction prediction by exploiting the negative sample space. BMC Bioinformatics 21(1):493. https://doi.org/10.1186/s12859-020-03835-5
Udell M, Horn C, Zadeh R, Boyd S (2016) Generalized low rank models. Foundations and Trends in Maching Learning 9(1):1–118. https://doi.org/10.1561/2200000055
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newsl 11(1):10–18. https://doi.org/10.1145/1656274.1656278
Demšar J, Curk T, Erjavec A, Gorup C, Hocevar T, Milutinovic M, Možina M, Polajnar M, Toplak M, Staric A, Stajdohar M, Umek L, Žagar L, Žbontar J, Žitnik M, Zupan B (2013) Orange: data mining toolbox in Python. J Mach Learn Res 14:2349–2353
Williams G (2009) Rattle: a data mining GUI for R. The R Journal 1:45–55. https://doi.org/10.32614/rj-2009-016
Alcala-Fdez J, Sanchez L, García S, Del Jesus MJ, Ventura S, Garrell J-M, Otero J, Romero C, Bacardit J, Rivas Santos V, Fernández JC, Herrera F (2009) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13:307–318. https://doi.org/10.1007/s00500-008-0323-y
Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Ohl P, Sieb C, Thiel K, Wiswedel B (2008) KNIME: the Konstanz information miner. In: Data analysis, machine learning and applications. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 319–326
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12(2011):2825–2830
Bischl B, Lang M, Kotthoff L, Schiffner J, Richter J, Studerus E, Casalicchio G, Jones ZM (2016) mlr: machine learning in R. J Mach Learn Res 17(1):5938–5942
Liaw A, Wiener M (2001) Classification and regression by RandomForest. Forest 23
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28(5):1–26. https://doi.org/10.18637/jss.v028.i05
Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab - An S4 Package for Kernel Methods in R. J Stat Softw 11(9):1–20. https://doi.org/10.18637/jss.v011.i09
Sushko I, Novotarskyi S, Körner R et al (2011) Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information. J Comput Aided Mol Des 25(6):533–554. https://doi.org/10.1007/s10822-011-9440-2
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/a:1010933404324
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Paper presented at the proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, California, USA
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30(7):1145–1159. https://doi.org/10.1016/S0031-3203(96)00142-2
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Nath, A., Chaube, R. (2024). Mining Chemogenomic Spaces for Prediction of Drug–Target Interactions. In: Gore, M., Jagtap, U.B. (eds) Computational Drug Discovery and Design. Methods in Molecular Biology, vol 2714. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3441-7_9
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3441-7_9
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3440-0
Online ISBN: 978-1-0716-3441-7
eBook Packages: Springer Protocols