DeepBSRPred: deep learning-based binding site residue prediction for proteins

Nikam, Rahul; Yugandhar, Kumar; Gromiha, M. Michael

doi:10.1007/s00726-022-03228-3

DeepBSRPred: deep learning-based binding site residue prediction for proteins

Original Article
Published: 27 December 2022

Volume 55, pages 1305–1316, (2023)
Cite this article

Amino Acids Aims and scope Submit manuscript

Rahul Nikam¹,
Kumar Yugandhar^1,2 &
M. Michael Gromiha^1,3

1043 Accesses
4 Citations
Explore all metrics

Abstract

Motivation

Proteins–protein interactions (PPIs) are important to govern several cellular activities. Amino acid residues, which are located at the interface are known as the binding sites and the information about binding sites helps to understand the binding affinities and functions of protein–protein complexes.

Results

We have developed a deep neural network-based method, DeepBSRPred, for predicting the binding sites using protein sequence information and predicted structures from AlphaFold2. Specific sequence and structure-based features include position-specific scoring matrix (PSSM), solvent accessible surface area, conservation score and amino acid properties, and residue depth, respectively. Our method predicted the binding sites with an average F1 score of 0.73 in a dataset of 1236 proteins. Further, we compared the performance with other existing methods in the literature using four benchmark datasets and our method outperformed those methods.

Availability and implementation

The DeepBSRPred web server can be found at https://web.iitm.ac.in/bioinfo2/deepbsrpred/index.html, along with all datasets used in this study. The trained models, the DeepBSRPred standalone source code, and the feature computation pipeline are freely available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Survey on Deep Networks Approaches in Prediction of Sequence-Based Protein–Protein Interactions

Article 19 May 2022

Learning the protein language of proteome-wide protein-protein binding sites via explainable ensemble deep learning

Article Open access 19 January 2023

Amalgamation of 3D structure and sequence information for protein–protein interaction prediction

Article Open access 05 November 2020

Data availability

The data used in this work are available at https://web.iitm.ac.in/bioinfo2/deepbsrpred/download.html.

Abbreviations

An-Ab:: Antigen–antibody
EC:: Enzyme containing
GP:: G-protein containing
IN:: Inhibitor containing
RC:: Receptor containing
MS:: Miscellaneous
ASA:: Accessible surface area
AUROC:: Area under the receiver operating characteristic curve
AUPRC:: Area under precision-recall
PSSM:: Position-specific scoring matrix
F1:: F1-score
MCC:: Matthew’s correlation coefficient
Polar real:: ASA of Polar residues
BIOV880102:: Information value for accessibility (Biou et al. 1988)
NADH010102:: Hydropathy scale based on self-information values in the two-state model (Naderi-Manesh et al. 2001)
VALDAR:: Protein conservation metrics (Valdar and Thornton, 2001)
dASA:: Solvent accessible surface area for protein unfolding
PONJ960101:: Average volumes of residues (Pontius et al. 1996)
FASG760101:: Molecular weight (Fasman 1976)
GRAR740103:: Volume (Grantham 1974)
HB acceptor:: Hydrogen bond acceptor
ASAD:: Solvent accessible surface area for denatured protein (Gromiha et al. 1999)
ASAN:: Solvent accessible surface area for native protein (Gromiha et al. 1999)
TAYLOR_GAPS:: Conservation score (Taylor 1986)
PSSM sum:: Summation of PSSM values; Residue depth: Residue depth is computed using python
SMERFS:: Conservation from AAcon tool (Manning et al. 2008)
Contact count:: Number of contacts of the residue

References

Abadi M, Agarwal A et al. (2016) Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
Agnieszka G, Peter V et al., (2018) AACon: A Fast Amino Acid Conservation Calculation Service. https://www.compbio.dundee.ac.uk/aacon/
Al-Rfou R, Alain G et al. (2016) Theano: a Python framework for fast computation of mathematical expressions. Comput Sci. abs/1605.02688
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402. https://doi.org/10.1093/nar/25.17.3389
Amos-Binks A, Patulea C et al (2011) Binding site prediction for protein-protein interactions and novel motif discovery using re-occurring polypeptide sequences. BMC Bioinform 12:225
Article Google Scholar
Asadabadi EB, Abdolmaleki P (2013) Predictions of protein-protein interfaces within membrane protein complexes. Avicenna J Med Biotechnol 5:148–157
CAS PubMed PubMed Central Google Scholar
Asgari E, Mofrad MR (2015) Continuous distributed representation of biological sequences for deep proteomics and genomics. PLoS ONE 10:e0141287
Article PubMed PubMed Central Google Scholar
Asgari E, McHardy, et al (2019) Probabilistic variable-length segmentation of protein sequences for discriminative motif discovery (DiMotif) and sequence embedding (ProtVecX). Sci Rep 9:3577
Article PubMed PubMed Central Google Scholar
Biou V, Gibrat JF et al (1988) Secondary structure prediction: combination of three different methods. Protein Eng Des Sel 2(3):185–191
Article CAS Google Scholar
Branco P, Torgo L (2016) A survey of predictive modeling on imbalanced domains. ACM Computing Surveys (CSUR) 49(2):1–50
Article Google Scholar
Cao B, Porollo A et al (2006) Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics 22:303–309
Article CAS PubMed Google Scholar
Chakravarty S, Varadarajan R (1999) Residue depth: a novel parameter for the analysis of protein structure and stability. Structure 7:723–732
Article CAS PubMed Google Scholar
Chen X, Jeong JC (2009) Sequence-based prediction of protein interaction sites with an integrative method. Bioinformatics 25:585–591
Article PubMed Google Scholar
Chen P, Li J (2010) Sequence-based identification of interface residues by an integrative profile combining hydrophobic and evolutionary information. BMC Bioinformatics 11:402
Article CAS PubMed PubMed Central Google Scholar
Chollet F (2015) Keras: Deep learning library for theano and tensorflow. URL: https://keras.io/k, 7(8), T1.
Clark JJ, Orban ZJ et al (2020) Predicting binding sites from unbound versus bound protein structures. Sci Rep 10(1):15856
Article CAS PubMed PubMed Central Google Scholar
Dhole K, Singh G et al (2014) Sequence-based prediction of protein-protein interaction sites with L1-logreg classifier. J Theor Biol 348:47–54
Article CAS PubMed Google Scholar
Du X, Cheng J, and Song J (2009) Improved prediction of protein binding sites from sequences using genetic algorithm. Protein J 28(6):273–280. https://doi.org/10.1007/s10930-009-9192-1
Fasman GD (1976) Handbook of Biochemistry and Molecular Biology. Proteins. CRC Press, Cleveland
Google Scholar
Geng H, LuT, et al (2015) Prediction of protein-protein interaction sites based on naive Bayes classifier. Biochem Res Int 2015:1–7
Article CAS Google Scholar
Grantham R (1974) Amino acid difference formula to help explain protein evolution. Science 185(4154):862–864
Article CAS PubMed Google Scholar
Gromiha MM, Oobatake M et al (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82(1):51–67
Article CAS PubMed Google Scholar
Gromiha MM, Yokota K et al (2009) Identification and analysis of binding site residues in protein-protein complexes. Int J Biol Biomed 3(9):415–420
Google Scholar
Gromiha MM, Saranya N et al (2011) Sequence and structural features of binding site residues in protein-protein complexes: comparison with protein-nucleic acid complexes. Proteome Science 9(Suppl 1):S13
Article PubMed PubMed Central Google Scholar
Heinzinger M, Elnaggar A et al (2019) Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 20:723
Article CAS PubMed PubMed Central Google Scholar
Hubbard SJ, Thornton JM (1993) ‘NACCESS’, computer program. Department of Biochemistry and Molecular Biology, University College, London
Google Scholar
Hwang H, Petrey D et al (2016) A hybrid method for protein–protein interface prediction. Protein Sci 25:159–165
Article CAS PubMed Google Scholar
Jia J, Liu Z et al (2016) iPPBS-Opt: a sequence-based ensemble classifier for identifying protein-protein binding sites by optimizing imbalanced training datasets. Molecules 21:95
Article PubMed PubMed Central Google Scholar
Jones DT, Buchan DW et al (2012) PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 28:184–190
Article CAS PubMed Google Scholar
Jumper J, Evans R et al (2021) Highly accurate protein structure prediction with AlphaFold. Nature 596(7873):583–589
Article CAS PubMed PubMed Central Google Scholar
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12):2577–2637
Article CAS PubMed Google Scholar
Kawashima S, Pokarowski P et al (2008) AAindex: amino acid index database progress report. Nucleic Acids Res 36(Database issue):D202–D205
CAS PubMed Google Scholar
Konc J, Janezic D (2007) Protein-protein binding-sites prediction by protein surface structure conservation. J Chem Inf Model 47(3):940–944
Article CAS PubMed Google Scholar
Laine E, Carbone A (2015) Local geometry and evolutionary conservation of protein surfaces reveal the multiple recognition patches in protein-protein interactions. PLoS Comput Biol 11:e1004580
Article PubMed PubMed Central Google Scholar
Li Y, Golding GB et al (2021) DELPHI: accurate deep ensemble model for protein interaction sites prediction. Bioinformatics 37(7):896–904
Article CAS PubMed Google Scholar
Liang S, Zhang J et al (2004) Prediction of the interaction site on the surface of an isolated protein structure by analysis of side chain energy scores. Proteins 57(3):548–557
Article CAS PubMed Google Scholar
Lijnzaad P, Berendsen HJ, Argos P (1996) Hydrophobic patches on the surfaces of protein structures. Proteins 25(3):389–397
Article CAS PubMed Google Scholar
Lise S, Archambeau C et al (2009) Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinform 10:365
Article Google Scholar
Liu GH, Shen HB et al (2016) Prediction of protein–protein interaction sites with machine-learning-based data-cleaning and post-filtering procedures. J Membr Biol 249:141–153
Article CAS PubMed Google Scholar
London N, Movshovitz-Attias D et al (2010) The structural basis of peptide-protein binding strategies. Structure 18:188–199
Article CAS PubMed Google Scholar
Ma B, Elkayam T et al (2003) Protein-protein interactions: structurally conserved residues distinguish between binding sites and exposed protein surfaces. Proc Natl Acad Sci USA 100(10):5772–5777
Article CAS PubMed PubMed Central Google Scholar
Maheshwari S, Brylinski M (2015) Prediction of protein–protein interaction sites from weakly homologous template structures using meta-threading and machine learning. J Mol Recognit 28:35–48
Article CAS PubMed Google Scholar
Maheshwari S, Brylinski M (2016) Template-based identification of protein–protein interfaces using eFindSitePPI. Methods 93:64–71
Article CAS PubMed Google Scholar
Manning JR, Jefferson ER et al (2008) The contrasting properties of conservation and correlated phylogeny in protein functional residue prediction. BMC Bioinform 9:51
Article Google Scholar
McDonald IK, Thornton JM (1994) Satisfying hydrogen bonding potential in proteins. J Mol Biol 238(5):777–793
Article CAS PubMed Google Scholar
Murakami Y, Mizuguchi K (2010) Applying the Naïve Bayes classifier with kernel density estimation to the prediction of protein-protein interaction sites. Bioinformatics 26:1841–1848
Article CAS PubMed Google Scholar
Naderi-Manesh H, Sadeghi M et al (2001) Prediction of protein surface accessibility with information theory. Proteins 42(4):452–459
Article CAS PubMed Google Scholar
Neuvirth H, Raz R et al (2004) ProMate: a structure-based prediction program to identify the location of protein-protein binding sites. J Mol Biol 338(1):181–199
Article CAS PubMed Google Scholar
Ofran Y, Rost B (2007) ISIS: interaction sites identified from sequence. Bioinformatics 23:e13–e16
Article CAS PubMed Google Scholar
Pedregosa F, Varoquaux G et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Google Scholar
Pontius J, Richelle J et al (1996) Deviations from standard atomic volumes as a quality measure for protein crystal structures. J Mol Biol 264(1):121–136
Article CAS PubMed Google Scholar
Porollo A, Meller J (2007) Prediction-based fingerprints of protein-protein interactions. Proteins: structure. Function and Bioinformatics 66:630–645
Article CAS Google Scholar
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3):e0118432
Article PubMed PubMed Central Google Scholar
Singh G, Dhole K et al. (2014) SPRINGS: prediction of protein-protein interaction sites using artificial neural networks. Technical report. PeerJ PrePrints, PPR39858
Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: A review. Int J Pattern Recognit Artif Intell 23(4):687–719. https://doi.org/10.1142/S0218001409007326
Article Google Scholar
Taherzadeh G, Yang Y, Zhang T, Liew AW, Zhou Y (2016) Sequence-based prediction of protein-peptide binding sites using support vector machine. J Comput Chem 37(13):1223–1229. https://doi.org/10.1002/jcc.24314
Taylor WR (1986) The classification of amino acid conservation. J Theor Biol 119(2):205–218
Article CAS PubMed Google Scholar
Thomas CN, Anja B et al (2018) IntPred: a structure-based predictor of protein–protein interaction sites. Bioinformatics 34:223–229
Article Google Scholar
Valdar WS, Thornton JM (2001) Conservation helps to identify biologically relevant crystal contacts. J Mol Biol 313(2):399–416. https://doi.org/10.1006/jmbi.2001.5034
Valdar WS (2002) Scoring residue conservation. Proteins: Struct Funct Bioinform 48:227–241
Article CAS Google Scholar
Varadi M, Anyango S et al (2022) AlphaFold protein structure database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res 50(D1):D439–D444
Article CAS PubMed Google Scholar
Viloria SJ, Allega MF, Lambrughi M, Papaleo E (2017) An optimal distance cutoff for contact-based protein structure networks using side-chain centers of mass. Sci Rep 7:1–11
Google Scholar
Wang G, Dunbrack RL (2003) PISCES: a protein sequence culling server. Bioinformatics 19(12):1589–1591
Article CAS PubMed Google Scholar
Wang X, Yu B (2019) Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique. Bioinformatics 35:2395–2402
Article CAS PubMed Google Scholar
Wang DD, Wang R et al (2014) Fast prediction of protein–protein interaction sites based on extreme learning machines. Neurocomputing 128:258–266
Article Google Scholar
Wei Z, Han K et al (2016) Protein–protein interaction sites prediction by ensembling SVM and sample-weighted random forests. Neurocomputing 193:201–212
Article Google Scholar
Wei ZS, Yang JY, Shen HB, Yu DJ (2015) A cascade random forests algorithm for predicting protein-protein interaction sites. IEEE Trans Nanobiosci 14(7):746–760. https://doi.org/10.1109/TNB.2015.2475359
Xie Z, Deng X et al (2020) Prediction of protein–protein interaction sites using convolutional neural network and improved data sets. Int J Mol Sci 21:467
Article CAS PubMed PubMed Central Google Scholar
Xingyu G, Zhenyu C et al (2016) Adaptive weighted imbalance learning with application to abnormal activity recognition. Neurocomputing 173:1927–1935
Article Google Scholar
Xue LC, Dobbs D et al (2011) HomPPI: a class of sequence homology-based protein-protein interface prediction methods. BMC Bioinformatics 12:244
Article CAS PubMed PubMed Central Google Scholar
Zardecki C, Dutta S et al (2022) PDB-101: Educational resources supporting molecular explorations through biology and medicine. Protein Sci 31(1):129–140
Article CAS PubMed Google Scholar
Zeng M, Zhang F et al (2019) Protein–protein interaction site prediction through combining local and global features with deep neural networks. Bioinformatics 36:1114–1120
Article Google Scholar
Zhang J, Kurgan L (2019) Scriber: accurate and partner type-specific prediction of protein-binding residues from proteins sequences. Bioinformatics 35:i343–i353
Article CAS PubMed PubMed Central Google Scholar
Zhang B, Li J et al (2019) Sequence-based prediction of protein-protein interaction sites by simplified long short-term memory network. Neurocomputing 357:86–100
Article Google Scholar

Download references

Acknowledgements

We thank Indian Institute of Technology Madras and the High-Performance Computing Environment (HPCE) for computational facilities. The work is partially supported by the Department of Science and Technology, Government of India (No. DST/INT/SWD/P-05/2016).

Author information

Authors and Affiliations

Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
Rahul Nikam, Kumar Yugandhar & M. Michael Gromiha
Department of Computational Biology, Cornell University, New York, NY, USA
Kumar Yugandhar
Department of Computer Science, Tokyo Institute of Technology, Yokohama, Japan
M. Michael Gromiha

Authors

Rahul Nikam
View author publications
You can also search for this author in PubMed Google Scholar
Kumar Yugandhar
View author publications
You can also search for this author in PubMed Google Scholar
M. Michael Gromiha
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: MMG; methodology: MMG, software/code: RN; investigation: RN, KY; discussion: RN, KY, MMG; writing original draft: RN; review & editing: MMG, KY; supervision: MMG. All authors read and approved the manuscript.

Corresponding author

Correspondence to M. Michael Gromiha.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare.

Additional information

Handling editor: F. Eisenhaber.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nikam, R., Yugandhar, K. & Gromiha, M.M. DeepBSRPred: deep learning-based binding site residue prediction for proteins. Amino Acids 55, 1305–1316 (2023). https://doi.org/10.1007/s00726-022-03228-3

Download citation

Received: 09 June 2022
Accepted: 15 December 2022
Published: 27 December 2022
Issue Date: October 2023
DOI: https://doi.org/10.1007/s00726-022-03228-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DeepBSRPred: deep learning-based binding site residue prediction for proteins