Skip to main content

Exploiting Complex Protein Domain Networks for Protein Function Annotation

  • Conference paper
  • First Online:
Complex Networks and Their Applications VII (COMPLEX NETWORKS 2018)

Part of the book series: Studies in Computational Intelligence ((SCI,volume 813))

Included in the following conference series:

Abstract

Huge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as InterPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Protein Data Bank, https://www.rcsb.org/.

References

  1. Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997). https://doi.org/10.1093/nar/25.17.3389

    Article  Google Scholar 

  2. Aridhi, S., Montresor, A., Velegrakis, Y.: Bladyg: a graph processing framework for large dynamic graphs. Big Data Res. 9, 9–17 (2017)

    Article  Google Scholar 

  3. Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6(4), 262–274 (2009)

    Google Scholar 

  4. Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)

    Article  Google Scholar 

  5. Consortium, T.U.: Uniprot: a hub for protein information. Nucl. Acids Res. 43(D204–D212) (2015). https://doi.org/10.1093/nar/gku989. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384041/

  6. Cornish-Bowden, A.: Current iubmb recommendations on enzyme nomenclature and kinetics. Perspect. Sci. 1(1–6), 74–87 (2014)

    Article  Google Scholar 

  7. Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  8. Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345(1), 187–199 (2005)

    Article  Google Scholar 

  9. Finn, R.D., Clements, J., Eddy, S.R.: Hmmer web server: interactive sequence similarity searching. Nucl. Acids Res. 39(2), W29–W37 (2011). https://doi.org/10.1093/nar/gkr367

  10. Gattiker, A., et al.: Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1), 49–58 (2003). https://doi.org/10.1016/s1476-9271(02)00094-4

    Article  Google Scholar 

  11. Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18(6), 523–531 (2001)

    Article  Google Scholar 

  12. Huang, W.L., Chen, H.M., Hwang, S.F., Ho, S.Y.: Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems 90(2), 405–413 (2007)

    Article  Google Scholar 

  13. des Jardins, M., Karp, P.D., Krummenacker, M., Lee, T.J., Ouzounis, C.A.: Prediction of enzyme classification from protein sequence without the use of sequence similarity. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 92–99 (1997)

    Google Scholar 

  14. Jones, P., et al.: Interproscan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240 (2014)

    Article  Google Scholar 

  15. Kretschmann, E., Fleischmann, W., Apweiler, R.: Automatic rule generation for protein annotation with the c4.5 data mining algorithm applied on swiss-prot. Bioinformatics 17(10), 920–6 (2001)

    Article  Google Scholar 

  16. Kumar, N., Skolnick, J.: Eficaz2. 5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics 28(20), 2687–2688 (2012)

    Google Scholar 

  17. Kummerfeld, S.K., Teichmann, S.A.: Protein domain organisation: adding order. BMC Bioinform. 10(1), 39 (2009)

    Article  Google Scholar 

  18. Li, Y., et al.: Deepre: sequence-based enzyme ec number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018). https://doi.org/10.1093/bioinformatics/btx680

    Article  Google Scholar 

  19. Li, Y.H., et al.: Svm-prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PloS One 11(8) (2016)

    Google Scholar 

  20. Lu, L., Qian, Z., Cai, Y.D., Li, Y.: Ecs: an automatic enzyme classifier based on functional domain composition. Comput. Biol. Chem. 31(3), 226–232 (2007)

    Article  Google Scholar 

  21. Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl\(\_\)1), i302–i310 (2005)

    Google Scholar 

  22. Nagao Chioko, N.N., Kenji, M.: Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PloS One 9(1) (2014)

    Google Scholar 

  23. Nasibov, E., Kandemir-Cavas, C.: Efficiency analysis of knn and minimum distance-based classifiers in enzyme family prediction. Comput. Biol. Chem. 33(6), 461–464 (2009)

    Article  Google Scholar 

  24. Quester, S., Schomburg, D.: Enzymedetector: an integrated enzyme function prediction tool and database. BMC Bioinform. 12(1), 376 (2011)

    Article  Google Scholar 

  25. Quevillon, E., et al.: Interproscan: protein domains identifier. Nucl. Acids Res. 33(suppl\(\_\)2), W116–W120 (2005)

    Google Scholar 

  26. Rahman, S.A., Cuesta, S.M., Furnham, N., Holliday, G.L., Thornton, J.M.: Ec-blast: a tool to automatically search and compare enzyme reactions. Nat. Methods 11(2), 171 (2014)

    Article  Google Scholar 

  27. Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257 (2000)

    Article  Google Scholar 

  28. Shen, H.B., Chou, K.C.: Ezypred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun. 364(1), 53–59 (2007)

    Article  Google Scholar 

  29. Volpato, V., Adelfio, A., Pollastri, G.: Accurate prediction of protein enzymatic class by n-to-1 neural networks. BMC Bioinform. 14(1), S11 (2013)

    Article  Google Scholar 

  30. Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., Zhang, Y.: The i-tasser suite: protein structure and function prediction. Nat. Methods 12(1), 7 (2015)

    Article  Google Scholar 

  31. Yu, C., Zavaljevski, N., Desai, V., Reifman, J.: Genome-wide enzyme annotation with precision control: catalytic families (catfam) databases. Proteins: Struct. Funct. Bioinform. 74(2), 449–460 (2009)

    Article  Google Scholar 

  32. Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  33. Zhang, C., Freddolino, P.L., Zhang, Y.: Cofactor: improved protein function prediction by combining structure, sequence and proteinprotein interaction information. Nucl. Acids Res. 45(1), 291–299 (2017)

    Article  Google Scholar 

  34. Zhao, B., Hu, S., Li, X., Zhang, F., Tian, Q., Ni, W.: An efficient method for protein function annotation based on multilayer protein networks. Hum. Genomics 10(1), 33 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the CNRS-INRIA/FAPs project “TempoGraphs” (PRC2243). Bishnu Sarker is a doctoral student funded by an INRIA CORDI-S contract.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bishnu Sarker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarker, B., Rtichie, D.W., Aridhi, S. (2019). Exploiting Complex Protein Domain Networks for Protein Function Annotation. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, Cham. https://doi.org/10.1007/978-3-030-05414-4_48

Download citation

Publish with us

Policies and ethics