Exploiting Complex Protein Domain Networks for Protein Function Annotation

Sarker, Bishnu; Rtichie, David W.; Aridhi, Sabeur

doi:10.1007/978-3-030-05414-4_48

Bishnu Sarker⁸,
David W. Rtichie⁸ &
Sabeur Aridhi⁸

Part of the book series: Studies in Computational Intelligence ((SCI,volume 813))

Included in the following conference series:

International Conference on Complex Networks and their Applications

2554 Accesses
7 Citations

Abstract

Huge numbers of protein sequences are now available in public databases. In order to exploit more fully this valuable biological data, these sequences need to be annotated with functional properties such as Enzyme Commission (EC) numbers and Gene Ontology terms. The UniProt Knowledgebase (UniProtKB) is currently the largest and most comprehensive resource for protein sequence and annotation data. In the March 2018 release of UniProtKB, some 556,000 sequences have been manually curated but over 111 million sequences still lack functional annotations. The ability to annotate automatically these unannotated sequences would represent a major advance for the field of bioinformatics. Here, we present a novel network-based approach called GrAPFI for the automatic functional annotation of protein sequences. The underlying assumption of GrAPFI is that proteins may be related to each other by the protein domains, families, and super-families that they share. Several protein domain databases exist such as InterPro, Pfam, SMART, CDD, Gene3D, and Prosite, for example. Our approach uses Interpro domains, because the InterPro database contains information from several other major protein family and domain databases. Our results show that GrAPFI achieves better EC number annotation performance than several other previously described approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

Improving automatic GO annotation with semantic similarity

Article Open access 12 December 2022

Associating Protein Domains with Biological Functions: A Tripartite Network Approach

Notes

1.
Protein Data Bank, https://www.rcsb.org/.

References

Altschul, S.F., et al.: Gapped blast and psi-blast: a new generation of protein database search programs. Nucl. Acids Res. 25(17), 3389–3402 (1997). https://doi.org/10.1093/nar/25.17.3389
Article Google Scholar
Aridhi, S., Montresor, A., Velegrakis, Y.: Bladyg: a graph processing framework for large dynamic graphs. Big Data Res. 9, 9–17 (2017)
Article Google Scholar
Chou, K.C.: Pseudo amino acid composition and its applications in bioinformatics, proteomics and system biology. Curr. Proteomics 6(4), 262–274 (2009)
Google Scholar
Chua, H.N., Sung, W.K., Wong, L.: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics 22(13), 1623–1630 (2006)
Article Google Scholar
Consortium, T.U.: Uniprot: a hub for protein information. Nucl. Acids Res. 43(D204–D212) (2015). https://doi.org/10.1093/nar/gku989. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4384041/
Cornish-Bowden, A.: Current iubmb recommendations on enzyme nomenclature and kinetics. Perspect. Sci. 1(1–6), 74–87 (2014)
Article Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Dobson, P.D., Doig, A.J.: Predicting enzyme class from protein structure without alignments. J. Mol. Biol. 345(1), 187–199 (2005)
Article Google Scholar
Finn, R.D., Clements, J., Eddy, S.R.: Hmmer web server: interactive sequence similarity searching. Nucl. Acids Res. 39(2), W29–W37 (2011). https://doi.org/10.1093/nar/gkr367
Gattiker, A., et al.: Automated annotation of microbial proteomes in SWISS-PROT. Comput. Biol. Chem. 27(1), 49–58 (2003). https://doi.org/10.1016/s1476-9271(02)00094-4
Article Google Scholar
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A., Takagi, T.: Assessment of prediction accuracy of protein function from protein-protein interaction data. Yeast 18(6), 523–531 (2001)
Article Google Scholar
Huang, W.L., Chen, H.M., Hwang, S.F., Ho, S.Y.: Accurate prediction of enzyme subfamily class using an adaptive fuzzy k-nearest neighbor method. Biosystems 90(2), 405–413 (2007)
Article Google Scholar
des Jardins, M., Karp, P.D., Krummenacker, M., Lee, T.J., Ouzounis, C.A.: Prediction of enzyme classification from protein sequence without the use of sequence similarity. Proc. Int. Conf. Intell. Syst. Mol. Biol. 5, 92–99 (1997)
Google Scholar
Jones, P., et al.: Interproscan 5: genome-scale protein function classification. Bioinformatics 30(9), 1236–1240 (2014)
Article Google Scholar
Kretschmann, E., Fleischmann, W., Apweiler, R.: Automatic rule generation for protein annotation with the c4.5 data mining algorithm applied on swiss-prot. Bioinformatics 17(10), 920–6 (2001)
Article Google Scholar
Kumar, N., Skolnick, J.: Eficaz2. 5: application of a high-precision enzyme function predictor to 396 proteomes. Bioinformatics 28(20), 2687–2688 (2012)
Google Scholar
Kummerfeld, S.K., Teichmann, S.A.: Protein domain organisation: adding order. BMC Bioinform. 10(1), 39 (2009)
Article Google Scholar
Li, Y., et al.: Deepre: sequence-based enzyme ec number prediction by deep learning. Bioinformatics 34(5), 760–769 (2018). https://doi.org/10.1093/bioinformatics/btx680
Article Google Scholar
Li, Y.H., et al.: Svm-prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity. PloS One 11(8) (2016)
Google Scholar
Lu, L., Qian, Z., Cai, Y.D., Li, Y.: Ecs: an automatic enzyme classifier based on functional domain composition. Comput. Biol. Chem. 31(3), 226–232 (2007)
Article Google Scholar
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B., Singh, M.: Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21(suppl$\_$1), i302–i310 (2005)
Google Scholar
Nagao Chioko, N.N., Kenji, M.: Prediction of detailed enzyme functions and identification of specificity determining residues by random forests. PloS One 9(1) (2014)
Google Scholar
Nasibov, E., Kandemir-Cavas, C.: Efficiency analysis of knn and minimum distance-based classifiers in enzyme family prediction. Comput. Biol. Chem. 33(6), 461–464 (2009)
Article Google Scholar
Quester, S., Schomburg, D.: Enzymedetector: an integrated enzyme function prediction tool and database. BMC Bioinform. 12(1), 376 (2011)
Article Google Scholar
Quevillon, E., et al.: Interproscan: protein domains identifier. Nucl. Acids Res. 33(suppl$\_$2), W116–W120 (2005)
Google Scholar
Rahman, S.A., Cuesta, S.M., Furnham, N., Holliday, G.L., Thornton, J.M.: Ec-blast: a tool to automatically search and compare enzyme reactions. Nat. Methods 11(2), 171 (2014)
Article Google Scholar
Schwikowski, B., Uetz, P., Fields, S.: A network of protein-protein interactions in yeast. Nat. Biotechnol. 18(12), 1257 (2000)
Article Google Scholar
Shen, H.B., Chou, K.C.: Ezypred: a top-down approach for predicting enzyme functional classes and subclasses. Biochem. Biophys. Res. Commun. 364(1), 53–59 (2007)
Article Google Scholar
Volpato, V., Adelfio, A., Pollastri, G.: Accurate prediction of protein enzymatic class by n-to-1 neural networks. BMC Bioinform. 14(1), S11 (2013)
Article Google Scholar
Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., Zhang, Y.: The i-tasser suite: protein structure and function prediction. Nat. Methods 12(1), 7 (2015)
Article Google Scholar
Yu, C., Zavaljevski, N., Desai, V., Reifman, J.: Genome-wide enzyme annotation with precision control: catalytic families (catfam) databases. Proteins: Struct. Funct. Bioinform. 74(2), 449–460 (2009)
Article Google Scholar
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Zhang, C., Freddolino, P.L., Zhang, Y.: Cofactor: improved protein function prediction by combining structure, sequence and proteinprotein interaction information. Nucl. Acids Res. 45(1), 291–299 (2017)
Article Google Scholar
Zhao, B., Hu, S., Li, X., Zhang, F., Tian, Q., Ni, W.: An efficient method for protein function annotation based on multilayer protein networks. Hum. Genomics 10(1), 33 (2016)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the CNRS-INRIA/FAPs project “TempoGraphs” (PRC2243). Bishnu Sarker is a doctoral student funded by an INRIA CORDI-S contract.

Author information

Authors and Affiliations

University of Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
Bishnu Sarker, David W. Rtichie & Sabeur Aridhi

Authors

Bishnu Sarker
View author publications
You can also search for this author in PubMed Google Scholar
David W. Rtichie
View author publications
You can also search for this author in PubMed Google Scholar
Sabeur Aridhi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bishnu Sarker .

Editor information

Editors and Affiliations

Nokia Bell Labs, Cambridge, UK
Luca Maria Aiello
IUT Lumière, University of Lyon, Bron Cedex, France
Chantal Cherifi
LE2I UMR CNRS 6306 9, University of Burgundy, Dijon Cedex, France
Hocine Cherifi
Mathematical Institute, University of Oxford, Oxford, UK
Renaud Lambiotte
Department of Computer Science and Technology, The Computer Laboratory, University of Cambridge, Cambridge, UK
Pietro Lió
Center for Complex Networks and Systems Research, School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, USA
Luis M. Rocha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarker, B., Rtichie, D.W., Aridhi, S. (2019). Exploiting Complex Protein Domain Networks for Protein Function Annotation. In: Aiello, L., Cherifi, C., Cherifi, H., Lambiotte, R., Lió, P., Rocha, L. (eds) Complex Networks and Their Applications VII. COMPLEX NETWORKS 2018. Studies in Computational Intelligence, vol 813. Springer, Cham. https://doi.org/10.1007/978-3-030-05414-4_48

Download citation

DOI: https://doi.org/10.1007/978-3-030-05414-4_48
Published: 05 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05413-7
Online ISBN: 978-3-030-05414-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Exploiting Complex Protein Domain Networks for Protein Function Annotation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

Improving automatic GO annotation with semantic similarity

Associating Protein Domains with Biological Functions: A Tripartite Network Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Exploiting Complex Protein Domain Networks for Protein Function Annotation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Graph Based Automatic Protein Function Annotation Improved by Semantic Similarity

Improving automatic GO annotation with semantic similarity

Associating Protein Domains with Biological Functions: A Tripartite Network Approach

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation