InterPro Protein Classification

McDowall, Jennifer; Hunter, Sarah

doi:10.1007/978-1-60761-977-2_3

Jennifer McDowall³ &
Sarah Hunter³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 694))

3216 Accesses
47 Citations

Abstract

Improvements in nucleotide sequencing technology have resulted in an ever increasing number of nucleotide and protein sequences being deposited in databases. Unfortunately, the ability to manually classify and annotate these sequences cannot keep pace with their rapid generation, resulting in an increased bias toward unannotated sequence. Automatic annotation tools can help redress the balance. There are a number of different groups working to produce protein signatures that describe protein families, functional domains or conserved sites within related groups of proteins. Protein signature databases include CATH-Gene3D, HAMAP, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY, and TIGRFAMs. Their approaches range from characterising small conserved motifs that can identify members of a family or subfamily, to the use of hidden Markov models that describe the conservation of residues over entire domains or whole proteins. To increase their value as protein classification tools, protein signatures from these 11 databases have been combined into one, powerful annotation tool: the InterPro database (http://www.ebi.ac.uk/interpro/) (Hunter et al., Nucleic Acids Res 37:D211–D215, 2009). InterPro is an open-source protein resource used for the automatic annotation of proteins, and is scalable to the analysis of entire new genomes through the use of a downloadable version of InterProScan, which can be incorporated into an existing local pipeline. InterPro provides structural information from PDB (Kouranov et al., Nucleic Acids Res 34:D302–D305, 2006), its classification in CATH (Cuff et al., Nucleic Acids Res 37:D310–D314, 2009) and SCOP (Andreeva et al., Nucleic Acids Res 36:D419–D425, 2008), as well as homology models from ModBase (Pieper et al., Nucleic Acids Res 37:D347–D354, 2009) and SwissModel (Kiefer et al., Nucleic Acids Res 37:D387–D392, 2009), allowing a direct comparison of the protein signatures with the available structural information. This chapter reviews the signature methods found in the InterPro database, and provides an overview of the InterPro resource itself.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Homology-Based Annotation of Large Protein Datasets

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

Article Open access 12 March 2021

CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins

Article Open access 04 October 2014

References

Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Laugraud A, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, Mistry J, Mitchell A, Mulder N, Natale D, Orengo C, Quinn AF, Selengut JD, Sigrist CJ, Thimma M, Thomas PD, Valentin F, Wilson D, Wu CH, Yeats C. (2009) InterPro: the integrative protein signature database. Nucleic Acids Res. 37, D211–D215.
Article PubMed CAS Google Scholar
Kouranov A, Xie L, de la Cruz J, Chen L, Westbrook J, Bourne PE, Berman HM. (2006) The RCSB PDB information portal for structural genomics. Nucleic Acids Res. 34, D302–D305.
Article PubMed CAS Google Scholar
Cuff AL, Sillitoe I, Lewis T, Redfern OC, Garratt R, Thornton J, Orengo CA. (2009) The CATH classification revisited--architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Res. 37, D310–D314.
Article PubMed CAS Google Scholar
Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. 36, D419–D425.
Article PubMed CAS Google Scholar
Pieper U, Eswar N, Webb BM, Eramian D, Kelly L, Barkan DT, Carter H, Mankoo P, Karchin R, Marti-Renom MA, Davis FP, Sali A. (2009) MODBASE, a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res. 37, D347–D354.
Article PubMed CAS Google Scholar
Kiefer F, Arnold K, Künzli M, Bordoli L, Schwede T. (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res. 37, D387–D392.
Article PubMed CAS Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. (1990) Basic local alignment search tool. J Mol Biol. 215, 403–410.
PubMed CAS Google Scholar
Pearson WR. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 183, 63–98.
Article PubMed CAS Google Scholar
UniProt Consortium. (2009) The Universal Protein Resource (UniProt) 2009. Nucleic Acids Res. 37, D169–D174.
Article Google Scholar
Servant F, Bru C, Carrère S, Courcelle E, Gouzy J, Peyruc D, Kahn D. (2002) ProDom: automated clustering of homologous domains. Brief Bioinform. 3(3), 246–251.
Article PubMed CAS Google Scholar
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402.
Article PubMed CAS Google Scholar
Sigrist CJA, Cerutti L, Hulo N, Gattiker A, Falquet L, Pagni M, Bairoch A, Bucher P. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors. Brief Bioinform. 3, 265–274.
Article PubMed CAS Google Scholar
Gribskov M, Lüthy R, Eisenberg D. (1990) Profile analysis. Methods Enzymol. 183, 146–159.
Article PubMed CAS Google Scholar
Lima T, Auchincloss AH, Coudert E, Keller G, Michoud K, Rivoire C, Bulliard V, de Castro E, Lachaize C, Baratin D, Phan I, Bougueleret L, Bairoch A. (2009) HAMAP: a database of completely sequenced microbial proteome sets and manually curated microbial protein families in UniProtKB/Swiss-Prot. Nucleic Acids Res. 37, D471–D478.
Article PubMed CAS Google Scholar
Attwood TK. (2002) The PRINTS database: a resource for identification of protein families. Brief Bioinform. 3(3), 252–263.
Article PubMed CAS Google Scholar
Krogh A, Brown M, Mian IS, Sjölander K, Haussler D. (1994) Hidden Markov models in computational biology. Applications to protein modeling. J Mol Biol. 235(5), 1501–1531.
Article PubMed CAS Google Scholar
Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, Bateman A. (2008) The Pfam protein families database. Nucleic Acids Res. 36, D281–D288.
Article PubMed CAS Google Scholar
Heger A, Wilton CA, Sivakumar A, Holm L. (2005) ADDA: a domain database with global coverage of the protein universe. Nucleic Acids Res. 33, D188–D191.
Article PubMed CAS Google Scholar
Letunic I, Doerks T, Bork P. (2009) SMART 6: recent updates and new developments. Nucleic Acids Res. 37, D229–D232.
Article PubMed CAS Google Scholar
Haft DH, Selengut JD, White O. (2003) The TIGRFAMs database of protein families. Nucleic Acids Res. 31(1), 371–373.
Article PubMed CAS Google Scholar
Wu CH, Nikolskaya A, Huang H, Yeh LS, Natale DA, Vinayaka CR, Hu ZZ, Mazumder R, Kumar S, Kourtesis P, Ledley RS, Suzek BE, Arminski L, Chen Y, Zhang J, Cardenas JL, Chung S, Castro-Alvear J, Dinkov G, Barker WC. (2004) PIRSF: family classification system at the Protein Information Resource. Nucleic Acids Res. 32, D112–D114.
Article PubMed CAS Google Scholar
Mi H, Lazareva-Ulitsky B, Loo R, Kejariwal A, Vandergriff J, Rabkin S, Guo N, Muruganujan A, Doremieux O, Campbell MJ, Kitano H, Thomas PD. (2005) The PANTHER database of protein families, subfamilies, functions and pathways. Nucleic Acids Res. 33, D284–D288.
Article PubMed CAS Google Scholar
Wilson D, Pethica R, Zhou Y, Talbot C, Vogel C, Madera M, Chothia C, Gough J. (2009) SUPERFAMILY – sophisticated comparative genomics, data mining, visualization and phylogeny. Nucleic Acids Res. 37, D380–D386.
Article PubMed CAS Google Scholar
Yeats C, Lees J, Reid A, Kellam P, Martin N, Liu X, Orengo C. (2008) Gene3D: comprehensive structural and functional annotation of genomes. Nucleic Acids Res. 36, D414–D418.
Article PubMed CAS Google Scholar
Quevillon E, Silventoinen V, Pillai S, Harte N, Mulder N, Apweiler R, Lopez R. (2005) InterProScan: protein domains identifier. Nucleic Acids Res. 33, W116–W120.
Article PubMed CAS Google Scholar
Haider S, Ballester B, Smedley D, Zhang J, Rice P, Kasprzyk A. (2009) BioMart Central Portal – unified access to biological data. Nucleic Acids Res. 37, W23–W27.
Article PubMed CAS Google Scholar
Jones P, Côté RG, Cho SY, Klie S, Martens L, Quinn AF, Thorneycroft D, Hermjakob H. (2008) PRIDE: new developments and new datasets. Nucleic Acids Res. 36, D878–D883.
Article PubMed CAS Google Scholar
Joshi-Tope G, Gillespie M, Vastrik I, D’Eustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, Lewis S, Birney E, Stein L. (2005) Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432.
Article PubMed CAS Google Scholar
Reference Genome Group of the Gene Ontology Consortium. (2009) The Gene Ontology’s Reference Genome Project: a unified framework for functional annotation across species. PLoS Comput Biol. 5(7), e1000431.
Article Google Scholar
Kerrien S, Alam-Faruque Y, Aranda B, Bancarz I, Bridge A, Derow C, Dimmer E, Feuermann M, Friedrichsen A, Huntley R, Kohler C, Khadake J, Leroy C, Liban A, Lieftink C, Montecchi-Palazzi L, Orchard S, Risse J, Robbe K, Roechert B, Thorneycroft D, Zhang Y, Apweiler R, Hermjakob H. (2007) IntAct – open source resource for molecular interaction data. Nucleic Acids Res. 35, D561–D565.
Article PubMed CAS Google Scholar
Fleischmann A, Darsow M, Degtyarenko K, Fleischmann W, Boyce S, Axelsen KB, Bairoch A, Schomburg D, Tipton KF, Apweiler R. (2004) IntEnz, the integrated relational enzyme database. Nucleic Acids Res. 32, D434–D437.
Article PubMed CAS Google Scholar
Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res. 37, D233–D238.
Article PubMed CAS Google Scholar
Harmar AJ, Hills RA, Rosser EM, Jones M, Buneman OP, Dunbar DR, Greenhill SD, Hale VA, Sharman JL, Bonner TI, Catterall WA, Davenport AP, Delagrange P, Dollery CT, Foord SM, Gutman GA, Laudet V, Neubig RR, Ohlstein EH, Olsen RW, Peters J, Pin JP, Ruffolo RR, Searls DB, Wright MW, Spedding M. (2009) IUPHAR-DB: the IUPHAR database of G protein-coupled receptors and ion channels. Nucleic Acids Res. 37, D680–D685.
Article PubMed CAS Google Scholar
Degtyarenko K, Contrino S. (2004) COMe: the ontology of bioinorganic proteins. BMC Struct Biol. 4, 3.
Article PubMed Google Scholar
Rawlings ND, Morton FR, Kok CY, Kong J, Barrett AJ. (2008) MEROPS: the peptidase database. Nucleic Acids Res. 36, D320–D325.
Article PubMed CAS Google Scholar
Whelan S, de Bakker PI, Quevillon E, Rodriguez N, Goldman N. (2006) PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res. 34, D327–D331.
Article PubMed CAS Google Scholar
Golovin A, Henrick K. (2008) MSDmotif: exploring protein sites and motifs. BMC Bioinformatics. 9, 312.
Article PubMed Google Scholar
Petryszak R, Kretschmann E, Wieser D, Apweiler R. (2005) The predictive power of the CluSTr database. Bioinformatics. 21(18), 3604–3609.
Article PubMed CAS Google Scholar
Haft DH, Selengut JD, Brinkac LM, Zafar N, White O. (2005) Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics. Bioinformatics. 21(3), 293–306.
Article PubMed CAS Google Scholar
Jimenez RC, Quinn AF, Garcia A, Labarga A, O’Neill K, Martinez F, Salazar GA, Hermjakob H. (2008) Dasty2, an Ajax protein DAS client. Bioinformatics. 21(14), 3198–3199.
Google Scholar
Prlić A, Down TA, Hubbard TJ. (2005) Adding some SPICE to DAS. Bioinformatics. 21(Suppl 2), ii40–ii41.
Article PubMed Google Scholar
Hartshorn MJ. (2002) AstexViewer: a visualisation aid for structure-based drug design. J Comput Aided Mol Des. 16(12), 871–881.
Article PubMed CAS Google Scholar

Download references

Author information

Authors and Affiliations

EMBL Outstation, European Bioinformatics Institute (EBI), Cambridge, UK
Jennifer McDowall & Sarah Hunter

Authors

Jennifer McDowall
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Hunter
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Delaware Biotechnology Institute, Dept. Computer & Information Sciences, University of Delaware, Innovation Way 15, Newark, 19711, Delaware, USA
Cathy H. Wu
Delaware Biotechnology Institute, Dept. Computer & Information Sciences, University of Delaware, Innovation Way 15, Newark, 19711, Delaware, USA
Chuming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

McDowall, J., Hunter, S. (2011). InterPro Protein Classification. In: Wu, C., Chen, C. (eds) Bioinformatics for Comparative Proteomics. Methods in Molecular Biology, vol 694. Humana Press. https://doi.org/10.1007/978-1-60761-977-2_3

Download citation

DOI: https://doi.org/10.1007/978-1-60761-977-2_3
Published: 01 November 2010
Publisher Name: Humana Press
Print ISBN: 978-1-60761-976-5
Online ISBN: 978-1-60761-977-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

InterPro Protein Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

InterPro Protein Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

CLAP: A web-server for automatic classification of proteins with special reference to multi-domain proteins

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation