Abstract
This chapter describes the generation of the data in the CATH-Gene3D online resource and how it can be used to study protein domains and their evolutionary relationships. Methods will be presented for: comparing protein structures, recognizing homologs, predicting domain structures within protein sequences, and subclassifying superfamilies into functionally pure families, together with a guide on using the webpages.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Brice MD, Rodgers JR et al (1977) The protein data bank: a computer-based archival file for macromolecular structures. J Mol Biol 112(3):535–542 http://view.ncbi.nlm.nih.gov/pubmed/875032
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM (1997) CATH—a hierarchic classification of protein domain structures. Structure 5(8):1093–1108 http://www.ncbi.nlm.nih.gov/pubmed/9309224
Murzin AG, Brenner SE, Hubbard T, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247(4):536–540 citeulike-article-id:2564113
Oates ME, Stahlhacke J, Vavoulis DV, Smithers B, Rackham OJL, Sardar AJ et al (2015) The SUPERFAMILY 1.75 database in 2014: a doubling of data. Nucleic Acids Res 43(D1):D227–D333 http://dx.doi.org/10.1093/nar/gku1041. Oxford University Press
Andreeva A, Howorth D, Chothia C, Kulesha E, Murzin AG (2014) SCOP2 prototype: a new approach to protein structure mining. Nucleic Acids Res 42(D1):D310–D314 Oxford University Presshttp://dx.doi.org/10.1093/nar/gkt1242
Fox NK, Brenner SE, Chandonia J-MM. 2014 SCOPe: structural classification of proteins--extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(Database issue):D304–D309 Oxford University Press http://dx.doi.org/10.1093/nar/gkt1240
Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S et al (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926http://dx.doi.org/10.1371/journal.pcbi.1003926. Public Library of Science
Ekman D, Björklund ÅK, Frey-Skött J, Elofsson A (2005) Multi-domain proteins in the three kingdoms of life: orphan domains and other unassigned regions. J Mol Biol 348(1):231–243 http://dx.doi.org/10.1016/j.jmb.2005.02.007
Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol. 361(3):562–590 http://www.ncbi.nlm.nih.gov/pubmed/16863650
Karplus K, Barrett C, Hughey R (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics 14(10):846–856 http://www.ncbi.nlm.nih.gov/pubmed/9927713
Karplus K, Karchin R, Draper J, Casper J, Mandel-Gutfreund Y, Diekhans M et al (2003) Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins 53(Suppl 6):491–496 http://www.ncbi.nlm.nih.gov/pubmed/14579338
Taylor W, Orengo CA (1989) Protein structure alignment. J Mol Biol 208(1):1–22 http://dx.doi.org/10.1016/0022-2836(89)90084-3
Orengo CA, Taylor WR (1996) [36] SSAP: Sequential structure alignment program for protein structure comparison. In: Computer methods for macromolecular sequence analysis. Elsevier, pp 617–635 http://dx.doi.org/10.1016/S0076-6879(96)66038-8
Swindells MB (1995) A procedure for detecting structural domains in proteins. Protein Sci 4(1):103–112 http://dx.doi.org/10.1002/pro.5560040113
Siddiqui AS, Barton GJ (1995) Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci 4(5):872–884 http://dx.doi.org/10.1002/pro.5560040507
Holm L, Sander C (1994) Parser for protein folding units. Proteins 19(3):256–268 http://dx.doi.org/10.1002/prot.340190309
Swindells MB (1995) A procedure for the automatic determination of hydrophobic cores in protein structures. Protein Sci 4(1):93–102 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2142969&tool=pmcentrez&rendertype=abstract
Rossmann MG, Liljas A (1974) Letter: recognition of structural domains in globular proteins. J Mol Biol 85(1):177–181 http://www.ncbi.nlm.nih.gov/pubmed/4365123
Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M et al (2007) The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res 35(Database issue):D291–D297 http://dx.doi.org/10.1093/nar/gkl959. Oxford University Press
Orengo CA, Thornton JM (2005) Protein families and their evolution—a structural perspective. Annu Rev Biochem 74(1):867–900 http://dx.doi.org/10.1146/annurev.biochem.74.082803.133029. Department of Biochemistry and Molecular Biology, University College, London WC1E 6BT, United Kingdom. orengo@biochemistry.ucl.ac.uk
Redfern OC, Harrison A, Dallman T, Pearl FMG, Orengo CA (2007) Cathedral: a fast and effective algorithm to predict folds and domain boundaries from multidomain protein structures. PLoS Comput Biol 3:e232+ http://dx.plos.org/10.1371/journal.pcbi.0030232
Subbiah S, Laurents DV, Levitt M (1993) Structural similarity of DNA-binding domains of bacteriophage repressors and the globin core. Curr Biol 3(3):141–148 http://dx.doi.org/10.1016/0960-9822(93)90255-M
Zhang Y, Skolnick J (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33(7):2302–2309 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1084323&tool=pmcentrez&rendertype=abstract
Kolodny R, Koehl P, Levitt M (2005) Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J Mol Biol 346(4):1173–1188 http://dx.doi.org/10.1016/j.jmb.2004.12.032. Department of Structural Biology, Fairchild Building, Stanford University, Stanford CA 94305, USA. trachel@cs.stanford.edu
Söding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960 http://dx.doi.org/10.1093/bioinformatics/bti125. Oxford University Press
Eddy SR (1996) Hidden Markov models. Curr Opin Struct Biol 6(3):361–365 http://www.ncbi.nlm.nih.gov/pubmed/8804822
Finn RD, Clements J, Arndt W, Miller BL, Wheeler TJ, Schreiber F, et al 2015 HMMER web server: 2015 update. Nucleic Acids Res43(W1):W30–W38.http://nar.oxfordjournals.org/content/43/W1/W30. Oxford University Press
The UniProt Consortium. (2014).UniProt: a hub for protein information. Nucleic Acids Res43(D1):D204–D212http://nar.oxfordjournals.org/content/43/D1/D204
Madera M (2008) Profilecomparer: a program for scoring and aligning profile hidden Markov models. Bioinformatics 24(22):2630–2631 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2579712{&}tool=pmcentrez{&}rendertype=abstract. Oxford Univ Press
Söding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(Web Server issue):W244–W248 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1160169{&}tool=pmcentrez{&}rendertype=abstract
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL et al (2015) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44:D279–D285
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402 http://dx.doi.org/10.1093/nar/25.17.3389.National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA. altschul@ncbi.nlm.nih.gov:Oxford University Press
Lee DA, Rentzsch R, Orengo C (2010) GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains. Nucleic Acids Res 38(3):720–737 http://dx.doi.org/10.1093/nar/gkp1049
Capra JA, Singh M (2008) Characterization and prediction of residues determining protein functional specificity. Bioinformatics 24(13):1473–1480 http://dx.doi.org/10.1093/bioinformatics/btn214Oxford University Press
Valdar WSJ (2002) Scoring residue conservation. Proteins 48(2):227–241 http://dx.doi.org/10.1002/prot.10146. Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, London, United Kingdom: Wiley Subscription Services, Inc., A Wiley Company
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282 http://view.ncbi.nlm.nih.gov/pubmed/1633570. Department of Biochemistry and Molecular Biology, University College, London, UK
Sali A, Blundell TL (1993) Comparative protein modelling by satisfaction of spatial restraints. J Mol Biol 234(3):779–815 http://www.ncbi.nlm.nih.gov/pubmed/8254673
Webb B, Sali A (2014) Comparative protein structure modeling using MODELLER. Curr Protoc Bioinformatics 47:5.6.1–5.6.32 http://www.ncbi.nlm.nih.gov/pubmed/25199792
Jiang Y, Oron TR, Clark WT, Bankapur AR, D’Andrea D, Lepore R, et al(2016) An expanded evaluation of protein function prediction methods shows an improvement in accuracy. http://arxiv.org/abs/1601.00891
Moya Garcia A, Dawson NL, Kruger FA, et al (2016) A Structural and Functional View of Polypharmacology. bioRxiv
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195 http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002195
Velankar S, Dana JM, Jacobsen J, van Ginkel G, Gane PJ, Luo J et al (2013) SIFTS: structure integration with function, taxonomy and sequences resource. Nucleic Acids Res 41(Database issue):D483–D489 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3531078&tool=pmcentrez&rendertype=abstract
Dessailly BH, Dawson NL, Mizuguchi K, Orengo CA (2013) Functional site plasticity in domain superfamilies. Biochim Biophys Acta 1834(5):874–889
Yeats C, Redfern OC, Orengo C (2010) A fast and automated solution for accurately resolving protein domain architectures. Bioinformatics 26(6):745–751 http://dx.doi.org/10.1093/bioinformatics/btq034
Lam SD, Dawson NL, Das S, Sillitoe I, Ashford P, Lee D et al (2016) Gene3D: expanding the utility of domain assignments. Nucleic Acids Res 44(D1):D404–D409 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=4702871&tool=pmcentrez&rendertype=abstract
Das S, Lee D, Sillitoe I, Dawson NL, Lees JG, Orengo CA (2015) Functional classification of CATH superfamilies: a domain-based approach for protein function annotation. Bioinformatics 31(21):3460–3467 http://bioinformatics.oxfordjournals.org/content/31/21/3460.abstract. Oxford University Press
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K et al (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2803857&tool=pmcentrez&rendertype=abstract
Lees JG, Lee D, Studer RA, Dawson NL, Sillitoe I, Das S et al (2014) Gene3D: Multi-domain annotations for protein sequence and comparative genome analysis. Nucleic Acids Res 42(Database issue):D240–D245 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3965083&tool=pmcentrez&rendertype=abstract
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y et al (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res 42(Database issue):D1091–D1097 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3965102&tool=pmcentrez&rendertype=abstract
Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C et al (2012) The IntAct molecular interaction database in 2012. Nucleic Acids Res 40(Database issue):D841–D846 http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=3245075&tool=pmcentrez&rendertype=abstract
Supek F, Bošnjak M, Škunca N, Šmuc T (2011) REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS One 6(7):e21800 http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0021800
Furnham N, Sillitoe I, Holliday GL, Cuff AL, Rahman SA, Laskowski RA et al (2012) FunTree: a resource for exploring the functional evolution of structurally defined enzyme superfamilies. Nucleic Acids Res 40(Database issue):D776–D782 http://dx.doi.org/10.1093/nar/gkr852Oxford University Press
Holliday GL, Almonacid DE, Bartlett GJ, O’Boyle NM, Torrance JW, Murray-Rust P et al (2007) MACiE (Mechanism, Annotation and Classification in Enzymes): novel tools for searching catalytic mechanisms. Nucleic Acids Res 35(Database issue):D515–D520 http://nar.oxfordjournals.org/content/35/suppl{_}1/D515.short
Rahman SA, Cuesta SM, Furnham N, Holliday GL, Thornton JM. (2014) EC-BLAST: a tool to automatically search and compare enzyme reactions. Nat Methods11(2):171–174 http://dx.doi.org/10.1038/nmeth.2803. Nature Publishing Group.
Tamuri AU, Laskowski RA (2010) ArchSchema: a tool for interactive graphing of related Pfam domain architectures. Bioinformatics 26(9):1260–1261 http://www.ncbi.nlm.nih.gov/pubmed/20299327
Sillitoe I, Dawson N, Thornton J, Orengo C (2015) The history of the CATH structural classification of protein domains. Biochimie http://www.sciencedirect.com/science/article/pii/S0300908415002515
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25(1):25–29 Nature Publishing Group
Acknowledgments
N.L.D. acknowledges funding from the Wellcome Trust (Award number: 104960/Z/14/Z). I.S. acknowledges funding from the BBSRC (Award number: BB/K020013/1). J.G.L. acknowledges funding from the BBSRC (Award number: BB/L002817/1). S.D.L. acknowledges funding from the Malaysian Ministry of Education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media LLC
About this protocol
Cite this protocol
Dawson, N.L., Sillitoe, I., Lees, J.G., Lam, S.D., Orengo, C.A. (2017). CATH-Gene3D: Generation of the Resource and Its Use in Obtaining Structural and Functional Annotations for Protein Sequences. In: Wu, C., Arighi, C., Ross, K. (eds) Protein Bioinformatics. Methods in Molecular Biology, vol 1558. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6783-4_4
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6783-4_4
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6781-0
Online ISBN: 978-1-4939-6783-4
eBook Packages: Springer Protocols