Abstract
With the nexus of super computing and the biotech revolution, it seems an era of predictive biology through systems biology may be at hand. Modern omics capabilities enable examination of the state of biological system in exquisite detail. The genome, transcriptome, proteome, and metabolome may all be largely knowable, at least for some model systems, providing a basis for modeling and simulation of molecular mechanisms, or pathways, that could capture a biological system’s emergent properties. However, there are significant challenges remaining that impede the realization of this vision, perhaps the most significant being the missing functional annotation of genes and gene products. For even the most well-studied organisms as much as a third of called genes for a given genome are not annotated and more than half may be tenuous. Homology inferred from sequence similarity is the basis for much of genome annotation. Homology inferred from structural similarity could be a powerful complement to sequence-based annotation methods. Structural biology or structural informatics can be used to assign molecular function and may have increasing utility with the rapid growth of gene sequence databases and emerging methods for structure determination, like structure prediction based on coevolution. Here we describe tools and provide example cases using structural similarity at the level of quaternary structure, domain content, domain topology, and small 3D motifs to infer homology and posit function. Ultimately annotation by similarity, be it 3D structure homology or more classically primary sequence homology, must be founded by accurate annotation of one ortholog in the group—understanding every function encoded by a genome remains a major challenge to life science.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL (2007) GenBank. Nucleic Acids Res 36(Suppl_1):D25–D30
Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan A (2018) Massive mining of publicly available RNA-seq data from human and mouse. Nat Commun 9(1):1–10
Omenn GS, Lane L, Overall CM, Corrales FJ, Schwenk JM, Paik YK, Van Eyk JE, Liu S, Snyder M, Baker MS, Deutsch EW (2018) Progress on identifying and characterizing the human proteome: 2018 metrics from the HUPO human proteome project. J Proteome Res 17(12):4031–4041
McCool EN, Lubeckyj RA, Shen X, Chen D, Kou Q, Liu X, Sun L (2018) Deep top-down proteomics using capillary zone electrophoresis-tandem mass spectrometry: identification of 5700 proteoforms from the Escherichia coli proteome. Anal Chem 90(9):5529–5533
Feussner K, Feussner I (2019) Comprehensive LC-MS-based metabolite fingerprinting approach for plant and fungal-derived samples. In: High-throughput metabolomics. Humana, New York, NY, pp 167–185
Lake BB, Chen S, Sos BC, Fan J, Kaeser GE, Yung YC, Duong TE, Gao D, Chun J, Kharchenko PV, Zhang K (2018) Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain. Nat Biotechnol 36(1):70–80
Sandberg R (2014) Entering the era of single-cell transcriptomics in biology and medicine. Nat Methods 11(1):22–24
DOE US (2019) Breaking the bottleneck of genomes: understanding gene function across taxa workshop report, DOE/SC-0199. U.S. Department of Energy Office of Science, Washington, DC. https://genomicscience.energy.gov/genefunction/. Accessed 26 Feb 2020
Sivashankari S, Shanmughavel P (2006) Functional annotation of hypothetical proteins–a review. Bioinformation 1(8):335
Hutchison CA, Chuang RY, Noskov VN, Assad-Garcia N, Deerinck TJ, Ellisman MH, Gill J, Kannan K, Karas BJ, Ma L, Pelletier JF (2016) Design and synthesis of a minimal bacterial genome. Science 351:6280
Richarme G, Liu C, Mihoub M, Abdallah J, Leger T, Joly N, Liebart JC, Jurkunas UV, Nadal M, Bouloc P, Dairou J (2017) Guanine glycation repair by DJ-1/Park7 and its bacterial homologs. Science 357(6347):208–211
UniProt Consortium (2018) UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res 47(D1):D506–D515
UniProt consortium (2020) UniProt UniProtKB/Swiss-Prot UniProt release 2020_01. https://www.uniprot.org/statistics/Swiss-Prot. Accessed 26 Feb 2020
Giordanetto F, Knerr L, Nordberg P, Pettersen D, Selmi N, Beisel HG, de la Motte H, Månsson Å, Dahlstrom M, Broddefalk J, Saarinen G (2018) Design of Selective sPLA2-X inhibitor (−)-2-{2-[carbamoyl-6-(trifluoromethoxy)-1 H-indol-1-yl] pyridine-2-yl} propanoic acid. ACS Med Chem Lett 9(7):600–605
Sekar K, Sekharudu C, Tsai MD, Sundaralingam M (1998) 1.72 Å resolution refinement of the trigonal form of bovine pancreatic phospholipase A2. Acta Crystallogr D Biol Crystallogr 54(3):342–346
Segelke BW, Nguyen D, Chee R, Xuong NH, Dennis EA (1998) Structures of two novel crystal forms of Naja naja naja phospholipase A2 lacking Ca2+ reveal trimeric packing. J Mol Biol 279(1):223–232
Scott DL, Otwinowski Z, Gelb MH, Sigler PB (1990) Crystal structure of bee-venom phospholipase A2 in a complex with a transition-state analogue. Science 250(4987):1563–1566
Cavazzini D, Meschi F, Corsini R, Bolchi A, Rossi GL, Einsle O, Ottonello S (2013) Autoproteolytic activation of a symbiosis-regulated truffle phospholipase A2. J Biol Chem 288(3):1533–1547
Matoba Y, Sugiyama M (2003) Atomic resolution structure of prokaryotic phospholipase A2: analysis of internal motion and implication for a catalytic mechanism. Proteins 51(3):453–469
Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004) UCSF chimera—a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612. https://doi.org/10.1002/jcc.20084
Scott DL, Sigler PB (1994) Structure and catalytic mechanism of secretory phospholipases A2. Adv Protein Chem 45:53–88
Noeske J, Wasserman MR, Terry DS, Altman RB, Blanchard SC, Cate JH (2015) High-resolution structure of the Escherichia coli ribosome. Nat Struct Mol Biol 22(4):336–341
Locher KP (2016) Mechanistic diversity in ATP-binding cassette (ABC) transporters. Nat Struct Mol Biol 23(6):487
Oldham ML, Khare D, Quiocho FA, Davidson AL, Chen J (2007) Crystal structure of a catalytic intermediate of the maltose transporter. Nature 450(7169):515
Hvorup RN, Goetz BA, Niederer M, Hollenstein K, Perozo E, Locher KP (2007) Asymmetry in the structure of the ABC transporter-binding protein complex BtuCD-BtuF. Science 317(5843):1387–1390
Hutchinson EG, Thornton JM (1990) HERA—a program to draw schematic diagrams of protein secondary structures. Proteins 8(3):203–212
Laskowski RA, Jabłońska J, Pravda L, Vařeková RS, Thornton JM (2018) PDBsum: structural summaries of PDB entries. Protein Sci 27(1):129–134
Lewinson O, Livnat-Levanon N (2017) Mechanism of action of ABC importers: conservation, divergence, and physiological adaptations. J Mol Biol 429(5):606–619
RCSB (2000) Protein Data Bank. http://www.rcsb.org/. Accessed 26 Feb 2020
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The Protein Data Bank. Nucleic Acids Res 28:235–242
wwPDB (2003) Worldwide Protein Data Bank. http://www.wwpdb.org/. Accessed 26 Feb 2020
Berman H, Henrick K, Nakamura H (2003) Announcing the worldwide protein data bank. Nat Struct Mol Biol 10(12):980
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
NIH, National Center for Biotechnology Information, U.S. National Library of Medicine (1990) BLAST >> blastp suite. https://blast.ncbi.nlm.nih.gov/Blast.cgi?PAGE=Proteins. Accessed 26 Feb 2020
Ye Y, Godzik A (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics 19(Suppl 2):ii246–ii255
Godzik Lab (2020) FATCAT. http://fatcat.godziklab.org/fatcat-cgi/cgi/fatcat.pl?-func=search. Accessed 26 Feb 2020
EMBL-EBI (2013) PDBsum pictorial database of 3D structures in the protein databank. https://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=index.html. Accessed 26 Feb 2020
El-Gebali S, Mistry J, Bateman A, Eddy SR, Luciani A, Potter SC, Qureshi M, Richardson LJ, Salazar GA, Smart A, Sonnhammer ELL (2019) The Pfam protein families database in 2019. Nucleic Acids Res 47(D1):D427–D432
EMBL-EBI (2018) Pfam 32.0. https://pfam.xfam.org/. Accessed 26 Feb 2020
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Das U, Daugherty L, Duquenne L, Finn RD (2008) InterPro: the integrative protein signature database. Nucleic Acids Res 37(Suppl 1):D211–D215
Dawson NL, Lewis TE, Das S, Lees JG, Lee D, Ashford P, Orengo CA, Sillitoe I (2017) CATH: an expanded resource to predict protein function through structure and sequence. Nucleic Acids Res 45(D1):D289–D295
CATH (2020) CATH/Gene3D v4.2. https://www.cathdb.info/. Accessed 26 Feb 2020
Fox NK, Brenner SE, Chandonia JM (2014) SCOPe: structural classification of proteins—extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42(D1):D304–D309
Murzin AG, Brenner SE, Hubbard TJP, Chothia C (1995) SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 247:536–540
Milburn D, Laskowski RA, Thornton JM (1998) Sequences annotated by structure: a tool to facilitate the use of structural information in sequence analysis. Protein Eng 11(10):855–859
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227(4693):1435–1441
Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N (2016) ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Res 44(W1):W344–W350
Tian W, Chen C, Lei X, Zhao J, Liang J (2018) CASTp 3.0: computed atlas of surface topography of proteins. Nucleic Acids Res 46(W1):W363–W367
RASMOT-3D PRO (2009) Recursive Automatic Search of MOTif in 3D structures of PROteins. http://biodev.cea.fr/rasmot3d/. Accessed 26 Feb 2020
Debret G, Martel A, Cuniasse P (2009) RASMOT-3D PRO: a 3D motif search webserver. Nucleic Acids Res 37(Suppl 2):W459–W464
Zeng ZH, Castano AR, Segelke BW, Stura EA, Peterson PA, Wilson IA (1997) Crystal structure of mouse CD1: an MHC-like fold with a large hydrophobic binding groove. Science 277(5324):339–345
Fremont DH, Matsumura M, Stura EA, Peterson PA, Wilson IA (1992) Crystal structures of two viral peptides in complex with murine MHC class I H-2Kb. Science 257(5072):919–927
El-Etr SH, Margolis JJ, Monack D, Robison RA, Cohen M, Moore E, Rasley A (2009) Francisella tularensis type a strains cause the rapid encystment of Acanthamoeba castellanii and survive in amoebal cysts for three weeks postinfection. Appl Environ Microbiol 75(23):7488–7500
Feld GK, El-Etr S, Corzett MH, Hunter MS, Belhocine K, Monack DM, Frank M, Segelke BW, Rasley A (2014) Structure and function of REP34 implicates carboxypeptidase activity in Francisella tularensis host cell invasion. J Biol Chem 289(44):30668–30679
PDB id: 3b2y, Joint Center for Structural Genomics (JCSG) (2007) Crystal structure of metallopeptidase containing co-catalytic metalloactive site (YP_563529.1) from Shewanella denitrificans OS217 at 1.74 Å resolution. https://doi.org/10.2210/pdb3B2Y/pdb
Otero A, Rodríguez de la Vega M, Tanco S, Lorenzo J, Avilés FX, Reverter D (2012) The novel structure of a cytosolic M14 metallocarboxypeptidase (CCP) from Pseudomonas aeruginosa: a model for mammalian CCPs. FASEB J 26(9):3754–3764
PDB id: 2omo, Osipiuk J, Evdokimova E, Kagan O, Savchenko A, Edwards A, Joachimiak A, Midwest Center for Structural Genomics (MCSG) (2007) Putative antibiotic biosynthesis monooxygenase from Nitrosomonas europaea. DOI. https://doi.org/10.2210/pdb2OMO/pdb
PDB id: 2gff, de Carvalho-Kavanagh M, Schafer J, Lekin T, Toppani D, Chain P, Lao V, Motin V, Garcia E, Segelke B (2007) Crystal structure of Yersinia pestis LsrG. https://doi.org/10.2210/pdb2GFF/pdb
Marques JC, Lamosa P, Russell C, Ventura R, Maycock C, Semmelhack MF, Miller ST, Xavier KB (2011) Processing the interspecies quorum-sensing signal autoinducer-2 (AI-2) characterization of phospho-(S)-4, 5-dihydroxy-2, 3-pentanedione isomerization by LsrG protein. J Biol Chem 286(20):18331–18343
Lemieux MJ, Ference C, Cherney MM, Wang M, Garen C, James MN (2005) The crystal structure of Rv0793, a hypothetical monooxygenase from M. tuberculosis. J Struct Funct Genom 6(4):245–257
PDB id: 3f44, Joint Center for Structural Genomics (JCSG) (2008) Crystal structure of putative monooxygenase (YP_193413.1) from Lactobacillus acidophilus NCFM at 1.55 A resolution. https://doi.org/10.2210/pdb3F44/pdb
PDB id: 3kkf, Joint Center for Structural Genomics (JCSG) (2009) Crystal structure of putative antibiotic biosynthesis monooxygenase (NP_810307.1) from Bacteroides thetaiotaomicron VPI-5482 at 1.30 Å resolution. https://doi.org/10.2210/pdb3KKF/pdb
PDB id: 3mcs, Joint Center for Structural Genomics (JCSG) (2010) Crystal structure of putative monooxygenase (fn1347) from fusobacterium nucleatum subsp. Nucleatum ATCC 25586 at 2.55 Å resolution. https://doi.org/10.2210/pdb3MCS/pdb
PDB id: 3bm7, Joint Center for Structural Genomics (JCSG) (2007) Crystal structure of a putative antibiotic biosynthesis monooxygenase (cc_2132) from Caulobacter crescentus cb15 at 1.35 Å resolution. https://doi.org/10.2210/pdb3BM7/pdb
PDB id: 1r6y, Adams MA, Jia Z, Montreal-Kingston Bacterial Structural Genomics Initiative (BSGI) (2003) Crystal structure of YgiN from Escherichia coli. https://doi.org/10.2210/pdb1R6Y/pdb
PDB id: 1q8b, Zhang R, Joachimiak A, Edwards A, Savchenko A, Midwest Center for Structural Genomics (MCSG) (2003) Structural genomics, protein YJCS. https://doi.org/10.2210/pdb1Q8B/pdb
PDB id: 1x7v, Sanders DA, Walker JR, Skarina T, Gorodichtchenskaia E, Joachimiak A, Edwards A, Savchenko A, Midwest Center for Structural Genomics (MCSG) (2004) Crystal structure of PA3566 from Pseudomonas aeruginosa. https://doi.org/10.2210/pdb1X7V/pdb
PDB id: 2fb0, Nocek B, Hatzos C, Abdullah J, Collart F, and Joachimiak A, Midwest Center for Structural Genomics (MCSG) (2006) Crystal structure of conserved protein of unknown function from Bacteroides thetaiotaomicron VPI-5482 at 2.10 Å resolution, possible oxidoreductase. https://doi.org/10.2210/pdb2FB0/pdb
PDB id: 2bbe, Chang C, Bigelow L, Joachimiak A, Midwest Center for Structural Genomics (MCSG) (2005) Crystal structure of protein SO0527 from Shewanella oneidensis. https://doi.org/10.2210/pdb2BBE/pdb
PDB id: 4dpo, Agarwal R, Chamala S, Evans R, Gizzi A, Hillerich B, Kar A, LaFleur J, Foti R, Siedel R, Zencheck W, Villigas G, Almo SC, Swaminathan S, New York Structural Genomics Research Consortium (NYSGRC) (2012) Crystal structure of a conserved protein MM_1583 from Methanosarcina mazei Go1. https://doi.org/10.2210/pdb4DPO/pdb
Sciara G, Kendrew SG, Miele AE, Marsh NG, Federici L, Malatesta F, Schimperna G, Savino C, Vallone B (2003) The structure of ActVA-Orf6, a novel type of monooxygenase involved in actinorhodin biosynthesis. EMBO J 22(2):205–215
Wada, Shirouzu T, Terada M, Kamewari T, Park Y, Tame SY, Kuramitsu JR, Yokoyama S (2004) Crystal structure of the conserved hypothetical protein TT1380 from Thermus thermophilus HB8. Proteins 55(3):778–780
Grocholski T, Koskiniemi H, Lindqvist Y, Mäntsälä P, Niemi J, Schneider G (2010) Crystal structure of the cofactor-independent monooxygenase SnoaB from Streptomyces nogalater: implications for the reaction mechanism. Biochemistry 49(5):934–944
Chim N, Iniguez A, Nguyen TQ, Goulding CW (2010) Unusual diheme conformation of the heme-degrading protein from Mycobacterium tuberculosis. J Mol Biol 395(3):595–608
PDB id: 4fca, Tan K, Zhou M, Kwon K, Anderson WF, Joachimiak A, Center for Structural Genomics of Infectious Diseases (CSGID) (2012) The crystal structure of a functionally unknown conserved protein from Bacillus anthracis str. Ames. https://doi.org/10.2210/pdb4FCA/pdb
PDB id: 4fgm, Vorobiev S, Su M, Tong T, Kohan E, Wang D, Everett JK, Acton TB, Montelione GT, Tong L, Hunt JF, Northeast Structural Genomics Consortium (NESGC) (2012) Crystal structure of the aminopeptidase n family protein q5qty1 from Idiomarina loihiensis, Northeast structural genomics consortium target ilr60. https://doi.org/10.2210/pdb4FGM/pdb
Segelke B, Knapp M, Kadkhodayan S, Balhorn R, Rupp B (2004) Crystal structure of Clostridium botulinum neurotoxin protease in a product-bound state: evidence for noncanonical zinc protease activity. Proc Natl Acad Sci 101(18):6888–6893
PDB id: 3u9w, Niegowski D, Thunnissen M, Tholander F, Rinaldo-Matthis A, Muroya A, Haeggstrom J Z (2012) Structure of human leukotriene a4 hydrolase in complex with inhibitor sc57461a. https://doi.org/10.2210/pdb3U9W/pdb
Rawlings ND, Barrett AJ (1995) Evolutionary families of metallopeptidases. Methods Enzymol 248:183–228
Guzenko D, Burley SK, Duarte JM 2020 Real time structural search of the Protein Data Bank. PLoS computational biology, 16(7), p.e1007970
Acknowledgments
Molecular graphics and analyses were performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311.
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Segelke, B.W. (2022). Functional Annotation from Structural Homology. In: Navid, A. (eds) Microbial Systems Biology. Methods in Molecular Biology, vol 2349. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1585-0_11
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1585-0_11
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1584-3
Online ISBN: 978-1-0716-1585-0
eBook Packages: Springer Protocols