Inferring Function from Homology

Giles, Tom C.; Emes, Richard D.

doi:10.1007/978-1-4939-6613-4_2

Tom C. Giles^3,4 &
Richard D. Emes^3,4

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1526))

5350 Accesses
4 Citations

Abstract

Recent technological advances in sequencing and high-throughput DNA cloning have resulted in the generation of vast quantities of biological sequence data. Ideally the functions of individual genes and proteins predicted by these methods should be assessed experimentally within the context of a defined hypothesis. However, if no hypothesis is known a priori, or the number of sequences to be assessed is large, bioinformatics techniques may be useful in predicting function.

This chapter proposes a pipeline of freely available Web-based tools to analyze protein-coding DNA and peptide sequences of unknown function. Accumulated information obtained during each step of the pipeline is used to build a testable hypothesis of function.

The following methods are described in detail:

1.
Annotation of gene function through Protein domain detection (SMART and Pfam).
2.
Sequence similarity methods for homolog detection (BLAST and DELTA-BLAST).
3.
Comparing sequences to whole genome data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Homology-Based Annotation of Large Protein Datasets

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Article Open access 31 July 2015

Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations

Article Open access 20 January 2016

References

Doolittle RF (1981) Similar amino acid sequences: chance or common ancestry? Science 214(4517):149–159
Article CAS PubMed Google Scholar
Pearson WR, Sierk ML (2005) The limits of protein sequence comparison? Curr Opin Struct Biol 15(3):254–260
Article CAS PubMed PubMed Central Google Scholar
Fitch WM (2000) Homology a personal view on some of the problems. Trends Genet 16(5):227–231
Article CAS PubMed Google Scholar
Henikoff S, Greene EA, Pietrokovski S, Bork P, Attwood TK, Hood L (1997) Gene families: the taxonomy of protein paralogs and chimeras. Science 278(5338):609–614
Article CAS PubMed Google Scholar
Sonnhammer EL, Koonin EV (2002) Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet 18(12):619–620
Article CAS PubMed Google Scholar
Weber MJ (2005) New human and mouse microRNA genes found by homology search. FEBS J 272(1):59–73
Article CAS PubMed Google Scholar
Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28(1):33–36
Article CAS PubMed PubMed Central Google Scholar
Hurles M (2004) Gene duplication: the genomic trade in spare parts. PLoS Biol 2(7):E206
Article PubMed PubMed Central Google Scholar
Bateman A (1997) The structure of a domain common to archaebacteria and the homocystinuria disease protein. Trends Biochem Sci 22(1):12–13
Article CAS PubMed Google Scholar
Ponting CP, Russell RR (2002) The natural history of protein domains. Annu Rev Biophys Biomol Struct 31:45–71
Article CAS PubMed Google Scholar
Ponting CP (2001) Issues in predicting protein function from sequence. Brief Bioinform 2(1):19–29
Article CAS PubMed Google Scholar
Ponting CP, Dickens NJ (2001) Genome cartography through domain annotation. Genome Biol 2(7), Comment 2006
Google Scholar
Flicek P, Ahmed I, Amode MR, Barrell D, Beal K, Brent S, Carvalho-Silva D, Clapham P, Coates G, Fairley S et al (2013) Ensembl 2013. Nucleic Acids Res 41(Database issue):D48–D55
Article CAS PubMed Google Scholar
Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T et al (2002) The Ensembl genome database project. Nucleic Acids Res 30(1):38–41
Article CAS PubMed PubMed Central Google Scholar
Meyer LR, Zweig AS, Hinrichs AS, Karolchik D, Kuhn RM, Wong M, Sloan CA, Rosenbloom KR, Roe G, Rhead B et al (2013) The UCSC Genome Browser database: extensions and updates 2013. Nucleic Acids Res 41(Database issue):D64–D69
Article CAS PubMed Google Scholar
Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH (2002) CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res 30(1):281–283
Article CAS PubMed PubMed Central Google Scholar
Marchler-Bauer A, Zheng C, Chitsaz F, Derbyshire MK, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Lanczycki CJ, Lu F, Lu S, Marchler GH, Song JS, Thanki N, Yamashita RA, Zhang D, Bryant SH (2013) CDD: conserved domains and protein three-dimensional structure. Nucleic Acids Res 41(Database issue):D348–D352
Article CAS PubMed Google Scholar
Apweiler R, Attwood TK, Bairoch A, Bateman A, Birney E, Biswas M, Bucher P, Cerutti L, Corpet F, Croning MD et al (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res 29(1):37–40
Article CAS PubMed PubMed Central Google Scholar
Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, McWilliam H, Maslen J, Mitchell A, Nuka G, Pesseat S, Quinn AF, Sangrador-Vegas A, Scheremetjew M, Yong SY, Lopez R, Hunter S (2014) InterProScan 5: genome-scale protein function classification. Bioinformatics 30(9):1236–1240
Article CAS PubMed PubMed Central Google Scholar
Bateman A, Birney E, Durbin R, Eddy SR, Finn RD, Sonnhammer EL (1999) Pfam 3.1: 1313 multiple alignments and profile HMMs match the majority of proteins. Nucleic Acids Res 27(1):260–262
Article CAS PubMed PubMed Central Google Scholar
Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, Heger A, Hetherington K, Holm L, Mistry J, Sonnhammer EL, Tate J, Punta M (2014) Pfam: the protein families database. Nucleic Acids Res 42(Database issue):D222–D230
Article CAS PubMed Google Scholar
Letunic I, Doerks T, Bork P (2012) SMART 7: recent updates to the protein domain annotation resource. Nucleic Acids Res 40(Database issue):D302–D305
Article CAS PubMed Google Scholar
Schultz J, Milpetz F, Bork P, Ponting CP (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proc Natl Acad Sci U S A 95(11):5857–5864
Article CAS PubMed PubMed Central Google Scholar
Pearson WR, Lipman DJ (1988) Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 85(8):2444–2448
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article CAS PubMed Google Scholar
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed PubMed Central Google Scholar
Eddy SR (1998) Profile hidden Markov models. Bioinformatics 14(9):755–763
Article CAS PubMed Google Scholar
Gibbs RA, Weinstock GM, Metzker ML, Muzny DM, Sodergren EJ, Scherer S, Scott G, Steffen D, Worley KC, Burch PE et al (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428(6982):493–521
Article CAS PubMed Google Scholar
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921
Article CAS PubMed Google Scholar
Ellsworth RE, Jamison DC, Touchman JW, Chissoe SL, Braden Maduro VV, Bouffard GG, Dietrich NL, Beckstrom-Sternberg SM, Iyer LM, Weintraub LA et al (2000) Comparative genomic sequence analysis of the human and mouse cystic fibrosis transmembrane conductance regulator genes. Proc Natl Acad Sci U S A 97(3):1172–1177
Article CAS PubMed PubMed Central Google Scholar
Emes RD, Goodstadt L, Winter EE, Ponting CP (2003) Comparison of the genomes of human and mouse lays the foundation of genome zoology. Hum Mol Genet 12(7):701–709
Article CAS PubMed Google Scholar
Schultz J, Copley RR, Doerks T, Ponting CP, Bork P (2000) SMART: a web-based tool for the study of genetically mobile domains. Nucleic Acids Res 28(1):231–234
Article CAS PubMed PubMed Central Google Scholar
Sonnhammer EL, Eddy SR, Birney E, Bateman A, Durbin R (1998) Pfam: multiple sequence alignments and HMM-profiles of protein domains. Nucleic Acids Res 26(1):320–322
Article CAS PubMed PubMed Central Google Scholar
Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V, Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Sonnhammer EL, Bateman A (2006) Pfam: clans, web tools and services. Nucleic Acids Res 34(Database issue):D247–D251
Article CAS PubMed Google Scholar
Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17(1):49–61
Article CAS PubMed Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919
Article CAS PubMed PubMed Central Google Scholar
Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197
Article CAS PubMed Google Scholar
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S et al (2008) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 36(Database issue):D13–D21
CAS PubMed Google Scholar
Pearson WR (2014) BLAST and FASTA similarity searching for multiple sequence alignment. Methods Mol Biol 1079:75–101
Article PubMed Google Scholar
Altschul SF, Gertz EM, Agarwala R, Schaffer AA, Yu YK (2009) PSI-BLAST pseudocounts and the minimum description length principle. Nucleic Acids Res 37(3):815–824
Article CAS PubMed Google Scholar
Altschul SF, Koonin EV (1998) Iterated profile searches with PSI-BLAST—a tool for discovery in protein databases. Trends Biochem Sci 23(11):444–447
Article CAS PubMed Google Scholar
Boratyn GM, Schaffer AA, Agarwala R, Altschul SF, Lipman DJ, Madden TL (2012) Domain enhanced lookup time accelerated BLAST. Biol Direct 7:12
Article CAS PubMed PubMed Central Google Scholar
Jones DT, Swindells MB (2002) Getting the most from PSI-BLAST. Trends Biochem Sci 27(3):161–164
Article CAS PubMed Google Scholar
Korf I (2003) Serial BLAST searching. Bioinformatics 19(12):1492–1496
Article CAS PubMed Google Scholar
Altschul SF, Bundschuh R, Olsen R, Hwa T (2001) The estimation of statistical parameters for local alignment score distributions. Nucleic Acids Res 29(2):351–361
Article CAS PubMed PubMed Central Google Scholar
Wootton JC, Federhen S (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol 266:554–571
Article CAS PubMed Google Scholar
Altschul SF, Gish W (1996) Local alignment statistics. Methods Enzymol 266:460–480
Article CAS PubMed Google Scholar
Henikoff S (1996) Scores for sequence searches and alignments. Curr Opin Struct Biol 6(3):353–360
Article CAS PubMed Google Scholar
Schaffer AA, Aravind L, Madden TL, Shavirin S, Spouge JL, Wolf YI, Koonin EV, Altschul SF (2001) Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. Nucleic Acids Res 29(14):2994–3005
Article CAS PubMed PubMed Central Google Scholar
Sierk ML, Pearson WR (2004) Sensitivity and selectivity in protein structure comparison. Protein Sci 13(3):773–785
Article CAS PubMed PubMed Central Google Scholar
Wass MN, Barton G, Sternberg MJ (2012) CombFunc: predicting protein function using heterogeneous data sources. Nucleic Acids Res 40(Web Server issue):W466–W470
Article CAS PubMed PubMed Central Google Scholar
Minneci F, Piovesan D, Cozzetto D, Jones DT (2013) FFPred 2.0: improved homology-independent prediction of gene ontology terms for eukaryotic protein sequences. PLoS One 8(5):e63754
Article CAS PubMed PubMed Central Google Scholar
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25(1):25–29
Article CAS PubMed PubMed Central Google Scholar
Henikoff S, Pietrokovski S, Henikoff JG (1998) Superior performance in protein homology detection with the Blocks Database servers. Nucleic Acids Res 26(1):309–312
Article CAS PubMed PubMed Central Google Scholar
Henikoff JG, Pietrokovski S, McCallum CM, Henikoff S (2000) Blocks-based methods for detecting protein homology. Electrophoresis 21(9):1700–1706
Article CAS PubMed Google Scholar
Schaffer AA, Wolf YI, Ponting CP, Koonin EV, Aravind L, Altschul SF (1999) IMPALA: matching a protein sequence against a collection of PSI-BLAST-constructed position-specific score matrices. Bioinformatics 15(12):1000–1011
Article CAS PubMed Google Scholar
Pietrokovski S (1996) Searching databases of conserved sequence regions by aligning protein multiple-alignments. Nucleic Acids Res 24(19):3836–3845
Article CAS PubMed PubMed Central Google Scholar
Sadreyev R, Grishin N (2003) COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance. J Mol Biol 326(1):317–336
Article CAS PubMed Google Scholar
Sadreyev RI, Grishin NV (2004) Quality of alignment comparison by COMPASS improves with inclusion of diverse confident homologs. Bioinformatics 20(6):818–828
Article CAS PubMed Google Scholar
Sadreyev RI, Tang M, Kim BH, Grishin NV (2007) COMPASS server for remote homology inference. Nucleic Acids Res 35(Web Server issue):W653–W658
Article PubMed PubMed Central Google Scholar
Sadreyev RI, Tang M, Kim BH, Grishin NV (2009) COMPASS server for homology detection: improved statistical accuracy, speed and functionality. Nucleic Acids Res 37(Web Server issue):W90–W94
Article CAS PubMed PubMed Central Google Scholar
Soding J, Biegert A, Lupas AN (2005) The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 33(Web Server issue):W244–W248
Article PubMed PubMed Central Google Scholar
Hildebrand A, Remmert M, Biegert A, Soding J (2009) Fast and accurate automatic structure prediction with HHpred. Proteins 77(Suppl 9):128–132
Article CAS PubMed Google Scholar
Eddy SR (2011) Accelerated profile HMM searches. PLoS Comput Biol 7(10):e1002195
Article CAS PubMed PubMed Central Google Scholar
Kent WJ (2002) BLAT—the BLAST-like alignment tool. Genome Res 12(4):656–664
Article CAS PubMed PubMed Central Google Scholar
Marchler-Bauer A, Anderson JB, Derbyshire MK, DeWeese-Scott C, Gonzales NR, Gwadz M, Hao L, He S, Hurwitz DI, Jackson JD et al (2007) CDD: a conserved domain database for interactive domain family analysis. Nucleic Acids Res 35(Database issue):D237–D240
Article CAS PubMed Google Scholar
Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, Wagner L (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res 31(1):28–33
Article CAS PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington Campus, Leicestershire, LE12 5RD, UK
Tom C. Giles & Richard D. Emes
Advanced Data Analysis Centre, University of Nottingham, Leicestershire, LE12 5RD, UK
Tom C. Giles & Richard D. Emes

Authors

Tom C. Giles
View author publications
You can also search for this author in PubMed Google Scholar
Richard D. Emes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Richard D. Emes .

Editor information

Editors and Affiliations

Monash University, Melbourne, Victoria, Australia
Jonathan M. Keith

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Giles, T.C., Emes, R.D. (2017). Inferring Function from Homology. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1526. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6613-4_2

Download citation

DOI: https://doi.org/10.1007/978-1-4939-6613-4_2
Published: 29 November 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6611-0
Online ISBN: 978-1-4939-6613-4
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Inferring Function from Homology

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Inferring Function from Homology

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Homology-Based Annotation of Large Protein Datasets

De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods

Protein Sequence Annotation Tool (PSAT): a centralized web-based meta-server for high-throughput sequence annotations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation