Abstract
Molecular evolution can reveal the relationship between sets of homologous sequences and the patterns of change that occur during their evolution. An important aspect of these studies is the inference of a phylogenetic tree, which explicitly describes evolutionary relationships between homologous sequences. This chapter provides an introduction to evolutionary trees and how to infer them from sequence data using some commonly used inferential methodology. It focuses on statistical methods for inferring trees and how to assess the confidence one should have in any resulting tree, with a particular emphasis on the underlying assumptions of the methods and how they might affect the tree estimate. There is also some discussion of the underlying algorithms used to perform tree search and recommendations regarding the performance of different algorithms. Finally, there are a few practical guidelines, including how to combine multiple software packages to improve inference, and a comparison between Bayesian and Maximum likelihood phylogenetics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hahn BH et al (2000) AIDS—AIDS as a zoonosis: scientific and public health implications. Science 287:607–614
Pellegrini M et al (1999) Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci U S A 96:4285–4288
Ames RM et al (2012) Determining the evolutionary history of gene families. Bioinformatics 28:48–55
Liberles DA et al (2012) The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 21:769–785
Hahn MW, Han MV, Han S-G (2007) Gene family evolution across 12 Drosophila genomes. PLoS Genet 3:e197
Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562
Lynch M, Walsh B (2007) The origins of genome architecture. Sinauer Associates, Sunderland, MA
Gogarten JP, Doolittle WF, Lawrence JG (2002) Prokaryotic evolution in light of gene transfer. Mol Biol Evol 19:2226–2238
Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proc Natl Acad Sci U S A 107:9264–9269
Siepel A et al (2005) Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res 15:1034–1050
Felsenstein J (2003) Inferring Phylogenies. Sinauer Associates, Sunderland, MA
Löytynoja A, Goldman N (2008) Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science 320:1632–1635
Anisimova M, Cannarozzi G, Liberles DA (2010) Finding the balance between the mathematical and biological optima in multiple sequence alignment. Trends Evol Biol 2:e7
Löytynoja A (2012) Alignment methods: strategies, challenges, benchmarking, and comparative overview. In: Evolutionary genomics. Springer, New York, pp 203–235.
Yang Z (2006) Computational molecular evolution. Oxford University Press, Oxford
Redelings B, Suchard M (2005) Joint Bayesian estimation of alignment and phylogeny. Syst Biol 54:401–418
Thorne JL, Kishino H, Felsenstein J (1991) An evolutionary model for maximum likelihood alignment of DNA sequences. J Mol Evol 33:114–124
McGuire G, Denham MC, Balding DJ (2001) Models of sequence evolution for DNA sequences containing gaps. Mol Biol Evol 18:481–490
Morrison DA, Ellis JT (1997) Effects of nucleotide sequence alignment on phylogeny estimation: a case study of 18S rDNAs of Apicomplexa. Mol Biol Evol 14:428–441
Wong K, Suchard M, Huelsenbeck J (2008) Alignment uncertainty and genomic analysis. Science 319:473–476
Blackburne BP, Whelan S (2013) Class of multiple sequence alignment algorithm affects genomic analysis. Mol Biol Evol 30:642–653
Wägele JW, Mayer C (2007) Visualizing differences in phylogenetic information content of alignments and distinction of three classes of long-branch effects. BMC Evol Biol 7:147
Hendy MD, Penny D (1993) Spectral analysis of phylogenetic data. J Classif 10:5–24
Morrison DA (2010) Using data-display networks for exploratory data analysis in phylogenetic studies. Mol Biol Evol 27:1044–1057
Huson DH, Bryant D (2006) Application of phylogenetic networks in evolutionary studies. Mol Biol Evol 23:254–267
Morrison DA (2011) Introduction to phylogenetic networks. RJR Productions, Uppsala, Sweden
Philippe H, Germot A (2000) Phylogeny of eukaryotes based on ribosomal RNA: long-branch attraction and models of sequence evolution. Mol Biol Evol 17:830–834
Inagaki Y et al (2004) Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF-1α phylogenies. Mol Biol Evol 21:1340–1349
Viklund J, Ettema TJ, Andersson SG (2011) Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol Biol Evol 29:599–615
Morrison DA (2006) Phylogenetic analyses of parasites in the new millennium. Adv Parasitol 63:1–124
Edwards AWF (1972) Likelihood: an account of the statistical concept of likelihood and its application to scientific inference. Cambridge University Press, New York
Chang JT (1996) Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math Biosci 137:51–73
Rogers JS (1997) On the consistency of maximum likelihood estimation of phylogenetic trees from nucleotide sequences. Syst Biol 46:354–357
Stamatakis A (2014) RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30:1312–1313
Izquierdo-Carrasco F, Smith SA, Stamatakis A (2011) Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees. BMC Bioinformatics 12:470
Steel M, Penny D (2000) Parsimony, likelihood, and the role of models in molecular phylogenetics. Mol Biol Evol 17:839–850
Siddall ME, Kluge AG (1997) Probabilism and phylogenetic inference. Cladistics 13:313–336
Saitou N, Nei M (1987) The neighbor-joining method—a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425
Allman ES, Rhodes JA (2006) The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J Comput Biol 13:1101–1113
Swofford DL et al (1996) Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK (eds) Molecular systematics. Sinauer Associates, Sunderland, MA, pp 407–514
Morrison DA (2007) Increasing the efficiency of searches for the maximum likelihood tree in a phylogenetic analysis of up to 150 nucleotide sequences. Syst Biol 56:988–1010
Whelan S (2007) New approaches to phylogenetic tree search and their application to large numbers of protein alignments. Syst Biol 56:727–740
Vinh LS, von Haeseler A (2004) IQPNNI: moving fast through tree space and stopping in time. Mol Biol Evol 21:1565–1571
Money D, Whelan S (2012) Characterizing the phylogenetic tree-search problem. Syst Biol 61:228–239
Bryant D (2004) The splits in the neighborhood of a tree. Ann Combin 8:1–11
Whelan S, Money D (2010) The prevalence of multifurcations in tree-space and their implications for tree-search. Mol Biol Evol 27:2674–2677
Lin Y-M, Fang S-C, Thorne JL (2007) A tabu search algorithm for maximum parsimony phylogeny inference. Eur J Oper Res 176:1908–1917
Zwickl D (2006) Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion. Ph.D. thesis, University of Texas, USA
Lewis PO (1998) A genetic algorithm for maximum-likelihood phylogeny inference using nucleotide sequence data. Mol Biol Evol 15:277–283
Lemmon AR, Milinkovitch MC (2002) The metapopulation genetic algorithm: an efficient solution for the problem of large phylogeny estimation. Proc Natl Acad Sci U S A 99:10516–10521
Darriba D et al (2012) jModelTest 2: more models, new heuristics and parallel computing. Nat Methods 9:772
Darriba D et al (2011) ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27:1164–1165
Whelan S et al (2015) ModelOMatic: fast and automated model selection between RY, nucleotide, amino acid, and codon substitution models. Syst Biol 64:42–55
Allen JE, Whelan S (2014) Assessing the state of substitution models describing noncoding RNA evolution. Genome Biol Evol 6:65–75
Blair C, Murphy RW (2011) Recent trends in molecular phylogenetic analysis: where to next? J Hered 102:130–138
Lanfear R et al (2012) PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol Biol Evol 29:1695–1701
Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53:571–581
Le SQ, Lartillot N, Gascuel O (2008) Phylogenetic mixture models for proteins. Philos Trans R Soc B Biol Sci 363:3965–3976
Le SQ, Gascuel O (2010) Accounting for solvent accessibility and secondary structure in protein phylogenetics is clearly beneficial. Syst Biol 59:277–287
Bouckaert RR (2010) DensiTree: making sense of sets of phylogenetic trees. Bioinformatics 26:1372–1373
Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:783–791
Hillis DM, Bull JJ (1993) An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst Biol 42:182–192
Efron B, Halloran E, Holmes S (1996) Bootstrap confidence levels for phylogenetic trees. Proc Natl Acad Sci U S A 93:13429
Embley TM, Martin W (2006) Eukaryotic evolution, changes and challenges. Nature 440:623–630
Fitzpatrick DA, Creevey CJ, McInerney JO (2006) Genome phylogenies indicate a meaningful α-proteobacterial phylogeny and support a grouping of the mitochondria with the Rickettsiales. Mol Biol Evol 23:74–85
McGowen MR, Gatesy J, Wildman DE (2014) Molecular evolution tracks macroevolutionary transitions in Cetacea. Trends Ecol Evol 29:336–346
Shimodaira H, Hasegawa M (1999) Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Mol Biol Evol 16:1114–1116
Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
Stamatakis A, Hoover P, Rougemont J (2008) A rapid bootstrap algorithm for the RAxML web servers. Syst Biol 57:758–771
Minh BQ, Nguyen MAT, von Haeseler A (2013) Ultrafast approximation for phylogenetic bootstrap. Mol Biol Evol 30:1188–1195. doi:10.1093/molbev/mst024
Anisimova M, Gascuel O (2006) Approximate likelihood-ratio test for branches: a fast, accurate, and powerful alternative. Syst Biol 55:539–552
Huelsenbeck JP et al (2002) Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol 51:673–688
Holder M, Lewis PO (2003) Phylogeny estimation: traditional and Bayesian approaches. Nat Rev Genet 4:275–284
Ronquist F, Deans AR (2010) Bayesian phylogenetics and its influence on insect systematics. Annu Rev Entomol 55:189–206
Yang Z, Rannala B (2012) Molecular phylogenetics: principles and practice. Nat Rev Genet 13:303–314
Drummond AJ et al (2012) Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol Biol Evol 29:1969–1973
Ronquist F et al (2012) MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst Biol 61:539–542
Larget B, Simon DL (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16:750–759
Alfaro ME, Holder MT (2006) The posterior and the prior in Bayesian phylogenetics. Annu Rev Ecol Evol Syst 37:19–42
Zhang C, Rannala B, Yang Z (2012) Robustness of compound Dirichlet priors for Bayesian inference of branch lengths. Syst Biol 61:779–784
Bergsten J, Nilsson AN, Ronquist F (2013) Bayesian tests of topology hypotheses with an example from diving beetles. Syst Biol 62:660–673
Rannala B, Zhu T, Yang Z (2012) Tail paradox, partial identifiability, and influential priors in Bayesian branch length inference. Mol Biol Evol 29:325–335
Lewis PO, Holder MT, Holsinger KE (2005) Polytomies and Bayesian phylogenetic inference. Syst Biol 54:241–253
Yang ZH (2007) Fair-balance paradox, star-tree paradox, and Bayesian phylogenetics. Mol Biol Evol 24:1639–1655
Lartillot N, Philippe H (2004) A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 21:1095–1109
Lartillot N, Brinkmann H, Philippe H (2007) Suppression of long-branch attraction artefacts in the animal phylogeny using a site-heterogeneous model. BMC Evol Biol 7:S4
Robinson D et al (2003) Protein evolution with dependence among codons due to tertiary structure. Mol Biol Evol 20:1692–1704
Lartillot N, Poujol R (2011) A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol 28:729–744
Lukoschek V, Keogh JS, Avise JC (2012) Evaluating fossil calibrations for dating phylogenies in light of rates of molecular evolution: a comparison of three approaches. Syst Biol 61:22–43
Baele G et al (2012) Improving the accuracy of demographic and molecular clock model comparison while accommodating phylogenetic uncertainty. Mol Biol Evol 29:2157–2167
Delsuc F, Brinkmann H, Philippe H (2005) Phylogenomics and the reconstruction of the tree of life. Nat Rev Genet 6:361–375
Landan G, Graur D (2007) Heads or tails: a simple reliability check for multiple sequence alignments. Mol Biol Evol 24:1380–1383
Penn O et al (2010) An alignment confidence score capturing robustness to guide tree uncertainty. Mol Biol Evol 27:1759–1767
Jordan G, Goldman N (2012) The effects of alignment error and alignment filtering on the sitewise detection of positive selection. Mol Biol Evol 29:1125–1139
Huber KT et al (2002) Spectronet: a package for computing spectra and median networks. Appl Bioinformatics 1:2041–2059
Huson DH (1998) SplitsTree: analyzing and visualizing evolutionary data. Bioinformatics 14:68–73
Gil M et al (2013) CodonPhyML: fast maximum likelihood phylogeny estimation under codon substitution models. Mol Biol Evol 30:1270–1280
Swofford DL (2002) Phylogenetic analysis using parsimony (*and other methods). Sinauer Associates, Sunderland, MA
Guindon S et al (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59:307–321
Lartillot N, Lepage T, Blanquart S (2009) PhyloBayes 3: a Bayesian software package for phylogenetic reconstruction and molecular dating. Bioinformatics 25:2286–2288
Nylander JA et al (2008) AWTY (are we there yet?): a system for graphical exploration of MCMC convergence in Bayesian phylogenetics. Bioinformatics 24:581–583
Acknowledgments
SW is funded by Uppsala University. DAM is funded by Akademikernas A-kassa and Trygghetsstiftelsen.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media New York
About this protocol
Cite this protocol
Whelan, S., Morrison, D.A. (2017). Inferring Trees. In: Keith, J. (eds) Bioinformatics. Methods in Molecular Biology, vol 1525. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6622-6_14
Download citation
DOI: https://doi.org/10.1007/978-1-4939-6622-6_14
Published:
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-6620-2
Online ISBN: 978-1-4939-6622-6
eBook Packages: Springer Protocols