Abstract
Molecular sequences in a phylogenetic analysis can differ in composition, and that shows that the process of evolution can change over time. However, models of evolution in common use are homogeneous over the tree, and if used in a phylogenetic analysis with compositionally tree-heterogeneous datasets these models can recover incorrect trees. The NDCH or Node-Discrete Compositional Heterogeneity model is able to model such data by accommodating differences in composition over the tree. Usage, problems, and limitations of this model are discussed, and a modification, the NDCH2 model, is described that can ameliorate some of these problems and limitations. Using these models can greatly increase the fit of the model to the data and can find better tree topologies. These models and various statistical tests are illustrated using a bacterial SSU rRNA dataset. These models are implemented in the software P4, and files for the analyses described here are made available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Muto A, Osawa S (1987) The guanine and cytosine content of genomic DNA and bacterial evolution. Proc Natl Acad Sci (USA) 84:166–9. https://doi.org/10.1073/pnas.84.1.166
Embley TM, Thomas RH, Williams RAD (1993) Reduced thermophilic bias in the 16S rDNA sequence from Thermus ruber provides further support for a relationship between Thermus and Deinococcus. Syst Appl Microbiol 16:25–29. https://doi.org/10.1016/S0723-2020(11)80247-X
Steel M, Lockhart P, Penny D (1993) Confidence in evolutionary trees from biological sequence data. Nature 364:440–442. https://doi.org/10.1038/364440a0
Hasegawa M, Hashimoto T (1993) Ribosomal RNA trees misleading? Nature 361:23. https://doi.org/10.1038/361023b0
Lake JA (1994) Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances. Proc Natl Acad Sci (USA) 91:1455–1459. https://doi.org/10.1073/pnas.91.4.1455
Lockhart PJ, Steel MA, Hendy MD, and Penny D (1994) Recovering evolutionary trees under a more realistic model of sequence evolution. Mol Biol Evol 11:605–612. https://doi.org/10.1093/oxfordjournals.molbev.a040136
Foster PG, Jermiin LS, Hickey DA (1997) Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J Mol Evol 44:282–288. https://doi.org/10.1007/PL00006145
Mooers AØ, Holmes EC (2000) The evolution of base composition and phylogenetic inference. Trends Ecol Evol 15:365–369. https://doi.org/10.1016/S0169-5347(00)01934-0
Naser-Khdour S, Minh BQ, Zhang W, Stone EA, Lanfear R (2019) The prevalence and impact of model violations in phylogenetic analysis. Genome Biol Evol 11:3341–3352. https://doi.org/10.1093/gbe/evz193
Foster PG, Hickey DA (1999) Compositional bias may affect both DNA-based and protein-based phylogenetic reconstructions. J Mol Evol 48:284–290. https://doi.org/10.1007/PL00006471
Collins TM, Fedrigo O, Naylor GJ (2005) Choosing the best genes for the job: the case for stationary genes in genome-scale phylogenetics. Syst Biol 54:493–500. https://doi.org/10.1080/10635150590947339
Rodríguez-Ezpeleta N, Brinkmann H, Roure B, Lartillot N, Lang BF, Philippe H (2007) Detecting and overcoming systematic errors in genome-scale phylogenies. Syst Biol 56:389–399. https://doi.org/10.1080/10635150701397643
Criscuolo A, Gribaldo S (2010) BMGE (Block Mapping and Gathering with Entropy): A new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evol Biol 10:1–21. https://doi.org/10.1186/1471-2148-10-210
Hirt RP, Logsdon JM, Healy B, Dorey MW, Doolittle WF, Embley TM (1999) Microsporidia are related to Fungi: evidence from the largest subunit of RNA polymerase II and other proteins. Proc Natl Acad Sci (USA) 96:580–585. https://doi.org/10.1073/pnas.96.2.580
Yang Z, Roberts D (1995) On the use of nucleic acid sequences to infer early branchings in the tree of life. Mol Biol Evol 12:451–458. https://doi.org/10.1093/oxfordjournals.molbev.a040220
Galtier N, Gouy M (1998) Inferring pattern and process: Maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis. Mol Biol Evol 15:871–879. https://doi.org/10.1093/oxfordjournals.molbev.a025991
Foster PG (2004) Modeling compositional heterogeneity. Syst Biol 53:485–495. https://doi.org/10.1080/10635150490445779
Gowri-Shankar V, Rattray M (2007) A reversible jump method for Bayesian phylogenetic inference with a nonhomogeneous substitution model. Mol Biol Evol 24:1286–1299. https://doi.org/10.1093/molbev/msm046
Blanquart S, Lartillot N (2008) A site- and time-heterogeneous model of amino acid replacement. Mol Biol Evol 25:842–858. https://doi.org/10.1093/molbev/msn018
Heaps SE, Nye TM, Boys RJ, Williams TA, Embley TM (2014) Bayesian modelling of compositional heterogeneity in molecular phylogenetics. Stat Appl Genet Mol Biol 13:589–609. https://doi.org/10.1515/sagmb-2013-0077
Williams TA, Heaps SE, Cherlin S, Nye TM, Boys RJ, Embley TM (2015) New substitution models for rooting phylogenetic trees. Phil Trans Roy Soc B Biol Sci 370:20140336. https://doi.org/10.1098/rstb.2014.0336
Jermiin LS, Ho SY, Ababneh F, Robinson J, Larkum AW (2004) The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated. Syst Biol 53:638–643. https://doi.org/10.1080/10635150490468648
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376. https://doi.org/10.1007/BF01734359
Minh BQ, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Haeseler A von, Lanfear R (2020) IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol Biol Evol 37:1530–1534. https://doi.org/10.1093/molbev/msaa015
Gascuel O (1997) BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. Mol Biol Evol 14:685–695. https://doi.org/10.1093/oxfordjournals.molbev.a025808
Cox CJ, Foster PG, Hirt RP, Harris SR, Embley TM (2008) The archaebacterial origin of eukaryotes. Proc Natl Acad Sci (USA) 105:20356–20361. https://doi.org/10.1073/pnas.0810647105
Foster PG, Cox CJ, Embley TM (2009) The primary divisions of life: a phylogenomic approach employing composition-heterogeneous methods. Phil Trans Roy Soc B Biol Sci 364:2197–2207. https://doi.org/10.1098/rstb.2009.0034
Kalyaanamoorthy S, Minh BQ, Wong TK, Von Haeseler A, Jermiin LS (2017) ModelFinder: fast model selection for accurate phylogenetic estimates. Nature Meth 14:587–589. https://doi.org/10.1038/nmeth.4285
Ababneh F, Jermiin LS, Ma C, Robinson J (2006) Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences. Bioinformatics 22:1225–1231. https://doi.org/10.1093/bioinformatics/btl064
Jermiin LS, Jayaswal V, Ababneh FM, and Robinson J (2016) Identifying Optimal Models of Evolution. In: Methods in molecular biology. Springer, New York, pp 379–420. https://doi.org/10.1007/978-1-4939-6622-6_15
Shimodaira H (2002) An approximately unbiased test of phylogenetic tree selection. Syst Biol 51:492–508. https://doi.org/10.1080/10635150290069913
Shimodaira H, Hasegawa M (2001) CONSEL: For assessing the confidence of phylogenetic tree selection. Bioinformatics 17:1246–1247. https://doi.org/10.1093/bioinformatics/17.12.1246
Bollback JP (2002) Bayesian model adequacy and choice in phylogenetics. Mol Biol Evol 19:1171–1180. https://doi.org/10.1093/oxfordjournals.molbev.a004175
Xie W, Lewis PO, Fan Y, Kuo L, Chen M.-H (2011) Improving marginal likelihood estimation for Bayesian phylogenetic model selection. Syst Biol 60:150–160. https://doi.org/10.1093/sysbio/syq085
Geyer CJ (1991) Markov chain Monte Carlo maximum likelihood
Huelsenbeck JP, Ronquist F (2001) MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17:754–755. https://doi.org/10.1093/bioinformatics/17.8.754
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Foster, P.G. (2022). Phylogenetic Analysis That Models Compositional Heterogeneity over the Tree. In: Luo, H. (eds) Environmental Microbial Evolution. Methods in Molecular Biology, vol 2569. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2691-7_6
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2691-7_6
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2690-0
Online ISBN: 978-1-0716-2691-7
eBook Packages: Springer Protocols