Recommendations for Further Reading

Bickel, David

doi:10.1007/978-3-031-11958-3_7

David Bickel²

Part of the book series: SpringerBriefs in Systems Biology ((BRIEFSBIOSYS))

481 Accesses

Abstract

The last chapter is a guide to further reading on molecular phylogenetics, uncertainty quantification, and other topics encountered in the book.

…a lot of people prefer Bayesian support values to other measures of support, such as the bootstrap …because they have a likeable tendency to give you higher numbers, making you feel happier about your tree.

– Lindell Bromham¹

An Introduction to Molecular Evolution and Phylogenetics (Oxford University Press)[26, p. 431]. Ⓒ Lindell Bromham 2016. Reproduced with permission of the Licensor through PLSclear.

Access provided by Autonomous University of Puebla. Download chapter PDF

A biologist’s guide to Bayesian phylogenetic analysis

Article 21 September 2017

Inferring Trees

A Not-So-Long Introduction to Computational Molecular Evolution

Keywords

This chapter is a guide for the readers wanting to delve deeper into some of the topics of previous chapters.

7.1 Molecular Phylogenetics Books

Section 7.1.1 describes books requiring no more mathematics preparation than the present book. Section 7.1.2 briefly comments on books for readers with more knowledge of mathematics.

7.1.1 Phylogenetics Books with Less Mathematics

7.1.1.1 Phylogenetic Trees Made Easy: A How-To Manual [53]

Hall [53] shows step by step how to use phylogenetics software, especially MEGA 7. Nearly, all the MEGA 7 instructions apply with little modification to MEGA X, the version described in Kumar et al. [74] and Stecher et al. [108] and mentioned above in Sects. 3.3, 5.4, and 5.6.

In addition to serving as a software tutorial for the topics introduced above in Chaps. 3–5, Hall [53] exposes the readers to specialized topics such as evolutionary networks and the detection of selection pressures. Like the present book (see the Preface), Hall [53] keeps the mathematics simple, including equations when needed for understanding.

7.1.1.2 An Introduction to Molecular Evolution and Phylogenetics [26]

Bromham [26] places molecular phylogenetics in the context of background topics such as mutation, replication, genomics, genetics, and mechanisms of evolution. Bromham [26] makes heavy use of graphical explanations, much as does Chap. 1, above. Due to its warnings about uncertainty in the output of phylogenetics software, Bromham [26] is cited above in Chaps. 3–4 to motivate their corrections of unquantified uncertainty.

To appeal to a wide audience of biologists, Bromham [26] avoids formulas as a matter of principle. For example, Bromham [26, p. 430] translated Bayes’s theorem to English sentences much as Laplace had translated probability formulas into French sentences [78]. The principle is not followed slavishly: Bromham [26, p. 418] resorts to an equation in the discussion of evolution rates.

7.1.2 Phylogenetics Books with More Mathematics

Yang [133] gives details of many statistical methods of analyzing sequence data for molecular phylogenetics. The equations and notation are complex enough for describing the methods without recourse to the idealized simplifications seen in Chaps. 3 and 5 above. The author nonetheless made special efforts to make the book accessible to biologists [133, Preface]. Many uncertainties involved in statistical inferences about molecular evolution are thoroughly discussed in Yang [133, chapters 10–11]. The emphasis is on maximum likelihood estimation and Bayesian inference.

Drummond and Bouckaert [42] specifically focus on Bayesian inference about molecular evolution. The authors are the leading developers of BEAST 2 [24], which is currently the most popular software suite dedicated to Bayesian phylogenetics. The work Drummond and Bouckaert [42] is organized into three parts:

(1)
The “Theory” part, like much of Yang [133], will appeal to the readers comfortable with calculus, linear algebra, and mathematical notation.
(2)
The “Practice” part assists biologists with the use of BEAST 2 without requiring knowledge of the more technical parts.
(3)
The “Programming” part, going under the hood of BEAST 2, will interest the readers with coding skills.

An earlier guide to BEAST is chapter 18 in Salemi et al. [104], a book written by the experts in specific areas of molecular phylogenetics, including the preliminary steps of finding and aligning sequences. This chapter describes MrBayes, another software tool for Bayesian inference.

The work Nei and Kumar [92] is written by the creators of the MEGA software mentioned in Sect. 7.1.1.1. Clearly explaining many statistical tests and bootstrap methods, it remains widely cited. Another classic text is Felsenstein [46].

Chapters 13 and 14 of Ewens and Grant [43] describe statistical methods of extracting information on molecular evolution from biological sequence data. Those chapters complement Nei and Kumar [92] in large part by providing a more concise treatment of the topics. Previous chapters of Ewens and Grant [43] give an overview of other statistical methods of analyzing DNA and protein sequences, with an emphasis on the statistics behind BLAST theory. Those methods are used to select and align sequences before the tree estimation methods can be applied. For practical guidance in that use of BLAST, see Hall [53], the book recommended in Sect. 7.1.1.1.

Xia [132] explains much of the mathematics involved in methods of phylogenetic trees reconstructed from sequence data, with an emphasis on distance-based methods and maximum likelihood estimation. The work Xia [132, chapter 2] is cited above in Sect. 3.4.1 on alignment as a source of uncertainty.

7.2 Bioinformatics and Genomics Books

The introductory book by Lesk [82] sets molecular phylogenetics in the context of other methods of computational biology. It is cited in Sect. 3.1.2.1 on distance-based estimation. Lesk [82] displays and discusses the three sequences behind Fig. 3.7.

Abu-Jamous et al. [1] describe many methods of cluster analysis in the context of bioinformatics problems. Distance-based tree estimation, while mathematically a form of hierarchical cluster analysis, has an evolutionary interpretation when homology is accepted as a working hypothesis (Sect. 1.2.1).

Using dice games, Bickel [12] explains statistical methods of analyzing data from genome-wide association studies and from measurements of gene expression and related proteomics and metabolomics data. The empirical Bayes tools (mentioned in Sect. 5.4) primarily apply to simultaneously testing multiple hypotheses. Multiple testing occurs not only in the types of data used in that book but also in an adjusted bootstrap proportion [102] (cf. Sect. 4.1) and also more generally, as seen in Sect. 2.5. Empirical Bayes methods are designed to guard against false positives like those leading to the replication crisis in many fields of science [see 8]. Motivated by that problem, chapter 7 of Bickel [12] explains how such methods scale down to testing a single hypothesis. The book uses a result from the use of confidence theory to propagate the uncertainty in estimating prior distributions [17].

7.3 Imprecise Probability Books

You may recall that Sects. 4.2.3, 4.3, and 5.5.2 explain how to correct a confidence level or posterior probability by multiplying it by the estimated probability that the prior distributions and other model assumptions are adequate. That correction factor, the proportion of uncertainty quantified by the models, is , where is the proportion unquantified uncertainty [19].

Strict Bayesians would raise an objection: if all the probabilities are multiplied by , then the total probability is instead of 100%. True, but “it’s not a bug, it’s a feature” [31], for such estimates honestly reflect the extent of unquantified uncertainty. (A less conservative method [15] is summarized above in Exercise 6b of Sect. 4.4.)

While the corrected probability function cannot be a probability distribution, it qualifies mathematically a lower probability function in the theory of imprecise probability. The value of such a function may be interpreted as the sufficiency of the evidence (Appendix A) or as a lower bound on standard probability. Under the latter interpretation, serves as a “discounting coefficient” in the linear-vacuous [122, §2.9.2] or ε-contamination model of uncertainty described by Augustin et al. [6, §4.7]. Discounting coefficients that do not generate lower probabilities have also been considered [13, 14].

With that in mind, these books on imprecise probability theory are recommended to statisticians and scientists not averse to theorems:

The work Augustin et al. [6], a unified collection of chapters by various experts, is mentioned out of chronological order since it provides an accessible entry to the topic.
Walley [122] launched the field, providing mathematical and intuitive arguments for representing uncertainty with imprecise probability.
- Walley [123] presents later developments by the same author.
The work Troffaes and de Cooman [118] includes many of the theorems in Walley [122] as well as more recent results.
2021 saw the publication of two books with unique perspectives on imprecise probability:
- Cuzzolin [37] not only presents two decades of research on a geometric approach but also reviews all major flavors of imprecise probability, citing over 2000 works. Discounting is discussed briefly [37, §4.3.6].
- Weirich [126] puts emphasis on utility functions.

7.4 Power Law Books

Power laws form the core of the big-picture models of self-similar fluctuations seen in Appendices B-C. Such models have been used to describe fluctuating rates of molecular evolution (Sect. 2.8). The recommended starting point is Taleb [111], which offers a lively introduction to power laws.

Lowen and Teich [84] use fractal stochastic processes to define point processes in order to model count data such as the substitutions plotted in Fig. 1.1. A special case of such point processes is the class of intermittent point processes of Appendix B, below. Appendix C provides examples of other cases (Sect. C.2).

West et al. [129] introduce much of the mathematical modeling sketched in Sect. C.1.

References

Abu-Jamous, B., R. Fa, and A.K. Nandi. 2015. Integrative Cluster Analysis in Bioinformatics. West Sussex: John Wiley & Sons.
Book Google Scholar
Augustin, T., F. Coolen, G. de Cooman, and M. Troffaes, eds. 2014. Introduction to Imprecise Probabilities. Wiley Series in Probability and Statistics. Hoboken: Wiley.
Google Scholar
Bausell, R.B. 2021. The Problem with Science: The Reproducibility Crisis and What to Do About It. Oxford: Oxford University Press.
Book Google Scholar
Bickel, D.R. 2019. Genomics Data Analysis: False Discovery Rates and Empirical Bayes Methods. New York: Chapman and Hall/CRC. https://davidbickel.com/genomics/.
Book Google Scholar
Bickel, D.R. 2020. Departing from Bayesian inference toward minimaxity to the extent that the posterior distribution is unreliable. Statistics & Probability Letters 164: 108802.
Article Google Scholar
Bickel, D.R. 2021a. Moderating probability distributions for unrepresented uncertainty: Application to sentiment analysis via deep learning. Communications in Statistics - Theory and Methods. https://doi.org/10.1080/03610926.2020.1863988.
Bickel, D.R. 2021b. Propagating uncertainty about molecular evolution models and prior distributions to phylogenetic trees. Working paper. https://doi.org/10.5281/zenodo.5810696.
Bickel, D.R. 2022a. Confidence distributions and empirical Bayes posterior distributions unified as distributions of evidential support. Communications in Statistics - Theory and Methods 51: 3142–3163.
Article Google Scholar
Bickel, D.R. 2022c. Propagating clade and model uncertainty to confidence intervals of divergence times and branch lengths. Molecular Phylogenetics and Evolution 167: 107357.
Article CAS Google Scholar
Bouckaert, R., J. Heled, D. Kühnert, T. Vaughan, C.H. Wu, D. Xie, M.A. Suchard, A. Rambaut, and A.J. Drummond. 2014. BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Computational Biology 10.
Google Scholar
Bromham, L. 2016. An Introduction to Molecular Evolution and Phylogenetics. Oxford: Oxford University Press.
Google Scholar
Carr, N. 2018. It’s not a bug, it’s a feature. Wired 26: 26.
Google Scholar
Cuzzolin, F. 2021. The Geometry of Uncertainty: The Geometry of Imprecise Probabilities. Artificial Intelligence: Foundations, Theory, and Algorithms. Cham: Springer.
Google Scholar
Drummond, A.J., and R.R. Bouckaert. 2015. Bayesian Evolutionary Analysis with BEAST. Cambridge: Cambridge University Press.
Book Google Scholar
Ewens, W.J., and G.R. Grant. 2001. Statistical Methods in Bioinformatics: An Introduction. Statistics for Biology and Health. Berlin: Springer.
Google Scholar
Felsenstein, J. 2004. Inferring Phylogenies. New York: Sinauer Associates.
Google Scholar
Hall, B. 2018a. Phylogenetic Trees Made Easy: A How-To Manual. New York: Sinauer Associates.
Google Scholar
Kumar, S., G. Stecher, M. Li, C. Knyaz, and K. Tamura. 2018. MEGA X: Molecular evolutionary genetics analysis across computing platforms. Molecular Biology and Evolution 35: 1547.
Article CAS Google Scholar
de Laplace, M. 2009. Essai philosophique sur les probabilités. Cambridge: Cambridge University Press.
Book Google Scholar
Lesk, A. 2019. Introduction to Bioinformatics. Oxford: Oxford University Press.
Google Scholar
Lowen, S., and M. Teich. 2005. New York: John Wiley & Sons Ltd.
Google Scholar
Nei, M., and S. Kumar. 2000. Molecular Evolution and Phylogenetics. Oxford: Oxford University Press.
Google Scholar
Kliman, R.M. 2016. Phylogenetic tree comparison. In Encyclopedia of Evolutionary Biology, 277–284. Oxford: Academic Press.
Google Scholar
Salemi, M., A. Vandamme, and P. Lemey (eds.) 2009. The Phylogenetic Handbook: A Practical Approach to Phylogenetic Analysis and Hypothesis Testing. Cambridge: Cambridge University Press.
Google Scholar
Stecher, G., K. Tamura, and S. Kumar. 2020. Molecular evolutionary genetics analysis (MEGA) for macOS. Molecular Biology and Evolution 37: 1237–1239.
Article CAS Google Scholar
Taleb, N. 2020. Statistical Consequences of Fat Tails: Real World Preasymptotics, Epistemology, and Applications. Austin: Scribe Media.
Google Scholar
Troffaes, M., and G. de Cooman. 2014. Lower Previsions. Wiley Series in Probability and Statistics. New York: Wiley.
Google Scholar
Walley, P. 1991. Statistical Reasoning with Imprecise Probabilities. London: Chapman and Hall.
Book Google Scholar
Walley, P. 2015. BI Statistical Methods, Vol. 1: Foundations. Dublin: Prescience Press.
Google Scholar
Weirich, P. 2021. Rational Choice Using Imprecise Probabilities and Utilities. Cambridge: Cambridge University Press.
Book Google Scholar
West, B.J., M. Bologna, and P. Grigolini. 2003. Physics of Fractal Operators. New York: Springer.
Book Google Scholar
Xia, X. 2020. A Mathematical Primer of Molecular Phylogenetics. New York: Chapman and Hall/CRC.
Book Google Scholar
Yang, Z. 2014. Molecular Evolution: A Statistical Approach. Oxford: Oxford University Press.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Informatics and Analytics, University of North Carolina at Greensboro, Greensboro, NC, USA
David Bickel

Authors

David Bickel
View author publications
You can also search for this author in PubMed Google Scholar

7.1 Electronic Supplementary Materials

Supplemental 1

ancestor uncertainty (xlsx 1021 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Bickel, D. (2022). Recommendations for Further Reading. In: Phylogenetic Trees and Molecular Evolution. SpringerBriefs in Systems Biology. Springer, Cham. https://doi.org/10.1007/978-3-031-11958-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-11958-3_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-11957-6
Online ISBN: 978-3-031-11958-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Recommendations for Further Reading

Abstract

Similar content being viewed by others