Simple diagnostic statistical tests of models for DNA substitution

Goldman, Nick

doi:10.1007/BF00182751

Simple diagnostic statistical tests of models for DNA substitution

Published: December 1993

Volume 37, pages 650–661, (1993)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

Simple diagnostic statistical tests of models for DNA substitution

Download PDF

Nick Goldman¹

198 Accesses
47 Citations
Explore all metrics

Abstract

The accuracy of models for DNA substitution used in phylogenetic analyses is becoming more important with the increasing availability and analysis of molecular sequence data. It is natural to look for ways of improving these models, and to do this in a planned manner it is useful to be able to identify features of sequences that may not be described adequately. In this paper, I describe three statistics which may give useful diagnostic information on departures from models' predictions. The statistical distributions of these statistics are discussed and simple significance tests are derived. These tests are based on the (estimated) phylogeny of the sequences and so have the advantage of using the information contained in this tree. Examples are given of the application of the new tests to Markov chain models describing the evolution of primate pseudogene sequences and small-subunit RNA sequences.

Abbreviations

b(N,p) :: binomial distribution of N trials, each with probability p of success
m(N,p ₁,p ₂, ..., p _r):: multinomial distribution of N trials, with r possible outcomes having probabilities p ₁, p ₂, ..., p_r, respectively
N(μ, σ²):: Normal distribution with mean μ and variance σ²
p(λ):: Poisson distribution with mean λ
bp:: base pairs
cdf:: cumulative distribution function
i.i.d.:: independent, identical distribution

References

Bishop MJ, Friday AE (1985) Evolutionary trees from nucleic acid and protein sequences. Proc R Soc Lond B 226:271–302
Google Scholar
Cavender JA (1989) Mechanized derivation of linear invariants. Mol Biol Evol 6:301–316
Google Scholar
Cox DR (1961) Tests of separate families of hypotheses. Proceedings of the 14th Berkeley Symposium (University of California Press) 1:105–123
Google Scholar
Cox DR (1962) Further results on tests of separate families of hypotheses. J R Statist Soc B 24:406–424
Google Scholar
Feller W (1968) An introduction to probability theory and its applications, 3rd ed. John Wiley, New York, pp 153–154, 167–168,179–186
Google Scholar
Felsenstein J (1973) Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet 25:471–492
Google Scholar
Felsenstein J (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol 17:368–376
CAS PubMed Google Scholar
Fitch WM, Margoliash E (1967) A method for estimating the number of invariant amino acid coding positions in a gene using cytochrome c as a model case. Biochem Genet 1:65–71
Google Scholar
Fitch WM, Markowitz E (1970) An improved method for determining codon variability in a gene and its application to the rate of fixations of mutations in evolution. Biochem Genet 4:579–593
Google Scholar
Gillespie JH (1989) Lineage effects and the index of dispersion of molecular evolution. Mol Biol Evol 6:636–647
Google Scholar
Golding B, Felsenstein J (1990) A maximum likelihood approach to the detection of selection from a phylogeny. J Mol Evol 31:511–523
Google Scholar
Goldman N (1990) Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst Zool 39:345–361
Google Scholar
Goldman N (1993) Statistical tests of models of DNA substitution. J Mol Evol 36:182–198
Google Scholar
Hasegawa M, Kishino H, Yano T (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
CAS PubMed Google Scholar
Hasegawa M, Kishino H, Yano T (1987) Man's place in Hominoidea as inferred from molecular clocks of DNA. J Mol Evol 26:132–147
Google Scholar
Hasegawa M, Kishino H, Yano T (1989) Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea. J Hum Evol 18:461–476
Google Scholar
Holmes EC, Pesole G, Saccone C (1989) Stochastic models of molecular evolution and the estimation of phylogeny and rates and nucleotide substitution in the hominoid primates. J Hum Evol 18:775–794
Google Scholar
Johnson NL, Kotz S (1977) Urn models and their application. John Wiley, New York, pp 107–113
Google Scholar
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro HN (ed) Mammalian protein metabolism, vol 3. Academic Press, New York, pp 21–132
Google Scholar
Kimura M (1983) The neutral theory of molecular evolution. Cambridge University Press, Cambridge, pp 65–89
Google Scholar
Kishino H, Hasegawa M (1989) Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J Mol Evol 29:170–179
CAS PubMed Google Scholar
Koop BF, Goodman M, Xu P, Chan K, Slightom JL (1986) Primate eta-globin DNA sequences and man's place among the great apes. Nature 319:234–238
Google Scholar
Lanave C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93
CAS PubMed Google Scholar
Lindgren BW (1976) Statistical theory, 3rd ed. Macmillan, New York, pp 487–489, 494–495
Google Scholar
Navidi WC, Churchill GA, von Haeseler A (1991) Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. Mol Biol Evol 8:128–143
Google Scholar
Palumbi SR (1989) Rates of molecular evolution and the fraction of nucleotide positions free to vary. J Mol Evol 29:180–187
Google Scholar
Penny D, Hendy MD, Steel MA (1992) Progress with methods for constructing evolutionary trees. TREE 7:73–79
Google Scholar
Ripley BD (1987) Stochastic simulation. John Wiley, New York, pp 170–178
Google Scholar
Swofford DL, Olsen GJ (1990) Phylogeny reconstruction. In: Hillis DM, Moritz C (eds) Molecular systematics. Sinauer, Sunderland MA, pp 411–502
Google Scholar
Yang Z (1992) Variations in evolutionary rates and estimation of evolutionary distances of DNA sequences. PhD thesis, Beijing Agricultural University, Beijing
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratory of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, NW7 IAA, London, England
Nick Goldman

Authors

Nick Goldman
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Goldman, N. Simple diagnostic statistical tests of models for DNA substitution. J Mol Evol 37, 650–661 (1993). https://doi.org/10.1007/BF00182751

Download citation

Received: 01 August 1992
Revised: 27 May 1993
Accepted: 19 June 1993
Issue Date: December 1993
DOI: https://doi.org/10.1007/BF00182751

Key words

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Simple diagnostic statistical tests of models for DNA substitution

Abstract

Article PDF

Similar content being viewed by others

Estimating Phylogenetic Trees

bModelTest: Bayesian phylogenetic site model averaging and model comparison

Identifying Optimal Models of Evolution

Abbreviations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

Simple diagnostic statistical tests of models for DNA substitution

Abstract

Article PDF

Similar content being viewed by others

Estimating Phylogenetic Trees

bModelTest: Bayesian phylogenetic site model averaging and model comparison

Identifying Optimal Models of Evolution

Abbreviations

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation