Abstract
Previous molecular phylogeny algorithms mainly rely onmulti-sequence alignments of cautiously selected characteristic sequences,thus not directly appropriate for whole genome phylogeny where eventssuch as rearrangements make full-length alignments impossible. Weintroduce here the concept of Complete Information Set (CIS) and itsmeasurement implementation as evolution distance without reference tosizes. As method proof-test, the 16s rRNA sequences of 22 completelysequenced Bacteria and Archaea species are used to reconstruct aphylogenetic tree, which is generally consistent with the commonlyaccepted one. Based on whole genome, our further efforts yield a highlyrobust whole genome phylogenetic tree, supporting separate monophyleticcluster of species with similar phenotype as well as the early evolution ofthermophilic Bacteria and late diverging of Eukarya. The purpose of thiswork is not to contradict or confirm previous phylogeny standards butrather to bring a brand-new algorithm and tool to the phylogeny researchcommunity. The software to estimate the sequence distance and materialsused in this study are available upon request to corresponding author.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Koonin, E.V.: The Emerging Paradigm and Open Problems in Comparative Genomics, Bioinformatics 15(1999), 265–266.
Woese, C.R., Kandler, O. and Wheelis, M.L.: Towards a Natural System of Organisms: Proposal for the Domains Archaea, Bacteria, and Eucarya, Proc. Natl. Acad. Sci. USA 87(1990), 4576–4579.
Doolittle, W.F. and Logsdon, J.M., Jr.: Archaeal Genomics: Do Archaea have a Mixed Heritage? Curr. Biol. 8(1998), R209–211.
Woese, C.: The Universal Ancestor, Proc. Natl. Acad. Sci. USA 95(1998), 6854–6859.
Nomura, M.: Engineering of Bacterial Ribosomes: Replacement of all Seven Escherichia colirRNA Operons by a Single Plasmid-Encoded Operon, Proc. Natl. Acad. Sci. USA 96(1999), 1820–1822.
Pennisi, E.: Is it Time to Uproot the Tree of Life? Science 284(1999), 1305–1307.
Boore, J.L. and Brown, W.M.: Big Trees from Little Genomes: Mitochondrial Gene Order as a Phylogenetic Tool, Curr. Opin. Genet. Dev. 8(1998), 668–674.
Snel, B., Bork, P. and Huynen, M.A.: Genome Phylogeny Based on Gene Content, Nat. Genet. 21(1999), 108–110.
Lin, J. and Gerstein, M.: Whole-Genome Trees based on the Occurrence of Folds and Orthologs: Implications for Comparing Genomes on Different Levels, Genome Res. 10(2000), 808–818.
Brown, J.R., Douady, C.J., Italia, M.J., Marshall, W.E. and Stanhope, M.J.: Universal Trees based on Large Combined Protein Sequence Data Sets, Nat. Genet. 28(2001), 281–285.
Li, M. et al.: An Information-Based Sequence Distance and its Application to Whole Mitochondrial Genome Phylogeny, Bioinformatics 17(2001), 149–154.
Hariri, A., Weber, B. and Olmsted, J.: 3rd. On the Validity of Shannon-Information Calculations for Molecular Biological Sequences, J. Theor. Biol. 147(1990), 235–254.
Fang, W.W.: The Characterization of a Measure of Information Discrepancy, Information 125(2000), 207–252.
Fang, W.W.: On a Global Optimization Problem in the Study of Information Discrepancy, J. Global Optimization 11(1997), 387–408.
Kullback, S.: Information Theory and Statistics, Wiley, New York, 1959.
Saitou, N. and Nei, M.: The Neighbor-Joining Method: A new Method for Reconstructing Phylogenetic Trees, Mol. Biol. Evol. 4(1987), 406–425.
Efron, B., Halloran, E. and Holmes, S.: Bootstrap Confidence Levels for Phylogenetic Trees, Proc. Natl. Acad. Sci. USA 93(1996), 13429–13434.
Thompson, J.D., Higgins, D.G. and Gibson, T.J.: CLUSTAL W: Improving the Sensitivity of Progressive Multiple Sequence Alignment through SequenceWeighting, Position-Specific Gap Penalties and Weight Matrix Choice, Nucleic Acids Res. 22(1994), 4673–4680.
Hillis, D.M., Huelsenbeck, J.P. and Swofford, D.L.: Hobgoblin of Phylogenetics? Nature 369(1994), 363–364.
Russo, C.A., Takezaki, N. and Nei, M.: Efficiencies of Different Genes and Different Tree-Building Methods in Recovering a Known Vertebrate Phylogeny, Mol. Biol. Evol. 13(1996), 525–536.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Li, W., Fang, W., Ling, L. et al. Phylogeny Based on Whole Genome as inferred from Complete Information Set Analysis. Journal of Biological Physics 28, 439–447 (2002). https://doi.org/10.1023/A:1020316706928
Issue Date:
DOI: https://doi.org/10.1023/A:1020316706928