Abstract
LetR n denote the set of rooted trees withn leaves in which: the leaves are labeled by the integers in {1, ...,n}; and among interior vertices only the root may have degree two. Associated with each interior vertexv in such a tree is the subset, orcluster, of leaf labels in the subtree rooted atv. Cluster {1, ...,n} is calledtrivial. Clusters are used in quantitative measures of similarity, dissimilarity and consensus among trees. For anyk trees inR n , thestrict consensus tree C(T 1, ...,T k ) is that tree inR n containing exactly those clusters common to every one of thek trees. Similarity between treesT 1 andT 2 inR n is measured by the numberS(T 1,T 2) of nontrivial clusters in bothT 1 andT 2; dissimilarity, by the numberD(T 1,T 2) of clusters inT 1 orT 2 but not in both. Algorithms are known to computeC(T 1, ...,T k ) inO(kn 2) time, andS(T 1,T 2) andD(T 1,T 2) inO(n 2) time. I propose a special representation of the clusters of any treeT R n , one that permits testing in constant time whether a given cluster exists inT. I describe algorithms that exploit this representation to computeC(T 1, ...,T k ) inO(kn) time, andS(T 1,T 2) andD(T 1,T 2) inO(n) time. These algorithms are optimal in a technical sense. They enable well-known indices of consensus between two trees to be computed inO(n) time. All these results apply as well to comparable problems involving unrooted trees with labeled leaves.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
ADAMS, E. N., III (1972), “Consensus Techniques and the Comparison of Taxonomic Trees,”Systematic Zoology, 21, 390–397.
AHO, A. V., HOPCROFT, J. E., and ULLMAN, J. D. (1974),The Design and Analysis of Computer Algorithms, Reading, Massachusetts: Addison-Wesley.
BOURQUE, M. (1978), “Arbres de Steiner et Réseaux dont Certains Sommets sont à Localisation Variable,” Ph.D. dissertation, Université de Montréal, Quebec, Canada.
BROWN, E. K., and DAY, W. H. E. (1984), “A Computationally Efficient Approximation to the Nearest Neighbor Interchange Metric,”Journal of Classification, 1, 93–124.
CAVALLI-SFORZA, L. L., and EDWARDS, A. W. F. (1967), “Phylogenetic Analysis Models and Estimation Procedures,”American Journal of Human Genetics, 19, 233–257.
COLLESS, D. H. (1980), “Congruence between Morphometric and Allozyme Data forMenidia Species: A Reappraisal,”Systematic Zoology, 29, 288–299.
DAY, W. H. E. (1983), “The Role of Complexity in Comparing Classifications,”Mathematical Biosciences, 66, 97–114.
HARARY, F. (1969),Graph Theory, Reading, Massachusetts: Addison-Wesley.
HENDY, M. D., LITTLE, C. H. C., and PENNY, D. (1984), “Comparing Trees with Pendant Vertices Labelled,”SIAM Journal on Applied Mathematics Theory, 44, 1054–1065.
MARCZEWSKI, E., and STEINHAUS, H. (1958), “On a Certain Distance of Sets and the Corresponding Distance of Functions,”Colloquium Mathematicum, 6, 319–327.
MARGUSH, T. (1982), “Distances Between Trees,”Discrete Applied Mathematics, 4, 281–290.
MARGUSH, T., and McMORRIS, F.R. (1981), “Consensus n-Trees,”Bulletin of Mathematical Biology, 43, 239–244.
McMORRIS, F.R., MERONK, D.B., and NEUMANN, D.A. (1983), “A View of some Consensus Methods for Trees,” inNumerical Taxonomy: Proceedings of a NATO Advanced Study Institute, ed. J. Felsenstein, Berlin: Springer-Verlag, 122–126.
McMORRIS, F.R., and NEUMANN, D. (1983), “Consensus Functions Defined on Trees,”Mathematical Social Sciences, 4, 131–136.
MICKEVICH, M.F. (1978), “Taxonomic Congruence,”Systematic Zoology, 27, 143–158.
NELSON, G. (1979), “Cladistic Analysis and Synthesis: Principles and Definitions, with a Historical Note on Adanson'sFamilles des Plantes (1763–1764),”Systematic Zoology, 28, 1–21.
NELSON, G., and PLATNICK, N. (1981),Systematics and Biogeography: Cladistics and Vicariance, New York: Columbia University Press.
NEUMANN, D.A. (1983), “Faithful Consensus Methods for n-Trees,”Mathematical Biosciences, 63, 271–287.
RESTLE, F. (1959), “A Metric and an Ordering on Sets,”Psychometrika, 24, 207–220.
ROBINSON, D.F. (1971), “Comparison of Labeled Trees with Valency Three,”Journal of Combinatorial Theory, 11, 105–119.
ROBINSON, D.F., and FOULDS, L.R. (1981), “Comparison of Phylogenetic Trees,”Mathematical Biosciences, 53, 131–147.
ROHLF, F.J. (1982), “Consensus Indices for Comparing Classifications,”Mathematical Biosciences, 59, 131–144.
ROHLF, F.J. (1983), “Numbering Binary Trees with Labeled Terminal Vertices,”Bulletin of Mathematical Biology, 45, 33–40.
SCHUH, R.T., and FARRIS, J.S. (1981), “Methods for Investigating Taxonomic Congruence and Their Application to the Leptopodomorpha,”Systematic Zoology, 30, 331–351.
SHAO, K. (1983), “Consensus Methods in Numerical Taxonomy,” Ph.D. dissertation, State University of New York, Stony Brook, New York.
SOKAL, R.R., and ROHLF, F.J. (1981), “Taxonomic Congruence in the Leptopodomorpha Re-examined,”Systematic Zoology, 30, 309–325.
STANDISH, T.A. (1980),Data Structure Techniques, Reading, Massachusetts: Addison-Wesley.
STINEBRICKNER, R. (1984), “s-Consensus Trees and Indices,”Bulletin of Mathematical Biology, 46, 923–935.
TATENO, Y., NEI, M., and TAJIMA, F. (1982), “Accuracy of Estimated Phylogenetic Trees from Molecular Data I. Distantly Related Species,”Journal of Molecular Evolution, 18, 387–404.
WATERMAN, M.S., and SMITH, T.F. (1978), “On the Similarity of Dendrograms,”Journal of Theoretical Biology, 73, 789–800.
WEIDE, B. (1977), “A Survey of Analysis Techniques for Discrete Algorithms,”Computing Surveys, 9, 291–313.
Author information
Authors and Affiliations
Additional information
The Natural Sciences and Engineering Research Council of Canada partially supported this work with grant A-4142.
Rights and permissions
About this article
Cite this article
Day, W.H.E. Optimal algorithms for comparing trees with labeled leaves. Journal of Classification 2, 7–28 (1985). https://doi.org/10.1007/BF01908061
Issue Date:
DOI: https://doi.org/10.1007/BF01908061