1 Introduction and motivation

In the late 1980s, Cavender and Felsenstein (1987) and Lake (1987) introduced the idea of phylogenetic invariants; a class of polynomials useful in the study of phylogenetic trees. In subsequent years, these polynomials have proven useful for studying analytical questions of identifiability (Allman and Rhodes 2003) and for identifying local maximum likelihood optima (Chor et al. 2000). However, beginning with the earliest simulation studies (Hillis et al. 1994), there has been doubt as to the statistical effectiveness of phylogenetic invariants for inference of phylogenetic trees from data sets.

Allman and Rhodes (2003, 2008) renewed interest in phylogenetic invariants. They took the point of view of algebraic geometry to give a comprehensive description of these polynomials and lay out several open questions [some of which have subsequently been solved (Allman et al. 2014; Bates and Oeding 2011; Friedland 2013; Friedland and Gross 2012)]. Concurrently, Sumner et al. (2008) suggested an alternative perspective on algebraic methods as applied to phylogenetics. From this perspective, group representation theory (symmetries and transformations) takes center stage, leading to the study of a different set of polynomials of special interest, the Markov invariants. In contrast to phylogenetic invariants, the definition of Markov invariants is detached from the notion of a phylogenetic tree; rather they are the polynomial invariants for the matrix group induced by the action of Markov matrices. As such, the application of Markov invariants to the context of phylogenetics comes only after consideration of the specific tree structures underlying phylogenetic models. In this vein, Sumner and Jarvis (2009) showed how leaf permutation symmetries on a quartet tree, for example, can be used to bring Markov invariants into phylogenetics proper; effectively showing there are phylogenetic invariants lurking within the ring of Markov invariants applicable to this case. Recently, both perspectives have been applied to inferring phylogenetic trees by Casanellas and Fernández-Sánchez (2010) and Holland et al. (2013), with further promising results given by Fernández-Sánchez and Casanellas (2015).

Most likely due to the disjointed historical development of these polynomial functions, there is some confusion, already clear in the paragraph above, regarding the use of “invariant” as applied to both phylogenetic and Markov invariants. In the literature, “phylogenetic invariant” is used to refer to any polynomial which vanishes on all distributions arising from a subset of phylogenetic tree topologies (understood as leaf-labelled trees). If the subset is proper, the phylogenetic invariant is referred to as “tree informative”. We, however, prefer to use “invariant” in the more mathematically traditional sense to mean invariant under an invertible transformation [c.f. classical invariant theory (Olver 2003)]. We argue that in the phylogenetic context, the relevant transformations are adjustments of model parameters and leaf permutations of trees. To avoid confusion, we follow Draisma and Kuttler (2008) and refer to any polynomial which is useful for identifying tree topology as a phylogenetic identity. In contrast, we say a polynomial is a Markov invariant (Sumner et al. 2008) if the polynomial itself (rather than its particular value on subsets of distributions) is invariant under adjustment of model parameters on a phylogenetic tree (the precise meaning of this distinction is made clear in Sect. 2). Formally, these polynomials are invariant under a specific action of a group of invertible transformations (at least “relatively”, that is, they may attract a transformation constant). Clearly distinguishing Markov invariants from phylogenetic identities is crucial in what follows.

Given that phylogenetic identities arise solely from algebraic conditions on phylogenetic probability distributions, we argue it is also essential to consider the statistical structure of inference methods constructed using these polynomials more carefully than has previously appeared in the literature. Toward this end, we provide a comprehensive discussion, including both analytical and statistical arguments and a comparison of the algebraic geometry and representation theory perspectives, of using the phylogenetic identities for the inference of phylogenetic trees. To simplify the discussion, we focus on the most elementary case: quartet trees with a binary state space. We argue that the representation theoretic point of view and the ideas underlying Markov invariants provide significant guidance as to how to construct statistically powerful methods of phylogenetic inference.

Binary state spaces have long been of theoretical interest in the study of phylogenetic methods as the mathematical properties of two-state models are often more tractable, and yet the results are still illuminating about general phylogenetic principles. We also note that recently there has been increased interest in binary data from an applied point of view due to the widespread availability of bi-allelic single nucleotide polymorphism (SNP) datasets derived from modern genome-wide sequencing technologies (Davey et al. 2011; Lemmon and Lemmon 2013).

Our discussion is unified through two notions of symmetry that naturally arise in phylogenetics. In Sect. 2 we develop these and refer to them as “leaf symmetries” and the “Markov action”. In Sect. 3 we argue that any inference method that seeks to infer tree topology alone (as is typical of phylogenetic identity methods) should respect both of these symmetries. We show that respect for leaf permutation symmetry is something that can (and should) be imposed upon any tree inference method based on phylogenetic identities. Additionally, demanding the method respect the Markov action symmetry leads directly to the definition of Markov invariants, with our main example constructed in Sect. 4. An ideal situation arises in the quartet case: we show that imposing the leaf permutation symmetry upon the Markov invariants identifies a specific subset of phylogenetic identities, which in turn leads to a unique choice of identities to apply to quartet tree inference.

In Sect. 5 we discuss the properties of the edge identities; especially in relation to the three statistical properties given in Sect. 3. We provide a detailed examination of the behaviour of the edge identities under leaf permutation symmetries and, as for the squangles, derive semi-algebraic constraints for their behaviour under the assumption of a continuous-time Markov chain.

Along with our theoretical arguments for considering polynomials which respect these two symmetries, we also use these symmetries to develop a statistical decision rule for tree inference (via residual sums of squares). In Sect. 6, we provide simulation studies which illustrate both the practical importance of these ideas and that the naive application of phylogenetic identities (like that given by Cavender and Felsenstein (1987)) can be statistically biased and not nearly as powerful as our approach motivated by the symmetries inherent to the problem.

In Sect. 7, we conclude with a discussion of how these ideas apply directly to models with more than binary states, with specific results presented for the four state (DNA) case. In particular, we find that it is only in the binary case that the Markov invariants (squangles) lie in the same space of polynomials as the phylogenetic identities (edge identities). Thus in the case of models with greater than two states, the attractive transformation properties of the Markov invariants become a missed opportunity if one restricts attention to edge identities (as is advocated by Casanellas and Fernández-Sánchez (2010)). This result is derived using representation theoretical techniques [particularly group characters (Jarvis and Sumner 2014)] for which full derivations are provided in the Appendix (Online Resource 4).

2 Background

In phylogenetic inference, the topology of the evolutionary tree is difficult to determine correctly and is often the unknown parameter which is the most biologically important. It is well known that it is enough to correctly identify all the quartet trees corresponding to all subsets of four taxa in order to determine the overall phylogenetic tree. Thus correct identification of a single quartet topology remains a point of considerable mathematical interest, and is the focus of the work presented here.

Remark 1

Throughout this paper we will exclusively consider phylogenetic quartet inference methods that, given aligned sequence data on four taxa as input, solely return confidence in each of the three possible quartet tree topologies. For methods (such as maximum likelihood) that usually also return estimates of evolutionary divergence times or other model parameters, we will consider the topology to be the only output.

2.1 Taxon permutations and leaf symmetries

When discussing four general taxa, we label them ABCD; and when we want to discuss a fixed order on the taxa we label them 1, 2, 3, 4. This gives us a natural way to talk both about the three different quartet trees and equivalent trees using the common split notation. In this notation, the three distinct quartet trees are \(T_1 = 12|34\), \(T_2 = 13|24\) and \(T_3 = 14|23\), where formally \(ij|kl\equiv \{\{i,j\},\{k,l\}\}\) is a bipartition of the set \(\{1,2,3,4\}\). Each quartet has symmetries under leaf permutations which are captured by the equalities \(12|34 = 21|34 = 34|12\ldots \) etc. These different representations of the same quartet are of practical importance if one considers the application of a phylogenetic method (usually via some computer software) on the taxon set \(\{A,B,C,D\}\) with output one of the quartets \(T_1,T_2,T_3\). For instance, if the list of taxa in the ordering ABCD leads to \(T_1\) we would expect the alternative input ordering ACBD to return the quartet \(T_2\) (since B now corresponds to 3, and C to 2), and the alternative input order DCAB to also return \(T_1\) via the correspondence \(12|34 = 43|12\).

Such changes in taxon ordering can be understood as the symmetric group \(\mathfrak {S}_4\) permuting the four taxa in the natural way, thereby inducing permutations of the three possible quartet trees. For example, the taxon permutation \((13)\in \mathfrak {S}_4\) fixes \(T_2\) and interchanges \(T_1\leftrightarrow T_3\). From the perspective of phylogenetic quartet inference, we account for this redundancy by considering the subgroup of \(\mathfrak {S}_4\) that fixes a given quartet. For example, \(T_1\) is invariant under the action of the subgroup of \(\mathfrak {S}_4\) consisting of the permutations which we refer to as the stabilizer of \(T_1\):

$$\begin{aligned} \text {Stab}(T_1)=\{e,(12),(34),(12)(34),(13)(24),(14)(23),(1324),(1423)\}. \end{aligned}$$

It is an easy exercise to write down the stabilizer subgroups for \(T_2\) and \(T_3\).

To understand the importance of these observations, consider the “black box” view of a phylogenetic quartet method, where the black box (in the form of a computer programFootnote 1) takes an ordered set of taxon sequences ABCD, and returns one of the three possible quartets \(T_1,T_2\) or \(T_3\). To say that the method “respects” the permutation symmetries explained above is to demand that the method behaves in the appropriate way given a permutation of the input sequences such as BACD corresponding to (12), or CDAB corresponding to the permutation (13)(24). We ensure that the phylogenetic methods we develop in this paper respect these quartet tree leaf permutation symmetries.

2.2 Tensors and group actions

The data we consider are frequency arrays \(F=\left( f_{ijkl}\right) \) arising from an alignment of four binary \(\{0,1\}\) sequences, where \(f_{ijkl}\) is the number of times we observe the pattern of states ijkl for sequence 1, 2, 3, 4, respectively.

We model this data by assuming F arises under multinomial (independent) sampling from a distribution \(P=(p_{ijkl})\) which itself is constructed from a binary Markov chain on a quartet tree, where \(p_{ijkl}\) is the probability of observing the binary states \(i,j,k,l\in \{0,1\}\) at the leaves 1, 2, 3, 4 of the tree, respectively. In the next section we discuss the construction of such P in detail; for the moment we wish to consider the generic structural properties of P irrespective of whether P arises as a probability distribution on a tree or not.

Considering \(P=(p_{ijkl})\) as a \(2\times 2\times 2\times 2\) array of numbers, and taking \(\{e_1=\left[ {\begin{matrix}1\mathrm{}\\ 0\end{matrix}} \right] ,e_2=\left[ {\begin{matrix}0\\ 1\end{matrix}}\right] \}\) as a basis for \(\mathbb {C}^2\), allows us to treat P more formally as belonging to the \(2^4=16\) dimensional tensor product space

$$\begin{aligned} U:= \mathbb {C}^2\otimes \mathbb {C}^2\otimes \mathbb {C}^2\otimes \mathbb {C}^2 =\left\{ \sum _{i,j,k,l\in \{0,1\}}p_{ijkl}e_i\otimes e_j\otimes e_k\otimes e_l: p_{ijkl}\in \mathbb {C}\right\} . \end{aligned}$$

Of course, P has all real and non-negative components so P actually belongs to a stochastic subset of this space. However, algebraically it is convenient to work over the complex numbers in what follows. When speaking abstractly we refer to a general member of U as a tensor, and when we want to emphasize that the components in the array should be considered as probabilities, we will refer to it as a distribution.

The taxon permutations discussed in the previous section act naturally on tensors \(P\in U\) via permutation of the indices of \(p_{ijkl}\). To be concrete, suppose \(\sigma \in \mathfrak {S}_4\) is a permutation, then we have the action \(P\mapsto \sigma P\) defined via the coordinate transformation \(p_{i_1i_2i_3i_4}\mapsto p_{i_{\sigma (1)}i_{\sigma (2)}i_{\sigma (3)}i_{\sigma (4)}}\).

Another key mathematical feature of working with tensor product spaces, essential to our derivations, is the natural action of the general linear group \(\text {GL}(2)\) on each factor of the tensor product space, described as follows. Recall that \(\text {GL}(2)\) is the group of \(2\times 2\) invertible matrices with entries taken from \(\mathbb {C}\), that is

$$\begin{aligned} \text {GL}(2)= \left\{ A= \left[ \begin{array}{cc} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \end{array} \right] {:}\;a_{11},a_{12},a_{21},a_{22}\in \mathbb {C},\det (A)\ne 0 \right\} . \end{aligned}$$

Recall also that \(\text {GL}(2)\) acts on column vectors \(v=[v_1,v_2]^T\in \mathbb {C}^2\) via \(v\mapsto Av\) or, equivalently, in component form: \(v_i\mapsto \sum _{i'\in \{0,1\}} a_{ii'}v_{i'}\). This action extends to U by taking four matrices \(A,B,C,D\in \text {GL}(2)\) and defining an analogous rule for tensor component transformations:

$$\begin{aligned} p_{ijkl}\mapsto \sum _{ i',j',k',l'\in \{0,1\}} a_{ii'}b_{jj'}c_{kk'}d_{ll'}p_{i'j'k'l'}. \end{aligned}$$

This provides an action of the direct product group \(\times ^4 \text {GL}(2)\equiv \text {GL}(2) \times \text {GL}(2) \times \text {GL}(2) \times \text {GL}(2)\) expressed in tensor form as the mapping \(P\mapsto A\otimes B\otimes C\otimes D\cdot P\).

In what follows, we consider the actions of both \(\mathfrak {S}_4\) and \(\times ^4\text {GL}(2)\) on tensors \(P\in U\). For the former with \(\sigma \in \mathfrak {S}_4\) we will generically write \(P\mapsto \sigma \cdot P\), and for the latter with \(g=A\otimes B\otimes C\otimes D\in \times ^4\text {GL}(2)\) we will generically write \(P\mapsto g\cdot P\). Although in this notation there is ambiguity between which group action we are applying, we will resolve this in all cases by providing the necessary context.

2.3 Tree tensors, clipped tensors

We will say that M is a (\(2 \times 2\)) Markov matrix if

$$\begin{aligned} M= \left[ \begin{array}{cc} 1-a_{21} &{} a_{12} \\ a_{21} &{} 1-a_{12} \end{array} \right] , \end{aligned}$$

where \(0 \le a_{12},a_{21} \le 1\) are the probabilities of state changes \(0\rightarrow 1\) and \(1\rightarrow 0\), respectively. We consider the rooted version of the quartet tree \(T_1\) obtained by placing an additional vertex (the “root”) on the internal edge of \(T_1\). We label each edge of the tree by the subset of leaves descendant to the edge. Let \(\pi =[\pi _1,\pi _2]^T\) be a probability distribution (that is, \(\pi _i> 0\) and \(\pi _1+\pi _2=1\)), and let \(M_{e}=(m_{ij}^{(e)})\) be a collection of Markov matrices indexed by the edges \(e\in T_1\).Footnote 2 We set

$$\begin{aligned} p^{(1)}_{ijkl}=\sum _{x,y,r\in \{0,1\}} m^{(1)}_{ix}m^{(2)}_{jx}m^{(3)}_{ky}m^{(4)}_{ly}m^{(12)}_{xr}m^{(34)}_{yr}\pi _r. \end{aligned}$$

Under this construction, the tensor \(P_1=(p^{(1)}_{ijkl})\) corresponds to the standard construction of a probability distribution arising from the Markov process on \(T_1\) [as described in textbooks such as Felsenstein (2004)]. Additionally, a well-known result [a generalization of Felsenstein’s “pulley-principle” (Felsenstein 1981)] shows it is possible to adjust the free parameters in this expression such that we can move the root of \(T_1\) to anywhere we please, whilst fixing the distribution \(P_1\). Motivated by this:

Definition 2.1

We say that a tensor \(P_1\) is a tree tensor corresponding to the quartet \(T_1=12|34\) if \(P_1\) arises under the construction just given, for any choice of Markov matrices, root distribution, and root placement. Similarly, we say that \(P_2\) and \(P_3\) are tree tensors corresponding to the quartets \(T_2=13|24\) and \(T_3=14|23\) if they arise in the analogous way on the remaining two quartets.

We now connect this construction to our description of the natural action of \(\times ^4\text {GL}(2)\) on U described in the previous section. We do this by defining, for any fixed tree tensor \(P_i\), the clipped tensor \(\widetilde{P}_i\), which is obtained by setting each Markov matrix on the leaf edges of the quartet to be equal to the identity matrix. In this way, generically we have (for example):

$$\begin{aligned} \widetilde{p}^{(1)}_{ijkl}= \left\{ \begin{array}{ll} \sum _{r\in \{0,1\}} m^{(12)}_{ir}m^{(34)}_{kr}\pi _r,&{} \text { if }i=j\text { and }k=l;\\ 0,&{} \text { otherwise}. \end{array} \right. \end{aligned}$$
(2.1)

From the definitions given in the previous section, we can now write

$$\begin{aligned} P_1=M_1\otimes M_2\otimes M_3\otimes M_4\cdot \widetilde{P}_1, \end{aligned}$$

and consider \(P_1\) as arising from the clipped tensor \(\widetilde{P}\) under the action of \(\times ^4\text {GL}(2)\) (provided we make the additional assumption that each of the Markov matrices \(M_1,M_2,M_3, M_4\) is invertible and hence belongs to \(\text {GL}(2)\)). This motivates:

Definition 2.2

The Markov group \(\mathcal {M}_2\) is the set of matrices:

$$\begin{aligned} \mathcal {M}_2= \left\{ M= \left[ \begin{array}{cc} 1-a_{21} &{} a_{12} \\ a_{21} &{} 1-a_{12} \end{array} \right] : a_{12},a_{21}\in \mathbb {C}, \det (M)\ne 0 \right\} . \end{aligned}$$

Notice we have removed the stochastic constraints on the matrix entries so that \(\mathcal {M}_2\) is a proper subgroup of \(\text {GL}(2)\) (as is easy to verify).

While this perspective excludes tree tensors constructed using non-invertible Markov matrices, this is not a serious objection since, from a modelling perspective, we prefer to take the point of view of continuous-time Markov chains where all relevant Markov matrices are invertible (since they occur as matrix exponentials). In any case, within the set of Markov matrices the subset with zero determinant is of measure zero and hence we may assume that any Markov matrix occurring in practice (in a sufficiently random way) will indeed belong to \(\mathcal {M}_2\). Thus we may consider tree tensors \(P_i\) as arising under the action of \(\times ^4 \mathcal {M}_2\), as a subgroup of \(\times ^4 \text {GL}(2)\), on clipped tensors \(\widetilde{P}_i\).

2.4 Markov action

Conceptually, we can extend the notion of the action of \(\times ^4\mathcal {M}_2\) on clipped tensors \(\widetilde{P}_i\) to an action on all tensors in U. Of particular importance is the following: if \(P_1\in U\) is a tree tensor and \(M_1,M_2,M_3,M_4\in \mathcal {M}_2\) are Markov matrices, we can interpret the action

$$\begin{aligned} P_1\mapsto M_1\otimes M_2\otimes M_3\otimes M_4\cdot P_1 \end{aligned}$$

as corresponding to lengthening the leaves of the phylogenetic tree. Of course this interpretation works for any tensor \(P\in U\) (whether P is a tree tensor or otherwise).

Definition 2.3

The Markov action is the group action of \(\times ^4\mathcal {M}_2\) on U obtained by restricting each copy of \(\text {GL}(2)\) in \(\times ^4\text {GL}(2)\) to the Markov group \(\mathcal {M}_2\).

Importantly, this action encodes the conditional independence of Markov evolution across lineages; and, if P happens to be a tree tensor, this action preserves the underlying tree topology. In other words, the Markov action provides a symmetry on the set of quartet tree tensors. Connecting this with our previously discussed black box view, where a quartet method is assumed to estimate tree topology only, we see that the Markov action is essentially a nuisance parameter that ideally the method should be insensitive to.

2.5 Markov invariants

With the Markov action in hand we can now formally define the polynomials that are our main interest in this paper. This class of polynomials was first defined and explored by Sumner et al. (2008).

Definition 2.4

Take q(P) to be a multivariate polynomial function on the indeterminates \(P=(p_{ijkl})\). We say that q is a Markov invariant if q transforms as a one-dimensional representation under the Markov action.

In the language of classical invariant theory, this is equivalent to saying q is a “relative invariant” under the Markov action so, for all \(P\in U\) and all \(g=M_1\otimes M_2\otimes M_3\otimes M_4\in \times ^4\mathcal {M}_2\):

$$\begin{aligned} q(g\cdot P)=q(M_1\otimes M_2\otimes M_3\otimes M_4\cdot P)=\lambda _gq(P), \end{aligned}$$

where \(\lambda _g\in \mathbb {C}\) satisfies, for all \(g,g'\in \times ^4 \mathcal {M}_2\), the multiplicative property: \(\lambda _{gg'}=\lambda _g\lambda _{g'}\). In the language of group representation theory, this means that \(\lambda _g\) provides a one-dimensional representation of \(\times ^4 \mathcal {M}_2\). In the examples we discuss, \(\lambda _g\) is simply a power of the determinant \(\det (g)\) (from which the multiplicative property follows easily).

As alluded to in the previous section, our interest in Markov invariants is motivated by the desire to control the behaviour, under the Markov action, of a quartet phylogenetic method founded on the evaluation of a set of polynomials. The Markov invariants represent the optimal case where we have complete understanding of what is happening under the Markov action. As we will see, the situation is quite different for the classically constructed phylogenetic identities.

2.6 Flattenings, minors, and edge identities

Here we derive the so-called “edge identities”. In their most general form, these are phylogenetic identities for phylogenetic trees, which can be used to detect the presence or absence of a particular edge in the phylogenetic tree. These identities were first derived using the general concepts of tensor flattenings and associated rank conditions developed by Allman and Rhodes (2008). Here we specialize to the case of binary states and quartet trees and take an approach which focuses on the role of the Markov action.

Definition 2.5

Suppose \(P=(p_{i_1i_2i_3i_4})\in U\) is a generic tensor and suppose \(\alpha \beta |\gamma \delta \) is a bipartition of \(\{1,2,3,4\}\). The flattening of P corresponding to the bipartition \(\alpha \beta |\gamma \delta \) is the \(2^2\times 2^2\) matrix containing the entries \(p_{i_1i_2i_3i_4}\) with rows indexed by \( i_\alpha i_\beta =00,01,10,11 \) and columns indexed by \( i_\gamma i_\delta =00,01,10,11\).

Up to row and column permutations, there are only three distinct flattenings of a tensor \(P\in U\), each corresponding to one of the possible quartet trees \(T_1,T_2\) or \(T_3\). Concretely, we denote the “12|34” flattening of P as the \(4\times 4\) matrix \(\text {Flat}_1(P)\) with entries

$$\begin{aligned} {\text {Flat}_1(P)}_{i_1i_2,i_3i_4}=p_{i_1i_2i_3i_4}. \end{aligned}$$

Similarly we define the “13|24” and “14|23” flattenings as the \(4\times 4\) matrices \(\text {Flat}_2(P)\) and \(\text {Flat}_3(P)\) with entries

$$\begin{aligned} {\text {Flat}_2(P)}_{i_1i_3,i_2i_4}= p_{i_1i_2i_3i_4},\qquad {\text {Flat}_3(P)}_{i_1i_4,i_2i_3}= p_{i_1i_2i_3i_4}, \end{aligned}$$

respectively.

The action of \(\times ^4\text {GL}(2)\) discussed in Sect. 2.2, \(P\rightarrow A\otimes B\otimes C\otimes D \cdot P\), can be shown to be expressed on the 12|34 flattening as

$$\begin{aligned} \text {Flat}_1(P)\rightarrow (A\otimes B)\cdot \text {Flat}_1(P) \cdot (C\otimes D)^T, \end{aligned}$$
(2.2)

where \({}^T\) indicates matrix transpose.Footnote 3

Using the flattenings, it is not too hard to derive some phylogenetic identities for quartet trees. Consider a clipped tensor \(\widetilde{P_1}\) from the quartet tree \(T_1\) and its flattening

$$\begin{aligned} \text {Flat}_1(\widetilde{P}_1)= \left[ \begin{array}{cccc} x &{} 0 &{} 0 &{} y \\ 0 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 0\\ z &{} 0 &{} 0 &{} w \end{array} \right] , \end{aligned}$$
(2.3)

where the xyzw label the non-zero probabilities given in (2.1). From (2.2) we see that

$$\begin{aligned} \text {Flat}_1(P_1)=M_1\otimes M_2 \cdot \text {Flat}_1(\widetilde{P}_1)\cdot (M_3\otimes M_4)^T, \end{aligned}$$

and hence, assuming each \(M_i\in \mathcal {M}_2\) is non-singular, we conclude that \(\text {rank}\left( \text {Flat}_1(P_1)\right) \le 2\). Therefore the 16 cubic polynomials obtained by taking 3-minors of this flattened matrix form a set of phylogenetic identities for the quartet 12|34. We refer to these minors as edge identities. (We will see in Sect. 4 that these minors are actually tree-informative since they do not vanish on the other two quartets, at least generically.)

These observations generalize to:

Theorem 2.1

(Allman and Rhodes 2008) In each case \(i = 1,2,3\); the 16 polynomial functions in the indeterminates \(P_i=(p^{(i)}_{i_1i_2i_3i_4})\) produced by taking the cubic 3-minors of the flattening \(\text {Flat}_i(P_i)\) form phylogenetic identities for probability distributions \(P_i\) arising from the quartet tree \(T_i\).

An attractive feature of this process of taking flattenings and minors is that the construction can be generalized to phylogenetic tensors with any number of taxa, and Markov chains with arbitrary state spaces (beyond the binary case discussed here). This observation was first presented by Allman and Rhodes (2008) and generalized to a wider class of models by Draisma and Kuttler (2008) and Casanellas and Fernández-Sánchez (2010).

3 Quartet inference measures

We now describe some desirable properties of any quartet method which returns tree topology only. We suppose the pattern frequency array \(F=(f_{ijkl})\in U\) for four taxa arose as N independent samples from some fixed distribution \(P\in U\). (In particular one may like to consider the case where \(P=P_i\) arose on the tree \(T_i\), but this is not necessary for the discussion in this section.) We interpret N as sequence length of the alignment, and denote this situation as \(F\sim \text {MultiNom}(P,N)\), noting this implies F has (componentwise) expectation value \(E[F]=NP\).

Definition 3.1

A triple \(\varDelta (F)=(R_1,R_2,R_3)\) is called a quartet inference measure (or simply a measure) for F if each \(R_1\),\(R_2\),\(R_3\) is a (statistically interpretable) confidence in the respective statements \(F \sim \text {MultiNom}(P_1,N)\), \(F \sim \text {MultiNom}(P_2,N)\), \(F \sim \text {MultiNom}(P_3,N)\), for some \(P_1\),\(P_2\),\(P_3\) arising in turn from the quartets \(T_1\),\(T_2\),\(T_3\).

Later, we set each \(R_i\) equal to a residual sum of squares under the quartet hypothesis \(T_i\), but for the moment we assume, without loss of generality, that \(\varDelta \) is designed so that small values of \(R_i\) correspond to greater confidence in quartet \(T_i\). Given this, we assume the quartet inference measure \(\varDelta \) ranks the statistical confidence in the three quartet trees \(T_1\), \(T_2\) and \(T_3\) using the relative ordering of \(R_1\), \(R_2\) and \(R_3\).

Considering quartet inference measures \(\varDelta \) in the abstract sense, in Table 1 we describe three theoretical statistical properties a measure may, or may not, satisfy. On the practical side, in the simulation study (Sect. 6), we apply several specific examples of quartet measures \(\varDelta \) constructed from polynomial functions (both phylogenetic identities and Markov invariants) on the tensor product space U. The results of the simulations clearly establish the importance of each of the properties given in Table 1.

Presently, we illustrate the three properties by showing:

Table 1 Proposed desirable statistical properties of quartet inference measures

Theorem 3.1

The neighbor-joining algorithm (Saitou and Nei 1987) together with an additive estimator of pairwise distance consistent with a fixed Markov model provides a quartet inference measure satisfying Property I, Property II (strong), and Property III.

Note: Supposing the pairwise distance estimator between taxa i and j input to neighbor-joining is denoted as \(d_{ij}\). By “additive” and “consistent with a given Markov model” we mean the following:

  1. 1.

    A specific continuous-time Markov model on quartet trees is fixed;

  2. 2.

    The associated Markov matrices produce a matrix group (Sumner et al. 2012) so the “Markov action” on the leaves is well defined (as in Definition 2.3) ;

  3. 3.

    The expectation value \(E[d_{ij}]\) is equal to the sum of the branch lengths on the path from leaf i to j .

Examples of Markov models where these conditions can be achieved include the binary-symmetric and Jukes–Cantor models, together with their unbiased pairwise distance estimators [see, for example, Felsenstein (2004)]. In the following we give an outline of a proof.

Proof

For quartets, the neighbor-joining algorithm returns the quartet corresponding to the minimum of the three-tuple \(\varDelta =(R_1,R_2,R_3):=(d_{12}+d_{34},d_{13}+d_{24},d_{14}+d_{23})\). Under this definition, it is clear that \(\varDelta \) satisfies Property I, as required.

Further, if each \(d_{ij}\) is additive and consistent with a Markov model on the tree (as described above), then under the Markov action it follows that \(E[R_i]\rightarrow E[R_i]+\lambda _g\), where \(\lambda _g:=t_1'+t_2'+t_3'+t_4'\) and each \(t_i'\) is the extension of branch length on leaf i of the quartet. Setting \(\varDelta (F)=(R_1,R_2,R_3)\) we have, under the leaf action: \(E[\varDelta (F')]=E[\varDelta (F)]+(\lambda _g,\lambda _g,\lambda _g)\), and \(\lambda _g+\lambda _{g'}=\lambda _{gg'}\) (since the branch lengths are additive under further extension of the leaves). This establishes that, under these conditions, neighbor-joining satisfies Property II (strong), as required.

Finally, Property III is built into our assumption on the pairwise distance estimator. (For example, the Jukes–Cantor distance estimator for DNA sequences will fail to return a finite answer when the proportion of sites that differ in a pairwise sequence alignment is greater than 0.75; this is a structural feature resulting from a continuous-time assumption.) \(\square \)

Thinking in a continuous-time formulation of Markov chains, in general one would expect that such \(\lambda _g\) would have some monotonicity property with respect to time so that, up to a fixed amount of statistical noise, our ability to discriminate quartets using a measure \(\varDelta \) satisfying Property II decreases in time. This is indeed the case for the example of neighbor-joining just given, and more generally corresponds to the biological fact that the ability to detect homology between extant taxa (that is, the “phylogenetic signal”) degrades as the divergence of common ancestry is pushed further backwards in time. In our application of Markov invariants, we will see that this is also the case where \(\lambda _g\) is multiplicative and \(\lambda _g\sim e^{-\gamma t}\), with \(\gamma >0\).

Previous work has discussed applying Property I (Eriksson 2008; Sumner and Jarvis 2009; Rusinko and Hipp 2012) in the context of phylogenetic identities. To our knowledge, Property II has never been explicitly discussed before. We will however show in Sect. 4 that Property II (weak) is implicit in the quartet method based on Markov invariants presented by Holland et al. (2013).

We are convinced that these properties of a quartet measure \(\varDelta \) are natural given that the purpose of \(\varDelta \) is to deliver confidence in the choice of quartet from observed data. We will explain how the Markov invariants are ideally tailored to the task of constructing quartet measures that satisfy Property II in its strong version. As we will see, this is contingent upon the construction of unbiased estimators of the Markov invariants; a problem we solve completely in the binary quartet case, but is otherwise open (see Sect. 7).

The next two sections contain the derivations of Markov invariants and the related discussion of Properties I and II.

4 The squangles

As previously noted in Sect. 2.4, whether a phylogenetic pattern distribution F arises as a sample from a specific quartet \(T_i\) depends only on the internal structure of the tree, not on the lengths or model parameters on the leaf edges. This motivates Definition 2.4 of Markov invariants, which for historical reasons in the quartet case on four-state, DNA models, we call “squangles” (stochastic quartet tangle, see Sumner et al. (2008)). We work with an analogous construction in the binary case and, when needed, refer to these polynomials as “binary squangles” or, whenever there is no risk of ambiguity, simply as “squangles”.

In this section, we first derive the (binary) squangles, then use them to build a quartet measure \(\varDelta \) which satisfies Properties I, II (weak), and III. We then consider issues of statistical bias to build a second measure that satisfies Properties I, II (strong), and III.

4.1 Construction

To motivate and construct the squangles, we use an alternative basis for \(\mathbb {C}^2\). Our choice of basis is motivated by the simple observation that a linear change of coordinates on the probability vectors \([p_0,p_1]^T\) makes probability conservation, \(p_0+p_1 = 1\) an explicitly conserved quantity under the action of Markov matrices \(\mathcal {M}_2\).

To this end, we use the orthogonal similarity transformation \(h=\frac{1}{\sqrt{2}}\left[ {\begin{matrix} 1 &{} 1 \\ -1 &{} 1 \end{matrix}}\right] \) with inverse \(h^{-1} = h^T\), so that \(2\times 2\) Markov matrices \(M=\left[ {\begin{matrix}1-a &{} b \\ a &{} 1-b\end{matrix}}\right] \) are transformed to \( M'=h^TMh=\left[ {\begin{matrix} \lambda &{} v \\ 0 &{} 1 \end{matrix}} \right] \), where \(\lambda = 1 - a - b\) and \(v = b - a\), and the second row explicitly manifests probability conservation. In what is to come, we will have additional recourse to consider only parameters that arise under a continuous-time formulation of a Markov chain, so that \(M=e^{Qt}\) for some \(2\times 2\) “rate” (zero-column sum) matrix Q. In this case we have the constraints \(0\le a,b< \frac{1}{2}\) which, in particular, implies \(0< \lambda \le 1\).

Let \(P\in U\) be a distribution with components \(p_{ijkl}\). Following the notation set out in Sect. 2.6 we let \(\text {Flat}_1(P)\) be the 12|34 flattening of P, which under the Markov action transforms as

$$\begin{aligned} \text {Flat}_1(P)\rightarrow (M_1\otimes M_2)\cdot \text {Flat}_1(P)\cdot (M_3\otimes M_4)^T. \end{aligned}$$

In the alternative basis we have the \(4\times 4\) form

$$\begin{aligned} M'_1\otimes M'_2= \left[ \begin{array}{cccc} \lambda _1\lambda _2 &{} \lambda _1v_2 &{} v_1\lambda _2 &{} v_1v_2 \\ 0 &{} \lambda _1 &{} 0 &{} v_1 \\ 0 &{} 0 &{} \lambda _2 &{} v_2 \\ 0 &{} 0 &{} 0 &{} 1 \end{array} \right] , \end{aligned}$$

and a similar expression for \(M'_3\otimes M'_4\). Commensurately, we let \(\text {Flat}_1'(P)\) denote the 12|34 flattening in the alternate basis:

$$\begin{aligned} \text {Flat}_1'(P):=\left( h^T\otimes h^T\right) \cdot \text {Flat}_1(P)\cdot \left( h\otimes h\right) . \end{aligned}$$

This formulation allows us to identify the bottom right \(3\times 3\) sub-matrix \(\widehat{\text {Flat}}'_1(P)\) of \(\text {Flat}'_1(P)\) as providing an invariant subspace for the Markov action, that is

$$\begin{aligned} \widehat{\text {Flat}}'_1(P)\rightarrow \widehat{(M'_1\otimes M'_2)}\cdot \widehat{\text {Flat}}'_1(P)\cdot \widehat{(M'_3\otimes M'_4)}^T, \end{aligned}$$

whereFootnote 4

$$\begin{aligned} \widehat{M'_1\otimes M'_2}:= \left[ \begin{array}{ccc} \lambda _1 &{} 0 &{} v_1 \\ 0 &{} \lambda _2 &{} v_2 \\ 0 &{} 0 &{} 1 \end{array} \right] , \end{aligned}$$

and similarly for \(\widehat{M'_3\otimes M'_4}\).

Further, this construction leads to a cubic Markov invariant using nothing more than the multiplicative property of the determinant:

$$\begin{aligned} \det (\widehat{\text {Flat}}'_1(P))\rightarrow & {} \det \left( \widehat{(M'_1\otimes M'_2)}\cdot \widehat{\text {Flat}}'_1(P)\cdot (\widehat{M'_3\otimes M'_4})^T\right) \nonumber \\= & {} \det (\widehat{M'_1\otimes M'_2})\det (\widehat{\text {Flat}}_1'(P))\det (\widehat{M'_3\otimes M'_4})\nonumber \\= & {} \lambda _1\lambda _2\lambda _3\lambda _4\det (\widehat{\text {Flat}}'_1(P)). \end{aligned}$$
(4.1)

As a polynomial on U, we set \(q_1(P):=\det (\widehat{\text {Flat}}'_1(P))\) and \(q_1\) is our first example of a Markov invariant on the space of tensors U since, for all \(P\in U\) and \(g = M_1\otimes M_2\otimes M_3\otimes M_4\in \times ^4\mathcal {M}_2\), we have:

$$\begin{aligned} q_1(g\cdot P)=\det (g)q_1(P), \end{aligned}$$

where \(\det (g) = \det (M_1)\det (M_2)\det (M_3)\det (M_4)\equiv \lambda _1\lambda _2\lambda _3\lambda _4\). Thus:

Theorem 4.1

The polynomial \(q_1\) defined as \(q_1(P):=\det (\widehat{\text {Flat}}'_1(P))\) is a Markov invariant accompanied by the one-dimensional representation of \(\times ^4\mathcal {M}_2\) given by \(\lambda _g=\det (g)\) for all \(g\in \times ^4\mathcal {M}_2\).

For the reasons explained at the start of this section, we refer to \(q_1\) as the “squangle”.

The reader should note that the squangle \(q_1\) is defined via (and depends absolutely upon) both the 12|34 flattening and our particular choice of basis for \(\mathbb {C}^2\). On the other hand, \(q_1(P)\) is perfectly well defined for all tensors \(P\in U\), and occurs as a homogeneous, cubic polynomial in the indeterminates \(p_{i_1i_2i_3i_4}\) with 96 terms (the explicit polynomial form is provided in Online Resource 1).

It is also important to note that (4.1) is valid only under the action of \(2\times 2\) Markov matrices and certainly fails for more general \(2\times 2\) matrices in \(\text {GL}(2)\). Thus the squangles are very much tailored for the probabilistic setting of Markov chains.

Having constructed \(q_1\) we now evaluate \(q_1\) specifically on a tensor \(P_1\) arising from the quartet tree \(T_1\) with the goal of producing a quartet inference measure \(\varDelta \). As observed in Sect. 2.3, if \(P_1\) arises from a quartet we can certainly write \(P_1=M_1\otimes M_2\otimes M_3\otimes M_4 \cdot \widetilde{P}_1\), where \(\widetilde{P}_1\) is the so-called clipped tensor. In particular, in the original probability basis, this tensor has components \(\widetilde{p}^{(1)}_{ijkl}= 0\) whenever \(i \ne j\) or \(k \ne l\).

We also saw in (2.3) that, under the 12|34 flattening, \(\text {Flat}_1(\widetilde{P}_1)\) generically has rank at most 2. Hence, working in the alternative basis, \(\widehat{\text {Flat}}'_1(\widetilde{P}_1)\) also has rank at most 2, and being a cubic minor we obtain

$$\begin{aligned} q_1(\widetilde{P}_1)=\det (\widehat{\text {Flat}}'_1(\widetilde{P}_1)) =0\quad \implies \quad q_1(P_1)=\lambda _1\lambda _2\lambda _3\lambda _4q_1(\widetilde{P}_1)=0, \end{aligned}$$

for all tensors \(P_1\) arising on the quartet tree 12|34 under any choices of parameters. Thus:

Theorem 4.2

The Markov invariant \(q_1\) is a phylogenetic identity for the quartet 12|34.

On the other hand if we suppose a distribution \(P_2\) arises from the quartet tree 13|24 we can write \(P_2=M_1\otimes M_2\otimes M_3\otimes M_4 \cdot \widetilde{P}_2\), where, considered as the 12|34 flattening in the original basis, we have generically:

$$\begin{aligned} \text {Flat}_1(\widetilde{P}_2)= \left[ \begin{array}{cccc} x &{} 0 &{} 0 &{} 0 \\ 0 &{} y &{} 0 &{} 0\\ 0 &{} 0 &{} z &{} 0\\ 0 &{} 0 &{} 0 &{} w \end{array} \right] . \end{aligned}$$

Transforming to the alternative basis and evaluating \(q_1(\widetilde{P}_2)\) we now find

$$\begin{aligned} q_1(\widetilde{P}_2)=\frac{1}{4}(wyz+xyz+wxy+wxz). \end{aligned}$$

Since \(\widetilde{P}_2\) is a distribution we have \(x,y,z,w>0\) and hence \( q_1(\widetilde{P}_2)>0 \). Since \(q_1\) is a Markov invariant, we have \(q_1(P_2)=\lambda _1\lambda _2\lambda _3\lambda _4q_1(\widetilde{P}_2)\), and we conclude \(q_1(P_2)>0\) for all choices of parameters such that \(\widetilde{P}_2\) corresponds to a probability distribution on the quartet 13|24 under continuous-time formulation of a Markov chain where \(0<\lambda _i=\det (M_i)=e^{\text {tr}(Q_i t)}\le 1\).

Finally if we suppose \(P_3\) arises from \(T_3\) we get

$$\begin{aligned} P_3=M_1\otimes M_2\otimes M_3\otimes M_4 \cdot \widetilde{P}_3, \end{aligned}$$

and again under the 12|34 flattening in the original basis, we have

$$\begin{aligned} \text {Flat}_1(\widetilde{P}_3)= \left[ \begin{array}{cccc} x &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} y &{} 0\\ 0 &{} z &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} w \end{array} \right] , \end{aligned}$$

which follows simply from the structural property of the components of \(P_3\) in the original basis: \(\widetilde{p}_{ijkl}\ne 0\) if and only if \(i = l\) and \(j = k\). Transforming to the alternative basis and evaluating \(q_1(\widetilde{P}_3)\) we now find

$$\begin{aligned} q_1(\widetilde{P}_3)=-\frac{1}{4}(wyz+xyz+wxy+wxz). \end{aligned}$$

Since \(q_1\) is a Markov invariant, we have \(q_1(P_3)=\lambda _1\lambda _2\lambda _3\lambda _4q_1(\widetilde{P}_3)\), and we conclude \(q_1(P_3)<0\) for sensible choices of parameters, that is, parameters such that \(\widetilde{P}_3\) really does correspond to a probability distribution and, on the leaf edges, \(0<\det (M_i)\le 1\).

We can of course define two additional squangles \(q_2,q_3\) using the other two choices of tensor flattenings 13|24 and 14|23. This can be achieved using an analogous argument to the one we gave for \(q_1\) but it is simpler at this stage to utilize the natural action of \(\mathfrak {S}_4\) on tensors \(P\in U\) to define:

$$\begin{aligned} q_2(P):=-q_1((23)\cdot P),\qquad q_3(P):=-q_1((24)\cdot P); \end{aligned}$$

where the choice of signs is chosen for reasons of elegance that will become apparent.

Clearly \(q_2\) and \(q_3\) also form Markov invariants since:

$$\begin{aligned} q_2(M_1\otimes M_2 \otimes M_3 \otimes M_4 \cdot P)= & {} -q_1((23)\cdot M_1\otimes M_2 \otimes M_3 \otimes M_4 \cdot P) \\= & {} -q_1(M_1\otimes M_3 \otimes M_2 \otimes M_4 \cdot (23)\cdot P) \\= & {} -\lambda _1\lambda _3\lambda _2\lambda _4q_1((23)\cdot P)=\lambda _1\lambda _2\lambda _3\lambda _4q_2(P), \end{aligned}$$

with a similar derivation for \(q_3\).

4.2 Signs for the squangles

A critical part of our construction of a useful measure for tree inference relies on understanding the expected values of the polynomials, and particularly, their expected signs. Thus, we use the invariance property established in the last subsection to infer positivity conditions for \(q_2\) and \(q_3\) on the three possible quartets as follows (note we have already established these conditions for \(q_1\) as part of our development of the last section).

Suppose \(P_2\) is a tensor arising from the quartet 13|24. As before we can write \(P_2=M_1\otimes M_2\otimes M_3\otimes M_4\cdot \widetilde{P}_2\). Now taking \(\widetilde{P}_1:=(23)\cdot \widetilde{P}_2\), it is clear that \(\widetilde{P}_1\) is a clipped tensor taken from the quartet 12|34. Thus

$$\begin{aligned} q_2(\widetilde{P}_2)=-q_1((23)\cdot \widetilde{P}_2)=-q_1(\widetilde{P}_1)=0, \end{aligned}$$

since we concluded above that \(q_1(\widetilde{P}_1)=0\) for all tensors from the quartet 12|34. Conversely, choosing any clipped tensor \(\widetilde{P}_1\) from 12|34 and defining \(\widetilde{P}_2:=(23)\cdot \widetilde{P}_1\), we have:

$$\begin{aligned} q_2(\widetilde{P}_1)=-q_1((23)\cdot \widetilde{P}_1)=-q_1(\widetilde{P}_2)<0. \end{aligned}$$

Continuing in this fashion we infer the signs of the evaluations of the squangles on tensors from the three possible quartets.

Before we summarize this information however, we note the squangles form a vector space (linear combination of these invariant functions is again an invariant function), and explicit computation shows that this vector space only has dimension two, that is, there is a linear dependence between the polynomials \(q_1,q_2,q_3\). In fact, this dependency is exhibited by

$$\begin{aligned} q_1+q_2+q_3=0, \end{aligned}$$

as a polynomial identity. Thus only two of the squangles are needed to span the vector space of these invariant functions. In the Appendix (Online Resource 4) we show that this linear dependence follows directly from a representation theoretic argument using group characters. Given this linear dependence, for reasons of symmetry it makes sense to consider the pair \(\{q_2,q_3\}\) as a basis for the squangles when the quartet 12|34 is under consideration, the pair \(\{q_1,q_3\}\) as a basis when the quartet 13|24 is under consideration, and the pair \(\{q_1,q_2\}\) as a basis when the quartet 14|23 is under consideration.

Table 2 Expectation values of the three squangles \(q_1,q_2,q_3\) when evaluated on a tree tensor \(P_i\) corresponding to quartet \(T_i\)

Putting the information found so far together, we find expected values for the squangles \(q_1\), \(q_2\), and \(q_3\) when evaluated on the three possible quartets as given in Table 2. We use this table of expectation values to design an optimal quartet inference measure \(\varDelta \).

Theorem 4.3

Given a probability tensor \(P_i\in U\) arising from the quartet \(T_i\), when evaluated on a frequency array \(F\sim \text {MultiNom}(P_i,N)\), the Markov invariants \(\{q_1,q_2,q_3\}\) have the signed expectation values given in Table 2.

Proof

Given the above observations regarding the signs of the squangles when evaluated on the three possible quartets, to complete the proof, we need only confirm, for \(i = 1,2,3\), the expectation values \(E[q_i(F)]=N(N-1)(N-2)q_i(P)\) for all probability tensors P and \(F\sim \text {MultiNom}(P,N)\).

Under the multinomial distribution, we have \(E[F]=NP\) which is simply the vector version of \(E[f_{ijkl}]=Np_{ijkl}\). In general, the situation for higher monomial powers in the \(f_{ijkl}\) is not so straightforward. However, the explicit polynomial form given in Online Resource 1 reveals that each monomial term in \(q_1\) is square free (and hence the same result follows for the squangles \(q_2\) and \(q_3\)). Considering a square free cubic monomial \(f_{i_1j_1k_1l_1}f_{i_2j_2k_2l_2}f_{i_3j_4k_4l_4}\), one finds, using the moment generating function for the multinomial distribution (see (4.2) and surrounding discussion below):

$$\begin{aligned} E[f_{i_1j_1k_1l_1}f_{i_2j_2k_2l_2}f_{i_3j_3k_3l_3}]=N(N-1)(N-2)p_{i_1j_1k_1l_1}p_{i_2j_2k_2l_2}p_{i_3j_3k_3l_3}. \end{aligned}$$

We then apply linearity of expectation value to conclude \(E[q_i(F)]=N(N-1)(N-2)q_i(P)\) for \(i=1,2,3\). Defining \(u:=E[q_1(F)]\), \(v:=E[q_2(F)]\), and \(w:=E[q_3(F)]\) completes the proof. \(\square \)

Before using this information to derive a quartet inference measure, we first need to consider the behaviour of the squangles under taxon permutations.

4.3 Taxon permutations for the squangles

From the definition of the flattenings and the action of \(\mathfrak {S}_4\) on U it follows that

$$\begin{aligned} \text {Flat}_1((12)\cdot P)=K\text {Flat}_1(P),\qquad \text {Flat}_1((13)(24)\cdot P)=\text {Flat}_1(P)^T, \end{aligned}$$

where K is the permutation matrix

$$\begin{aligned} K= \left[ \begin{array}{cccc} 1 &{} 0 &{} 0 &{} 0\\ 0 &{} 0 &{} 1 &{} 0\\ 0 &{} 1 &{} 0 &{} 0\\ 0 &{} 0 &{} 0 &{} 1\\ \end{array} \right] . \end{aligned}$$

Further, since the permutations (12) and (13)(24) generate the stabilizer \(\text {Stab}(T_1)\), we see that the action of the eight permutations in \(\text {Stab}(T_1)\) comes from compositions of these basic two:

$$\begin{aligned} \begin{array}{cccc} \text {Flat}_1(P), &{} K\text {Flat}_1(P),&{} \text {Flat}_1(P)K,&{} K\text {Flat}_1(P)K,\\ \text {Flat}_1(P)^T,&{}K\text {Flat}_1(P)^T,&{}\text {Flat}_1(P)^TK,&{}K\text {Flat}_1(P)^TK. \end{array} \end{aligned}$$

Transforming this result into the alternative basis, it is straightforward to show:

$$\begin{aligned} \text {Flat}'_1((12)\cdot P)=K\text {Flat}'_1(P),\qquad \text {Flat}'_1((13)(24)\cdot P)=\text {Flat}'_1(P)^T, \end{aligned}$$

where, in the first result, we have used \((h^T\otimes h^T)\cdot K\cdot (h\otimes h)=K\). From this we see that

$$\begin{aligned} q_1((12)\cdot P)= & {} \det (\widehat{\text {Flat}}'_1((12)\cdot P))=\det (K)\det (\widehat{\text {Flat}}'_1( P))\\= & {} -\det (\widehat{\text {Flat}}'_1( P))=-q_1(P), \end{aligned}$$

and similarly \(q_1((13)(24)\cdot P)=q_1(P)\). Thus the squangle \(q_1\) spans a one-dimensional subspace under the action of the stabilizer \(\text {Stab}(T_1)\). In particular, \(q_1\) transforms as the \(\texttt {sgn}\) representation of \(\mathfrak {S}_4\) restricted to the stabilizer:

Theorem 4.4

The squangle \(q_1\) transforms as \(\texttt {sgn}\) under the action of the stabilizer \(\text {Stab}(T_1)\) defined by \(q_1(P)\mapsto q_1(\sigma \cdot P)=\text {sgn}(\sigma )q_1(P)\), for all \(\sigma \in \text {Stab}(T_1)\) and \(P\in U\).

Following the approach of Sumner and Jarvis (2009) for the DNA (four-state) case, this result provides an alternative route to establishing that the squangle \(q_1\) is a phylogenetic identity for quartet \(T_1\) (Theorem 4.2) using the notion of the clipped tensor, as follows. If \(P_1\) is a tree tensor corresponding to quartet \(T_1\), then it is clear that the clipped tensor \(\widetilde{P}_1\) is fixed under the (odd) permutation \((12)\in \text {Stab}(T_1) \), that is \((12)\cdot \widetilde{P}_1=\widetilde{P}_1\). Hence,

$$\begin{aligned} q_1(P_1)= & {} \lambda _1\lambda _2\lambda _3\lambda _4q_1(\widetilde{P}_1) =\lambda _1\lambda _2\lambda _3\lambda _4q_1((12)\cdot \widetilde{P}_1)\\= & {} \lambda _1\lambda _2\lambda _3\lambda _4\texttt {sgn}((12))q_1( \widetilde{P}_1)=-q_1(P_1), \end{aligned}$$

and we conclude that \(q_1(P_1)=0\) for all tree tensors \(P_1\) corresponding to quartet \(T_1\).

Now considering the following calculation:

$$\begin{aligned} q_2((12)\cdot P)= & {} -q_1((23)(12)\cdot P)\\= & {} -q_1((132)\cdot P)\\= & {} -q_1((12)(13)\cdot P)=q_1((13)\cdot P)=q_1((13)(24) (24)\cdot P)\\= & {} q_1((24)\cdot P)=-q_3(P). \end{aligned}$$

Where we have used the definition of \(q_2\) in the first equality, Theorem 4.4 in the fourth and fifth equality, and the definition of \(q_3\) in the final equality. A similar calculation shows:

$$\begin{aligned} q_2((13)(24)\cdot P)=q_2(P). \end{aligned}$$

From this, we conclude:

Theorem 4.5

The squangles \(q_2\) and \(q_3\) transform under the action of the stabilizer \(\text {Stab}(T_1)\) as a signed permutation representation. Specifically, for all \(\sigma \in \text {Stab}(T_1)\) and \(P\in U\):

$$\begin{aligned} q_2(\sigma P)= \left\{ \begin{array}{c} q_2(P),\text { if }{} \texttt {sgn}(\sigma )=1;\\ -q_3(P), \text { if }{} \texttt {sgn}(\sigma )=-1. \end{array} \right. \end{aligned}$$

In the next section, we will apply these results to construct quartet measures which explicitly satisfy Property I.

4.4 The measure and residual sum of squares

We are now ready to discuss specific examples of quartet inference measures \(\varDelta \). Given the expectation values given in Table 2, we may construct a naive measure using the squangles as \(\varDelta (F) = (|q_1(F)|^\ell , |q_2(F)|^\ell , |q_3(F)|^\ell )\) for some integer \(\ell >0\). We note that Theorems 4.1, 4.4 and 4.5 suggest that this measure may satisfy Properties I, II (possibly in the strong form), and III. However, we need to consider the statistical situation carefully to establish this formally and, as we will see, this motivates us to consider a more sophisticated measure.

Firstly, we consider the signed expectations given in Table 2 and develop a residual sum of squares measure—analogous to that developed by Holland et al. (2013)—that takes these signs into account. For purposes of self-containment, we revisit the derivation and then, after a consideration of statistical bias correction under multinomial sampling, modify the quartet inference measure to produce one that satisfies Properties I, II (strong), and III, as described in Sect. 3.

Suppose we are interested in the hypothesis that the array of observed pattern frequencies occurs as a multinomial sample \(F\sim \text {MultiNom}(P_1,N)\) with \(P_1\) arising on quartet \(T_1\) under some fixed set of parameters. Evaluating the squangles on the array F, we see that our best estimate of the parameter \(u\ge 0\) is given by

$$\begin{aligned} \hat{u}= \left\{ \begin{array}{ll} \frac{1}{2}(q_3(F)-q_2(F)),&{}\text {if}\,\,q_3(F)> q_2(F);\\ 0,&{}\text {otherwise}. \end{array} \right. \end{aligned}$$

If \(\hat{u}>0\) then the residual sum of squares is

$$\begin{aligned} (q_2(F)+\hat{u})^2+(q_3(F)-\hat{u})^2=\frac{1}{2} (q_2(F)+q_3(F))^2=\frac{1}{2}q_1^2(F),\nonumber \end{aligned}$$

since \(q_1+q_2+q_3=0\). On the other hand, if \(\hat{u}= 0\), the residual sum of squares is \(q^2_2(F)+q_3^2(F)\). The residuals for the other two quartet hypotheses can be obtained similarly and all are presented in Table 3. These results exactly correspond to those given by Holland et al. (2013) for the DNA squangles case. Presently, we take these ideas further by considering issues of statistical bias to find an inference measure which, in expectation value, satisfies Properties I, II (strong), and III.

Table 3 Residual sums of squares for each quartet hypothesis and possible ordering of squangle values

To motivate the discussion, assume \(F\sim \text {MultiNom}(P_1,N)\) where \(P_1\) is a distribution arising from quartet \(T_1\) and suppose \(q_2(F)\le q_1(F)\le q_3(F)\). Then, under the least squares approach, we have the residual sum of squares for each quartet hypothesis \(T_i\):

$$\begin{aligned} \varDelta (F) = (\text {RSS}_1,\text {RSS}_2,\text {RSS}_3)= \left( \textstyle {\frac{1}{2}}q_1^2(F),q_{1}^2(F)+q_{3}^2(F),q_{1}^2(F)+q_{2}^2(F)\right) . \end{aligned}$$

To ensure Property II (strong) we need the expected value of \(\varDelta \) for the situation \(F\sim \text {MultiNom}(P_1,N)\) to be proportional to the expected value of \(\varDelta \) for \(F'\sim \text {MultiNom}(g\cdot P_1,N)\) with \(g\in \times ^4\mathcal {M}_2\). However this is not true as the situation stands since

$$\begin{aligned} E[\varDelta ]= & {} E[(\textstyle {\frac{1}{2}}q_1^2(F),q_{1}^2(F)+q_{3}^2(F),q_{1}^2(F)\\&+q_{2}^2(F))]\not \propto (\frac{1}{2}q_1^2(P_1),q_{1}^2(P_1)+q_{3}^2(P_1),q_{1}^2(P_1)+q_{2}^2(P_1)), \end{aligned}$$

given that \(q_{i}^2(F)\) provides a biased estimator of \(q_{i}^2(P_1)\) under multinomial sampling. We can, however, remedy this situation by computing unbiased estimators of the squares of the squangles. We denote these polynomials as \(S_i\), defined through the condition

$$\begin{aligned} E[S_i(F)]=q_i^2(P_1). \end{aligned}$$

Then we redefine our measure to be

$$\begin{aligned} \varDelta (F):=\left( \textstyle {\frac{1}{2}}S_1(F),S_1(F)+S_3(F),S_1(F)+S_2(F)\right) \end{aligned}$$

and it follows that

$$\begin{aligned} E[\varDelta (F')]=\det (g)^2E[\varDelta (F)], \end{aligned}$$

as required by Property II (strong).

We now discuss how to explicitly compute the unbiased forms \(S_i\). To simplify the presentation, we will denote the probabilities of distinct patterns ijkl using the symbols \(x_1,x_2,x_3\ldots \) and the corresponding site pattern counts \(f_{ijkl}\) using the symbols \(X_1,X_2,X_3\ldots \). The moment generating function for the multinomial distribution is then expressed as

$$\begin{aligned} f(s_1,s_2,s_3,\ldots ):=E[e^{s_1X_1+s_2X_2+s_3X_3+\ldots }]=(x_1e^{s_1}+x_2e^{s_2}+x_3e^{s_3}+\ldots )^N, \end{aligned}$$

so the expectation value of a monomial in the site pattern counts can then be computed via

$$\begin{aligned} E[X_{1}^{n_1}X_{2}^{n_2}X_{3}^{n_3}\ldots ]=\left. \frac{\partial ^{n_1}}{\partial s_1^{n_1}}\frac{\partial ^{n_2}}{\partial s_2^{n_2}}\frac{\partial ^{n_3}}{\partial s_3^{n_3}}\ldots f(s_1,s_2,\ldots )\right| _{s_1 = s_2 = s_3 = \ldots = 0}. \end{aligned}$$
(4.2)

As was alluded to in the proof of Theorem 4.3, complications arise when considering the expectation values of polynomials in the counts \(f_{ijkl}\). Even though the squangles are square free (as the explicit form given in Online Resource 1 shows), when we compute residuals according to Table 3, the relevant polynomials \(q_i^2(F)\) will no longer be square free. However, we can at least say that each degree six monomial term in \(q_i^2(F)\) has no exponents of higher order than a square. Thus we need only consider the monimals of the form \(X_1^2X_2^2X_3^2\), \(X_1^2X_2^2X_3X_4\), \(X_1^2X_2X_3X_4X_5\), and \(X_1X_2X_3X_4X_5X_6\). Using (4.2), we found that, in each case, we can obtain bias-corrected forms by the simple replacement \(X_i^2\rightarrow X_i^2-X_i\). Indeed, following this through one finds:

$$\begin{aligned} E[(X_1^2-X_1)(X_2^2-X_2)(X_3^2-X_3)]= & {} N(N-1)\ldots (N-5)x_1^2x_2^2x^2_3,\\ E[(X_1^2-X_1)(X_2^2-X_2)X_3X_4]= & {} N(N-1)\ldots (N-5)x_1^2x_2^2x_3x_4,\\ E[(X_1^2-X_1)X_2X_3X_4X_5]= & {} N(N-1)\ldots (N-5)x_1^2x_2x_3x_4x_5,\\ E[X_1X_2X_3X_4X_5X_6]= & {} N(N-1)\ldots (N-5)x_1x_2x_3x_4x_5x_6. \end{aligned}$$

We then applied this process to each monomial in \(q_i^2(F)\) and divided by the combinatorial factor \(N(N-1)(N-2)\ldots (N-5)\) to produce \(S_i(F)\); inhomogeneous polynomials with the defining property \(E[S_i(F)]=q_i^2(P)\), as required.

Table 4 Our proposed optimal decision rule for using the squangles to infer quartet trees from binary sequence data

Our computations showed that the expansion of each \(q_i^2\) has 4008 monomial terms whereas the \(S_i\) each have 6688 terms. This is a significant computational complication as we can efficiently compute \(q_i^2(F)\) by simply taking the square \((q_i(F))^2\) (where we recall each \(q_i\) only has 96 monomial terms). However, having computed the explicit polynomial form of each \(S_i\) once (we did so in Mathematica (Wolfram Research, Inc 2010)), there is no need to do so again, and having done so repeated numerical evaluation is no great computational obstruction. We have included the explicit polynomial form of the \(S_i\) in the Online Resource 2.

With the unbiased forms \(S_i\) in hand, we found that the best performing quartet inference method obtainable is as described by the pseudocode in Table 4. We close this section with the conclusion:

Theorem 4.6

The quartet inference measure and decision rule described in Table 4 satisfies both Property I, Property II (strong), and Property III (see Table 1 for definitions).

Proof

The result follows from \(E[S_i(F)]=q_i^2(P)\) and Theorems 4.4, 4.5, 4.1, and 4.3. \(\square \)

5 The edge identities

In this section we discuss the behaviour of the edge identities (defined in Sect. 2.6) in terms of Properties I, II and III.

5.1 Property I

As we saw in Sect. 2.6, in the context of quartet trees and binary sequence data, the edge identities are the cubic minors of the three flattenings \(\text {Flat}_1(P)\), \(\text {Flat}_2(P)\) and \(\text {Flat}_3(P)\). For the purpose of this discussion, we denote the (ij) cubic minor of the flattening \(\text {Flat}_1(P)\) as \(m_{ij}(\text {Flat}_1(P))\), or simply as \(m_{ij}\).

We begin our discussion of Property I for the edge identities with a focus on the action of the stabilizer subgroup, \(\text {Stab}(T_1)\), of the quartet \(T_1 = 12|34\). This includes uncovering the exact representation of \(\text {Stab}(T_1)\) acting on the minors.

From the results of Sect. 4.3, we find that \(\text {Stab}(T_1)\) acts on the set of 16 cubic minors by signed permutations. Specifically, for all \(i,j=1,2,3,4\):

$$\begin{aligned} \begin{array}{ll} m_{1j}(K\text {Flat}_1(P))=-m_{1j}(\text {Flat}_1(P));&{} \phantom {++}m_{2j}(K\text {Flat}_1(P))=m_{3j}(\text {Flat}_1(P));\\ m_{3j}(K\text {Flat}_1(P))=m_{2j}(\text {Flat}_1(P));&{} \phantom {++}m_{4j}(K\text {Flat}_1(P))=-m_{4j}(\text {Flat}_1(P)); \end{array} \end{aligned}$$

and

$$\begin{aligned} m_{ij}(\text {Flat}_1(P)^T)=m_{ji}(\text {Flat}_1(P)). \end{aligned}$$

Thus we see that the minors break up into six (signed) orbits under this action:

$$\begin{aligned}&\{m_{11}\}, \{m_{12}, m_{21}, m_{31}, m_{13}\}, \{m_{14}, m_{41}\}, \{m_{22}, m_{32}, m_{23}, m_{33}\},\\&\quad \{m_{24}, m_{34}, m_{42}, m_{43}\}, \{m_{44}\}. \end{aligned}$$

Taking a multinomial sample \(F\sim \text {MultiNom}(P,N)\) and fixing an orbit, we see that the sum of squares \(\sum _{m_{ij}\in \text {orbit}}|m_{ij}(F)|^2\) is explicitly invariant under the action of \(\text {Stab}(T_1)\). If we define \(\varDelta _1\) to be this sum and analogously define \(\varDelta _2\) and \(\varDelta _3\), then \(\varDelta (F) := (\varDelta _1, \varDelta _2,\varDelta _3)\) is a quartet measure that satisfies Property I.

However, we could have instead used certain linear combinations of the minors from each orbit and still produce a polynomial invariant under \(\text {Stab}(T_1)\) and thus the analogous polynomials together form a Property I satisfying quartet measure. For example, we could define \(\varDelta _1\) to be \(|m_{11}(F)+m_{44}(F)|^2+|m_{11}(F)-m_{44}(F)|^2\). In general, it is possible to take linear combinations of the minors which transform as one-dimensional linear representations of the stabilizer \(\text {Stab}(T_1)\) and then take sums of squares of thereof as an inference measure (this can be done systematically using the methods developed by Sumner and Jarvis (2009)). We performed this analysis but omit the details here, since we found that there was no particular gain in statistical power over the straightforward sum of squares described above.

The theoretical conclusion is that leaf permutation symmetries alone are not enough to uniquely determine a choice of measure constructed from the minors. Further, there is no reason at all to expect this measure will satisfy Property II in the weak or strong form. This discussion does however raise the natural question of whether there perhaps exists a linear combination of minors which satisfies both Property I and II. Of course, this linear combination is exactly the (binary) squangle constructed in the previous section. Importantly, this fact is specific to this binary case, and in Sect. 7 and the appendix we discuss why this is a special feature restricted to the binary Markov model.

5.2 Signs for the edge identities

We now turn to exploring Property III in relation to the edge identities. The results presented in this section will only be valid under a continuous-time Markov process.

Explicit computation shows that, as a polynomial in the variables \(P=(p_{ijkl})_{i,j,k,l\in \{0,1\}}\), each minor of the flattening \(\text {Flat}_1(P)\) can be expressed as a linear combination of a minor of \(\text {Flat}_2(P)\) with a minor of \(\text {Flat}_3(P)\). We present these relationships in Table 5. We stress that these are algebraic relationships between the minors as polynomials in the variables \((p_{ijkl})\), valid for all tensors P.

If we fix a tensor \(P_1\) arising from \(T_1\), we see that we may re-express the vanishing of a given minor of \(\text {Flat}_1(P_1)\) as an equality between the corresponding minors in \(\text {Flat}_2(P_1)\) and \(\text {Flat}_3(P_1)\). Importantly, it then turns out that (under mild conditions discussed shortly) there exists a positive parameter u (whose exact value depends on the specific choice of the model parameters defining \(P_1\)) such that the value of the respective minors is either \(+u\) or \(-u\). The relevant signs are also provided in Table 5.

Taking this sign information into account leads to an important modification of the quartet inference measure obtained using the edge identities. This is implemented in the least squares framework with residual sums of squares exactly analogous to the signed squangles described in Table 3. Presently, we describe the conditions that lead to this additional sign information.

The (mild) condition we impose is that the \(2\times 2\) Markov matrices M on the leaves of the quartet tree have positive determinant: \(\det (M)>0\). The reader should note that this is a biologically reasonable condition where evolutionary times are generally of the order where a probability of substitution is smaller than the probability of no substitution. This is also the case when we consider a continuous-time implementation of the underlying Markov process, so \(M=e^{Qt}\) for some rate matrix Q and hence \(\det (M)=e^{\text {tr}(Qt)}>0\).

Assuming this condition, the inverses of such Markov matrices have entries with signs given by:

$$\begin{aligned} M^{-1}= \left( \begin{array}{cc} + &{} - \\ - &{} + \end{array} \right) . \end{aligned}$$

Hence, if we take a Kronecker product of two such matrices we have the signed form:

$$\begin{aligned} M^{-1}\otimes M^{-1}= \left( \begin{array}{cccc} + &{} - &{} - &{} + \\ - &{} + &{} + &{} - \\ - &{} + &{} + &{} - \\ + &{} - &{} - &{} + \end{array} \right) . \end{aligned}$$

Arguing as we did in Sect. 4.1, taking a clipped tensor \(\widetilde{P}_1\) arising on \(T_1\), we have

$$\begin{aligned} \text {Flat}_2(\widetilde{P}_1)=\text {Flat}_3(\widetilde{P}_1)= \left( \begin{array}{cccc} x &{} 0 &{} 0 &{} 0 \\ 0 &{} y &{} 0 &{} 0 \\ 0 &{} 0 &{} z &{} 0 \\ 0 &{} 0 &{} 0 &{} w \end{array} \right) , \end{aligned}$$

where \(x,y,z,w>0\). Organizing the minors of these flattenings into the corresponding cofactor matrices (that is, the \(4\times 4\) matrix where the (ij) entry is \((-1)^{i+j}\) times the (ij) minor), we have

$$\begin{aligned} \text {Cof}(\text {Flat}_2(\widetilde{P}_1))= \text {Cof}(\text {Flat}_3(\widetilde{P}_1))= \left( \begin{array}{cccc} yzw &{} 0 &{} 0 &{} 0 \\ 0 &{} xzw &{} 0 &{} 0 \\ 0 &{} 0 &{} xyw &{} 0 \\ 0 &{} 0 &{} 0 &{} xyz \end{array} \right) . \end{aligned}$$

Recalling that the cofactor matrix can be expressed as

$$\begin{aligned} \text {Cof}(A)=\det (A){A^{-1}}^{T}, \end{aligned}$$

it follows that the cofactor matrix is multiplicative: \(\text {Cof}(AB)=\text {Cof}(A)\text {Cof}(B)\). From the above expressions we may conclude that the cofactor matrices of the flattenings have, for any \(P_1=M_1\otimes M_2\otimes M_3\otimes M_4\cdot \widetilde{P}_1\) arising on quartet \(T_1\), the signed form:

$$\begin{aligned} \text {Cof}(\text {Flat}_2(P_1))= & {} \text {Cof}(M_1\otimes M_3)\cdot \text {Cof}(\text {Flat}_2(\widetilde{P}_1))\cdot \text {Cof}((M_2\otimes M_4)^T)\\= & {} \left( \begin{array}{cccc} + &{} - &{} - &{} + \\ - &{} + &{} + &{} - \\ - &{} + &{} + &{} - \\ + &{} - &{} - &{} + \end{array} \right) , \end{aligned}$$

and, similarly, the same result holds for the signed form of \(\text {Cof}(\text {Flat}_3(P_1))\).

We have used this result to produce the sign information given in Table 5. The information in this table can be used to produce a measure that we refer to as the “signed minor”, with the specific algorithm described in Table 6.

Table 5 Algebraic relationships between the minors of the three flattenings
Table 6 The “signed minor” method for computing the measure \(\varDelta _1\) for tree \(T_1\)

6 Simulation study

6.1 Phylogenetic methods tested

We conducted a comprehensive simulation study to compare the accuracy of the inference methods described in this paper. To facilitate the discussion we use the following abbreviations:

  • The measure formed from the binary squangles without the residual sum of squares decision rule (Sect. 3). In other words, compute the squangles \(q_1\), \(q_2\), and \(q_3\) and return the tree with the squangle that is closest to zero: “unsigned squangles” or US;

  • The binary squangles using the residual sum of squares decision rule given in Table 3: “signed squangles” or SS;

  • The binary squangles with the residual sum of squares decision rule and corrected for bias in the estimators, as described in Table 4: “bias-corrected signed squangles” or CSS;

  • The measure \(\varDelta _k = \Sigma _{i,j=1,2,3,4} m_{ij}^2(\text {Flat}_k(F))\) formed from the sum of squares of the 16 matrix minors (the edge identities) without the residual sum of squares decision rule: “unsigned minors” or UM;

  • The measure formed from the 16 matrix minors (the edge identities) with the residual sum of squares decision rule, as described in Table 6: “signed minors” or SM;

  • Neighbor-joining on distances that have been corrected for multiple substitutions using the formula \(d_{cor} = -0.5 \log (1 - 2d_{obs})\), where \(d_{obs}\) is the proportion of sites that differ between two aligned sequences: “neighbor-joining” or NJ;

  • Neighbor-joining on distances that have not been corrected for multiple substitutions: “uncorrected neighbor-joining” or UNJ;

  • The method proposed by Eriksson (2008) that is based on singular value decomposition of tensor flattenings: “ErikSVD”;

  • The subsequent modification to the ErikSVD method proposed by Fernández-Sánchez and Casanellas (2015) that normalises the tensor flattening before applying singular value decomposition: “Erik+2”.

Where known, the statistical properties of these methods are summarised in Table 7.

Table 7 Properties of the quartet inference methods discussed in this work (see Table 1 for definitions)

6.2 Generation of simulated data

All the simulations use a continuous time, symmetric Markov model. Edge length parameters correspond to the probability of a change along an edge (as opposed to the expected number of changes). Data were simulated on one of four types of tree: “Felsenstein”, “Farris”, “balanced”, or “unbalanced star” (Fig. 1). These trees were chosen as they have been widely studied in the literature concerning accuracy of different phylogenetic methods (Felsenstein 1978; Huelsenbeck and Hillis 1993; Swofford et al. 2001). Many methods are known to have biases on these tree shapes either (i) negatively, towards getting an incorrect tree (for the Felsenstein shape), or (ii) positively, towards getting the correct tree (for the Farris shape).

Fig. 1
figure 1

Left to right “Felsenstein tree”, “Farris tree”, “balanced tree”, and “unbalanced star”

We conducted three different sets of simulations. For all scenarios we simulated 1000 trees for each parameter combination and recorded how many were correctly inferred.

The first set of simulations explored the effect of sequence length on accuracy of the different methods. Data were simulated on each type of tree. Long branches had a 0.3 probability of a change and short branches had a 0.05 probability of a change. The sequence length was varied from 50 to 1600 in steps of 50.

The second set of simulations explored the effect of internal branch length on accuracy of the different methods. Data were simulated on each type of tree excluding the star tree. Long pendant branches had a 0.3 probability of a change and short pendant branches had a 0.05 probability of a change. The internal branch length was varied from 0 to 0.1 in steps of 0.01. Sequence length was fixed at 800 characters.

The third set of simulations solely focused on the “Felsenstein” tree and explored the effect of varying both the length of the two long branch lengths and the length of the three short branches. The short branch length was varied from 0.01 to 0.1 in steps of 0.01. The long branch length was varied from 0.1 to 0.4 in steps of 0.03. Sequence length was fixed at 400.

6.3 Simulation results

We present results for a subset of the methods and generating trees. The full simulation results, including heat maps, are available in Online Resource 3.

The results of the first set of simulations on the “Felsenstein” tree are presented in Fig.  2. The unsigned variants both perform poorly, while SM, SS, and NJ perform roughly equally well and CSS, the bias-corrected squangles, is the most accurate. The results on the “Farris” tree are presented in Fig. 3. For this tree the unsigned variants are most accurate, particularly for shorter sequence lengths. UM is more than 80% accurate even for sequence lengths of 50. SM, SS, and NJ perform roughly equally well and CSS, the bias-corrected squangles, is the least accurate. For the “balanced tree” (Fig. 4) all the signed methods and NJ performed about equally and were much more accurate than the unsigned methods and the methods based on singular value decomposition.

Fig. 2
figure 2

Accuracy of nine different phylogenetic methods for data simulated under varying sequence lengths on the “Felsenstein” tree (short branch lengths 0.05 and long branch lengths 0.3). The performance of NJ and SS is almost indistinguishable

Fig. 3
figure 3

Accuracy of nine different phylogenetic methods for data simulated under varying sequence lengths on the “Farris” tree (short branch lengths 0.05 and long branch lengths 0.3). Note that performance of NJ and SS is almost indistinguishable. The high accuracy for some methods reflects bias towards inferring the correct tree (c.f. the results for the “unbalanced star” in Fig. 5)

Fig. 4
figure 4

Accuracy of nine different phylogenetic methods for data simulated under varying sequence lengths on the balanced tree (internal branch length 0.05 and pendant branch lengths 0.3). Performance of the methods SM, CSS, SS, NJ and UNJ is almost indistinguishable and better than the performance of UM, US, ErikSVD and Erik+2

Fig. 5
figure 5

Performance of nine different phylogenetic methods for data simulated under varying sequence lengths on the unbalanced star tree (short branch lengths 0.05 and long branch lengths 0.3). The dashed horizontal line at 333.3 indicates ideal performance of an unbiased method

The simulations on the “unbalanced star” tree give us a more explicit opportunity to investigate the effect of (positive) bias inherent in the results for the Farris tree (Fig. 3). If a method is unbiased it should have no preference for any one quartet (and hence return each quartet roughly 1/3 of the time). In order to investigate this, the number of times the tree which groups the two long edges together was recorded (Fig. 5). UNJ is by far the most biased method. This is followed by UM which returns the tree that pairs the two long branches over 80% of the time, and US which returns this tree about 65% of the time. SM, SS and NJ are all less biased but return the tree that pairs the two long branches about 40% of the time. CSS is the only method tested that appears to be unbiased.

Overall the simulation scenarios, performance of our binary (two-state) implementations of ErikSVD and Erik+2 was relatively poor. This is in contrast to the excellent performance of this approach for four-state data reported by Fernández-Sánchez and Casanellas (2015). It is not immediately obvious why these methods are less effective on binary sequences.

The results of the second set of simulations on the “Felsenstein” tree are presented in Fig. 6. The unsigned variants both perform relatively poorly, SM, SS and NJ perform roughly equally well, and the CSS is the most accurate for all internal branch lengths tested.

Fig. 6
figure 6

Accuracy of nine different phylogenetic methods for varying internal edge lengths on the “Felsenstein” tree (short pendent branch lengths 0.05 and long pendent branch lengths 0.3). The internal branch length was varied from 0 to 0.1 in steps of 0.01. Sequence length was fixed at 800 characters

The results of the third set of simulations are presented as a series of heat maps in the Online Resource 3. Averaged over all the combinations of short and long branch lengths tested the methods ranked as follows for accuracy: CSS (81.4%), NJ (77.4%), SS (77.4%), SM (76.7%), Erik+2 (71.8%), US (56.1%), ErikSVD (56.0%), UM (52.1%), UNJ (40.3%).

Overall our results indicate that using a method (CSS) which has Properties I, II and III provides a highly accurate and unbiased method of quartet topology inference. The moderately improved performance of CSS relative to SS shows the benefit of correcting for the bias in the squangles, but that we can still reasonably use the squangles without this correction. For more on the future of bias correcting methods based on Markov invariants beyond binary state models, see Sect. 7.

7 Discussion and future work

The above analysis focuses exclusively on the binary case \(k = 2\). However, biologists are usually interested in studies where \(k = 4\), the DNA case. Here we discuss the extension of the above results to \(k = 4\).

7.1 The minors

In the Appendix (Online Resource 4) we establish that, when \(k = 2\), the 48 minors (16 minors from each of the 3 flattenings) form a 32-dimensional invariant subspace under the action of \(\text {GL}(4)\times \text {GL}(4)\) (expressed as left and right matrix multiplication). Further, in this scenario the binary squangles are elements of this invariant subspace and thus occur as certain linear combinations of the minors.

For the DNA case of \(k = 4\), similar representation theoretic arguments (as given in the Appendix, Online Resource 4) establish that the invariant subspace formed from the minors of the flattenings (now degree 5 minors of \(16\times 16\) matrices) does not contain any Markov invariants. This happens because the rank conditions on flattenings are invariant under the action of \(\text {GL}(16)\times \text {GL}(16)\) (again expressed as left and right matrix multiplication), whereas the Markov invariants are valid only under the two-step subgroup restriction:

  1. 1.

    \(\text {GL}(16)\times \text {GL}(16)\) to \(\times ^4\text {GL}(4)\equiv \left( \text {GL}(4)\times \text {GL}(4)\right) \times \left( \text {GL}(4)\times \text {GL}(4)\right) \);

  2. 2.

    each copy \(\text {GL}(4)\) thereof to Markov matrices \(\mathcal {M}_4\).

For \(k = 2\), it turns out there are so few possible invariant subspaces that the Markov invariants (binary squangles) happen to be in the subspace of polynomials spanned by the minors, i.e. they are linear combinations of the minors. For \(k = 4\), the minors and DNA squangles lie in distinct \(\times ^4 \text {GL}(4)\) invariant subspaces and it follows there is no linear combination of minors forming a Markov invariant (see the Appendix, Online Resource 4) and hence no chance a quartet inference measure formed from the minors can be made to satisfy Property II (strong).

Theorem 7.1

The DNA squangles \((k = 4)\) do not occur as linear combinations of minors of flattenings. As a consequence, there is no quartet inference measure based on minors and edge identities satisfying Property II (weak or strong).

Proof

See Appendix, Online Resource 4. \(\square \)

On the other hand, for any k the minors will continue to transform as a signed permutation representation of the relevant stabilizer subgroups so it is no problem to ensure Property I by a suitable choice of measure (any sum of squares of an orbit under the stabilizer subgroup will do).

Additionally, assuming that the relevant Markov matrices are sufficiently close to the identity matrix, similar arguments to those given in Sect. 5.2 can be made to determine the signs for the edge identities on quartets for any k (we leave the details of this for future work).

7.2 The squangles

As described by (Holland et al. 2013), the theory we presented here to build the basic residual sum of squares rule for \(k = 2\) extends to \(k = 4\) to provide a signed residual sum of squares rule for DNA data. However, this derivation does not include the computation of the unbiased forms \(S_i\) of the squares of the DNA squangles. While the representation-theoretic arguments for existence of the Markov invariants are similar, their construction is more complicated and there is no known way to compute them as a minor of a transformed flattening. We emphasize that the representation theory showing our tree measure has Property II (strong) extends to showing there is such a measure for \(k = 4\), as well as the behaviour under taxon permutations ensuring Property I.

An important addition to this paper over the results given by Holland et al. (2013) is our discussion of unbiased estimators of the parameters involved in the decision rule as discussed in Sect. 3. Since the binary squangles have relatively few terms, computing the unbiased forms of their squares is feasible by explicit squaring and bias correcting term by term. In particular, the binary squangles are cubic and square-free and thus we know that the only correction we need to make is for squared variables. However, for \(k = 4\) the DNA squangles are degree 5 polynomials in 256 variables with 66,744 terms each. Additionally, and most importantly, their explicit polynomial form is only known in a non-standard basis analogous to the change of basis used at the start of Sect. 4 (details are given by Sumner and Jarvis (2009)). Given that these polynomials would need to be squared and transformed to the natural (probability) basis, we consider the development of an unbiased square of the squangles for \(k = 4\) a challenging open problem.

Open Problem: Compute unbiased forms for the DNA squangles.

We close by pointing out that it is easy to argue that the SVD approach satisfies Property I but certainly does not satisfy Property II and it is not at all clear how to construct a correspondingly unbiased version of the SVD approach.