1 Introduction

Most of the commonly implemented models in molecular phylogenetics are based on the continuous-time Markov assumption. For these models, molecular substitution events (along an edge of a phylogenetic tree) are ruled by substitution rates. For DNA models—where the state space consists of the four nucleotides adenine, cytosine, guanine and thymine—12 substitution rates must be specified for each edge of the evolutionary tree, and the precise characteristics of the process are fixed by constraints on these rates. These constraints define a space of parameter values where each point corresponds to unknown evolutionary quantities such as base composition and mutation rate. As in all applied statistics, there is a trade-off between more complex, realistic models, and simpler, tractable models: complex models can provide very close fits to the observed data, but are more vulnerable to random error. A standard assumption in molecular phylogenetics is to work with homogeneous Markov chains, where the substitution rates are assumed to be constant in time.

The motivation behind our previous work (Sumner et al. 2012a) was to consider the consequences of allowing for some change in individual substitution rates which may well have occurred independently across the evolutionary history. With this perspective, the evolutionary process can still be modelled as a continuous-time Markov chain, but we must allow the process to be inhomogeneous, where the rates are allowed to vary as a function of time throughout the evolutionary history. This leads to considering evolutionary model classes that are “locally multiplicatively closed”, that is, models where the product of substitution matrices is still in the model as long as they are sufficiently close to the identity.Footnote 1 For such models, it is possible to interpret the time average of their inhomogeneous behaviour as a homogeneous process within the same model class. Many oft-used models, such as the general time-reversible model (Tavaré 1986; Posada and Crandall 1998), are not multiplicatively closed do not satisfy this property and this deficiency poses a problem for phylogenetic analysis in both flexibility of interpretation, and as a potential source of model-misspecification (Sumner et al. 2012). For a locally multiplicatively closed model, under some time restrictions it is possible to model evolutionary processes homogeneously, by interpreting the fitted substitution rates as an “average” of the true inhomogeneous process occurring on each branch of the tree. Sumner et al. (2012a) presented sufficient conditions for local multiplicative closure of continuous-time Markov chains, and this lead directly to the concept of Lie Markov models. These models arise when we demand the set of rate matrices of the model form a Lie algebra. This is a technical condition guaranteeing the corresponding set of substitution probability matrices will be locally multiplicatively closed, as desired. Moreover, we will show that this condition is actually sufficient (see Theorem 1). Mathematically, Lie Markov models can be regarded as a generalisation of other model classes, such as equivariant models introduced by Draisma and Kuttler (2008) or group-based models (Semple and Steel 2003; Michałek 2011; Donten-Bury and Michałek 2012).

Sumner et al. (2012a) discussed the symmetry properties of DNA models to nucleotide permutations, and noted the statistical relevance of these symmetries to likelihood calculations. The main result of that paper was a procedure to generate multiplicatively closed Markov models with a prescribed symmetry, which has desirable properties in terms of model selection. For instance, a biologist may wish that candidate models do not provide any natural groupings of nucleotides, and hence \({\mathfrak {S}}_4\) symmetry—i.e. the symmetry group of all possible nucleotide permutations—is appropriate. It is then a matter of choosing how many free parameters are appropriate for the given data set. The complete hierarchy of Lie Markov models with \({\mathfrak {S}}_4\) symmetry was derived by Sumner et al. (2012a).

In this paper, we deal with the case of locally multiplicatively closed Markov models whose symmetry is consistent with the grouping of nucleotides into purines and pyrimidines, i.e. \(AG\mid CT=\{\{A,G\},\{C,T\}\}\). As will be discussed, this motivates us to produce and examine the Lie Markov models with symmetry governed by the permutation subgroup of \({\mathfrak {S}}_4\) that preserves the purine/pyrimidine grouping:Footnote 2

$$\begin{aligned} {\mathcal {G}}:=\{e,(AG),(CT),(AG)(CT),(AC)(GT),(AT)(CG),(ACGT),(ATGC)\}, \end{aligned}$$

where \(e\) is the identity, or “do nothing”, permutation.

We will also go further than Sumner et al. (2012a) by exploring the definition of these models and investigate the geometrical properties that arise naturally when we deal with the tension between the algebraic formalism of Lie groups, where one works over the complex field, and the stochastic constraints of Markov models, where parameter values are constrained to be real and positive. In particular, we discuss the geometric embedding of the stochastic rate matrices within the vector space of complex rate matrices. These considerations motivate our definition of the stochastic cone of a Lie Markov model. Besides its geometrical interest, the stochastic cone is the set of stochastic rate matrices of the model and in a practical context is actually the main object of interest. We plan to discuss implementation and performance of the models we present here in a future publication.

Although our presentation focuses on the purine/pyrimidine grouping \(AG|CT\), given the appropriate nucleotide permutation, exactly the same hierarchy of models would arise if we were to consider the grouping \(AC|GT\), or the grouping \(AT|GC\). The reader should note that choosing the grouping \(AT|GC\) would give the classification of all Lie Markov models that preserve complementation \(A\leftrightarrow T,\; C\leftrightarrow G\) (see Yap and Pachter 2004). In particular, the “strand-symmetric” model defined by Casanellas and Sullivant (2005) arises in this way from our Model 6.6 (see Table 1). The conversion of our hierarchy of models from the \(AG|CT\) grouping to the \(AT|GC\) grouping would follow by simultaneously permuting the \(G\) and \(T\) rows and columns of the rate matrices in each model.

Table 1 Some Lie Markov models with purine/pyrimidine symmetry that may have special interest for biologists

In Sect. 2 we recall some of the basic definitions and tools introduced by Sumner et al. (2012a). We revisit the definition of Lie Markov models, and introduce the concept of the stochastic cone of a Lie Markov model. We also recall the basic results on group theory and representation theory necessary for the development of our results. In Sect. 3 we recall the idea of Lie Markov model with prescribed symmetry given by a permutation group \(G\). We introduce the ray-orbits of the corresponding stochastic cone, which are the orbits under the action of \(G\) of the rays of the stochastic cone. In Sect. 4, we take \(G={\mathcal {G}} \), decompose the space of rate matrices as a \({\mathcal {G}}\)-module and provide a basis consistent with this decomposition. We also determine the isomorphism classes of possible \({\mathcal {G}}\)-orbits and the decomposition of their (abstract) span into irreducible modules. In Sect. 5, we give the whole list of Lie Markov models with purine/pyrimidine symmetry. Each model is given by exhibiting a basis of the corresponding space of matrices as well as the ray-orbits of its stochastic cone. Among these models, we obtain the Jukes–Cantor model, the Kimura models with two or three parameters, the general Markov model and a number of new models that may have special interest for the biologists. Some of these are shown in Table 1 as an appetizer before the whole list, which itself can be found in explicit form as Supplementary Material online. Finally, in the conclusions we discuss implications and possibilities for future research.

2 Preliminaries

Throughout this section, we will recall some definitions and basic facts from Sumner et al. (2012a), which we also refer to for some proofs. We keep the assumptions and the notation already introduced there. In particular, we work over the complex field \({\mathbb {C}}\), and for simplicity refer to a matrix as “Markov” if the entries in each column sum to one. Later we will discuss how to specialise to the stochastic case where the entries must be real numbers in the range \([0,1]\). This will lead us to considering the stochastic cone of the Lie Markov model, which will be the set of real rate matrices with non-negative entries outside the diagonal.

We define the general Markov model \({\mathfrak {M}}_{GM}\) as the set of \(n\times n\) matrices whose columns sum to one:

$$\begin{aligned} {\mathfrak {M}}_{GM}:=\left\{ M\in {\mathbb {M}}_n({\mathbb {C}}) : \varvec{\theta }^T M =\varvec{\theta }^T \right\} , \end{aligned}$$

where \(\varvec{\theta }\) is the column \(n\)-vector with all its entries equal to \(1\), i.e. \(\varvec{\theta }^T=(1,1,\ldots ,1)\).

Recall that, in a homogeneous continuous-time Markov chain, the corresponding Markov matrices occur as exponentials \(M=e^{Qt}\), where \(Q\) is a “rate matrix” and \(t\) is time elapsed. We write

$$\begin{aligned} {\mathfrak {L}}_{GM}:=\left\{ Q\in M_n({\mathbb {C}}) : \varvec{\theta }^T Q =\mathbf{0}^T \right\} \end{aligned}$$

to indicate the set of all (complex) rate matrices. We refer to a Markov matrix \(M\in {\mathfrak {M}}_{GM}\), or a rate matrix \(Q\in {\mathfrak {L}}_{GM}\), as “stochastic” if its off-diagonal elements are real and positive.

Under matrix multiplication, the set

$$\begin{aligned} GL_1(n,{\mathbb {C}}):=\left\{ M\in {\mathbb {M}}_n({\mathbb {C}}) : \varvec{\theta }^T M =\varvec{\theta }^T ,\det (M)\!\ne \!0\right\} , \end{aligned}$$

forms a subgroup of the general linear group of invertible \(n\times n\) matrices with complex entries, i.e. \(GL_1(n,{\mathbb {C}})< GL(n,{\mathbb {C}})\). It contains the matrix exponential of any rate matrix, that is,

$$\begin{aligned} e^{{\mathfrak {L}}_{GM}}:=\left\{ e^Q : Q\in {\mathfrak {L}}_{GM}\right\} \subset GL_1(n, {\mathbb {C}}). \end{aligned}$$

We refer to \(e^{{\mathfrak {L}}_{GM}}\) as the general rate matrix model.

A Markov model \({\mathfrak {M}}\) is some subset \({\mathfrak {M}}\subseteq {\mathfrak {M}}_{GM}\) of the general Markov model containing the identity matrix \(\mathbf {1}\). A Markov model \({\mathfrak {M}}\) is locally multiplicatively closed if for all \(M_1, M _2 \in {\mathfrak {M}}\) in a neighbourhood of 1 we also have the matrix product \(M_1M_2\in {\mathfrak {M}}\). Similarly, given a subset \({\mathfrak {L}}\subseteq {\mathfrak {L}}_{GM}\) of rate matrices containing the null matrix, we refer to \(e^{\mathfrak {L}}\) as a rate matrix model. It is clear that all rate matrix models are Markov models, and we simplify terminology and also refer to \({\mathfrak {L}}\) as a “model”.

We are primarily interested in rate matrix models \({\mathfrak {M}}=e^{{\mathfrak {L}}}\) which that are locally multiplicatively closed. For such a model \({\mathfrak {M}}\), suppose that it is a smooth manifold around the identity matrix \(\mathbf {1}\), so there exist differentiable paths \(A(t)\in { \mathfrak {M}}\) with \(A(0)=\mathbf {1}\). Then, we can define the tangent space at the identity: \(T_{\mathbf {1}}({\mathfrak {M}})=\{A'(0): A(t)\in {\mathfrak {M}}, A(0)=\mathbf {1}\}\). The following theorem provides a characterization for these models. Recall that a \({\mathfrak {L}}\subset {\mathfrak {L}}_{GM}\) is a Lie algebra if for all \(Q_1, Q _2\in {\mathfrak {L}}\) and \(\lambda \in {\mathbb {C}}\):

  1. 1.

    \(Q_1+\lambda Q_2\in {\mathfrak {L}}\),

  2. 2.

    \(\left[ Q_1,Q_2\right] :=Q_1Q_2-Q_2Q_1\in {\mathfrak {L}}\).

The first condition states that \({\mathfrak {L}}\) is a vector space, and the second states that \({\mathfrak {L}}\) is closed under “Lie brackets”.

Theorem 1

(cf. Birkhoff 1938) A model \({\mathfrak {M}}=e^{\mathfrak {L}}\) is locally multiplicatively closed if and only if \(T_{\mathbf {1}}({\mathfrak {M}})\) is a Lie subalgebra of \({\mathfrak {L}}_{GM}\).

Proof

The sufficient condition is a consequence of the Baker–Campbell–Hausdorff formula (Campbell 1897): if \({\mathfrak {L}}\) forms a Lie algebra, there is a small ball \(B=B_{\varepsilon }\) around \(0\) in \({\mathfrak {L}}_{GM}\) such that if \(X,Y\in B\), then the product given by the Baker–Campbell–Hausdorff expansion

$$\begin{aligned} X*Y:=X+Y+\frac{1}{2}[X,Y]+\ldots \end{aligned}$$

is absolutely convergent and associative. Then, for any \(X,Y\in B\), we can write \( e^X e^Y=e^{X*Y}\). Define \(U=\mathrm{exp }(B\cap {\mathfrak {L}})\), which is a neighbourhood of \(\mathbf {1}\) in \({\mathfrak {M}}\), and conclude that if \(M_1,M_2\in U\), then \(M_1M_2\in {\mathfrak {M}}\).

Conversely, assume that \({\mathfrak {M}}=e^{\mathfrak {L}}\) is locally multiplicatively closed, so there is a neighbourhood \(U\) of \(\mathbf {1}\) in \({\mathfrak {M}}\) such that if \(M_1, M_2\in U\), then \(M_1M_2\in {\mathfrak {M}}\). Given \(X, Y\) in the tangent space \(T_{\mathbf {1}}({\mathfrak {M}})\), define \(A(t)=e^{tX},\; B(t)=e^{tY}\), and \(C(s,t)=A(t) B(s) A(t)^{-1}\). There exists some \(\varepsilon >0\) such that if \(0<t,s <\varepsilon \), then \(A(t), B(s), A(t)^{-1}\in U\) and also \(A(t) B(s)\in U\). Because of the assumption, we infer that \(C(s,t)\in {\mathfrak {M}}\). By taking derivatives, we conclude that

$$\begin{aligned} \frac{d}{ds} \left( \frac{dC(s,t)}{dt} \left| _{t=0} \right) \right| _{s=0}=[X,Y]\in T_{\mathbf {1}}({\mathfrak {M}}). \end{aligned}$$

It follows that \(T_{\mathbf {1}}({\mathfrak {M}})\) is a Lie subalgebra.\(\square \)

Presently, we recall the Lie algebra structure of the general Markov model (Johnson 1985; Sumner et al. 2012a). To this aim, consider the set of “elementary” rate matrices \(\{L_{ij}:1 \le i\ne j\le n\}\), where \(L_{ij}\) is the \(n\times n\) matrix with 1 in the \(ij\) entry, \(-\)1 in the \(jj\) entry and 0 everywhere else. The matrices \(\{L_{ij}\}_{i\ne j}\) form a \(\mathbb {C}\)-basis for the tangent space of \(GL_{1}(n,\mathbb {C})\) and, in particular, we can express any rate matrix \(Q\) as a linear combination:

$$\begin{aligned} Q=\sum _{i\ne j}\alpha _{ij}L_{ij}. \end{aligned}$$
(1)

This is a convenient basis for \({\mathfrak {L}}_{GM}\) because the stochastic condition on \(Q\) is simply that the coefficients \(\alpha _{ij}\) are real and non-negative. Moreover, if \(\delta _{ij}\) denotes the Kronecker delta (\(\delta _{ii}=1\) and \(\delta _{ij}=0\) when \(i\ne j\)), the equalities

$$\begin{aligned}{}[L_{ij},L_{kl}]=(L_{ij}-L_{jl})(\delta _{jk}-\delta _{jl})-(L _{kj}-L_{lj})(\delta _{il}-\delta _{jl}) \end{aligned}$$

exhibit the Lie algebra structure of \({\mathfrak {L}}_{GM}\).

Given a vector subspace \({\mathfrak {L}} \subset {\mathfrak {L}}_{GM}\), a stochastic generating set for \({\mathfrak {L}}\) is a generating set \(B_{\mathfrak {L}}=\{L_1,L_2,\ldots ,L_d\}\) of \({\mathfrak {L}}\) such that each \(L_k\) is a non-negative linear combination of the \(L_{ij}\), i.e. \(L_{k}=\sum _{i\ne j}\alpha _{ij}L_{ij}\) where \(\alpha _{ij}\ge 0\). A stochastic basis of \({\mathfrak {L}}\) is a stochastic generating set where the vectors are linearly independent.

Definition 1

(cf. Sumner et al. 2012a) A Lie Markov model is a Lie subalgebra \({\mathfrak {L}}\) of \({\mathfrak {L}}_{GM}\) for which there exists a stochastic basis.

Leaving the technical aspects aside, a Lie Markov model is a model for which the product of two substitution matrices is still in the model. The motivation of such models is given by the fact a non-homogeneous evolutionary processes can be described in a homogeneous fashion. In more concrete terms, if \(M_1, M_2 \in e^{{\mathfrak {L}}}\), then \(M_1 M_2 \in e^{{\mathfrak {L}}}\), i.e. for any inhomogeneous process on an edge where the rate matrices always lie within \({\mathfrak {L}}\), there is an equivalent homogeneous process on that edge, whose rate matrix also lies within \({\mathfrak {L}}\). This is not the case for the general time reversible model (the reader is referred to the paper by Sumner et al. (2012a) for a detailed proof of the non-closure of GTR).

Remark 1

By an elementary result in linear algebra, any generating set for a vector space can be reduced to a basis by removing elements, and hence Definition 1 would remain unchanged if “stochastic basis” were replaced with “stochastic generating set”. \(\diamond \)

We are especially interested in the study of the set of stochastic rate matrices of the model. The condition of Definition 1 ensures \({\mathfrak {L}}\) contains enough stochastic rate matrices (see Theorem 2 forthcoming), and it is useful to give some geometrical interpretation of this condition. To this aim, we need to recall some basic definitions on convex polyhedral cones.

Following the book by Alexandrov (2005), a convex polyhedral cone in \(\mathbb {R}^n\) is defined as a set

$$\begin{aligned} C=\{\lambda _1 v_1+\ldots +\lambda _r v_r : \lambda _i\ge 0\} \end{aligned}$$

generated by some finite set of vectors \(v_1,\ldots ,v_r\) in \({\mathbb {R}}^n\). Such vectors are called generators of the cone \(C\). The reader may note that, with this definition, every linear subspace of \({\mathbb {R}}^n\) is a convex polyhedral cone. When a convex polyhedral cone contains no nonzero linear subspaces, it is said to be strongly convex. In this case, which has special interest for us, any minimal system of generators of the cone is unique up to multiplication with positive scalars (Alexandrov 2005). The rays of the cone are the non-negative spans of each vector in a minimal system of generators, and they correspond to the 1-dimensional faces of the cone (see Fig. 1 for an illustration). Farkas’s theorem ensures the polyhedral cones can be equivalently defined as the intersection of finitely many halfspaces. It follows that the intersection of any two convex polyhedral cones in \({\mathbb {R}}^n\) is again a convex polyhedral cone.

Fig. 1
figure 1

A strongly convex polyhedral cone of dimension 3 with six rays (represented by arrows) and a convex polyhedral cone which is not strongly convex

Note 1

Consider a collection of vectors \(X=\{X_1,\ldots , X_r \}\). In what follows we will use the notation \({\mathbb {F}}X\) or \(\langle X_1,\ldots , X_r \rangle _{\mathbb {F}}\) to indicate the linear span of \(X\) over the field \({\mathbb {F}}\), where \({\mathbb {F}}={\mathbb {R}}\) or \({\mathbb {C}}\). That is,

$$\begin{aligned} {\mathbb {F}}X= \langle X_1,\ldots , X_r \rangle _{\mathbb {F}} :=\{\lambda _1 X_1+\ldots +\lambda _r X_r:\lambda _i \in {\mathbb {F}}\}. \end{aligned}$$

Of course, \({\mathbb {F}}X\) is a vector space. In particular, we can consider \(V:={\mathbb {C}}X\) as a complex vector space with dimension \(r\), or as a real vector space \(V={\mathbb {R}}X+{\mathbb {R}}(\mathbf{i}X)\) with dimension \(2r\). To distinguish these dimensions, we use the notation \( \dim _{{\mathbb {C}}}(V)=r\) and \( \dim _{{\mathbb {R}}}(V)=2r\).\(\diamond \)

The dimension of the cone \(C\) is defined as the dimension of the linear space \({\mathbb {R}} C=C+(-C)\) spanned by \(C\), i.e. \(\dim (C):=\dim ({\mathbb {R}}C)\). Of course, since a set of generators of a cone \(C\) is also a system of generators of the linear space \({\mathbb {R} }C\), we conclude that the number of rays of a cone is at least its dimension.

Returning to our setting, we consider the real vector space \({\mathfrak {L}}_{GM}^{\mathbb {R}}\) of dimension \(n(n-1)\) spanned by the \(n\times n\) elementary rate matrices \(L_{ij}, i\ne j\) defined above. We denote

$$\begin{aligned} {\mathfrak {L}}_{GM}^+:=\Bigg \{Q= \sum _{i\ne j}\alpha _{ij}L_{ij} : \alpha _{ij}\ge 0\Bigg \}, \end{aligned}$$

which is clearly a convex polyhedral cone in \({\mathfrak {L}}^{\mathbb {R}}_{GM}\). Given a (complex) vector subspace \({\mathfrak {L}}\) in \({\mathfrak {L}}_{GM}\), we consider

$$\begin{aligned} {\mathfrak {L}}^+:={\mathfrak {L}}\cap {\mathfrak {L}}^+_{GM}. \end{aligned}$$

Notice that all the entries of each matrix in \({\mathfrak {L}}^+\) are real and non-negative.

Theorem 2

For any (complex) vector subspace \({\mathfrak {L}},\; {\mathfrak {L}}^+=\mathfrak {L}\cap {\mathfrak {L}}^+_{GM}\) is a strongly convex polyhedral cone in \({\mathfrak {L}}_{GM}^{\mathbb {R}}\). The dimension of \({\mathfrak {L}}^+\) as a cone is less than or equal to the complex dimension of \({\mathfrak {L}}\), and equality holds if and only if \({\mathfrak {L}}\) has a stochastic generating set.

Proof

The set \({\mathfrak {L}}^+\) is the intersection of two convex polyhedral cones, so it is also a convex polyhedral cone. Moreover, being contained in \({\mathfrak {L}}^+_{GM}\) it is clear it contains no linear subspaces, so it is strongly convex, as required. Now, to show that the dimension of \({\mathfrak {L}}^+\) is less than or equal to the complex dimension of \({\mathfrak {L}}\), consider the vector space \({\mathbb {C}}{\mathfrak {L}}^+\) and observe it is a subspace: \({\mathbb {C}}{\mathfrak {L}}^+\subset {\mathfrak {L}}\). This implies \( \dim _{{\mathbb {C}}}({\mathbb {C}}{\mathfrak {L}}^+)\le \dim _{{\mathbb {C}}}({\mathfrak {L}})\), and since \({\mathfrak {L}}^+\) contains only real vectors, we have \( \dim _{{\mathbb {C}}}({\mathbb {C}}{\mathfrak {L}}^+)= \dim _{{\mathbb {R}}}({\mathbb {R}}{\mathfrak {L}}^+)=\dim ({\mathfrak {L}}^+)\le \dim _{{\mathbb {C}}}({\mathfrak {L}})\), as required. Now, assume \({\mathfrak {L}}\) has a stochastic generating set \(B_{{\mathfrak {L}}}\) so \(B_{{\mathfrak {L}}}\subset {\mathfrak {L}}^+\) and \({\mathbb {C}}B_{{\mathfrak {L}}}={\mathfrak {L}}\). As \(B_{{\mathfrak {L}}}\) contains only real vectors, we have \( \dim _{{\mathbb {R}}}({\mathbb {R}}B_{{\mathfrak {L}}})= \dim _{{\mathbb {C}}}({\mathbb {C}}B_{{\mathfrak {L}}})= \dim _{{\mathbb {C}}}({\mathfrak {L}})\); and because \({\mathfrak {L}}^+\) contains only real vectors and \({\mathfrak {L}}^+\subset {\mathfrak {L}}\), we have \(B_{\mathfrak {L}}\subset {\mathfrak {L}}^+\subset {\mathbb {R}}B_{\mathfrak {L}}\), so \({\mathbb {R}}B_{\mathfrak {L}}={\mathbb {R}}{\mathfrak {L}}^+\). Together this implies \( \dim _{{\mathbb {R}}}({\mathbb {R}}B_{\mathfrak {L}})= \dim _{{\mathbb {R}}}({\mathbb {R}}{\mathfrak {L}}^+)=\dim ({\mathfrak {L}}^+)= \dim _{{\mathbb {C}}}({\mathfrak {L}})\). Conversely, suppose \( \dim _{{\mathbb {C}}}({\mathfrak {L}})=\dim ({\mathfrak {L}}^+)\). Take a generating set for \({\mathbb {R}}{\mathfrak {L}}^+\) composed of vectors in \({\mathfrak {L}}^+\); by removing vectors in this generating set, we can always assume they actually form a basis \(B\subset {\mathfrak {L}}^+\) of \({\mathbb {R}}{\mathfrak {L}}^+\). Now consider the vector subspace \({\mathbb {C}}B\subset {\mathfrak {L}}\) and observe \( \dim _{{\mathbb {C}}}({\mathbb {C}}B)= \dim _{{\mathbb {R}}}({\mathbb {R}}B)= \dim _{{\mathbb {R}}}({\mathbb {R}}{\mathfrak {L}}^+)=\dim ({\mathfrak {L}}^+)= \dim _{{\mathbb {C}}}({\mathfrak {L}})\). Thus \({\mathbb {C}}B={\mathfrak {L}}\), as required. \(\square \)

Remark 2

The previous result implies that the vector subspace \({\mathfrak {L}}\) has a stochastic basis if and only if there is no drop in dimension when we restrict to the intersection with the positive orthant \({\mathfrak {L}}_{GM}^+\). From a practical perspective, it is the cone \({\mathfrak {L}}^+\) that contains the relevant part of the model, as only real matrices with non-negative off-diagonal entries make some sense as rate matrices of a continuous-time Markov chain. This observation justifies the requirement of stochastic basis in Definition 1.\(\diamond \)

Definition 2

The dimension of a Lie Markov model is the dimension of \(\mathfrak {L}\) as a complex vector space (which by virtue of Theorem 2 equals the dimension of \({\mathfrak {L}}^+\) as a cone). The stochastic cone of \({\mathfrak {L}}\) is the convex polyhedral cone \({\mathfrak {L}}^+\) and the rays of the model are the rays of \({\mathfrak {L}}^+\).

Remark 3

It is important to note that not every stochastic generating set of \({\mathfrak {L}}\) is a set of generators of the cone \({\mathfrak {L}}^+\). If this is the case and the set of generators is minimal, the positive linear span of each generator is a ray of the cone. \(\diamond \)

Background on group representation theory

In what follows we recall basic results from the representation theory of permutation groups \(G\le {\mathfrak {S}}_n\). We recommend the books by Sagan (2001) and James and Liebeck (2001) as an excellent introductions to the required material.

A (linear) representation of a group \(G\) is a group homomorphism \(\rho :G\rightarrow GL(V)\cong GL(m,{\mathbb {C}})\), where \(V\) is a \({\mathbb {C}}\)-vector space of dimension \(m\). In this situation, \(\rho \) provides an action of \(G\) on \(V\), and we say that \(V\) forms a \(G\)-module. A representation is said to be irreducible if it does not contain any proper \(G\)-submodules.

Let \(G\le {\mathfrak {S}}_n\) be a permutation group on \(n\) elements. Write \(\{V_i\}_{i=1,\ldots ,l}\) for the irreducible \(G\)-modules and \(\rho _i:G\rightarrow GL(V_i)\) for the corresponding group homomorphism. Since \(G\) is finite, any representation \(\rho :G\rightarrow GL(V)\) is completely reducible and there is a decomposition of the corresponding module \(V\) into irreducible parts called isotypic components, so we can write (Maschke’s theorem):

$$\begin{aligned} V\cong \oplus _{i=1}^\ell c_i V_i, \end{aligned}$$
(2)

where the \(c_i\) are non-negative integers specifying the number of copies of the module \(V_i\) in the decomposition of \(V\).

Example 1

The irreducible representations of \({\mathfrak {S}}_n\) are indexed by the integer partitions of \(n\) (Sagan 2001). The defining representation of \({\mathfrak {S}}_n\) is defined on the vector space \({\mathbb {C}}^n={\langle \{e_i\}_{1\le i\le n} \rangle }_{{\mathbb {C}}}\) by \(\sigma : e_i\mapsto e_{\sigma (i)}\). It decomposes as \(\{n\}\oplus \{n-1,1\}\), where \(\{n\}\) is the (one-dimensional) trivial representation and \(\{n-1,1\}\) has dimension \(n-1\). \(\diamond \)

Example 2

After identifying the nucleotides \(A,G,C,T\) with the integers \(1,2,3,4\), consider \({\mathcal {G}}\) as a subgroup of \({\mathfrak {S}}_4\):

$$\begin{aligned} {\mathcal {G}}:=\{e,(12),(34),(12)(34),(13)(24),(14)(23),(1324),(1423)\}. \end{aligned}$$

The group \({\mathcal {G}}\) has five conjugacy classes:

$$\begin{aligned} \left[ e \right]&= \{e\},\\ \left[ (12) \right]&= \{(12),(34)\}, \\ \left[ (12)(34)\right]&= \{(12)(34)\}, \\ \left[ (13)(24)\right]&= \{(13)(24),(14)(23)\}, \\ \left[ (1324)\right]&= \{(1324),(1423)\}. \end{aligned}$$

Recall the number of irreducible representations of a finite group is equal to the number of its conjugacy classes, and the sum of the dimension of each irreducible representation squared is equal to the order of the group (see Sagan 2001 for example). We conclude there are five irreducible representations of \({\mathcal {G}}\), which we denote as \({\mathtt {id}},\; {\mathtt {sgn}}\;\zeta _1,\; \zeta _2\), and \(\xi \); with the corresponding character table presented as Table 2. Notice the first row in the character table gives the dimension of each representation. Notice also there are four one-dimensional representations, namely \({\mathtt {id}}\) (the trivial representation), \({\mathtt {sgn}}\) (each permutation \(\sigma \) is mapped to \({\mathtt {sgn}}(\sigma )\)), \(\zeta _1\) and \(\zeta _2\). Besides these, the representation \(\xi \) is two-dimensional. The rows of Table 2 represent the conjugacy classes of \({{\mathcal {G}}}\). \(\diamond \)

Every irreducible module \(V_i\) of \(G\) has a projection operator associated to it:

$$\begin{aligned} \varTheta _{i}(\sigma ):=\textstyle {\frac{1}{|G|}}\sum _{\sigma \in G}\overline{\chi ^{i}(\sigma )}\rho (\sigma ), \end{aligned}$$
(3)

where \(\chi ^i: G\rightarrow \mathbb {C}\) is the character of the irreducible representation \(\rho _i\) defined as \(\chi ^i(\sigma ):=\text {tr}(\rho _i(\sigma ))\), i.e. the trace of the representing matrix \(\rho _i(\sigma )\). These operators project a given \(G\)-module \(V\) onto its isotypic components, i.e. \(\varTheta _{i}(V)=c_i V_{i}\), so they can be used to compute the \(c_i\) as well as to identify generators of the components.

Of course, we can restrict \(\rho \) to any subgroup \(H\le G\), inducing an \(H\)-module structure on making a \(H\)-module of \(V\). By virtue of Maschke’s theorem, we can also decompose \(V\) into the irreducible \(H\)-modules. Recall an irreducible representation of \(G\) does not necessarily stay irreducible when restricted to a subgroup \(H\) of \(G\). The branching rule \(G\downarrow H\) applies to describe the decomposition of the irreducible representations of \(G\) in terms of the irreducible representations of \(H\) (see Sagan 2001, Chap. 2.8). By applying orthogonality in the character tables of \({\mathfrak {S}}_{4}\) and \({\mathcal {G}}\) (see Table 2), and concentrating on the conjugacy class \([(12)(34)]\) in \({\mathfrak {S}}_4\) compared to the same class in \({\mathcal {G}}\), it is straightforward to derive the group branching rules shown in Table 3.

Table 2 The character tables of \(\mathfrak {S}_4\) and \({\mathcal {G}}=\{e,(12),(34),(12)(34),(13)(24),(14)(23),(1324),(1423)\}\)
Table 3 The branching rule of \({\mathfrak {S}}_4\downarrow {\mathcal {G}}\) describes the decomposition of the irreducible representations of \({\mathfrak {S}}_4\) when restricted to the subgroup \({\mathcal {G}}\)

Background on discrete group actions

Whenever a group \(G\) acts on a finite set \(B=\{b_1,\ldots ,b_t\}\), there is a group homomorphism

$$\begin{aligned} \rho :G\rightarrow {\mathfrak {S}}_t. \end{aligned}$$
(4)

A \(G\)-orbit in \(B\) is a subset \(\mathcal {B}=\{b_{i_1},b_{i_2},\ldots , b_{i_{l}}\}\subset B\) which is invariant under \(G\) and is minimal. That is

$$\begin{aligned} \sigma {\mathcal {B}}:=\{b_{i_{\rho (\sigma )(1)}},b_{i_{\rho (\sigma )(2)}},\ldots , b_{i_{\rho (\sigma )(l)}}\}={\mathcal {B}},\quad \text{ for } \text{ all } \sigma \in G, \end{aligned}$$

and \({\mathcal {B}}\) contains no smaller subsets with this property. From this, we can decompose \(B\) as a disjoint union of \(G\)-orbits: \(B=\mathcal {B}_1\cup \mathcal {B}_2 \cup \ldots \cup \mathcal {B}_k\).

If we focus on each orbit, the orbit stabilizer theorem (see Bogopolski 2008) states that, up to bijective correspondence, every \(G\)-orbit has the form of the quotient

$$\begin{aligned} G/H=\{[\sigma _1],\ldots ,[\sigma _q]\}, \qquad [\sigma _i]:=\sigma _i H, \end{aligned}$$

where \(H\) is a subgroup of \(G\) and the \(\sigma _i\in G\) are chosen so that the coset \(\sigma _j H \ne \sigma _i H\) if \(i\ne j\). The group operation of \(G\) induces an action in the finite set \(G/H\) by \(\sigma :\sigma _i H\mapsto (\sigma \sigma _i) H\). Actually, \(H\) is the stabilizer of some element \(x\in \mathcal {B},\; G_x:=\{g\in G:gx=x\}\). As \(G_x\le G\), and there are only finitely many subgroups of \(G\), it is thus possible to give a complete list of \(G\)-orbits (up to isomorphism) by simply listing all quotients \(G/H\) with \(H\le G\). We recall we can turn the quotient \(G/H\) into a \(G\)-module by considering the vector space generated by the cosets of \(G/H\):

$$\begin{aligned} {\langle G/H \rangle }_{{\mathbb {C}}}={\langle [e], [\sigma _2], \ldots , [\sigma _q] \rangle }_{{\mathbb {C}}}=\{v=c_1[e]+c_2[\sigma _2]+\ldots +c_q[\sigma _q]:c_i\in \mathbb {C}\}, \end{aligned}$$

with the action \(\sigma :v=\sum c_i[\sigma _i] \mapsto v'=\sum c_i[\sigma \sigma _i]\).

Back to the general case, the action of \(G\) on the set \(\mathcal {B}\) induces a representation of \(G\) in the vector space \(\mathbb {C}\mathcal {B}\). We will refer to these as permutation representations and they will play a key role throughout the paper. Notice they decompose as

$$\begin{aligned} {\mathbb {C}}{\mathcal {B}}\cong \oplus _{i=1}^k \langle G/H_i \rangle _{{\mathbb {C}}} \end{aligned}$$

where \(H_i\le G\) is the stabilizer of some element in the orbit \({\mathcal {B}}_i\). The reader should note this decomposition is not the decomposition into irreducible representations of (2). In fact it is possible to show that any nontrivial permutation representation is actually reducible.

3 Lie Markov models with prescribed symmetry

Sumner et al. (2012a) showed the search for Lie Markov models is significantly simplified by demanding the models to have some non-trivial symmetry since this reduces a potential infinity of models to just a number of special cases. The idea is to rely on imposing symmetry to assist in the search for Lie Markov models. An alternative strategy would be to enumerate all possible Lie Markov models and toss out those without the desired symmetry. However, unless the number of states is equal to 2 or 3, this approach is computationally infeasible. Thus, we are led to deal with the technicalities of this section in order to refine our search of the Lie Markov models with some prescribed symmetry. Of course, it is expected that the larger the symmetry we demand, the easier the analysis will be.

To this aim, recall the symmetric group \(\mathfrak {S}_n\) has an action on \({\mathfrak {L}}_{GM}\) defined on the elementary rate matrices as \(\rho (\sigma ) \cdot L_{ij} := L_{\sigma (i)\sigma (j)}\) for all \(\sigma \in {\mathfrak {S}}_n\), and extended to all of \({\mathfrak {L}}_{GM}\) by linearity. Equivalently, the action can be defined by

$$\begin{aligned} Q=\sum _{i\ne j}\alpha _{ij}L_{ij}\mapsto \sigma \cdot Q := K_{\sigma } Q K_{\sigma }^{-1}=\sum _{i\ne j}\alpha _{ij}L_{\sigma (i)\sigma (j)}, \end{aligned}$$
(5)

where \(K_{\sigma }\) is the permutation matrix associated to \(\sigma \).

Definition 3

(cf. Sumner et al. 2012a) We say a Lie Markov model \(\mathfrak {L}\) has the symmetry of the group \(G\le {\mathfrak {S}}_n\) if there is a basis \(B_{\mathfrak {L}}\) of \({\mathfrak {L}}\) invariant under the action of \(G\) induced by (5), that is, a basis \(B_{\mathfrak {L}}=\{L_1,L_2,\ldots , L_d\}\) such that

$$\begin{aligned} \sigma \cdot B_{\mathfrak {L}}:=\left\{ K_{\sigma }L_{1}K^{-1}_{\sigma },K_{\sigma }L_{2}K^{-1}_{\sigma },\ldots ,K_{\sigma }L_{d}K^{-1}_{\sigma }\right\} =B_{\mathfrak {L}},\quad \forall \sigma \in G. \end{aligned}$$

In this case, we will say that \(B_{{\mathfrak {L}}}\) is a permutation basis of \({\mathfrak {L}}\).

Notice a Lie Markov model \({\mathfrak {L}}\) has the symmetry of \(G\) if and only if there is a permutation representation of \(G\) on \({\mathfrak {L}}\), so we have a decomposition \({\mathfrak {L}}\cong \oplus _{i=1}^k \langle G/H_i\rangle _{{\mathbb {C}}}\). A permutation basis for \({\mathfrak {L}}\) is then obtained by collecting a permutation basis \(B_i\) for each \(\langle G/H_i\rangle _{{\mathbb {C}}}\) and putting them together.

Remark 4

Notice if \(\mathfrak {L}\) has the symmetry of a permutation group \(G\), then it also has the symmetry of any subgroup \(H\le G\). \(\diamond \)

The reader is referred to Sumner et al. (2012a) for the statistical motivations for this definition. In a nutshell, parameter estimation under such a model is invariant under nucleotide permutations belonging to \(G\). In particular, we have a group homomorphism

$$\begin{aligned} \rho :G \rightarrow \mathfrak {S}_d, \end{aligned}$$
(6)

where the image of any permutation \(\sigma \in G\) is determined by the equality \(K_{\sigma }L_{i}K^{-1}_{\sigma }=L_{\rho (\sigma )(i)}\). Thus, for any rate matrix \(Q=\sum _{i=1}^d\alpha _iL_i\in \mathfrak {L}\), we have

$$\begin{aligned} \sigma : Q=\sum _{i=1}^d\alpha _iL_i\mapsto \sum _{i=1}^d\alpha _iL_{\rho (\sigma )(i)}=\sum _{i=1}^d\alpha _{\rho (\sigma ^{-1}(i))}L_{i}, \end{aligned}$$

so \(G\) acts by permuting the model parameters, ie. \(\alpha _i\mapsto \alpha _{\rho (\sigma ^{-1}(i))}\), and hence leaves maximum likelihood estimates invariant.

Example 3

(Sumner et al. 2012a) The list of 4-state Lie Markov models with \({\mathfrak {S}}_4\) symmetry is:

  1. 1.

    the 1-dimensional model by Jukes and Cantor (1969);

  2. 2.

    the 3-dimensional model by Kimura (1981);

  3. 3.

    the 4-dimensional model by Felsenstein (1981);

  4. 4.

    the Kimura+Felsenstein model or “K3ST+F81”, with dimension 6; see Example 5 below (Sumner et al. 2012);

  5. 5.

    the General Markov model, with dimension 12.

Presently, we recall the general procedure to obtain Lie Markov models with prescribed symmetry. Suppose we have a Lie Markov algebra \(\mathfrak {L}\) with dimension \(d\) and a permutation group \(G\le {\mathfrak {S}}_n\). We demand that \(\mathfrak {L}\) satisfies the conditions of Definition 3 for the permutation group \(G\). Then, \({\mathfrak {L}}\) is provided with a basis \(B_{{\mathfrak {L}}}\) which is invariant under \(G\). As explained above, we have a decomposition of \(B_{{\mathfrak {L}}}\) into \(G\)-orbits. We can then compare the irreducible \(G\)-modules that occur in the decomposition of \(\mathfrak {L}_{GM}\) to those that occur in the decomposition of \({\langle G/H \rangle }_{{\mathbb {C}}}\) for each \(H\le G\). Finally, we can attempt to construct subalgebras \(\mathfrak {L}\subset \mathfrak {L}_{GM}\) with a basis \(B_\mathfrak {L}\) such that \(B_\mathfrak {L} =\mathcal {B}_1\cup \mathcal {B}_2 \cup \ldots \cup \mathcal {B}_r\) is a plausible union of orbits \(\mathcal {B}_i\) consistent with the linear decomposition of \(\mathfrak {L}_{GM}\) induced by the action of \(G\).

The general procedure is:

  1. 1.

    Decompose the Lie algebra of the GM model into irreducible modules of \(G\):

    $$\begin{aligned} \mathfrak {L}_{GM}=\oplus _{k}f_kV_{k}, \end{aligned}$$
    (7)

    where \(k\) labels the irreducible \(G\)-module \(V_{k}\) and the \(f_k\) are non-negative integers specifying the number of copies of each irreducible module in the decomposition.

  2. 2.

    Apply the orbit stabilizer theorem and construct the list of \(G\)-orbits, \(G/H\), by working through the subgroups \(H\le G\). For each subgroup \(H\), extend the orbits linearly over \(\mathbb {C}\) to the \(G\)-module \({\langle G/H \rangle }_{{\mathbb {C}}}\) and decompose this space into irreducible \(G\)-modules:

    $$\begin{aligned} {\langle G/H \rangle }_{{\mathbb {C}}}\cong \oplus _{k}b^{H}_{k}V_{k}, \end{aligned}$$

    where again the \(b^{H}_{k}\) are non-negative integers.

  3. 3.

    Working up in dimension \(d\), consider all unions of \(G\)-orbits \(S=\bigcup _{i=1}^q(G/H_i)\) such that \(\left| S\right| =\sum _{1\le i\le q}\left| G/H_i\right| = d\) (where \(|\cdot |\) stands for cardinality). For each \(S\), consider its linear decomposition into irreducible \(G\)-modules

    $$\begin{aligned} {\langle S \rangle }_{{\mathbb {C}}}\cong \oplus _k a_k V_{k} \end{aligned}$$

    where \(a_k:=b^{H_1}_{k}+b^{H_2}_{k}+\ldots +b^{H_q}_{k}\), and, in order to exclude unions of \(G\)-orbits that do not occur in the linear decomposition of \(\mathfrak {L}_{GM}\) as a \(G\)-module, check \(a_k\le f_k\), for each \(k\).

  4. 4.

    For each case thus identified, consider the vector space \(\mathfrak {L}=\oplus _{k}a_{k} V_{k}\) and use explicit computation to check whether \(\mathfrak {L}\) forms a Lie algebra. If so, attempt to show it has a stochastic basis.

This procedure is guaranteed to produce all Lie Markov models with symmetry \(G\). In Sect. 5, we will give a complete presentation of the 4-state models with purine/pyrimidine symmetry.

Remark 5

In our procedure we first look for all possible decompositions into irreducible modules for a permutation representations and we investigate how these decompositions are realised into Lie subalgebras of \({\mathfrak {L}}_{GM}\). A different approach would be to deal first with possible Lie subalgebras of \({\mathfrak {L}}_{GM}\) (up to isomorphism) and then, for each isomorphism class, look for possible subalgebras which are permutation representations of \(G\). Our experience tells us this second part is rather unfeasible and, in the last section, we adopt the procedure just explained. \(\diamond \)

Remark 6

Equivariant models were first introduced by Draisma and Kuttler (2008) and have also been studied by Casanellas and Fernández-Sánchez (2010), and Casanellas et al. (2012). Sumner et al. (2012a) modified slightly the definition of equivariant models to adapt it to the continuous-time Markov model setting. Under this definition, equivariant models appear as a particular case of Lie Markov models. Actually, the \(G\)-equivariant model is the Lie Markov model with \(G\) symmetry obtained when we take \({\mathfrak {L}}\) to be the isotypic component of \({\mathfrak {L}}_{GM}\) associated to the trivial or identity representation \({\mathtt {id}}\) (which maps each permutation to the identity map): \({\mathfrak {L}}=f_{{\mathtt {id}}} V_{{\mathtt {id}}}\) (see (7)). For example, the \({\mathfrak {S}}_4\)-equivariant model is the Lie Markov model with symmetry \({\mathfrak {S}}_{4}\) and decomposition \({\mathfrak {L}}\cong {\mathtt {id}}\): it is the model by Jukes and Cantor (1969). In a similar way, we will recover the Kimura model with two parameters (Kimura 1980) as the Lie Markov model with symmetry \({\mathcal {G}}\) and decomposition \({\mathfrak {L}}\cong 2 {\mathtt {id}}\) (see Model 2.2b in Sect. 5). \(\diamond \)

The stochastic cone of a Lie Markov model

We want to explore the geometry of the stochastic cone associated to a Lie Markov model with symmetry given by some permutation group \(G\le {\mathfrak {S}}_n\). Since the action of \(G\) on \({\mathfrak {L}}_{GM}\) is as given in (5), we infer that the space \( {\mathfrak {L}}_{GM}^+\) is invariant under this action, i.e. \( G {\mathfrak {L}}_{GM}^+= {\mathfrak {L}}_{GM}^+\). From this, we conclude that if \({\mathfrak {L}}\subset {\mathfrak {L}}_{GM}\) is a vector subspace which is invariant under the action of \(G\), then the stochastic cone \( {\mathfrak {L}}^+={\mathfrak {L}}\cap { \mathfrak {L}}^+_{GM}\) is invariant under \(G\) as well.

Because each permutation in \(G\) induces a linear automorphism in \({\mathfrak {L}}_{GM}\) and the cone \({\mathfrak {L}}^+\) is invariant, the set of rays of the cone must also be invariant under the action of \(G\). We infer that, after giving an ordering to the set of rays, there is a group homomorphism

$$\begin{aligned} G\rightarrow \mathfrak {S}_r, \end{aligned}$$
(8)

where \(r\) is the number of rays of \(\mathfrak {L}^+\). From this, we can decompose the set of rays of \({\mathfrak {L}}^+\) into \(G\)-orbits, which we will refer to as ray-orbits. Notice in general, the above homomorphism is different from the homomorphism arising from a permutation basis, as described in (6).

Example 4

The number of rays of \({\mathfrak {L}}_{GM}^+\) is \(n(n-1)\). These rays are exactly the positive span of the elementary rate matrices \(L _{ij}\). The group homomorphism \(G\rightarrow \mathfrak {S}_r\) of (8) corresponds to the action described in (5).\(\diamond \)

Example 5

We know there is only one six-dimensional Lie Markov model with \({\mathfrak {S}}_4\) symmetry (Sumner et al. 2012a, Result 17). The Lie algebra \({\mathfrak {L}}\) is the vector space sum of the model by Kimura (1981) and the model by Felsenstein (1981). It is generated by

$$\begin{aligned} W_{ij}:=L_{s(ij)}+(R_i+R_j), \qquad i<j, \quad i,j \in \{1,2,3,4\}, \end{aligned}$$

where \(R_i=\sum _{j\ne i}L_{ij}\) and \(L_{s(ij)}=L_{ij}+L_{ji}+L_{kl}+L_{lk}\) with \(i,j,k,l\) all different. The reader may notice, although the six vectors \(W_{ij}\) do form a permutation basis of \({\mathfrak {L}}\), by taking the convex cone generated by them, \( \{\sum \lambda _{ij} W_{ij} : \lambda _{ij}\ge 0 \}\), we are not considering all the stochastic rate matrices in the model. For example, the vector \(R_1\) is in the stochastic cone \({\mathfrak {L}}^+\) but we cannot obtain it as a positive linear combination of the vectors \(W_{ij}\).

The reader may argue this situation occurs because of our particular choice of a permutation basis, but this will be the case no matter the permutation basis of \({\mathfrak {L}}\) we consider. Actually, the stochastic cone \({\mathfrak {L}}^+\) has seven rays \(\{L_{\alpha },L_{\beta },L_{\gamma },R_1,R_2,R_3,R_4 \}\) (with the notation used there: \(L_{\alpha }=L_{s(12)},L_{\beta }=L_{s(13)},L_{\gamma }=L_{s(12)}\)). We will find this model again in Sect. 5 of this paper as Model 6.7a. \(\diamond \)

4 Decomposition of \({\mathfrak {L}}_{GM}\) as a \({\mathcal {G}}\)-module

As we are especially interested in nucleotide evolution, we fix \(n=4\) and deal with the group of permutations that preserves the partitioning of nucleotides into purines and pyrimidines: \(AG|CT:=\{\{A,G\},\{C,T\}\}\).

By identifying nucleotides \(\{A,G,C,T\}\) with numbers \(\{1,2,3,4\}\), this leads to consider the subgroup \({\mathcal {G}}\) of \({\mathfrak {S}}_4\) presented in Example 2:

$$\begin{aligned} {\mathcal {G}}:=\{e,(12),(34),(12)(34),(13)(24),(14)(23),(1324),(1423)\}. \end{aligned}$$

Of course, we expect to recover the Kimura model with three parameters in the list of Lie Markov models with this symmetry: it is Model 3.3a. However, as already noted in Remark 3, this model has a wider symmetry and in fact, it is \(\mathfrak {S}_4\)-symmetric (see Sumner et al. 2012a).

Presently, we use the projection operators to decompose the Lie algebra of the general Markov model into the irreducible representations of \({\mathcal {G}}\).

Remark 7

The reader may notice the irreducible characters of the group \({\mathcal {G}}\) in Table 2 take only real values. As a consequence, irreducible real representations remain irreducible over the complex field and all the representation theory for \({\mathcal {G}}\) can be dealt over the real field. However, we prefer to keep our study over the complex as this is the field where the general theory is developed. For instance, it is important to work over the complex field when computing the full list of \({\mathcal {G}}\)-submodules of \({\mathfrak {L}}_{GM}\) isomorphic to \(\oplus _k a_k V_k\) (see the step 4 of the procedure of Sect. 3): to this aim, we apply that the only \({\mathcal {G}}\)-endomorphisms of an irreducible module are of the form \(\lambda {\mathbf {1}}\) (Schur’s lemma), which is known to be false if the field is not algebraically closed. \(\diamond \)

From now on, we will consider the restriction of the action of \({\mathfrak {S}}_4\) described in (5) to the group \({{\mathcal {G}}}\). We will denote this action by \(\rho _{{\mathcal {G}}}\):

$$\begin{aligned} \rho _{{\mathcal {G}}}(\sigma ): Q\mapsto K_{\sigma } Q K_{\sigma }^{-1}. \end{aligned}$$
(9)

Sumner et al. (2012a, Result 8) showed the decomposition of the \({\mathfrak {L}}_{GM}\) into the irreducible representations of \({\mathfrak {S}}_4\) (expressed using integer partitions of 4) is \({\mathfrak {L}}_{GM}\cong \{4\}\oplus 2\{31\} \oplus \{2^2\} \oplus \{21^2\}\). By applying the branching rule of \({\mathfrak {S}}_4\) to \({\mathcal {G}}\) (see Table 3) we obtain:

Theorem 3

The decomposition of the 4-state general rate matrix model \(\mathfrak {L}_{GM}\) into irreducible representations of \({{\mathcal {G}}}\) is given by

$$\begin{aligned} \mathfrak {L}_{GM}\cong 2\, {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1 \oplus 2\, \zeta _2 \oplus 3\, \xi , \end{aligned}$$
(10)

where the decomposition of the dimension is given by \(12=2\times 1+1+1+2\times 1 +3\times 2\).

4.1 Decomposition of the orbits of \({\mathcal {G}}\) in \({\mathfrak {L}}_{GM}\)

Following the general scheme described in Sect. 3, our task now is to identify the Lie Markov models occurring as subalgebras of \({\mathfrak {L}}_{GM}\) and with symmetry \({{\mathcal {G}}}\). In Table 4 we present the decomposition of the orbits of \({{\mathcal {G}}}\). These are computed by using the orbit stabilizer theorem and projecting \({\langle {{\mathcal {G}}}/H \rangle }_{{\mathbb {C}}}\) onto the irreducible modules \(V_i\) of \({{\mathcal {G}}}\) using the projection operators \(\varTheta _i\) defined in (3).

Table 4 Decomposition of the orbits of \({\mathcal {G}}\) into irreducible modules

Example 6

Here we develop the case of \({\mathcal {H}}=\{e,(12)(34)\}\) as an illustrative example. We have \({{\mathcal {G}}} \, / \, {\mathcal {H}}=\{ [e],[(12)],[(13)(24)],[(1324)] \}\), where \([\sigma ]\) represents the coset in \({{\mathcal {G}}}/{\mathcal {H}}\) containing the element \(\sigma \). Namely, \({[e]}= \{e,(12)(34)\}\), \({[(12)]}= \{(12),(34)\}\), \({[(13)(24)]} = \{(13)(24),(14)(23)\}\) and \({[(1324)]} = \{(1324),(1423)\}\). These cosets inherit an action of \({{\mathcal {G}}}\) by taking \(\sigma : [\sigma ']\mapsto [\sigma \sigma ']\), which can be extended linearly to a linear representation of \({{\mathcal {G}}}\) by taking the module \({\langle {{\mathcal {G}}}\, / \,{\mathcal {H}} \rangle }_{{\mathbb {C}}}\cong {\mathbb {C}}^4\). Next, we decompose \({\langle {{\mathcal {G}}}\, / \,{\mathcal {H}} \rangle }_{{\mathbb {C}}}\) into irreducible modules of \({{\mathcal {G}}}\) by applying the projection operators: \(\varTheta _{{\mathtt {id}}},\;\varTheta _{{\mathtt {sgn}}},\; \varTheta _{\zeta _1},\;\varTheta _{\zeta _2}\) and \(\varTheta _{\xi }\). For example:

$$\begin{aligned} \varTheta _{{\mathtt {id}}}[e]= \textstyle {\frac{1}{8}}\sum _{\sigma \in {{\mathcal {G}}}}\sigma \cdot [e] =\textstyle {\frac{1}{4}}\left( [e]+[(12)]+[(13)(24)]+[(1324)]\right) . \end{aligned}$$

As this projection is non-zero, we conclude \({\langle {{\mathcal {G}}}/{\mathcal {H}} \rangle }_{{\mathbb {C}}}\) contains the trivial representation \({\mathtt {id}}\). We can check that the image by \(\varTheta _{{\mathtt {id}}}\) of the other coset elements gives the same projection, so \({\langle {{\mathcal {G}}}/{\mathcal {H}} \rangle }_{{\mathbb {C}}}\) contains \({\mathtt {id}}\) only once. Similarly, referring to the character table of \({\mathfrak {S}}_{4}\) (see Table 2), we have

$$\begin{aligned} \varTheta _{{\mathtt {sgn}}}[e]= \textstyle {\frac{1}{8}}\sum _{\sigma \in {{\mathcal {G}}}}\chi ^{{\mathtt {sgn}}}(\sigma ) \sigma \cdot [e] = \textstyle {\frac{1}{4}}\left( [e]-[(12)]+[(13)(24)]-[(1324)]\right) , \end{aligned}$$

and we check that \(\varTheta _{{\mathtt {sgn}}}[e]=\varTheta _{{\mathtt {sgn}}}[(12)]=\varTheta _{{\mathtt {sgn}}}[(13)(24)]=\varTheta _{{\mathtt {sgn}}}[(1324)]\) to learn that \({\langle {{\mathcal {G}}}/{\mathcal {H}} \rangle }_{{\mathbb {C}}}\) does contain a copy of the \({\mathtt {sgn}}\) representation. Similarly, we check \(\langle {{\mathcal {G}}}/{\mathcal {H}}\rangle _{\mathbb {C}}\) contains a copy of \(\zeta _1\) and \(\zeta _2\) representations. On the other hand, we see

$$\begin{aligned} \varTheta _{\xi }[e]&= \textstyle {\frac{1}{8}}\sum _{\sigma \in {{\mathcal {G}}}}\chi ^{\xi }(\sigma )\cdot [e] =\textstyle {\frac{1}{4}}\left( [e]-[(12)(34)]\right) =0, \end{aligned}$$

and we check \(\varTheta _{\xi }[(12)]=\varTheta _{\xi }[(13)(24)]=\varTheta _{\xi }[(1324)]=0\) to learn \({\langle {{\mathcal {G}}}/{\mathcal {H}} \rangle }_{{\mathbb {C}}}\) does not contain a copy of the \(\xi \) representation. Putting this together and counting dimensions, we infer that \({\langle {{\mathcal {G}}}/{\mathcal {H}} \rangle }_{{\mathbb {C}}}\cong {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1 \oplus \zeta _2\).\(\diamond \)

Proceeding as in this example, we have produced the results summarised in Table 4. It gives the decomposition of \({\langle {{\mathcal {G}}}/H \rangle }_{{\mathbb {C}}}\) into irreducible representations for each subgroup \(H\le {{\mathcal {G}}}\). The first column shows how many copies of each \(H\) occur as a subgroup in \({{\mathcal {G}}}\), with automorphism classes accounted for with distinct decomposition in the fourth column. For example, there are three automorphism classes of \(\mathbb {Z}_2\) in \({{\mathcal {G}}}: \{e,(12)\} \cong \{e,(34)\},\; \{e,(12)(34)\}\) and \(\{e,(13)(24)\} \cong \{e,(14)(23)\}\), and the corresponding spaces \({\langle {{\mathcal {G}}}/H \rangle }_{{\mathbb {C}}}\) have different decomposition into irreducible modules, as shown in Table 4. Similarly, there are two “types” of \({\mathfrak {S}}_2\times {\mathfrak {S}}_2\): \(\{e,(12),(34),(12)(34)\}\) and \(\{e,(12)(34),(13)(24),(14)(23)\}\). Again, these two types have differing decomposition into irreducible subspaces.

Finally, Table 5 shows all possible decompositions for a \({\mathcal {G}}\)-invariant subspace of \({\mathfrak {L}}_{GM}\) allowed by the decomposition of \({\mathfrak {L}}_{GM}\) of Theorem 3. The list is obtained by adding decompositions of \({\mathcal {G}}\)-orbits (see Table 4) as long as they are allowed by the decomposition of \({\mathfrak {L}}_{GM}\) as a \({\mathcal {G}}\)-module (see Theorem 3). Note the decomposition (10) of \(\mathfrak {L}_{GM}\) has two copies of the trivial representation while the decomposition of each \({{\mathcal {G}}}/H\) has only one copy.

Table 5 Decompositions into irreducible modules of all possible \({\mathcal {G}}\)-permutation subrepresentations of \({\mathfrak {L}}_{GM}\cong 2 {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1\oplus 2 \zeta _2 \oplus 3 \xi \) (see Theorem 3)

Referring to Table 5, we conclude:

Theorem 4

There are no Lie Markov models with purine/pyrimidine symmetry of dimension seven or eleven.

Remark 8

Being a \(G\)-orbit, we can consider the abstract vector space generated by any ray-orbit \(B=\{Q_1,\ldots ,Q_r\}\), that is, \(\{\sum _{i=1}^r a_i [ Q_i ] : a_i\in {\mathbb {C}}\}\), where the notation \([Q_i]\) is used to emphasise the fact that we are avoiding any reference to matrix addition between the elements of the ray-orbit. The dimension of this vector space equals the number of elements in the orbit, and as a permutation representation, the decomposition into irreducible representations will be one of the decompositions shown in Table 4. On the other hand, we can also consider the vector subspace of \({\mathfrak {L}}_{GM}\) spanned by the matrices \(Q_1,\ldots ,Q_r\). Notice that these matrices may not be may be not linearly independent as vectors of \({\mathfrak {L}}_{GM}\) and the dimension of this vector subspace will be smaller than the number of them. In this case, this vector space is not a permutation representation and its decomposition into irreducible representations does not appear in Table 4. For an example of this, the reader is referred to ray-orbits \((4,\frac{1}{3},\frac{2}{3})d, (4,\frac{1}{3},\frac{2}{3})e,(4,\frac{1}{3},\frac{2}{3})f\) presented in Table 8. \(\diamond \)

4.2 A convenient basis

In this section we derive a basis for the vector space of \(4\times 4\) rate matrices \({\mathfrak {L}}_{GM}\) where the matrices comprising the basis are organised naturally into subsets that span each of the irreducible components of the decomposition of \({\mathfrak {L}}_{GM}\) with respect to \({\mathcal {G}}\) (as given in Theorem 3). This basis is presented in Theorem 5 below. The reader should note these basis vectors play the role of the \(L_{ij}\) when models with \(\mathfrak {S}_4\) symmetry were considered (Sumner et al. 2012a).

Permutation vectors

For each \(\sigma \in {\mathcal {G}}, \sigma \ne e\), a permutation vector is defined as

$$\begin{aligned} L_{\sigma }:=-{\mathbf {1}}+K_{\sigma }=\sum _{1\le j\le 4} L_{j\sigma (j)}. \end{aligned}$$

Notice that each \(L_{\sigma }\) is a rate matrix in \({\mathfrak {L}}_{GM}\). The linear span of these vectors has dimension 5 because of the linear dependencies \(L_{(12)}+L_{(34)}= L_{(12)(34)}\), and \( L_{(1324)}+L_{(1423)}=L_{(13)(24)}+L_{(14)(23)}\). Moreover, the permutation vectors span a Lie algebra (Sumner et al. 2012a, Proposition 4.12):

$$\begin{aligned} \left[ L_\sigma ,L_{\sigma '}\right] =\left[ -{\mathbf {1}}+K_{\sigma },-{\mathbf {1}}+K_{\sigma '}\right] =\left[ K_{\sigma },K_{\sigma '}\right] =K_{\sigma \sigma '}-K_{\sigma '\sigma }=L_{\sigma \sigma '}-L_{\sigma '\sigma }. \end{aligned}$$

The permutation vectors are useful because they provide simple expressions of generators of \({\mathfrak {L}}_{GM}\) consistent with the decomposition of Theorem 3. The action \(\rho _{{\mathcal {G}}}\) of \({{\mathcal {G}}}\) on these permutation vectors is given by \(\tau : L_{\sigma } \mapsto K_{\tau } L_{\sigma } K_{\tau }^{-1}= L_{\tau \sigma \tau ^{-1}}\). Notice this action maps each matrix \(L_{\sigma }\) to \(L_{\sigma '}\), where \(\sigma '\) is some permutation in the conjugacy class of \(\sigma \). It follows that the vectors \(\{L_{\sigma }: \sigma '\in [\sigma ]\}\) span a \({\mathcal {G}}\)-module, and by applying character theory we can obtain the decomposition of these \({\mathcal {G}}\)-modules into isotypic components. Moreover, a basis for these \({\mathcal {G}}\)-modules consistent with these decompositions can be described with the assistance of the projection operators. The following example illustrates this procedure.

Example 7

Consider the 2-dimensional subspace \(S=\langle L_{(12)}, L_{(34)}\rangle _{\mathbb {C}}\) corresponding to the conjugacy class \([(12)]=\{(12),(34)\}\), and the representation \(\rho _{{\mathcal {G}}}:{\mathcal {G}}\rightarrow GL(S)\) induced by the action just defined. It is straightforward to check \(\tau (12) \tau ^{-1}=(12),\; \tau (34) \tau ^{-1}=(34)\) if \(\tau \in \{e,(12),(12)(34)\}\), while \(\tau (12) \tau ^{-1}=(34),\; \tau (34) \tau ^{-1}=(12)\) if \(\tau \in \{(13)(24),(1324)\}\). Adopting matrix notation, we obtain

$$\begin{aligned}&\rho _{\mathcal {G}}(e)= \rho _{\mathcal {G}}\left( (12) \right) = \rho _{\mathcal {G}} ((12)(34) )=\left( \begin{array}{cc} 1 &{}\quad 0 \\ 0 &{}\quad 1 \end{array} \right) \; \text{ and } \\&\rho _{\mathcal {G}}((13)(24))= \rho _{\mathcal {G}} ((1324))=\left( \begin{array}{cc} 0 &{}\quad 1 \\ 1 &{}\quad 0 \end{array} \right) . \end{aligned}$$

If \(\chi \) denotes the character associated with \(\rho _G\), we infer

$$\begin{aligned} \chi (e)=\chi ((12))= \chi ((12)(34))=2, \qquad \text{ and } \qquad \chi ((13)(24))= \chi ((1324))=0. \end{aligned}$$

By virtue of the character table of \({\mathcal {G}}\) (see Table 2), we infer \(S\cong {\mathtt {id}}\oplus \xi _2\), and applying the projection operators (see (3)):

$$\begin{aligned}&\varTheta _{{\mathtt {id}}}(L_{(12)})=\varTheta _{{\mathtt {id}}}(L_{(34)})=\frac{1}{2}(L_{(12)}+L_{(34)});\\&\varTheta _{\xi _2}(L_{(12)})=\varTheta _{\xi _2}(L_{(34)})=\frac{1}{2}(L_{(12)}-L_{(34)}). \end{aligned}$$

\(\diamond \)

Proceeding in this way for each conjugacy class of \({\mathcal {G}}\) (excluding the trivial class), we identify the following \({\mathcal {G}}\)-modules and decompositions:

$$\begin{aligned} \begin{array}{ll} {\langle L_{(12)},L_{(34)} \rangle }_{{\mathbb {C}}}\cong {\mathtt {id}}\oplus \zeta _2, &{}\quad {\langle L_{(12)(34)} \rangle }_{{\mathbb {C}}} \cong {\mathtt {id}}, \\ {\langle L_{(13)(24)},L_{(14)(23)} \rangle }_{{\mathbb {C}}} \cong {\mathtt {id}}\oplus {\mathtt {sgn}}, &{}\quad {\langle L_{(1324)},L_{(1423)} \rangle }_{{\mathbb {C}}} \cong {\mathtt {id}}\oplus \zeta _1.\\ \end{array} \end{aligned}$$

For future convenience, we keep the vectors obtained by applying the projection operators to these decompositions. From now on, we will use the following notation

$$\begin{aligned} \begin{array}{lll} {\mathrm {B}}^{{\mathtt {id}}}_1 := L_{(12)(34)}, &{}\quad {\mathrm {B}}^{{\mathtt {id}}}_2 : = L_{(13)(24)}+L_{(14)(23)}, \\ {\mathrm {B}}^{{\mathtt {sgn}}} := L_{(13)(24)}-L_{(14)(23)},&{}\quad \,\,{\mathrm {B}}^{\zeta _1} := L_{(1423)}- L_{(1324)}, \\ {\mathrm {B}}^{\zeta _2}_1 := L_{(12)}- L_{(34)};\\ \end{array} \end{aligned}$$

where the superscript indicates which irreducible \({\mathcal {G}}\)-module each vector belongs to.

Cherry vectors

Referring to Table 4 and the permutation representation spanned by the “cherries” \(\{1,2\}\) and \(\{3,4\}\), we introduce the matrices

$$\begin{aligned} {\mathrm {Ch}}_{12}&:= L_{13}+L_{14}+L_{23}+L_{24},\\ {\mathrm {Ch}}_{34}&:= L_{31}+L_{32}+L_{41}+L_{42}, \end{aligned}$$

and obtain \( {\langle {\mathrm {Ch}}_{12},{\mathrm {Ch}}_{34} \rangle }_{{\mathbb {C}}} \cong {\mathtt {id}}\oplus \zeta _2\). The action of \({{\mathcal {G}}}\) on each of these vectors is given by \(\tau : {\mathrm {Ch}}_{ij} \mapsto {\mathrm {Ch}}_{\tau (i)\tau (j)}\). Notice that \({\mathrm {Ch}}_{12}+{\mathrm {Ch}}_{34}={\mathrm {B}}^{{\mathtt {id}}}_2\). By applying the projection operator \(\varTheta _{\zeta _2}\), we see that \({\mathrm {B}}^{\zeta _2}_2:={\mathrm {Ch}}_{12}-{\mathrm {Ch}}_{34}\) accounts for the second copy of \(\zeta _2\).

Row-sum and twisted vectors

Keeping the notation used by Sumner et al. (2012a), define the row-sum vectors

$$\begin{aligned} R_{i}:=\sum _{{j: 1\le i\ne j\le 4}}L_{ij}. \end{aligned}$$

The action \(\rho _{{\mathcal {G}}}\) of \({{\mathcal {G}}}\) on each of these is \(\sigma :R_{i} \mapsto R_{\sigma (i)}\), and it is isomorphic to the restriction of the defining representation of \({\mathfrak {S}}_4\) to \({{\mathcal {G}}}\). Therefore, the (invariant) subspace generated by the row-sum vectors is isomorphic to \({\mathtt {id}}\oplus \{3,1\}\), restricted to the subgroup \({{\mathcal {G}}}\). By applying the branching rule \({\mathfrak {S}}_4\downarrow {\mathcal {G}}\) given in Table 3, we obtain

$$\begin{aligned} {\langle R_1,R_2,R_3,R_4 \rangle }_{{\mathbb {C}}}\cong {\mathtt {id}}\oplus \zeta _2 \oplus \xi . \end{aligned}$$

Obviously, we have \(R_1+R_{2}+R_{3}+R_{4}=B^{{\mathtt {id}}}_1+B^{{\mathtt {id}}}_2\) and \(( R_1+R_2)-(R_3+R_4) = {\mathrm {B}}^{\zeta _2}_1+{\mathrm {B}}^{\zeta _2}_2\). By applying the projection operator \(\varTheta _{\xi }\) we find that \(\langle R _1-R_2,R_3-R_4 \rangle _{{\mathbb {C}}}\) accounts for a copy of the \(\xi \) representation. We define

$$\begin{aligned} {\mathrm {B}}^{\xi }_1 := R_1-R_2, \qquad {\mathrm {B}}^{\xi }_2 := R_3-R_4. \end{aligned}$$

Next, define the twisted vectors as

$$\begin{aligned} H_i&:= L_{ik}+L_{il}+L_{ji}, \\ V_i&:= L_{ki}+L_{li}+L_{ij}, \end{aligned}$$

where \(\{\{i,j\},\{k,l\}\}=\{\{1,2\},\{3,4\}\}\). For example, \(V_2=L_{21}+L_{32}+L_{42}\) and \(H_3=L_{31}+L_{32}+L_{43}\). The action \(\rho _{{\mathcal {G}}}\) of \({{\mathcal {G}}}\) on these vectors is given by \(\sigma : V_i \mapsto V_{\sigma (i)}\) and \(\sigma : H_i \mapsto H_{\sigma (i)}\), again we have with we are dealing with the restriction of the defining representation of \({\mathfrak {S}}_{4}\) to \({{\mathcal {G}}}\). As above,

$$\begin{aligned} \langle V_1,V_2,V_3,V_4 \rangle _{{\mathbb {C}}} \cong \langle H_1,H_2,H_3,H_4 \rangle _{{\mathbb {C}}} \cong {\mathtt {id}}\oplus \zeta _2 \oplus \xi . \end{aligned}$$

Notice that \(\sum _{ i} H_{i}=\sum _{ i} V_{i}={\mathrm {B}}^{{\mathtt {id}}}_1+{\mathrm {B}}^{{\mathtt {id}}}_2,\;(H_{1}+H_{2})-(H_{3}+H_{4})={\mathrm {B}}^{\zeta _2}_{1}+{\mathrm {B}}^{\zeta _2} _{2}\) and \((V_{1}+V_{2})-(V_{3}+V_{4})={\mathrm {B}}^{\zeta _2}_{1}-{\mathrm {B}}^{\zeta _2} _{2}\). By applying the projection operator \(\varTheta _{\xi }\) in \( \langle V_1,V_2,V_3,V_4 \rangle _{{\mathbb {C}}} \) and \( \langle H_1,H_2,H_3,H_4 \rangle _{{\mathbb {C}}}\), we find that \( \langle H_1-H_2,H_3-H_4 \rangle _{{\mathbb {C}}}\) and \(\langle V_1-V_2,V_3-V_4 \rangle _{{\mathbb {C}}}\) account for the two other copies of \(\xi \), so we define

$$\begin{aligned} \begin{array}{lll} {\mathrm {B}}^{\xi }_3 := H_1-H_2, &{} \qquad \qquad &{} {\mathrm {B}}^{\xi }_5 := V_1-V_2,\\ {\mathrm {B}}^{\xi }_4 := H_3-H_4, &{} \qquad \qquad &{} {\mathrm {B}}^{\xi }_6 := V_3-V_4. \end{array} \end{aligned}$$

Putting all of these results together:

Theorem 5

The Lie algebra \(\mathfrak {L}_{GM}\) can be expressed as

$$\begin{aligned} \mathfrak {L}_{GM}&= \langle \{L_{ij}\}_{1\le i\ne j\le 4}\rangle _{{\mathbb {C}}}\\&= \langle \{L_{\sigma }\}_{\sigma \in {{\mathcal {G}}}, \sigma \ne e} \cup \{{\mathrm {Ch}}_{12},{\mathrm {Ch}}_{34}\}\cup \{R_i\}_{1\le i\le 4}\cup \{H_j\}_{1\le j\le 4}\cup \{V_k\}_{1\le k \le 4} \rangle _{{\mathbb {C}}}, \end{aligned}$$

with linear dependencies

$$\begin{aligned} L_{(12)}+L_{(34)}&= L_{(12)(34)},\\ L_{(13)(24)}+L_{(14)(23)}&= L_{(1324)}+L_{(1423)}={\mathrm {Ch}}_{12}+{\mathrm {Ch}}_{34}, \\ H_{1}+H_{2}=R_1+R_2&= {\mathrm {Ch}}_{12}+L_{(12)},\\ H_{3}+H_{4}=R_3+R_4&= {\mathrm {Ch}}_{34}+L_{(34)},\\ V_1+V_2&= {\mathrm {Ch}}_{34}+L_{(12)},\\ V_3+V_4&= {\mathrm {Ch}}_{12}+L_{(34)}. \end{aligned}$$

A basis for \({\mathfrak {L}}_{GM}\) consistent with the decomposition of Theorem 3 is given by

$$\begin{aligned} \begin{array}{l@{\quad }l} {\mathrm {B}}^{{\mathtt {id}}}_1 = L_{(12)(34)}, &{} {\mathrm {B}}^{\xi }_1 =R_1-R_2, \\ {\mathrm {B}}^{{\mathtt {id}}}_2 = L_{(13)(24)}+L_{(14)(23)}, &{} {\mathrm {B}}^{\xi }_2 = R_3-R_4, \\ {\mathrm {B}}^{{\mathtt {sgn}}} = L_{(13)(24)}-L_{(14)(23)}, &{} {\mathrm {B}}^{\xi }_3 = H_1-H_2, \\ {\mathrm {B}}^{\zeta _1} = L_{(1324)}-L_{(1423)}, &{} {\mathrm {B}}^{\xi }_4 = H_3-H_4, \\ {\mathrm {B}}^{\zeta _2}_1 = L_{(12)}-L_{(34)}, &{} {\mathrm {B}}^{\xi }_5 = V_1-V_2, \\ {\mathrm {B}}^{\zeta _2}_2 = {\mathrm {Ch}}_{12}-{\mathrm {Ch}}_{34}, &{} {\mathrm {B}}^{\xi }_6 = V_3-V_4; \\ \end{array} \end{aligned}$$

where \({\langle {\mathrm {B}}^{\xi }_1,{\mathrm {B}}^{\xi }_2 \rangle }_{{\mathbb {C}}},\; {\langle {\mathrm {B}}^{\xi }_3,{\mathrm {B}}^{\xi }_4 \rangle }_{{\mathbb {C}}} \) and \({\langle {\mathrm {B}}^{\xi }_5,{\mathrm {B}}^{\xi }_6 \rangle }_{{\mathbb {C}}} \) are the three copies of \(\xi \) in \({\mathfrak {L}}_{GM}\). With respect to this basis, the Lie algebra structure of \({\mathfrak {L}}_{GM}\) is summarised in Table 7.

Table 6 List of Lie Markov models with purine/pyrimidine symmetry for dimension 5, 6, 8, 9, 10 and 12 (the Lie Markov models with dimension 1 to 4 are described within the text)
Table 7 The Lie brackets of the basis \(\{{\mathrm {B}}^{*}_j\}\) of \({\mathfrak {L}}_{GM}\)

5 The list of Lie Markov models with purine/pyrimidine symmetry

We proceed to give the list of Lie Markov models with purine/pyrimidine symmetry, working up in dimension \(d\le 12\). For each \(d\), Table 5 lists all the possible decompositions allowed by Theorem 3. For each decomposition, all possible complex Lie subalgebras \({\mathfrak {L}}\) of \({\mathfrak {L}}_{GM}\) are obtained by direct computation using code written by the authors and implemented in the open-source mathematical software SAGE (Stein et al. 2012) (this code is available online at the website Fernández-Sánchez 2013). For each Lie algebra, we then impose that it has a stochastic basis (see Definition 1). Since a matrix \({\mathrm {B}}^{{\mathtt {id}}}_{a,b}=a {\mathrm {B}}^{{\mathtt {id}}}_1+b {\mathrm {B}}^{{\mathtt {id}}}_2\), with \(a,b>0\), has all its non-diagonal entries positive, the reader can notice that the above condition is guaranteed if \({\mathfrak {L}}\) contains such a matrix for, in this case, if \(\{B_1,\ldots ,B_t\}\) is a basis for \({\mathfrak {L}}\), then a stochastic basis for \({\mathfrak {L}}\) is given by \(\{B_1+\lambda {\mathrm {B}}^{{\mathtt {id}}}_{a,b},\ldots , B_t+\lambda {\mathrm {B}}^{{\mathtt {id}}}_{a,b}\}\) as long as \(\lambda >0\) is large enough.

For each model in the list, we describe a basis for the Lie algebra in terms of the vectors introduced in the Sect. 4 and the rays of the stochastic cone arranged in orbits (see Table 8). Both data are required to completely describe the model. The general form of the stochastic rate matrix, as well as a permutation basis (a basis invariant under the action of \({\mathcal {G}}\)), is also shown when it is not too complicated. In particular, stochastic rate matrices are presented as linear combinations of the rays with non-negative coefficients. Since the rays are the generators of the stochastic cone, every stochastic rate matrix in the model can be expressed in this way (the reader should notice that in general, we cannot write down all the stochastic rate matrices of a model in terms of the same permutation basis if we require the coefficients to be non-negative). The name of each model has the form “\(d.r\)”, where \(d\) is the dimension of the model and \(r\) is the number of rays of the corresponding stochastic cone (in particular, \(d \le r\)). In case there is more than one model with a given dimension and number of rays, we will differentiate them by using letters: for example, \(5.7a,\; 5.7b\) and so on.

Table 8 Ray-orbits of the Lie Markov models with \({\mathcal {G}}\) symmetry with the corresponding generators

Note 2

Throughout the following list, we adopt the notation \(X_{ij}^{+}=X_i+X_j\) and \(X_{ij}^{-}=X_i-X_j\), for \(i,j\in \{1,2,3,4\}\) and \(X\in \{R,H,V\}\).\(\diamond \)

Ray-orbits The rays of the stochastic cones of the forthcoming models appear in orbits of cardinality 1, 2, 4 and 8 (as is demanded by the orbit-stabilizer theorem) that we call ray-orbits. A system of generators for any of these ray-orbits is obtained as the \({\mathcal {G}}\)-orbit of a rate matrix \(Q\) in any of the rays of the family. Notice incidentally the action of \({\mathcal {G}}\) preserves the total sums of transition rates and of transversions rates of the rate matrices within the \({\mathcal {G}}\)-orbit.

For the Lie Markov models with symmetry \({\mathcal {G}}\), we explicitly describe the rays of the corresponding stochastic cone arranged in ray-orbits. In order to denote and compare these ray-orbits in a convenient fashion, we first normalize the generators of the rays, and take rate matrices whose trace is equal to \(-1\) (recall the absolute value of the trace of the rate matrix can be understood as the expected number of changes in one unit of time under the Markov process). Then, taking into account that the sum of transition rates and of transversion rates is constant, each ray-orbit is referred to as “\((r,\frac{s}{s+v},\frac{v}{s+v})\)”, where

  • \(r\) is the number of rays in the orbit: 1, 2, 4 or 8;

  • \(s\) is the sum of the transition rates in (any matrix of) the orbit;

  • \(v\) is the sum of the transversion rates in (any matrix of) the orbit.

The reader is referred to Table 8 for the whole list of ray-orbits arising in Lie Markov models with \({\mathcal {G}}\)-symmetry.

Note 3

From now on, we write \({\mathrm {B}}^{{\mathtt {id}}}:={\mathrm {B}}^{{\mathtt {id}}}_{1}+{\mathrm {B}}^{{\mathtt {id}}}_{2}\), \({\mathrm {B}}^{\zeta _2}:={\mathrm {B}}^{\zeta _2}_{1}+{\mathrm {B}}^{\zeta _2}_{2}\).\(\diamond \)

Dimension One

From Table 4 we see that there is only one abstract orbit of \({{\mathcal {G}}}\) with cardinality one, and it has decomposition \({\mathtt {id}}\).

\(({\mathtt {id}})\): The general Markov model contains two copies of the trivial representation, so we can consider the subspace generated by any linear combination \(a {\mathrm {B}}^{{\mathtt {id}}}_1+b {\mathrm {B}}^{{\mathtt {id}}}_2\). Moreover, since \( [{\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2]=0\), we see the subspace generated by any \(a {\mathrm {B}}^{{\mathtt {id}}}_1+b {\mathrm {B}}^{{\mathtt {id}}}_2\), is a Lie algebra for any fixed choice \(a,b\in {\mathbb {C}}\). When we request these spaces to have a stochastic basis, we have to restrict to the condition \(a,b\ge 0\). Therefore, we conclude:

Theorem 6

In the 4-state case, there is a continuum of one-dimensional Lie Markov models with \({{\mathcal {G}}}\) symmetry and decomposition \({\mathtt {id}}\). Each model in the family has the form

$$\begin{aligned} {\mathfrak {L}}=\langle B^{{\mathtt {id}}}_{a,b} \rangle _{{\mathbb {C}}}, \end{aligned}$$

where \(a+b=1, a,b\ge 0\) and

$$\begin{aligned} B^{{\mathtt {id}}}_{a,b}:=a {\mathrm {B}}^{{\mathtt {id}}}_1+b{\mathrm {B}}^{{\mathtt {id}}}_2= \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} a &{} b &{} b \\ a &{} * &{} b &{} b\\ b &{} b &{} * &{} a \\ b &{} b &{} a &{} * \end{array}\right) . \end{aligned}$$

where we use \(*\) to indicate the diagonal entry needed for the column to sum to zero.

Remark 9

This result is not completely satisfactory as all these models will appear as 1-dimensional Lie subalgebras of the 2-dimensional Lie Markov model \( \langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2 \rangle _{{\mathbb {C}}}\). This situation is quite general and we will avoid the consequent redundancy in the present list by considering families of Lie Markov models depending on some parameters as submodels of a Lie Markov models with larger dimension. Then, the family of models in Theorem 6 should be regarded as a Lie Markov model with decomposition \(2 {\mathtt {id}}\).

On the other hand, notice that if we expand the symmetry and request the models in the family of Theorem 6 to have the symmetry of \({\mathfrak {S}}_4\), we are lead to the constraint \(a=b\), which corresponds to the model by Jukes and Cantor (1969). Of course, this model already appeared as a Lie Markov model with symmetry \({\mathfrak {S}}_4\) (Sumner et al. 2012a).\(\diamond \)

Model 1.1

Take \({\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}} \rangle }_{{\mathbb {C}}}\). The stochastic cone has only one ray, spanned by \({\mathrm {B}}^{{\mathtt {id}}}\). Therefore, in this case, we only have a ray-orbit. We refer to it by the ray-orbit \((1,\frac{1}{3},\frac{2}{3})\) (see Table 8). The generic stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} 1 &{} 1 &{} 1 \\ 1 &{}* &{} 1 &{} 1 \\ 1 &{} 1 &{} * &{} 1 \\ 1 &{} 1 &{} 1 &{} * \end{array}\right) . \end{aligned}$$

Dimension Two

\(({\mathtt {id}}\oplus {\mathtt {sgn}})\): We have \( [{\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {sgn}}}]=[{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{{\mathtt {sgn}}}]=0\), so for any fixed \(a,b\ge 0\) with \(a+b=1\) and \(b\ne 0\), there is a well-defined Lie Markov model:

$$\begin{aligned} {\mathfrak {L}}={\langle B^{{\mathtt {id}}}_{a,b},{\mathrm {B}}^{{\mathtt {sgn}}} \rangle }_{{\mathbb {C}}}\cong {\mathtt {id}}\oplus {\mathtt {sgn}}. \end{aligned}$$

The condition \(b\ne 0\) is needed to ensure that the dimension of the stochastic cone is equal to the dimension of the Lie algebra. As in Remark 9, these models are considered as submodels of the model with decomposition \(2{\mathtt {id}}\oplus {\mathtt {sgn}}\) (see Model 3.3a).

\(({\mathtt {id}}\oplus \zeta _1)\): Since \([{\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{\zeta _1}]=[{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _1}]= 0\), we find that, for any choice of \(a,b\ge 0\) with \(a+b=1\) and \(b\ne 0\),

$$\begin{aligned} {\mathfrak {L}}={\langle B^{{\mathtt {id}}}_{a,b},{\mathrm {B}}^{\zeta _1} \rangle }_{{\mathbb {C}}}\cong {\mathtt {id}}\oplus \zeta _1, \end{aligned}$$

provides a 2-dimensional Lie Markov model. These are submodels of the 3-dimension model with decomposition \(2{\mathtt {id}}+\zeta _1\) (see Model 3.3b).

\(({\mathtt {id}}\oplus \zeta _2)\): We find the same situation for this decomposition. The following Lie algebras will appear as submodels of the 3-dimensional model indicated:

  • \({\mathfrak {L}}={\langle B^{{\mathtt {id}}}_{a,b}, {\mathrm {B}}^{\zeta _2} \rangle }_{{\mathbb {C}}}\) is a submodel of Model 3.4;

  • \({\mathfrak {L}}={\langle B^{{\mathtt {id}}}_{a,b},{\mathrm {B}}^{\zeta _2}_{1} \rangle }_{{\mathbb {C}}}\) is a submodel of Model 3.3c.

As a special case of the last family of models, when we take \(a=1\) and \(b =0\), we obtain the following model:

Model 2.2a

\({\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{\zeta _2}_1 \rangle }_{{\mathbb {C}}}\). The stochastic cone has two rays generated by \(L_{(12)}\) and \(L_{(34)}\), which form the ray-orbit \((2,1,0)\) of Table 8. The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} 0 &{} 0 \\ \alpha &{}* &{} 0 &{} 0 \\ 0 &{} 0 &{} * &{} \beta \\ 0 &{} 0 &{} \beta &{} * \end{array}\right) ,\quad \alpha , \beta \ge 0. \end{aligned}$$

This model gives a reducible Markov chain, that is, it is not possible to get to some states from some other states. We see that the purine states \(A\) and \(G\) communicate with each other, and the same for the pyrimidine states \(C\) and \(T\) (transitions) while no replacement between purines are pyrimidines (transversions) is allowed.

Apart from these models, our analysis of 1-dimensional Lie Markov models produces another 2-dimensional model with decomposition \(2 {\mathtt {id}}\).

\((2 {\mathtt {id}})\): Of course, there is only one possible model with this decomposition. Namely,

Model 2.2b

\({\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2 \rangle }_{{\mathbb {C}}}\). If we focus on the stochastic rate matrices, we find a cone with 2 rays, corresponding to the (see Table 8): ray-orbit \((1,1,0)=\{L_{(12)(34)}\}\), and ray-orbit \((1,0,1) =\{L_{(13)(24)}+L_{(14)(23)}\}\). The Lie algebra is abelian and the stochastic rate matrices for this model are given by

$$\begin{aligned} Q=\left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} \beta &{} \beta \\ \alpha &{} * &{} \beta &{} \beta \\ \beta &{} \beta &{} * &{} \alpha \\ \beta &{} \beta &{} \alpha &{} * \end{array}\right) , \quad \alpha , \beta \ge 0. \end{aligned}$$

Permutation basis: \(L_{(12)(34)},L_{(13)(24)}+L_{(14)(23)}.\)

This model corresponds to the Kimura model with 2 parameters (Kimura 1980).

Dimension Three

\((2{\mathtt {id}}\oplus {\mathtt {sgn}})\): There is only one model with this decomposition:

Model 3.3a

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{{\mathtt {sgn}}} \rangle }_{{\mathbb {C}}}\) is an abelian Lie Markov model. The stochastic cone has 3 rays in 2 ray-orbits: ray-orbit \((1,1,0)=\{L_{(12)(34)}\}\), and ray-orbit \((2,0,1)a=\{L_{(13)(24)},L_{(14)(23)}\}\) (see Table 8). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} \beta &{} \gamma \\ \alpha &{} * &{} \gamma &{} \beta \\ \beta &{} \gamma &{}*&{} \alpha \\ \gamma &{} \beta &{} \alpha &{} * \end{array}\right) , \quad \alpha , \beta , \gamma \ge 0. \end{aligned}$$

Permutation basis: \(L_{(12)(34)},L_{(13)(24)},L_{(14)(23)}.\)

Of course, this is the Kimura model with 3 parameters (Kimura 1981). Note this is the group-based model corresponding to \(\mathbb {Z}_2\times \mathbb {Z}_2\cong \{e,(12)(34),(13)(24),(14)(23)\}\).

\((2{\mathtt {id}}\oplus \zeta _1)\): There is only one model with this decomposition:

Model 3.3b

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _1} \rangle }_{{\mathbb {C}}}\) is a 3-dimensional abelian Lie Markov model. The stochastic cone has 3 rays, in 2 ray-orbits: ray-orbit \((1,1,0)=\{L_{(12)(34)}\}\), and ray-orbit \((2,0,1)b=\{L_{(1324)},L_{(1423)}\}\). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} \beta &{} \gamma \\ \alpha &{}* &{} \gamma &{} \beta \\ \gamma &{} \beta &{}* &{} \alpha \\ \beta &{} \gamma &{} \alpha &{} * \end{array}\right) , \quad \alpha , \beta , \gamma \ge 0. \end{aligned}$$

Permutation basis: \(L_{(12)(34)},L_{(1324)},L_{(1423)}.\)

Note this is the group-based model corresponding to \({\mathbb {Z}}_4\cong \{e,(1324),(12)(34),\) \((1423)\}\). This new model may be regarded as a “twisted” version of the Kimura model with three parameters.

\((2{\mathtt {id}}\oplus \zeta _2)\): There are two models with this decomposition:

Model 3.3c

\({\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _2}_1 \rangle }_{{\mathbb {C}}}\) is a 3-dimensional abelian Lie algebra. The stochastic cone has 3 rays, in 2 ray-orbits: ray-orbit \((1,0,1)=\{L_{(13)(24)}+L_{(14)(23)}\}\), and ray-orbit \((2,1,0)=\{L_{(12)},L_{(34)}\}\). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} *&{} \alpha &{} \beta &{} \beta \\ \alpha &{}* &{} \beta &{} \beta \\ \beta &{} \beta &{} * &{} \gamma \\ \beta &{} \beta &{} \gamma &{}* \end{array}\right) , \quad \alpha , \beta , \gamma \ge 0. \end{aligned}$$

Permutation basis: \(L_{(12)},L_{(34)},L_{(1324)}+L_{(1423)}.\)

Model 3.4

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _2} \rangle }_{{\mathbb {C}}}\) is a 3-dimensional Lie algebra. The stochastic cone has 4 rays, in 3 ray-orbits: ray-orbit \((1,0,1) = \{L_{(13)(24)}+L_{(14)(23)}\}\), ray-orbit \((1,1,0) = \{L_{(12)(34)}\}\), and ray-orbit \((2,1/3,2/3) = \{R_{12}^+,R_{34}^+\}\). This is the first model with \({\mathcal {G}}\) symmetry where the number of rays is larger than the dimension of the model. It is also the first case where the Lie algebra \({\mathfrak {L}}\) is not abelian: the Lie algebra structure is given by

$$\begin{aligned}&{[} L_{(13)(24)}+L_{(14)(23)},R_{ij}^{+} ]=R_{kl}^{+}-R_{ij}^{+}, \qquad {[} L_{(12)(34)},R_{ij}^{+} ]=0, \\&{[} L_{(13)(24)}+L_{(14)(23)},L_{(12)(34)} ]=0, \qquad {[} R_{ij}^{+}, R_{kl}^{+} ]=R_{ij}^{+}-R_{kl}^{+}, \end{aligned}$$

for \(\{ij,kl\}=\{12,34\}\). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha + \gamma &{} \beta + \gamma &{} \beta + \gamma \\ \alpha + \gamma &{}*&{} \beta + \gamma &{} \beta + \gamma \\ \beta + \delta &{} \beta + \delta &{} * &{} \alpha + \delta \\ \beta + \delta &{} \beta +\delta &{} \alpha +\delta &{}* \end{array}\right) , \quad \alpha , \beta , \gamma , \delta \ge 0. \end{aligned}$$

Permutation basis: \(L_{(12)(34)},R_{12}^{+},R_{34}^{+}.\)

Dimension Four

Lie Markov models with this dimension appear as special submodels of forthcoming models \(5.7a,\; 5.7b\) and \(5.7c\) with decomposition \(2{\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \xi \), when we restrict the identity component of their Lie algebra \({\mathfrak {L}}\) to a subspace \({\langle {\mathrm {B}}^{{\mathtt {id}}}_{a,b} \rangle }_{{\mathbb {C}}} \) with \(a,b\ge 0\). The reader can check that depending on the values of \(a\) and \(b\) the number of rays of the cones of these models may vary.

\(({\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1 \oplus \zeta _2)\): The models with this decomposition appear as special cases of Model 5.6a with decomposition \(2 {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1 \oplus \zeta _2\) (see Remark 9).

\(({\mathtt {id}}\oplus \zeta _2 \oplus \xi )\): Similarly, these models are special cases of the models \(5.6b,\; 5.11a,\;\) \( 5.11b,\; 5.11c\) and 5.16 with decomposition \(2 {\mathtt {id}}\oplus \zeta _2 \oplus \xi \). As a particular case, if we request these models to have \({\mathfrak {S}}_4\) symmetry, we obtain the restriction \(a=b\), leading to the model by Felsenstein (1981):

Model 4.4a

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}},{\mathrm {B}}^{\zeta _2}, {\mathrm {B}}^{\xi }_1,{\mathrm {B}}^{\xi }_2 \rangle }_{{\mathbb {C}}}\) is a 4-dimensional Lie algebra. The stochastic cone has 4 rays in one single ray-orbit: \((4,\frac{1}{3},\frac{2}{3})a=\{R_1,R_2,R_3,R_4\}\), and the Lie algebra structure is given by \( [R_i,R_j]=R_i-R_j\). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} \alpha &{} \alpha \\ \beta &{} *&{} \beta &{} \beta \\ \gamma &{} \gamma &{} * &{} \gamma \\ \delta &{} \delta &{} \delta &{} * \end{array}\right) , \quad \alpha , \beta , \gamma , \delta \ge 0. \end{aligned}$$

Permutation basis: \(R_{1},R_{2},R_{3},R_{4}.\)

\((2 {\mathtt {id}}\oplus 2 \zeta _2)\): There is only one model with this decomposition.

Model 4.4b

Take \( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _2}_1,{\mathrm {B}}^{\zeta _2}_2 \rangle }_{{\mathbb {C}}}\). The stochastic cone has 4 rays, in 2 ray-orbits: ray-orbit \((2,0,1)c = \{{\mathrm {Ch}}_{12},{\mathrm {Ch}}_{34}\}\), and ray-orbit \((2,1,0) =\{L_{(12)},L_{(34)}\}\). The Lie algebra is given by

$$\begin{aligned} {[}L_{(12)},L_{(34)}]&= 0, \\ {[}L_{(12)},{\mathrm {Ch}}_{ij}]&= {[}L_{(34)},{\mathrm {Ch}}_{ij}]=0, \quad ij\in \{12,34\} \\ {[}{\mathrm {Ch}}_{12},{\mathrm {Ch}}_{34}]&= 2({\mathrm {Ch}}_{34}-{\mathrm {Ch}}_{12})+2(L_{(34)}-L_{(12)}). \end{aligned}$$

The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha &{} \beta &{} \beta \\ \alpha &{} *&{} \beta &{} \beta \\ \gamma &{} \gamma &{} * &{} \delta \\ \gamma &{} \gamma &{} \delta &{} * \end{array}\right) , \quad \alpha , \beta , \gamma , \delta \ge 0 \end{aligned}$$

Permutation basis: \(L_{(12)},L_{(34)},{\mathrm {Ch}}_{12},{\mathrm {Ch}}_{34}.\)

\((2 {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _2)\): There is only one model with this decomposition.

Model 4.5a

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{{\mathtt {sgn}}},{\mathrm {B}}^{\zeta _2} \rangle }_{{\mathbb {C}}}\) is a 4-dimensional Lie algebra. The stochastic cone has five rays spanned, in three ray-orbits: \((1,1,0),\; (2,0,1)a\) and \((2,\frac{1}{3},\frac{2}{3})\). The general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha + \delta &{} \beta + \delta &{} \gamma + \delta \\ \alpha + \delta &{} * &{} \gamma + \delta &{} \beta + \delta \\ \beta + \varepsilon &{} \gamma + \varepsilon &{} * &{} \alpha + \varepsilon \\ \gamma + \varepsilon &{} \beta + \varepsilon &{} \alpha + \varepsilon &{} * \end{array}\right) , \quad \alpha , \beta , \gamma , \delta , \varepsilon \ge 0. \end{aligned}$$

Permutation basis: \(R_{12}^{+},R_{34}^{+},L_{(13)(24)},L_{(14)(23)}.\)

\((2 {\mathtt {id}}\oplus \zeta _1 \oplus \zeta _2)\): There is only one model with this decomposition.

Model 4.5b

\( {\mathfrak {L}}={\langle {\mathrm {B}}^{{\mathtt {id}}}_1,{\mathrm {B}}^{{\mathtt {id}}}_2,{\mathrm {B}}^{\zeta _1},{\mathrm {B}}^{\zeta _2} \rangle }_{{\mathbb {C}}}\) is a 4-dimensional Lie algebra. The stochastic cone has five rays, in three ray-orbits: \((1,1,0),\; (2,0,1)b\) and \((2,\frac{1}{3},\frac{2}{3})\). Then, the general stochastic rate matrix is

$$\begin{aligned} \left( \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} * &{} \alpha + \delta &{} \beta + \delta &{} \gamma + \delta \\ \alpha + \delta &{} * &{} \gamma + \delta &{} \beta + \delta \\ \gamma + \varepsilon &{} \beta + \varepsilon &{} * &{} \alpha + \varepsilon \\ \beta + \varepsilon &{} \gamma + \varepsilon &{} \alpha + \varepsilon &{} * \end{array}\right) , \quad \alpha , \beta , \gamma , \delta , \varepsilon \ge 0. \end{aligned}$$

Permutation basis: \(R_{12}^{+},R_{34}^{+},L_{(1324)},L_{(1423)}.\)

The remaining models, with dimensions 5–12, are presented in the Table 6, with a complete list in explicit form available as Supplementary Material online. The first column of the table gives the name of the model and the second column gives a basis for the corresponding Lie subalgebra. The third column gives the ray-orbits for the stochastic cone of that Lie subalgebra.

5.1 Some remarks

We conclude with some remarks and comments about the previous models.

Remark 10

  • Model 5.6b can be regarded as the vector sum of the model by Kimura (1980) and the model by Felsenstein (1981).

  • Model 6.7a already appeared as a model with \({\mathfrak {S}}_4\) symmetry under the name \(K3ST+F81\) (see Example3). A permutation basis consistent with the symmetry \({\mathfrak {S}}_4\) is given by the vectors \(W_{ij}\) of Example 5 above (see Sumner et al. 2012a). Another permutation basis for this model is \(\{L_{(13)(24)}, L_{(14)(23)},\; R_1, R_2, R_3, R_4\}\), which is not invariant under the action of \({\mathfrak {S}}_4\).

  • Model 9.20b is actually invariant under the action of the whole symmetric group \(\mathfrak {S}_4\), with Schur decomposition \(\{4\}\oplus \{2^2\}\oplus \{31\}\oplus \{21^2\}\). However, it did not appeared in the list of Lie Markov models with \(\mathfrak {S}_4\) symmetry given in Sumner et al. (2012a) as it does not have a permutation basis for that group (see Definition 3). This model is obtained by taking the set of all doubly stochastic (but otherwise unrestricted) rate matrices.

  • Of course, Model 12.12 is the general Markov model and we include it in the list for completion.

\(\diamond \)

Remark 11

The reader may notice the resemblance of Model 5.6b with the model by Hasegawa et al. (1988):

$$\begin{aligned} Q_{5.6b}=\left( \begin{array}{cccc} * &{} a+x &{} b+ x &{} b + x \\ a +y &{} * &{}b +y &{} b +y \\ b + z &{} b + z &{}* &{} a + z \\ b +t &{} b +t &{} a +t &{} * \end{array}\right) \qquad Q_{HKY}=\left( \begin{array}{cccc} * &{} \pi _A \alpha &{} \pi _A \beta &{} \pi _A \beta \\ \pi _G \alpha &{} * &{} \pi _G \beta &{} \pi _G \beta \\ \pi _C \beta &{} \pi _C \beta &{}* &{} \pi _C \alpha \\ \pi _T \beta &{} \pi _T \beta &{} \pi _T \alpha &{} * \end{array}\right) , \end{aligned}$$

where \(\pi _A+\pi _C+\pi _G+\pi _T=1\), and all these parameters are non-negative. Although the rates of these models depend on the parameters in a different way, the rate-matrices \(Q_{HKY}\) and \(Q_{5.6b}\) have the same structure. It is interesting to notice that the form of the non-diagonal entries of \(Q_{5.6b}\) arises from the corresponding entries in \(Q_{HKY}\) just by applying minus the logarithm, producing the following correspondence between the parameters of both models

$$\begin{aligned} x&= -\log (\pi _A), \quad y=-\log (\pi _G), \quad z=-\log (\pi _C), \quad t =-\log (\pi _T), \\ a&= -\log (\alpha ), \quad \;\; b=-\log (\beta ). \end{aligned}$$

Actually, this map induces a bijection between the Lie algebra \({\mathfrak {L}}_{5.6b}{=}{\langle {\mathrm {B}}^{{\mathtt {id}}}_1\!,{\mathrm {B}}^{{\mathtt {id}}}_2\!,{\mathrm {B}}^{\zeta _2} \rangle }_{{\mathbb {C}}},\) \({{\mathrm {B}}^{\xi }_1,{\mathrm {B}}^{\xi }_2}\) and the set of (not necessarily stochastic) rate matrices of HKY model. The inverse is given by

$$\begin{aligned} Q=\sum _{i\ne j} q_{ij}L_{ij} \mapsto \sum _{i\ne j} e^{-q_{ij}}L_{ij}. \end{aligned}$$

However, these two models have different essential properties. For instance, while Model 5.6b is given by the linear variety \({\mathfrak {L}}_{5.6b}\), it can be seen that the set of rate matrices of HKY model describes a variety that is not linear and contains singular points. The deep connection between Lie Markov models and submodels of the general time reversible model appears as a beautiful line of research that will deserve some attention from the authors in the future.\(\diamond \)

Remark 12

As already noted in Remark 4, a number of models in the above list have more symmetries than those requested by the group \({\mathcal {G}}\), and they already appeared as Lie Markov models with \( {\mathfrak {S}}_4\) symmetry (see Sumner et al. 2012a). For those models, the decomposition into irreducible representations of \({\mathcal {G}}\) can be obtained from the decomposition into irreducible representations of \({\mathfrak {S}}_4\) by applying the branching rule of Table 3 (cf. Sumner et al. 2012a, Table 2). Since there are no subgroups between \({\mathcal {G}}\) and \({\mathfrak {S}}_4\), we can conclude that the rest of models listed here do not have further symmetries.\(\diamond \)

Remark 13

A vector subspace \({\mathfrak {L}}\) in \({\mathfrak {L}}_{GM}\) is a matrix algebra if its multiplicatively closed, that is, if the product \(XY\) lies in \({\mathfrak {L}}\) for any couple \(X,Y\in {\mathfrak {L}}\). Of course, this condition is stronger than that of a Lie algebra. The reader may wonder which of the models in the above list are actually algebras. The authors were surprised to find that the only Lie algebras which are not algebras correspond to the Lie Markov models that appear in families depending on some parameters \(a,b\) in the sense of Remark 9, that is, the Lie Markov models corresponding to the following decompositions:

  • \({\mathtt {id}}\) (for dimension 1);

  • \({\mathtt {id}}\oplus {\mathtt {sgn}},\; {\mathtt {id}}\oplus \zeta _1\) and \({\mathtt {id}}\oplus \zeta _2\) (for dimension 3);

  • \({\mathtt {id}}\oplus \zeta _{2} \oplus \xi ,\; {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \xi ,\; {\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _{1} \oplus \zeta _{2}\) (for dimension 4); and

  • \({\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _{1} \oplus \zeta _{2} \oplus \xi \) (for dimension 8).

Notice that these decompositions correspond exactly to the decompositions of Table 4, that is, the irreducible permutation representations \(\langle {\mathcal {G}}/H\rangle _{\mathbb {C}}\) for a subgroup \(H\) of \({\mathcal {G}}\).

\(\diamond \)

Remark 14

The reader may notice that not all decompositions listed in Table 5 give rise to Lie Markov models. Although, there exists Lie subalgebras of the general Markov model that decompose according to \(2{\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1\) and \(2{\mathtt {id}}\oplus {\mathtt {sgn}}\oplus \zeta _1 \oplus \xi \), their relative position to the positive orthant causes a drop in the dimension of the convex polyhedral cone obtained when imposing the stochastic restrictions (namely, non-diagonal rates have to be non-negative). Such situations are not desired as explained in Remark 2 and do not correspond to our definition of Lie Markov model.

6 Discussion

Following the ideas of our previous work (Sumner et al. 2012a), in this paper we have discussed Lie Markov models with purine/pyrimidine symmetry. This symmetry was mathematically expressed by taking the group of nucleotide permutations

$$\begin{aligned} {\mathcal {G}}=\{e,(AG),(CT),(AG)(CT),(AC)(GT),(AT)(GC),(ACGT),(ATGC)\}. \end{aligned}$$

Our main motivation is that this symmetry may be of special interest to the biologist who wishes to deal with evolutionary models preserving the specific grouping of nucleotides into purines and pyrimidines. In Sect. 2 we recalled some of the basic definitions on Lie Markov models and the required tools arising from representation theory of groups. We also show that any rate-matrix model being locally multiplicatively closed is necessarily a Lie Markov model. Also in this section, we introduce a new concept which is the stochastic cone of a Lie Markov model, being the set of stochastic rate matrices of the Lie Markov model. In Sect. 3 we explained how to derive Lie Markov models with prescribed symmetry and discussed the geometry of the corresponding cone of stochastic rate matrices. In Sect. 4 we took the permutation group \({\mathcal {G}}\) and decomposed the space of all rate matrices into irreducible modules of \({\mathcal {G}}\) and provided a basis consistent with this decomposition. In Sect. 5 we gave the full list of all Lie Markov models with \({\mathcal {G}}\) symmetry, arranged by their dimension.

From an applied point of view, a natural question is which Lie Markov models are biologically interesting. This is a crucial point that will deserve special attention from the authors. Computer simulations to compare phylogenetic estimation using Lie Markov models with other evolutionary models are being designed and will appear in a future publication. At the same time, this leads to more theoretical questions as the connection between well-established evolutionary models and Lie Markov models, like the HKY model and the Lie Markov model 5.6b (see Remark 11).

We have considered models from a rate matrix perspective as some well-defined subset \({\mathfrak {L}}\) of the space \({\mathfrak {L}}_{GM}\) of all rate matrices. We could have adopted a more algebraic perspective and deal with the substitution matrices instead, and keeping in mind the importance of substitution matrices being multiplicatively closed (see Sumner et al. 2012a), define “evolutionary model” as some well-defined groups \(\mathfrak {M}\) of matrices in \(M_n(\mathbb {R})\). Then, when we restrict to the stochastic setting, we would be led to consider the intersection of \(\mathfrak {M}\) with the stochastic polytope:

$$\begin{aligned} \mathbb {P}_{\mathrm{sto }}:=\left\{ M=(m_{ij})\in \mathfrak {M} : m_{ij}\ge 0, \sum _i m_{ij}=1 \right\} . \end{aligned}$$

This is a compact polytope with the identity matrix in one of the vertices. This polytope is cut into several connected components by the algebraic hypersurface of equation \(\det (M)=0\). We are mainly interested in the connected component that contains the identity matrix. This is because, by continuity arguments, this connected component contains the exponential of the stochastic rate matrices of the model. In this paper, we have preferred to introduce evolutionary models from the point of view of rate matrices because both the definition of Lie Markov models and the procedure to construct them appear in a natural way in this setting. However, the connection between rate matrices and substitution models is not completely clear, and it deserves further attention. For example, it is known that the image of the exponential map restricted to the stochastic cone does not cover in general the whole connected component of the identity. These issues are related to the convergence of the Baker–Campbell–Hausdorff formula (Blanes and Casas 2004) and the Elfving’s “embedding problem” (Davies 2010): given a Markov matrix \(M\), there exists a matrix \(Q\) such that \(M=e^Q\) and \(e^{tQ}\) is a Markov matrix for all \(t\ge 0\). We want to explore this question in the future to clarify the connection between substitution and rate matrices of evolutionary models.

Although we have kept the original definition of symmetry for a Lie Markov model from Sumner et al. (2012a), an interesting question arises if one tries to expand this definition. Namely, we could investigate evolutionary models which are invariant under the action of some permutation subgroup \(G\) of \({\mathfrak {S}}_4\) without the additional request that they have a permutation basis. From an applied point of view, we do not find any particular reason not to consider this expanded definition, which would lead to a huge number of possible models. For example, under this expanded definition, we would admit the complex span of the ray-orbits \((4,\frac{1}{3},\frac{2}{3})d: {\mathfrak {L}}=\langle R_{13}^+, R_{14}^+, R_{23}^+, R_{24}^+\rangle _{{\mathbb {C}}} ,\;(4,\frac{1}{3},\frac{2}{3})e: {\mathfrak {L}}= \langle H_{13}^+, H_{14}^+, H_{23}^+, H_{24}^+\rangle _{{\mathbb {C}}}\) and \((4,\frac{1}{3},\frac{2}{3})f: {\mathfrak {L}}= \langle V_{13}^+, V_{14}^+, V_{23}^+, V_{24}^+\rangle _{{\mathbb {C}}}\) (see Table 8) as models with symmetry \({\mathcal {G}}\) and decomposition \({\mathtt {id}}\oplus \xi \) (note that this decomposition does not appear in the list of Table 5). More interestingly, as noted in Remark 10, the set of doubly stochastic rate matrices has \({\mathfrak {S}}_4\) symmetry under this expanded definition, and moreover they form a Lie algebra. The authors keep back this line of research for future publication.