1 Introduction

Significant variation in the position of genes in bacterial genomes is observed across individuals even within species (Kawai et al. 2006). This variation is largely due to the process of inversion, which involves successive reversals of segments of the circular chromosome (Eisen et al. 2000). As the inversion process cannot be directly observed, mathematical models are needed to draw inferences from genomic data about these evolutionary processes. Such models can also be used to put a metric on the space of genome arrangements, that is, to establish estimates of evolutionary distance between genomes. These distances in turn can be used to reconstruct phylogenetic trees using distance-based methods such as Neighbour-Joining (Saitou and Nei 1987). In the context of bacterial phylogenetics, inversions are particularly valuable for this purpose because, unlike comparisons via single nucleotide polymorphisms (SNPs), their signal is not affected by horizontal transfer events (Darling et al. 2008).

Several algorithms have been successfully developed to obtain the minimal number of inversions (or other similar operations) required to transform one genome into another. For instance, the breakpoint graph of Bafna and Pevzner (1993) and its successors (for example Hannenhalli and Pevzner 1995) address the inversion distance problem very effectively under a model in which all inversions are equally probable. There have even been first attempts to put this approach into an algebraic context (Meidanis and Dias 2000). An alternative approach has been to define a wider set of operations than just inversions, which has led to an elegant general method called Double-Cut-and-Join (DCJ) for identifying the minimal distance between a pair of genomes (Bergeron et al. 2006), again under the model in which all events are equally probable.

Because the processes involved in large-scale evolution on the bacterial chromosome are complicated and only partially understood, any algorithm to establish distance will have to make a range of simplifying assumptions about the processes in order to make any progress. For instance, the basic approach of using the minimal number of inversions to estimate distance already implies that the rate of evolution is slow or that short evolutionary times are involved (which is justified in some cases, Rocha 2006). Existing methods make further convenient assumptions about the inversion process itself, including the powerful assumption already mentioned that all inversions on a chromosome are equally probable. An immediate consequence of this assumption is that inversions on a circular chromosome can be studied just as well on a linear chromosome, because any inversion on the circle is equivalent to another (equally probable) inversion of the complementary region, and hence we may “cut” our representation of the circular permutation at some point and develop algorithms as though it were a linear chromosome.

Evidence has emerged recently that inversions are not all equally probable, and in particular that there are some restrictions on the length of an inverted region and on its location on the chromosome. Specifically, on the one hand some studies have suggested that shorter inverted segments evolve more frequently (Dalevi et al. 2002; Lefebvre et al. 2003; Darling et al. 2008), and on the other it has been observed that the terminus of replication is always close to the antipode of the origin of replication (Darling et al. 2008; Eisen et al. 2000). The latter observation suggests that there is a fitness cost to having the terminus far from the antipode, and hence that individual inversions themselves do not move the terminus far from this position.

Such additional information is difficult to incorporate into existing solutions to the inversion distance problem, and suggests that a new approach is needed. In particular we give a simple example (Sect. 2.2) that shows cutting the circle to linearize the problem will fail to find the minimal number of inversions when the length of an inverted region is constrained. More generally, in this paper we show how the inversion process may be modelled as a group acting on the set of all possible genome arrangements. Groups are powerful ways of encapsulating the symmetries of objects. While they have been intensively studied over the last century within pure mathematics, they have found many applications in science, especially in theoretical physics and domains such as crystallography where symmetry plays an obvious role.

We first present a group-theoretic framework in which some of the information about the inversion process can be realised in the form of models based on groups (Sect. 2). In this way we are able to translate questions about inversion distance into questions about groups. In the main body of the paper (Sect. 3) we address in detail one model of the inversion process and develop a new method to establish the inversion distance in this case, via the proof of results in group theory. In this model we consider the circular inversions limited to two regions. The key theoretical idea in our approach is to lift the problem from the group of circular permutations to the affine symmetric group, which can be viewed intuitively as unrolling the circle on the infinite number line. This is described in Sects. 3.2 and 3.3. The main theoretical results are given in Sects. 3.4 and 3.5. In particular, Theorem 3.16 shows how a same-length representative of a circular permutation in the affine symmetric group may be found. Length can easily be calculated in the affine symmetric group, and so this resolves the problem with cutting the circular genome, and provides a framework in which other such models can be studied. As an example, we apply this method to the reconstruction of the phylogenetic history of some published Yersinia genomes (Sect. 4).

2 Group theoretic inversion systems

In this section we introduce a general group-theoretic framework that covers a range of models of the chromosomal inversion process.

2.1 Group-theoretic inversion systems

The idea of considering genomes as permutations of a set of regions is not new, and underlies all combinatorial methods. Several other studies also put at least this much in the language of group theory (Meidanis and Dias 2000; Fertin et al. 2009; Moulton and Steel 2012).

For instance, given a pair of genomes for which eight conserved regions have been identified, the regions can be numbered according to the arrangement of one of the genomes so that this “reference” genome is represented by the sequence \([1,2,3,4,5,6,7,8]\). An inversion of the segment between regions 4 and 7 (inclusive) is then either \([1,2,3,7,6,5,4,8]\) (unsigned), or \([1,2,3,-7,-6,-5,-4,8]\) (signed). This notation is based on a common two-line notation for permutations of \(\mathbf n =\{1,2,\dots ,n\}\), in which the top line gives the elements of the domain \(\mathbf n \) and the bottom line gives the images of each element of \(\mathbf n \), so that

$$\begin{aligned} {[}1,2,3,-7,-6,-5,-4,8{]} \text{ is } \text{ shorthand } \text{ for } \left( \begin{array}{cccccccc} 1&{}2&{}3&{}4&{}5&{}6&{}7&{}8\\ 1&{}2&{}3&{}-7&{}-6&{}-5&{}-4&{}8 \end{array}\right) . \end{aligned}$$

The set of all such permutations forms a group, with each genome considered to be a group element (the reference genome being the identity element of the group). The group of unsigned permutations of a set \(\mathbf n \) is a subgroup of the group of signed permutations (the hyperoctahedral group, or Coxeter group of type \(B\)).

In general, we define an inversion system to be a tuple \((G,\mathcal I )\) where \(G\) is a permutation group and \(\mathcal I \) is a set of inversions such that \(\langle \mathcal I \rangle =G\), i.e. \(\mathcal I \) generates \(G\). If we choose a particular genome arrangement to be the reference genome, each possible genome can be considered to be a group element with respect to this reference. For any pair of genome arrangements there is a unique group element that transforms one into the other. For instance if genomes \(G_1\) and \(G_2\) are represented by group elements (permutations \(g_1\) and \(g_2\)), then the group element transforming \(G_1\) into \(G_2\) is \(g_1^{-1}g_2\) (we write our groups acting on the right). This group element is independent of the choice of reference genome. In any inversion system the inversion distance problem is equivalent to finding the minimum length of a group element as a word in the generators. Given the impossibility of searching the entire group when the number of regions is large (for example, genomes with 60 regions can correspond to groups of order \(\sim 10^{100}\)), it is necessary to exploit the algebraic structure.

If all inversions are allowed and we ignore orientation, then we have the group of all permutations of \(\mathbf n \), namely the symmetric group \(S_n\), the generators are all possible inversions of intervals on the circle, and the metric is the word length, up to the action of the dihedral group (see Sect. 2.3 for more details of this action). This is the model considered by Watterson et al. (1982), and for which they obtained bounds on the distance. A significant number of extensions and improvements have followed this path with great success (Kececioglu and Sankoff 1993; Hannenhalli and Pevzner 1995; Bader et al. 2001). The signed version of this model, in which regions are regarded as having orientation, gives rise to the hyperoctahedral group, and is the most widely studied model in the inversion distance literature.

If all inversions are equally probable and are on the circle, an inversion of one region is equivalent to the inversion of the complementary region. One consequence is that one may consider a select region to be fixed, and only consider inversions that do not move it. This enables treatment of the problem as if it were a linear chromosome. This is the basis for many efficient methods currently available, including the use of breakpoints (Kececioglu and Sankoff 1993) and methods using the breakpoint graph due to Bafna and Pevzner (1993).

In this paper we will study a non-uniform model, in which we only permit inversions of two adjacent regions. In any such model in which the length of an inversion is restricted, it is not always possible to find the minimum number of inversions by cutting the genome and treating it as if it were linear. This is best seen with an example.

2.2 Circular permutations with restricted inversion size: an illustrative example

Consider the circular genome with eight regions numbered \(1, \dots , 8\), and arranged around the circle in the order \([1,6,3,8,5,2,7,4]\). This amounts to an arrangement in which each even numbered region has moved forwards two spaces and each odd backwards two spaces. This arrangement can be sorted by 8 inversions of adjacent regions. For instance if we write \(s_i\) for the swap \((i\ i+1)\), with regions numbered mod 8 (so that \(s_8\) means swapping the adjacent regions 8 and 1), this arrangement can be returned to the identity via the action of \(s_8s_6s_4s_2s_7s_5s_3s_1\):

$$\begin{aligned} {[}1,6,3,8,5,2,7,4{]}s_8s_6s_4s_2s_7s_5s_3s_1&= [4,6,3,8,5,2,7,1]s_6s_4s_2s_7s_5s_3s_1\\&= [4,6,3,8,5,7,2,1]s_4s_2s_7s_5s_3s_1\\&= [4,6,3,5,8,7,2,1]s_2s_7s_5s_3s_1\\&= [4,3,6,5,8,7,2,1]s_7s_5s_3s_1\\&= [4,3,6,5,8,7,1,2]s_5s_3s_1\\&= [4,3,6,5,7,8,1,2]s_3s_1\\&= [4,3,5,6,7,8,1,2]s_1\\&= [3,4,5,6,7,8,1,2] \end{aligned}$$

which is the identity arrangement (on the circle), as shown in Fig. 1. The minimum number of inversions required to sort this arrangement by cutting-linearizing is 9.

Fig. 1
figure 1

An arrangement of eight regions whose length with 2-region inversions on the circle is 8 but when calculated by cutting and linearizing is 9

2.3 Circular permutations

The example above raises two important subtleties regarding circular permutations that do not affect linear rearrangements. The first is that for regions on a circle numbered 1 to \(n\), position \(n\) is adjacent to both position \(n-1\) and position 1. So, while an arrangement on the line with \(n\) and \(1\) swapped (that is \([n,2,3,\dots ,n-1,1]\)) requires many short inversions to sort, on a circle it requires just one 2-region inversion swapping \(n\) and \(1\). This feature does not affect matters if all inversions are equally probable, because an inversion of the regions \(2,\dots ,n-1\) is just as likely as one of \(n\) and 1, so the inversions across the boundary between \(n\) and 1 can be ignored. Of course, an inversion of the regions \(2,\dots ,n-1\) results in the regions apparently reversed in order: \([n,n-1,\dots ,2,1]\). However since our genome is in three dimensions this is equivalent to the arrangement \([1,2,\dots ,n-1,n]\). This brings to the fore the second subtlety.

Because the circular chromosome lives in three dimensions, we consider arrangements to be equivalent if they can be obtained from each other by rotating the circle or by turning it over (ignoring topological features such as knotting). This is because these actions on the circle simply correspond to different viewpoints from which to observe the chromosome. The actions of rotating and reflecting a circle form a group (because they can be composed associatively and have inverse operations), and when we identify \(n\) regularly positioned points on the circle the group is called the dihedral group, denoted \(\mathcal D _n\). The set of arrangements equivalent to \(\sigma \) under the action of the dihedral group is the coset \(\sigma \mathcal D _n\).

On the one hand this means our search space—the set of genuinely distinct arrangements—is reduced by a factor of \(2n\) (there are \(2n\) elements of \(\mathcal D _n\)). On the other hand, this means that to find the minimum number of inversions required to generate a given arrangement we need to consider each element of the coset, or in other words, each frame of reference. In what follows, our strategy is to develop a method to find the length of a permutation with respect to any given frame of reference, and then search over the \(2n\) frames of reference for the shortest.

We note that this subtlety is often not considered in the inversion distance literature (the initial paper by Watterson et al. 1982 is an exception). This may be because in a model in which inversions of any length are equally probable, the circular chromosome can be treated as if it were linear: as remarked above, an inversion across the origin of replication is equivalent to one on the complementary region. This feature is not enough, however, to account for the rotation and reflection symmetry, and to find the minimal distance under existing models it is still necessary to evaluate the distance under the different frames of reference (as shown in Fig. 2) obtained by the action of the \(2n\) elements of the dihedral group.

Fig. 2
figure 2

Some of the \(2n\) “frames of reference” obtained by action of the dihedral group \(D_n\) on the identity arrangement. Clockwise rotation is denoted by \(r\) and the reflection in the vertical axis (flip) by \(f\)

3 The two-region inversion distance problem for circular genomes

For the remainder of the paper we focus on a constrained model in which only inversions of two adjacent regions are permitted, and in which we ignore orientation of the regions. The restriction to two-region inversions allows us to exploit the theory of Coxeter groups. In this model the group \(G\) is the symmetric group \(S_n\) and the generating set \(\mathcal I \) is the set of two-region inversions. Note that because we are on the circle this is not a standard generating set for the symmetric group. Notation and a group presentation of \((G,\mathcal I )\) are given in Sect. 3.1.

The key idea we exploit is that the best way to view circular permutations is by lifting the problem to the (extended) affine symmetric group. This is the group of periodic permutations of the integers, and is a natural place to study circular permutations because by virtue of being on a circle, these are also periodic. The affine symmetric group and the extended affine symmetric group are introduced in Sects. 3.2 and 3.3.

An important result that we use is a theorem of Shi (Theorem 3.3) that gives a formula for the length of an element of the affine symmetric group with respect to the standard generators. Unfortunately, it cannot be used directly because despite the connection between circular and affine permutation groups, they are not identical—for one thing the affine symmetric group is infinite and the circular permutation group is finite. In fact the circular permutation group is a quotient of the affine symmetric group, and when we “lift” a circular permutation to the affine situation (effectively looking at its pre-image) we have infinitely many choices. The goal of Sects. 3.4 and 3.5 is to present a way to choose a representative of a circular permutation in the affine symmetric group that has the same length as the circular permutation. In this way we derive a method for finding the inversion-length of the circular permutation: choose a pre-image in the affine symmetric group that has the same length, and then apply Shi’s formula.

The group-theoretic results in Sects. 3.4 and 3.5 are, as far as we are aware, new to algebra. These are examples of answers to group-theoretic questions that arise directly from the biological models. In other words, the group-theoretic questions asked, relating to lifts of circular permutations, are not ones that have arisen in other applications and so it is necessary to prove new claims.

In particular, in Sect. 3.4 we prove that if a lift of a circular permutation is of minimal length out of all possible lifts, then it satisfies some constraints on the “crossings” (Definition 3.4) that are possible. These constraints are summarized in Corollaries 3.10 and 3.11. Finally in Sect. 3.5 we prove the main result of this paper, Theorem 3.16, that in an affine permutation that is a minimal lift of a circular permutation, each element of \(\mathbf n =\{1,2,\dots ,n\}\) is moved a minimal amount. This result (almost) determines the lift from the circular permutation to the affine symmetric group that gives the minimal length, as required.

3.1 The two-region inversion model

The group generators for this model are inversions of two regions, that is, simple transpositions \(s_i=(i\ i+1)\) swapping regions \(i\) and \(i+1\). However because we are on the circle with \(n\) regions we consider regions to be equivalence classes of integers mod \(n\), so that \(s_n=(n\ n+1)\) is the transposition swapping regions \(n\) and 1, and our group is the set of bijections on \(\mathbb Z _n\).

If we were to consider only the generators \(s_1,\dots s_{n-1}\) then this group would be precisely the standard presentation for the symmetric group as a Coxeter group of type \(A\). The relations in this case are

$$\begin{aligned} \begin{array}{ll} s_i^2= 1&{}\quad \text{ for } \text{ each }\quad i=1,\dots ,n-1, \\ s_is_j=s_js_i&{}\quad \text{ if }\quad |i-j|>1,\text{ and }\\ s_is_{i+1}s_i=s_{i+1}s_is_{i+1}&{}\quad \text{ for } \text{ each }\quad i=1,\dots ,n-2. \end{array} \end{aligned}$$

The additional generator \(s_n\) can be written in terms of these generators as the conjugate

$$\begin{aligned} s_n=s_{n-1}s_{n-2}\dots s_2s_1s_2\dots s_{n-2}s_{n-1} \end{aligned}$$

and it behaves in a precisely analogous manner to the above, giving a presentation (via Tietze transformations) for our circular permutation group with generators \(\{s_i\mid i=1,\dots ,n\}\) and relations:

$$\begin{aligned} \begin{array}{l@{\quad }l} s_i^2=1&{} \text{ for } \text{ each }\quad i=1,\dots ,n, \\ s_is_j= s_js_i&{} \text{ if }\quad i-j\mod n\ne \pm 1,\\ s_is_{i+1}s_i= s_{i+1}s_is_{i+1}&{} \text{ for } \text{ each }\quad i=1,\dots ,n-1, \text{ and }\\ s_n=s_{n-1}s_{n-2}\dots s_2s_1s_2\dots s_{n-2}s_{n-1}. \end{array} \end{aligned}$$

While the length function for the symmetric group with respect to the Coxeter-type presentation is well-studied (see for example Humphreys 1990), the additional generator provided by the circle makes the relevant theorems inapplicable. Fortunately, there is another way to view this group using the affine symmetric group \(\tilde{S}_n\).

In the remainder of this section we show how to lift a circular permutation to the affine symmetric group in such a way that the length in the affine symmetric group is the minimal number of 2-region inversions required to sort the circular permutation (see Fig. 3). Because we know how to calculate length in the affine symmetric group (see next section), we can then use this to find the length of the circular permutation. We will in fact be working often in the extended affine symmetric group \(\tilde{S}_n^{ext}\), which we introduce in Sect. 3.3. This group is defined in the same way as the affine symmetric group, but without the restriction on the sum of the images of a permutation given by Eq. (2) below.

Fig. 3
figure 3

The strategy of lifting a circular permutation \(s\) to an affine permutation \(\sigma \) in order to calculate its length. Thick arrows denote the mappings that are constructed in this paper

3.2 The affine symmetric group

Recall that \(\mathbf n :=\{1,2,\dots ,n\}\).

Definition 3.1

(Affine permutation) A bijection \(\sigma :\mathbb Z \rightarrow \mathbb Z \) is an affine permutation if it satisfies the following two conditions:

periodic:

$$\begin{aligned} \sigma (i+n)=\sigma (i)+n\quad \text{ for } \text{ all } i\in \mathbb Z \end{aligned}$$
(1)

balanced:

$$\begin{aligned} \sum _{i\in \mathbf n } \sigma (i) \;=\; \sum _{i\in \mathbf n } i \;=\; \frac{n(n+1)}{2} \end{aligned}$$
(2)

following Lusztig (1983).

The affine symmetric group \(\tilde{S}_n\) can be realized as the group of affine permutations. The symmetric group \(S_n\) can be identified with a subgroup of \(\tilde{S}_n\) by extending each permutation from \(\mathbf n \) to \(\mathbb Z \) obeying the periodicity requirement, and can also be obtained by projecting from \(\tilde{S}_n\) by reducing domain and codomain mod \(n\). In particular the generators \(\tilde{s}_i\) of \(\tilde{S}_n\) are the periodic permutations \(\tilde{s}_i(i+kn)=i+1+kn\) and \(\tilde{s}_i(i+1+kn)=i+kn\) for all \(k\in \mathbb Z \), and \(\tilde{s}_i(j)=j\) if \(j\not \equiv i\) or \(i+1\mod n\).

As with the symmetric group in relation to the Coxeter presentation, there is an easy formula for the minimum number of (affine) transpositions \(\tilde{s}_i\) required to represent an affine permutation, due to Shi (1986) (see also Shi 1994). In our context, using the length function in \(S_n\) would not take into account the additional generator \(s_n\); the key idea here is that by lifting a circular permutation up to the affine symmetric group the length function in \(\tilde{S}_n\) accounts for all \(n\) generators.

The obstacle to using the length function in the affine symmetric group is that a circular permutation doesn’t define a unique affine permutation: we need to choose for each \(s(i)\in \mathbb Z /n\mathbb Z \) a representative \(\sigma (i)\in \mathbb Z \), and we need to do this in such a way that the length of the obtained affine permutation is minimised.

To illustrate the problem, consider the circular permutation

$$\begin{aligned} \sigma =\left( \begin{array}{lllll} 1&{}2&{}3&{}4&{}5\\ 3&{}5&{}4&{}1&{}2\end{array}\right) , \end{aligned}$$

which we denote in “window” notation by \([3,5,4,1,2]\). This has \(3\mapsto 4\), but lifting to \(\tilde{S}_5\) we could have any choice of \(3\mapsto \{4+5j\mid j\in \mathbb Z \}\) (see Fig. 4). All choices give equivalent circular permutations.

Fig. 4
figure 4

Some of the infinitely many alternative ways to lift the mapping \(3\mapsto 4\) to the affine symmetric group

3.3 The extended affine symmetric group

Some choices of how to lift a circular permutation to an affine permutation will result in periodic permutations that do not satisfy condition (2) of Definition 3.1: that is, they are not balanced. Periodic permutations that are not balanced are part of the extended affine symmetric group, as defined in this subsection. We also give the Shi length formula (Theorem 3.3) that holds for both the extended and the non-extended affine symmetric groups, and define the crossing number of an affine permutation (Definition 3.4).

Definition 3.2

(Extended affine permutation) An extended affine permutation is a periodic bijection \(\sigma :\mathbb Z \rightarrow \mathbb Z \).

That is, a balanced extended affine permutation is an affine permutation: an element of \(\tilde{S}_n\). The set of extended affine permutations also forms a group, denoted \(\tilde{S}_n^{ext}\), and it can be generated by the same set \(\tilde{s}_i\) for \(i\in \mathbf n \) together with the additional permutation \(\tau :\mathbb Z \rightarrow \mathbb Z \) given by \(\tau (i)=i+1\) for all \(i\in \mathbb Z \).

Any extended affine permutation \(\sigma \) can be balanced by multiplication with a suitable power of \(\tau \). Importantly, this power of \(\tau \) does not affect the number of transpositions \(\tilde{s}_i\) needed to express \(\sigma \): any extended affine permutation may be written as a product \(\tau ^kw\) where \(w\) is an expression in the \(\tilde{s}_i\) for \(i=1,\dots ,n\), and \(k\) as well as the length of \(w\) are unique. The unique power of \(\tau \) in such an expression is called the winding number of the permutation.

In the context of circular permutations, \(\tau \) corresponds to a rotation of the circle by one position.

Let \(\mathrm P = \big \{ (i,j)\in \mathbf n ^2 \mid i<j \big \}\), i.e. the set of all those pairs of regions that are in strictly ascending order.

Theorem 3.3

(Shi 1986) Given an extended affine permutation \(\sigma \), the length \(\ell (\sigma )\) of \(\sigma \), that is the minimum number of transpositions occurring in any representation of \(\sigma \) as a product of transpositions, is given by

$$\begin{aligned} \ell (\sigma ) = \sum _{(i,j)\in \mathrm P } \left| \left\lfloor \frac{\sigma (j)-\sigma (i)}{n}\right\rfloor \right| . \end{aligned}$$
(3)

Proof

See Lemma 4.2.2 of Shi (1986), pp. 68–70.\(\square \)

Definition 3.4

Given an extended affine permutation \(\sigma \), and \((i,j)\in \mathrm P \), we define the crossing number \(\kappa _{i,j}(\sigma )\) of the positions \(i\) and \(j\) in \(\sigma \) as

$$\begin{aligned} \kappa _{i,j}(\sigma ) = \left\lfloor \frac{\sigma (j)-\sigma (i)}{n} \right\rfloor . \end{aligned}$$

Associating to \(\sigma \) a diagram as in Fig. 4, the crossing number for the pair \((i,j)\), \(\kappa _{i,j}(\sigma )\), is the number of strands \((j+kn)\mapsto \sigma (j+kn)\) (\(k\in \mathbb Z \)) that cross the strand \(i\mapsto \sigma (i)\), where crossings from the left are counted positive and crossings from the right are counted negative.

Definition 3.5

For \(k\in \mathbb Z \) we define the set of all strictly ascending pairs with crossing number exactly \(k\) in \(\sigma \) as

$$\begin{aligned} I_k(\sigma ) = \big \{ (i,j)\in \mathrm P \mid \kappa _{i,j}(\sigma )=k \big \} = \big \{ (i,j)\in \mathrm P \mid kn\le \sigma (j)-\sigma (i)<(k+1)n \big \}\!. \end{aligned}$$

Note that

$$\begin{aligned} \sum _{(i,j)\in \mathrm P } \left| \left\lfloor \frac{\sigma (j)-\sigma (i)}{n}\right\rfloor \right|&= \sum _{(i,j)\in \mathrm P } \left| \kappa _{i,j}(\sigma ) \right| \nonumber \\&= \sum _{k\in \mathbb Z } |k|\cdot |I_k(\sigma )|\nonumber \\&= \sum _{k\in \mathbb N ^+} |k|\cdot \big (|I_k(\sigma )|+|I_{-k}(\sigma )|\big ). \end{aligned}$$
(4)

3.4 Crossing numbers for minimum length representatives

Thanks to Shi’s formula (Theorem 3.3), the crossing numbers are closely related to length. So given a minimum length representative of a circular permutation, there should be some constraints on the crossing numbers. In this section we show that the only crossing numbers that can occur in a minimum length representative of a given circular permutation are \(-1\), \(0\), and \(+1\).

We start by proving some “transitivity” constraints for pairs of crossing numbers involving a common position. For instance we provide constraints on the crossing number of the pair \((i,k)\) when we have crossing numbers for \((i,j)\) and \((j,k)\) (Part (i)).

Lemma 3.6

Let \(\sigma \) be an extended affine permutation. The following hold:

  1. (i)

    If \((i,j)\in I_r(\sigma )\) and \((j,k)\in I_s(\sigma )\) then \((i,k)\in I_{r+s}(\sigma )\cup I_{r+s+1}(\sigma )\).

  2. (ii)

    If \((i,j)\in I_r(\sigma )\) and \((i,k)\in I_s(\sigma )\) then \((j,k)\in I_{s-r-1}(\sigma )\cup I_{s-r}(\sigma )\) or \((k,j)\in I_{r-s-1}(\sigma )\cup I_{r-s}(\sigma )\).

  3. (iii)

    If \((i,k)\in I_r(\sigma )\) and \((j,k)\in I_s(\sigma )\) then \((i,j)\in I_{r-s-1}(\sigma )\cup I_{r-s}(\sigma )\) or \((j,i)\in I_{s-r-1}(\sigma )\cup I_{s-r}(\sigma )\).

Proof

  1. (i)

    As \((i,j)\in I_r(\sigma )\) is equivalent to \(rn\le \sigma (j)-\sigma (i)<(r+1)n\) and \((j,k)\in I_s(\sigma )\) is equivalent to \(sn\le \sigma (k)-\sigma (j)<(s+1)n\), one has \((r+s)n\le \sigma (k)-\sigma (i)<(rs+2)n\), whence \((i,k)\in I_{r+s}(\sigma )\cup I_{r+s+1}(\sigma )\).

  2. (ii)

    As \((i,j)\in I_r(\sigma )\) is equivalent to \(-(r+1)n<\sigma (i)-\sigma (j)\le -rn\) and \((i,k)\in I_s(\sigma )\) is equivalent to \(sn\le \sigma (k)-\sigma (i)<(s+1)n\), one has \((s-r-1)n<\sigma (k)-\sigma (j)<(s-r+1)n\), respectively \((r-s-1)n<\sigma (j)-\sigma (k)<(r-s+1)n\). If \(j<k\), the former implies \((j,k)\in I_{s-r-1}(\sigma )\cup I_{s-r}(\sigma )\), while in the case \(k<j\), the latter implies \((k,j)\in I_{r-s-1}(\sigma )\cup I_{r-s}(\sigma )\).

  3. (iii)

    As \((i,k)\in I_r(\sigma )\) is equivalent to \(rn\le \sigma (k)-\sigma (i)<(r+1)n\) and \((j,k)\in I_s(\sigma )\) is equivalent to \(-(s+1)n<\sigma (j)-\sigma (k)\le -sn\), one has \((r-s-1)n<\sigma (j)-\sigma (i)<(r-s+1)n\), respectively \((s-r-1)n<\sigma (i)-\sigma (j)<(s-r+1)n\). If \(i<j\), the former implies \((i,j)\in I_{r-s-1}(\sigma )\cup I_{r-s}(\sigma )\), while in the case \(j<i\), the latter implies \((j,i)\in I_{s-r-1}(\sigma )\cup I_{s-r}(\sigma )\). \(\square \)

The idea now is the following: Assume that \(\alpha >1\) is the maximal integer such that \(I_{-\alpha }(\sigma )\cup I_{\alpha }(\sigma )\) is non-empty. This means that \(|\alpha |\) is the maximal size of a crossing number in \(\sigma \). We will define two transformations that reduce this, by removing the pairs with crossing numbers \(-\alpha \) (respectively \(\alpha \)) without increasing the absolute value of any crossing number, except from 0 to 1. The proof of this claim uses the transitivity constraints in Lemma 3.6 and the maximality of \(\alpha \).

Given an extended affine permutation \(\sigma \) and a subset \(S\subseteq \mathbf n \), we define the extended affine permutation \(\sigma ^S\) by setting

$$\begin{aligned} \sigma ^S(i) = {\left\{ \begin{array}{ll} \sigma (i) + n &{}\quad \text{ if } i\in S, \\ \sigma (i) &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$

That is, \(\sigma ^S\) has the image of each element of \(S\) shifted by \(n\). Note that this does not change the circular permutation that they correspond to.

Lemma 3.7

For an extended affine permutation \(\sigma \), if \(\kappa _{i,j}(\sigma )=r\), then one has

$$\begin{aligned} \kappa _{i,j}(\sigma ^S) = {\left\{ \begin{array}{ll} r+1 &{}\quad \text{ if } i\notin S \text{ and } j\in S\text{, } \\ r-1 &{}\quad \text{ if } i\in S \text{ and } j\notin S\text{, } \\ r &{}\quad \text{ otherwise. } \end{array}\right. } \end{aligned}$$

Proof

The claims follow immediately from Definition 3.4.

The following two propositions show how to choose a subset \(S\subseteq \mathbf n \) in such a way that the largest crossing number for \(\sigma ^S\) is strictly lower than that of \(\sigma \).

Proposition 3.8

Let \(\sigma \) be an extended affine permutation, and assume that \(\alpha >1\) is the maximal integer such that \(I_{\alpha }(\sigma )\cup I_{-\alpha }(\sigma )\) is non empty.

For \(S=\big \{ i\in \mathbf n \mid \exists \, j\in \mathbf n \,,\; (i,j)\in I_{\alpha }(\sigma )\big \}\) one has \(I_{\alpha }(\sigma ^S)=\varnothing \). Moreover, for any \((i,j)\in \mathrm P \) with \(\kappa _{i,j}(\sigma )\ne 0\), one has \(|\kappa _{i,j}(\sigma ^S)| \le |\kappa _{i,j}(\sigma )|\).

Proof

Assume that \((i,j)\in I_r(\sigma )\), so that \(\kappa _{i,j}(\sigma )=r\), and suppose

$$\begin{aligned} |\kappa _{i,j}(\sigma ^S)| > |\kappa _{i,j}(\sigma )| = |r| > 0. \end{aligned}$$

Then, by Lemma 3.7, if \(r>0\) we have \(i\notin S\) and \(j\in S\), and if \(r<0\) we have \(i\in S\) and \(j\notin S\).

If \(r>0\) then there exists \(k\in \{j+1,\ldots ,n\}\) with \((j,k)\in S=I_{\alpha }(\sigma )\), so \((i,k)\in I_{\alpha +r}(\sigma )\cup I_{\alpha +r+1}(\sigma )\) by Lemma 3.6 contradicting the maximality of \(\alpha \).

If \(r<0\) then there exists \(k\in \{i+1,\ldots ,n\}\) such that \((i,k)\in S=I_{\alpha }(\sigma )\). By Lemma 3.6, either \(j<k\) and \((j,k)\in I_{\alpha -r-1}(\sigma )\cup I_{\alpha -r}(\sigma )\), or \(k<j\) and \((k,j)\in I_{r-\alpha -1}(\sigma )\cup I_{r-\alpha }(\sigma )\). Since \(r<0\), the only situation compatible with the maximality of \(\alpha \) is \(r=-1\) and \((j,k)\in I_{\alpha }(\sigma )\), but in this case \(j\in S\), which is a contradiction.

By Lemma 3.7, it only remains to show that \(\kappa _{i,j}(\sigma ^S)\ne \kappa _{i,j}(\sigma )\) for any \((i,j)\in I_{\alpha }(\sigma )\). As \((i,j)\in I_{\alpha }(\sigma )\) implies \(i\in S\), one can only have \(\kappa _{i,j}(\sigma ^S)=\kappa _{i,j}(\sigma )\) if \(j\in S\), that is, if there exists \(k\in \{j+1,\ldots ,n\}\) with \((j,k)\in S=I_{\alpha }(\sigma )\). But then, by Lemma 3.6, \((i,k)\in I_{2\alpha }(\sigma )\cup I_{2\alpha +1}(\sigma )\), again contradicting the maximality of \(\alpha \).\(\square \)

A similar argument deals with the case in which \(I_{\alpha }(\sigma )\) is empty:

Proposition 3.9

Let \(\sigma \) be an extended affine permutation, and assume that \(\alpha >1\) is the maximal integer such that \(I_{\alpha }(\sigma )\cup I_{-\alpha }(\sigma )\) is non empty. Moreover assume that \(I_{\alpha }(\sigma )=\varnothing \).

For \(S=\big \{ j\in \mathbf n \mid \exists \, i\in \mathbf n \,,\; (i,j)\in I_{-\alpha }(\sigma )\big \}\) one has \(I_{-\alpha }(\sigma ^S)=\varnothing \). Moreover, for any \((i,j)\in \mathrm P \) with \(\kappa _{i,j}(\sigma )\ne 0\), one has \(|\kappa _{i,j}(\sigma ^S)| \le |\kappa _{i,j}(\sigma )|\).

Proof

The proof is similar to that of Proposition 3.8.\(\square \)

We now translate these results into the context of a circular permutation \(s\). In particular, we have the key conclusion that if \(\sigma \) is a minimal affine representative of \(s\) then its crossing numbers are at most \(\pm 1\).

Corollary 3.10

If \(s\) is a circular permutation and \(\sigma \) is an extended affine permutation representing \(s\), whose length is minimal among all representatives of \(s\), then \(I_k(\sigma )=\varnothing \) for all \(k\in \mathbb Z {\setminus }\{-1,0,+1\}\).

Proof

If \(\sigma \) is a representative of \(s\) and \(I_k(\sigma )\ne \varnothing \) for some \(k\in \mathbb Z \setminus \{-1,0,+1\}\), then application of Proposition 3.8 or Proposition 3.9 produces another representative of \(s\) that has, by Theorem 3.3, smaller length.\(\square \)

Corollary 3.11

If \(s\) is a circular permutation and \(\sigma \) is an extended affine permutation representing \(s\), whose length is minimal among all representatives of \(s\), then \(\max \big \{\sigma (i)\mid i\in \mathbf n \big \}-\min \big \{\sigma (i)\mid i\in \mathbf n \big \} < 2n\).

Proof

Choose \(k\in \mathbf n \) such that \(\sigma (k)=\min \big \{\sigma (i)\mid i\in \mathbf n \big \}\). One has \(\sigma (j)-\min \big \{\sigma (i)\mid i\in \mathbf n \big \}=\sigma (j)-\sigma (k)<2n\) for all \(j\in \mathbf n \) by Corollary 3.10, so the claim holds.\(\square \)

We have now placed significant constraints on an important feature of an affine permutation (its crossing numbers), when it is a minimal length representative of a circular permutation. The question remains of how to choose an affine representative that satisfies these constraints on the crossing numbers. This is addressed in the next section.

3.5 Finding a minimum length representative

In this section we show that a minimal affine representative of a circular permutation must have images for each \(i\in \mathbf n \) that are the minimal possible distance from \(i\) (Theorem 3.16). To begin with, we will need to define the nett crossing number of a position \(i\in \mathbf n \) (Definition 3.12), as well as the winding number of \(\sigma \).

Definition 3.12

Given an extended affine permutation \(\sigma \), and \(i\in \mathbf n \), we define the nett crossing number \(\nu _{i}(\sigma )\) of the position \(i\) to be

$$\begin{aligned} \nu _{i}(\sigma ) = \sum _{j\in \mathbf n } \kappa _{i,j}(\sigma ). \end{aligned}$$

Any vertical line in general position through a diagram in \(\tilde{S}_n^{ext}\) has the same number of nett crossings (crossings from the left minus crossings from the right, or vice versa). This number is the winding number of the permutation, and is the exponent on \(\tau \) in its expression in terms of the \(s_i\) and \(\tau \), as described in Sect. 3.3 (the elements of winding number zero are the affine permutations, that is, the “balanced” ones). This feature follows because each generator \(s_i\) has zero nett crossings, and nett crossings are invariant under Reidemeister moves.

We now have a lemma that is a direct consequence of the results in the previous section, showing that in minimal representatives the images of regions cannot move more than \(n\). This is somewhat intuitive, since if it were not true it would mean that in a minimal sequence of inversions a region could move more than a full circle to its destination, which (intuitively) seems unlikely.

Lemma 3.13

If \(\sigma \) is a minimal balanced representative of a circular permutation, then \(|\sigma (i)-i|<n\).

Proof

If \(|\sigma (i)-i|\ge n\) then there are two alternatives: \(\sigma (i)\ge i+n\) or \(\sigma (i)\le i-n\). We show that either leads to a contradiction.

Suppose that \(\sigma (i)\ge i+n\). Then \(\sigma (i-n)\ge i\) by periodicity. If there was a \(j>i\) that had an image \(\sigma (j)\) less than \(\sigma (i-n)=\sigma (i)-n\), then it would cross the \(i\) strand twice, violating Lemma 3.10. This implies that the winding number of \(\sigma \) is strictly positive, because in the vertical line down from \(i+\varepsilon \) (for \(\varepsilon \) sufficiently small), the only crossings can be from the left. This violates the balance of \(\sigma \) and so is a contradiction.

The second case is symmetric.\(\square \)

The following is a technical consequence of Lemma 3.13 that is needed in the next proof:

Lemma 3.15

Suppose \(j\in \mathbb Z {\setminus }\{0\}\). In a minimal balanced representative,

$$\begin{aligned} |\sigma (i)-i+jn|={\left\{ \begin{array}{ll} \sigma (i)-i+jn&{}\quad \text{ if } j>0\\ -(\sigma (i)-i+jn)&{}\quad \text{ if } j<0. \end{array}\right. } \end{aligned}$$

Proof

This is immediate from Lemma 3.13, since either \(0<\sigma (i)-i<n\) or \(-n<\sigma (i)-i<0\).\(\square \)

Lemma 3.15

If \(\sigma \) is a balanced permutation then \(\nu _i(\sigma )=i-\sigma (i)\). Moreover, if \(\sigma \) has winding number \(k\) then \(\nu _i(\sigma )=i-\sigma (i)+k\).

Proof

If \(\sigma \) is balanced, the element \(\sigma \tau ^{i-\sigma (i)}\) of \(\tilde{S}_n^{ext}\) sends \(i\mapsto i\), and has winding number \(i-\sigma (i)\). A vertical line in general position just to the left or right of \(i\) will have the same number of nett crossings as the strand \(i\mapsto i\), namely the winding number \(i-\sigma (i)\). But the multiplication of \(\sigma \) by \(\tau ^{i-\sigma (i)}\) does not change any crossings, and hence the claim follows. When the winding number of \(\sigma \) is \(k\ne 0\), the winding number after multiplication by \(\tau ^{i-\sigma (i)}\) is \(k+(i-\sigma (i))\) as required.\(\square \)

Theorem 3.16

If \(s\) is a circular permutation and \(\sigma \) is an affine permutation representing \(s\), whose length is minimal among all representatives of \(s\), then for each \(i\in \mathbf n ,\, \sigma \) takes the shortest distance between \(i\) and the equivalence class \(\sigma (i)\mod n\).

Proof

We have a circular permutation \(s\) and a balanced, minimal length lift, \(\sigma \in \tilde{S}_n\). Our claim is that each \(\sigma (i)\) is the minimal distance from \(i\) of all choices \(\{\sigma (i)+jn\mid j\in \mathbb Z \}\).

For an arbitrary \(i\), we consider alternative choices of image from the set \(\{\sigma (i)+jn\mid j\in \mathbb Z {\setminus }\{0\}\}\). We claim that any of these alternatives increases the distance of the image from \(i\).

Given that \(\sigma \) is balanced, a choice of \(i\mapsto \sigma (i)+jn\) results in winding number \(j\), and so the nett number of crossings of the line \(i\mapsto \sigma (i)+jn\)  is  \(i-(\sigma (i)+jn)+j\), by Lemma 3.15. If \(j>0\), all crossings of this edge are from the right, as crossings from the left would cross the edge \(i\mapsto \sigma (i)\) twice, violating minimality by Corollary 3.10 (see Fig. 5). Similarly, if \(j<0\) all crossings are from the left, for the same reason.

It follows that the total number of crossings of the edge \(i\mapsto \sigma (i)+jn\) equals the absolute value of the nett number of crossings, namely

$$\begin{aligned} \text{ total } \text{ crossings } \text{ of } (i\mapsto \sigma (i)+jn)&= {\left\{ \begin{array}{ll} -(i-(\sigma (i)+jn)+j)&{}\quad \text{ if } j>0\\ i-(\sigma (i)+jn)+j&{}\quad \text{ if } j<0. \end{array}\right. }\\&= {\left\{ \begin{array}{ll} (\sigma (i)-i)+jn-j&{}\quad \text{ if } j>0\\ (i-\sigma (i))-jn+j&{}\quad \text{ if } j<0. \end{array}\right. } \end{aligned}$$

For the original strand we have \(i-\sigma (i)\) nett crossings and distance \(|\sigma (i)-i|\). So the total number of crossings of the strand is at least \(|i-\sigma (i)|\). Therefore we have the inequalities:

$$\begin{aligned} |i-\sigma (i)|&\le \text{ total } \text{ crossings } \text{ of } (i\mapsto \sigma (i))\\&\le \text{ total } \text{ crossings } \text{ of } (i\mapsto \sigma (i)+jn) \text{ by } \text{ minimality } \text{ of } \sigma \text{, }\\&= {\left\{ \begin{array}{ll} (\sigma (i)-i)+jn-j&{}\quad \text{ if } j>0\\ (i-\sigma (i))-jn+j&{}\quad \text{ if } j<0, \end{array}\right. }\\&< {\left\{ \begin{array}{ll} (\sigma (i)-i)+jn&{}\quad \text{ if } j>0\\ (i-\sigma (i))-jn&{}\quad \text{ if } j<0, \end{array}\right. }\\&= |\sigma (i)+jn-i|\quad \text{ by } \text{ Lemma } \text{3.14 }\\&= \text{ the } \text{ distance } \text{ from }\, i\, to\, \sigma (i)+jn. \end{aligned}$$

In other words, if \(\sigma \) is a balanced, minimum length representative of the circular permutation \(s\), then each \(\sigma (i)\) is the minimum distance from \(i\) of all the alternatives \(\{\sigma (i)+jn\mid j\in \mathbb Z \}\), as required. \(\square \)

Fig. 5
figure 5

The dotted arrows in the left panel indicate all possible sets of connections that cross the strand \(i\mapsto \sigma (i)\) and a hypothetical alternative strand \(i\mapsto \sigma (i)+n\) (shown dashed). If \(\sigma \) is minimal then no pair of equivalence classes of strands cross more than once, by Corollary 3.10. This means one of the sets of connections drawn on the left is empty and the only possible non-empty sets are shown on the right panel. Note that the panel on the right shows crossings of the hypothetical alternative strand that are all from the same side (right to left)

Note: the distance between \(i\) and \(\sigma (i)\) is strictly less than that between  \(i\) and \(\sigma (i)+n\) when \(\sigma \) is drawn with the winding number zero. If \(\sigma (i)+n\) is instead chosen as the image of \(i\) the winding number becomes 1, as noted in the proof. If the permutation is rebalanced (changing the frame of reference) then the bottom axis is moved one to the left, resulting in the distance \(i\) to \(\sigma (i)\) increasing by one and the distance \(i\) to \(\sigma (i)+n\) decreasing by 1. Consequently, some choices \(\sigma (i)\) or \(\sigma (i)+n\) could be equivalent. This occurs when \(\sigma (i)=(n-1)/2\) (\(n\) odd). Here is the argument.

Lemma 3.17

If two choices \(i\mapsto \sigma (i)\) and \(i\mapsto \sigma (i)+n\) both result in minimal length elements, and the permutation is balanced with \(i\mapsto \sigma (i)\), then \(i-\sigma (i)=\frac{1}{2}(n-1)\).

Proof

As in the proof of Theorem 3.16, if \(i\mapsto \sigma (i)\) gives a minimal length element then the number of crossings of a line \(i\mapsto \sigma (i)+n\) is \(\sigma (i)+n-i-1\). Similarly, if \(i\mapsto \sigma (i)+n\) also gives a minimal length element then the number of crossings of a line \(i\mapsto \sigma (i)\) is \(i-\sigma (i)\). Since both choices have the same total number of crossings and all else remains fixed, these lines must have the same number of crossings, namely \(\sigma (i)+n-i-1=i-\sigma (i)\). The lemma follows.\(\square \)

Note that in the above lemma, and as mentioned prior to it, the permutation is balanced with \(i\mapsto \sigma (i)\) but not balanced with \(i\mapsto \sigma (i)+n\). So the distances when balanced are not the same thing as the number of crossings. The above scenario arises when \(n\) is odd, so for a given frame of reference there is only one choice: the distances \(i\) to \(\sigma (i)\) and \(i\) to \(\sigma (i)+n\) are different.

The results in this section show that, for each frame of reference, a minimal representative may be found by choosing shortest distances for each image. Taking the shortest representative over all frames of reference will yield the minimal number of inversions required for the given circular permutation.

4 Implementation and application

In this section we explain how the results may be implemented algorithmically to compute the inversion length in the two-region inversion model, and then apply the method to some published Yersinia pestis genomes.

4.1 Computational implementation

The method arising from these results breaks into three natural algorithmic components:

  1. 1.

    minimizing paths for the lifting process,

  2. 2.

    calculating the length for an affine permutation, and

  3. 3.

    sorting a circular permutation (finding an explicit sequence of inversions).

For lifting a circular permutation into an extended affine permutation we have to make a decision which way to route each path, i.e. choosing minimal-distance images for each \(i\in \mathbf n \). A straight-forward method checks for each \(i\in \mathbf n \) whether the image in the previous or in the next window has a shorter distance or not (Algorithm 1).

For calculating the length of an affine permutation we simply count the number of crossings. The sum in Eq. (3) can be translated into a for loop easily. Then for calculating the circular length we have to go through all frames of reference to find the minimal length (Algorithm 2).

Additionally, we can sort the permutation, producing a geodesic. The sorting algorithm (Algorithm 3) operates by comparing consecutive pairs and swaps them if needed, hence doing uncrossings. This algorithm always chooses the lowest index pair to be swapped and thus produces a single geodesic. However, a systematic exploration of all possible swaps (e.g. a backtrack algorithm) can enumerate geodesics.

Since we combine quadratic and linear algorithms, the overall sorting algorithm is also polynomial. In particular we can easily deal with real-world genome data with approximately 80 regions. The algorithms in this subsection were implemented using the GAP (GAP 2012) computer algebra system and the source code is available upon request.

figure a
figure b

4.2 Application to Yersinia genomes

We apply the method summarised in Algorithm 2 to calculate inversion-based distances among eight Yersinia genomes. The input data are in the form of a set of permutations of regions that are conserved across all genomes. We obtained these permutations from Darling et al. (2008) by using the Mauve software package (Darling et al. 2010). The resulting matrix of minimal inversion distances is given in Table 1. This matrix of distances can be used to generate a phylogenetic tree using distance-based methods such as neighbour-joining (Saitou and Nei 1987). We applied this method using the phylip package (Felsenstein 1989). The two genomes of Yersinia pseudotuberculosis can be used as outgroups, as done for example by Bos et al. (2011). The resulting phylogeny is shown in Fig. 6.

Fig. 6
figure 6

Phylogeny from data published in Darling et al. (2008), based on distances obtained by applying Algorithm 2

Table 1 Matrix of minimal inversion distances among Yersinia genomes calculated by Algorithm 2

This evolutionary reconstruction can be compared to the results of Darling et al. (2008), who also used inversion information. While Darling et al. used a network visualisation of the relationships among the genomes, it is possible to see the similarities with our phylogeny. Namely, Y. pestis Pestoides and Y. pestis Microtus 91001 join near the root, the remaining Y. pestis isolates group together, and the two Yersinia pseudotuberculosis outgroups also group together. Another point of comparison is the phylogeny based on 1,694 variable positions across the whole genome  (Bos et al. 2011). Again, Pestoides and Microtus 91001 diverged early while the remaining genomes evolved more recently. Bos et al. distinguish two clades that arose since the Black Death: one with Nepal516 and KIM and the other with CO92 and Antiqua. Although there are slight differences within these recent clades, our methods produced a tree that is broadly consistent with the tree of Bos et al. (2011) which uses sequence variation—a completely different source of information.

figure c

5 A general modelling framework

We have studied a model in which only certain inversions are permitted, specifically those of two adjacent regions. As remarked in Sect. 2.1, this is but one example of an inversion system, in which the set of inversions \(\mathcal I \) is constrained in some way. In this setting, we define a metric \(\ell \) on the group relative to \(\mathcal I \) and according to parsimony, so that \(\ell (g)\) is the word length of \(g\) in the generators \(\mathcal I \). Then the distance between genomes \(G_1\) and \(G_2\), represented by group elements \(g_1\) and \(g_2\) in the model \((G,\mathcal I )\) with the metric \(\ell \) is simply \(\ell (g_1^{-1}g_2)\). The model in which all inversions are permitted, the uniform model, simply removes constraints on \(\mathcal I \) completely.

A more realistic model than either the uniform or a constrained model is one in which inversions may be given different weights according to experimental data, such as the frequencies of different inversion lengths. Estimates of these frequencies have been obtained for instance in Darling et al. (2008) (see Fig. 7 of that paper). In the group-theoretic setting, we have the same group and the same generators as in the uniform model, since all inversions are permitted. The variation in the frequencies of different inversions is accounted for by manipulating the length function in the group, for instance by assigning weights to inversions depending on the number of regions involved (Swidan et al. 2004). In the uniform and constrained models, and in most combinatorial group theory, each generator is defined to have length 1, and the length of a product of generators is the sum of the lengths. However, length is used as a proxy for evolutionary distance, and if inversions are not equally probable then their length should be different. In the light of the parsimony assumption that the most likely evolutionary path is one of minimal distance, the weighting (or length) of a single inversion needs to be adjusted to account for the difference in probability.

Fig. 7
figure 7

Different biological models. The widely used uniform model and any constrained model are special cases of the general model, in which all inversions are assigned a weight value. The weights can be interpreted as probabilities the inversions to occur

For example, let \(\mathcal I \) be the set of all possible inversions, and suppose \(\omega :\mathcal I \rightarrow \mathbb R ^{\ge 0}\) is a weight function assigned to the inversions. If we assume short inversions are more probable than longer inversions, we may have a weight function that is order-preserving with respect to inversion length (if \(s\) is longer than \(t\) then \(\omega (s)>\omega (t)\)). Then our metric on the group may be defined by first defining length additively on any word in the generators, setting \(\ell (s_{i_1}\dots s_{i_k})=\omega (s_{i_1})+\dots +\omega (s_{i_k})\). Then to define the length of a group element \(g\in G\) one needs to take the minimum over all words \(w\) in the generators \(\mathcal I \) representing \(g\): \(\ell (g)=\min \{\ell (w)\mid w=g\}\). This establishes the minimal weight path in the Cayley graph from the identity arrangement to \(g\). In practice this is a significant problem, however, as there are infinitely many words representing \(g\). Even eliminating paths that double back on themselves the search space is potentially very large. Applying this model in generality will require some clever new ideas or a statistical approach [some computational approaches have been taken in Pinter and Skiena (2002), Swidan et al. (2004)].

While the general version of this model seems difficult to work with, special cases are clearly not intractable, as we have shown in this paper. The two-region inversion model we study simply employs a special weight function in which \(\omega (s)=1\) if \(s\) is an inversion of two regions and 0 otherwise. Indeed, any model that restricts the set of inversions \(\mathcal I \) but treats them all as equivalent follows a similar pattern, setting \(\omega (s)=1\) if \(s\in \mathcal I \) and 0 otherwise. This also applies to models in which inversions are not restricted according to length but by location, such as models only permitting inversions that do not move the terminus of replication, or even that are symmetric about the terminus (Ohlebusch et al. 2005). Similarly the uniform model generally studied has the even simpler weight function \(\omega (s)=1\) for all inversions \(s\).

6 Discussion

In this article, we have introduced a group theoretic framework for modelling the process of bacterial genome rearrangement due to inversions. Using this framework, we outlined a range of alternative models. We focused on a specific model in which inversions act locally on two genomic regions at a time. Based on this model, the group theoretic framework has enabled us to derive a new algorithm to obtain the minimum number of inversion events connecting two genomes under comparison. The key conceptual step has been to find a way to lift circular permutations to the affine symmetric group in such a way that the inversion distance on the circular genome can be found using results on length of elements in the affine symmetric group.

The combinatorial group theory of permutation groups has a long history of development and therefore presents a potent opportunity to examine in a new light the processes underlying bacterial genome evolution. There is potential to introduce more realism into models of genome evolution by generalising the model studied here within the group theoretic framework. This represents an important advance over existing methods of comparative genomics based on fairly coarse assumptions, most particularly the assumption that all inversion events are equally probable. In addition, while the questions in group theory that arise in the algebraic models in this paper are new, they are related to other questions under ongoing consideration by mathematicians. It is to be hoped that this connection between evolutionary biology and algebra will drive further theoretical development of related group theory.

Our approach extends preceding studies that applied group theory to genome evolution. While the innovative study by Watterson et al. (1982) described the inversion distance problem, its interpretation as a problem in group theory was first noted by Kececioglu and Sankoff (1993) a decade later (and followed by Meidanis and Dias (2000)). These models, along with most other approaches to the problem, assume a uniform distribution of inversion lengths; something we have addressed in this paper by allowing only inversions of two adjacent regions. A wider issue is that of whether the minimal length is the best measure of evolutionary distance at all, given evolution may not have taken a shortest path to the observed arrangements, regardless of the metric used to define minimal. Recently, Moulton and Steel (2012) pursued this challenge to parsimony, using group theoretic principles to consider the effect on length of a small change to an inversion sequence, and obtained “worst-case” bounds on the difference between lengths of elements when an additional generator is used. In general, it is clear that the application of group-theoretic methods to genomics problems holds great promise for a fertile exchange between algebraists and evolutionary biologists.