1 Introduction

Evolutionary theory searches for the general principles and patterns of evolution. This field typically progresses by analyzing models of evolution, such as the Wright-Fisher process. In these models, the fundamental mechanisms of reproduction, competition, and heritable variation are abstracted and simplified, so that the evolutionary process can be studied through mathematical analysis and simulation. Analysis of these models has yielded great insight. However, this approach is limited by the fact that results from one model are not directly transferrable to others. General principles of evolution are revealed only through comparisons across many models. In some cases, intense work is needed to distinguish robust patterns from artifacts of particular models.

This limitation can be overcome through a more general approach, in which the objects of study are not individual models but classes of models. A class of models is defined by its foundational assumptions. These assumptions place limits on how the biological events that drive evolution (birth, death, mutation, etc.) operate in this class.

The goal of this class-based approach is not specific predictions, but general theorems that apply to all models within the class. Through this approach, broad evolutionary principles can be discovered and proven in a single argument. A number of recent studies have planted the seeds of such an approach (Metz and de Roos 1992; Champagnat et al. 2006; Diekmann et al. 2007; Durinx et al. 2008, Rice 2008; Simon 2008; Nathanson et al. 2009; Rice and Papadopoulos 2009; Tarnita et al. 2009b; Nowak et al. 2010a; Nowak et al. 2010b; Tarnita et al. 2011).

Here we introduce and investigate a particular class of evolutionary models. In this class, reproduction is asexual, and population size and spatial structure are fixed. Evolution proceeds as a Markov chain. Each transition corresponds to the replacement of some individuals by the offspring of others. The probabilities of transition depend on the current population state; however, we leave the nature of this dependence unspecified for the sake of generality.

This class encompasses many well-known evolutionary models. In particular, it encompasses a variety of evolutionary processes on graphs, including voter models (Holley and Liggett 1975; Cox 1989; Cox et al. 2000; Sood and Redner 2005; Sood et al. 2008) invasion processes (Sood et al. 2008), and evolutionary game dynamics (Hauert and Doebeli 2004; Lieberman et al. 2005; Santos and Pacheco 2005; Ohtsuki et al. 2006; Taylor et al. 2007a; Szabó and Fáth 2007; Santos et al. 2008; Szolnoki et al. 2008; Roca et al. 2009; Broom et al. 2010; Allen et al. 2012; Shakarian et al. 2012). This class also includes standard models of well-mixed populations, such as the Wright-Fisher model (Fisher 1930), the Moran (1958) model, and the Cannings (1974) exchangeable model, along with frequency-dependent extensions of these (Nowak et al. 2004; Taylor et al. 2004; Imhof and Nowak 2006; Lessard and Ladret 2007; Traulsen et al. 2007).

We focus on two varieties of results. The first concerns the asymptotic properties of the evolutionary process—that is, its behavior over long periods of time. With mutation, we show that the evolutionary Markov chain is ergodic, and therefore its time-averaged behavior converges to a mutation-selection stationary distribution. Without mutation, one of the competing types inevitably fixes. Linking these two cases is the limit of rare mutation, in which long periods of fixation are punctuated by sporadic episodes of competition. To analyze this limit, we introduce a new probability distribution, the rare-mutation dimorphic distribution , which characterizes the likelihood of states to arise during these competition episodes.

Second, we ask how one might quantify success in evolutionary competition. Reasonable choices include fixation probability, fitness (survival probability plus expected offspring number), and time-averaged frequency. We provide a new definition of fixation probability (the probability that a type, starting with a single individual, will eventually dominate the population), taking into account the various ways a mutation could arise. We then compare these measures of success. We show that success conditions based on fixation probability, fitness, and average frequency coincide in the low-mutation limit. However, the relationships between these measures become more intricate with nonzero mutation because, even if mutation is symmetric, differences in birth rate can induce mutational bias toward one type.

As part of our comparison of success measures, we derive stochastic versions of the Price equation (Price 1970, 1972; van Veelen 2005) and the replicator equation (Taylor and Jonker 1978; Hofbauer and Sigmund 1998, 2003). Unlike the traditional Price equation and replicator equation, which describe deterministic evolutionary change from a given state, our versions of these equations are based on expectations over the entire evolutionary process, using probability distributions associated to the evolutionary Markov chain.

We begin in Sect. 2 by introducing two illustrative examples of models to which our results apply. We then, in Sect. 3, introduce our fundamental definitions and axioms, first informally and then rigorously. Section 4 derives results on the asymptotic behavior of the evolutionary process. Section 5 defines measures of evolutionary success and proves relationships among them. We provide a concise summary of our results in Sect. 6.

2 Two example models

To motivate and provide intuition for our approach, we first introduce two established evolutionary models encompassed by our formalism: the frequency-dependent Moran process (Nowak et al. 2004; Taylor et al. 2004) with mutation, and the birth-death process on graphs (Lieberman et al. 2005). We will revisit these examples throughout the text, to illustrate the applications of our definitions and results in the context of these models.

2.1 Frequency-dependent Moran process with mutation

The Moran process is a simple model of evolutionary competition between two types in a finite well-mixed population. It was originally formulated (Moran 1958) in the case of constant selection, but later extended by Nowak et al. (2004) and Taylor et al. (2004) to incorporate game-theoretic interactions and other forms of frequency dependence.

This model describes a well-mixed population of constant size \(N\). Within this population there are two types, which we label 0 and 1. An individual’s reproductive rate depends on the individual’s type as well as the current frequencies of the two types. We denote the reproductive rates of type 0 and type 1 individuals, respectively, by \(f_0(x_1)\) and \(f_1(x_1)\), where \(x_1\) is the current frequency of type 1. In general, \(f_0\) and \(f_1\) may be arbitrary nonnegative functions. Much attention, however, is focused on the case that reproductive rate is equal to the payoff obtained from interacting with the whole population according to some \(2\times 2\) matrix game

$$\begin{aligned} \begin{pmatrix} a_{00}&\quad a_{01} \\ a_{10}&\quad a_{11} \end{pmatrix}. \end{aligned}$$

Above, \(a_{XY}\) denotes the payoff to a type \(X\) individual interacting with type \(Y\), for \(X,Y \in \{0,1\}\). In this case, \(f_0\) and \(f_1\) are given by the linear functions

$$\begin{aligned} f_0(x) = a_{01} x + a_{00} (1-x), \quad f_1(x) = a_{11} x + a_{10} (1-x). \end{aligned}$$

(These expressions describe the case where self-interaction is included in the model; otherwise they become slightly more complicated.)

The Moran process proceeds as a Markov chain. At each time step, one individual is randomly selected to die, with uniform probability \(1/N\) per individual. Independently, one individual is selected, with probability proportional to reproductive rate, to produce an offspring. The new offspring inherits the type of the parent with probability \(1-u\), where \(u \in [0,1]\) is the mutation rate; otherwise the offspring’s type is determined at random with equal probability for the two types.

2.2 Birth–death with constant selection on graphs

Evolutionary graph theory (Lieberman et al. 2005; Ohtsuki et al. 2006; Taylor et al. 2007a; Szabó and Fáth 2007) is a framework for studying spatial evolution. Spatial structure is represented by a graph with \(N\) vertices and edge weights \(e_{ij}\), for \(i,j \in \{1, \ldots , N\}\), satisfying \(e_{ij}>0\) and \(\sum _j e_{ij} = 1\). Individuals occupy vertices of the graph, and replacement occurs along edges.

The basic model introduced by Lieberman et al. (2005) can be described as birth–death (BD; see Ohtsuki et al. 2006) with constant selection. In this model, each of two competing types is defined by a constant reproductive rate. We label the types 0 and 1, and assign them reproductive rates 1 and \(r>0\), respectively. Evolution again proceeds as a Markov chain. At each time step, one individual is chosen, proportionally to reproductive rate, to reproduce. Offspring of parent \(i\) replace individual \(j\) with probability \(e_{ij}\). Offspring inherit the type of the parent.

We focus in particular on the example of star-shaped graphs (Fig. 1). Star graphs are noteworthy in that they amplify the force of selection relative to drift (Lieberman et al. 2005).

Fig. 1
figure 1

The star graph, shown for population size \(N=9\). The center is indexed \(i=1\), and the \(N-1\) leaves are indexed \(i=2, \ldots , N\). Edge weights are given by \(e_{1j}=1/(N-1)\) and \(e_{j1} = 1\) for \(j=2, \ldots , N\); all other \(e_{ij}\) are zero. In the birth–death (BD) process, at each time step, an individual is randomly chosen to reproduce, with probability proportional to reproductive rate (1 for type 0, \(r\) for type 1). If the center individual reproduces, the offspring replaces a random leaf individual, chosen with uniform probability. If a leaf individual reproduces, the offspring replaces the center individual

The simple model described above can be generalized to incorporate local frequency-dependent interactions (i.e. games; Ohtsuki et al. 2006), other schemes for determining births and deaths (update rules; Ohtsuki et al. 2006), and nonzero mutation rates (Allen et al. 2012). All these generalizations fall within the class of models presented here.

2.3 Remarks on examples

The two above examples illustrate important general features of the class of models presented here. For example, in both models, evolution proceeds as a Markov chain, with birth and death probabilities depending on the current state. However, they are also quite special in several ways. For example, they both have the feature that exactly one birth and death occurs per time step. Also, in both examples, the population structure can naturally be represented as a graph (a complete graph, in the case of the Moran process). In contrast, our class of models allows for any number of offspring to be produced per time step, and includes models for which there is no obvious graph representation of the population structure.

3 Mathematical framework

The aim of this work is to capture the general properties of these and other models, without specifying any particular representation of population structure or interactions. Here we present the class of models under consideration. We begin in Sect. 3.1 with a verbal description of our framework and notation. We then formally state our basic definitions in Sect. 3.2 and assumptions in Sect. 3.3. The evolutionary Markov chain is defined in Sect. 3.4.

3.1 Verbal description

Since the formal definition of our class of models requires some specialized notation, we begin with a verbal description, along with illustrations using the evolutionary models introduced above.

In the class of models we consider, population size and structure (spatial structure, social structure, etc.) are fixed. Each model in this class has a fixed number \(N \ge 2\) of positions. Each position is always occupied by a single individual. Individuals do not change positions—they remain in position until replaced by new offspring (as described below). The positions are indexed \(i = 1, \ldots , N\). We will sometimes write “individual \(i\)” as a shorthand for “the current occupant of position \(i\)”.

There are two competing types, labeled 0 and 1. (The case of more than two types will be considered in future work). Individuals are distinguished only by their type and position. Consequently, the state of the evolutionary system is fully characterized by specifying which type occupies each position. We therefore represent the state of the system by the binary vector \(\mathbf{s}=(s_1, \ldots , s_N)\), where \(s_i\) denotes the type occupying position \(i\).

Evolution proceeds by replacement events. In each replacement event, the occupants of some positions are replaced by the offspring of others. We let \(R\) denote the set of positions that are replaced. For each replaced position \(j \in R\), the parent of the new occupant of \(j\) is denoted \(\alpha (j) \in \{1, \ldots , N\}\). In this notation, \(\alpha \) is a set mapping from \(R\) to \(\{1, \ldots , N\}\). Together, the pair \((R,\alpha )\) contains all the information necessary to specify a replacement event. Figure 2 illustrates a replacement event, along with mutation (see below).

Fig. 2
figure 2

Illustration of an evolutionary transition. The replacement event is represented by the pair \((R, \alpha )\). The set \(R=\{2,4,5\}\) indicates the positions that are replaced, and the mapping \(\alpha \) indicates the parent of the new offspring filling each replaced position. Thus the new occupant of position 2 is the offspring of 1, \(\alpha (2)=1\), while the new occupants of 4 and 5 are the offspring of 2, \(\alpha (4)=\alpha (5)=2\). The offspring in positions 2 and 4 inherit their parents’ types, while the offspring in position 5 is a mutant

The probability of each replacement event \((R,\alpha )\) depends on the current state \(\mathbf{s}\). We denote this probability by \(p_\mathbf{s}(R, \alpha )\). Thus in each state \(\mathbf{s}\) there is a probability distribution \(\{ p_\mathbf{s}(R, \alpha ) \}_{(R, \alpha )}\) over the set of possible replacement events. We call the mapping from the state \(\mathbf{s}\) to the probability distribution \(\{ p_\mathbf{s}(R, \alpha ) \}_{(R, \alpha )}\) the “replacement rule”, which we represent with the symbol \(\mathfrak{R }\).

This abstract notion of a replacement rule allows our framework to encompass a wide variety of evolutionary models. The replacement rule implicitly represents many processes that would be explicitly represented in the context of particular models. For example, the replacement rule for the frequency-dependent Moran process reflects the reproductive rate functions \(f_1(x)\) and \(f_0(x)\), while the replacement rule for the BD process on graphs reflects the graph structure. In the general case, the replacement rule is simply the mapping

Type   of    the  occupant   of   each   position \(\longrightarrow \) Probabilities   of   births   and   deaths.

Any intermediate processes or structural features affecting this mapping are left unspecified. Our framework is therefore not tied to any particular way of representing population structure or interactions.

Example: Moran process The replacement rule \(\mathfrak{R }\) for the frequency-dependent Moran process has the property that \(p_\mathbf{s}(R,\alpha )=0\) unless \(R\) has exactly one element. (This expresses the fact that exactly one individual is replaced per time step in this process.) Using this fact, we can express the replacement rule as

$$\begin{aligned} p_\mathbf{s}\big ( \{i\}, \alpha \big ) = \frac{1}{N} \frac{f_{s_{\alpha (i)}} \big (x_1(\mathbf{s}) \big )}{\sum _{j=1}^N f_{s_j} \big (x_1(\mathbf{s}) \big )}. \end{aligned}$$

The first factor on the right-hand side, \(1/N\), represents the probability that \(i\) is replaced, while the second factor represents the probability that \(\alpha (i)\) is chosen to reproduce—that is, the reproductive rate of \(\alpha (i)\) divided by the total reproductive rate.

Example: BD on graphs The replacement rule \(\mathfrak{R }\) for the BD process on graphs also has the property that \(p_\mathbf{s}(R,\alpha )=0\) unless \(R\) has exactly one element. The replacement rule can be expressed as

$$\begin{aligned} p_\mathbf{s}\big ( \{i\}, \alpha \big ) = \frac{1 + (r-1)s_{\alpha (i)}}{\sum _{j=1}^N (1+ (r-1)s_j)} \; e_{\alpha (i) \, i}. \end{aligned}$$

Note that \(1+ (r-1)s_j\) is precisely the reproductive rate of individual \(j\). The first factor on the right-hand side is the probability that \(\alpha (i)\) is chosen to reproduce—the reproductive rate of \(\alpha (i)\) divided by the total reproductive rate. The second factor, \(e_{\alpha (i) \, i}\), is the probability that \(i\) is replaced given that \(\alpha (i)\) reproduces.

Each new offspring born during a replacement event has probability \(u\) of being born with a mutation. If there is no mutation, the offspring inherits the type of the parent. If a mutation occurs, we use the convention that the mutant offspring has a 50 % chance of being of either type (0 or 1).

Overall, evolution is described by a stochastic process called the evolutionary Markov chain. In each step of this process, a replacement event \((R, \alpha )\) is chosen, with probability depending on the current state \(\mathbf{s}\) as according to the replacement rule \(\mathfrak{R }\). Possible mutations are then resolved, resulting in a new state \(\mathbf{s}^{\prime }\). This process repeats indefinitely.

Our analysis will focus on the long-term behavior of the evolutionary Markov chain, with particular emphasis on the question of which of the two types, 0 or 1, is favored by evolution.

3.2 Definitions

Definition 1

The state of the evolutionary process is the binary vector \(\mathbf{s}=(s_1, \ldots , s_N) \in \{0,1\}^N\) indicating the type of the occupant of each position.

Definition 2

A replacement event is a pair \((R, \alpha )\), where

  • \(R \subset \{1, \ldots , N\}\) is the set of replaced positions, and

  • \(\alpha : R \rightarrow \{1, \ldots , N\}\) is a map indicating the parent of each new offspring.

Definition 3

For a given replacement event \((R, \alpha )\), the set of offspring of individual \(i\) is the preimage \(\alpha ^{-1}(i)\), i.e. the set of all \(j \in \{1, \ldots , N\}\) with \(\alpha (j)=i\). The offspring number of \(i\) is the cardinality of this preimage, which we denote \(|\alpha ^{-1}(i)|\).

Definition 4

A replacement rule \(\mathfrak{R }\) assigns to each state \(\mathbf{s}\in \{0,1\}^N\) a probability distribution \(\{p_\mathbf{s}\, (R,\alpha )\}_{(R, \alpha )}\) on the set of possible replacement events.

Definition 5

An evolutionary process \(\mathcal{E }\) is defined by the following data:

  • the population size \(N \ge 2\),

  • the replacement rule \(\mathfrak{R }\),

  • the mutation rate \(u\), \(0 \le u \le 1\).

Definition 6

From a given replacement rule \(\mathfrak{R }\), we can define the following quantities as functions of the state \(\mathbf{s}\):

  • The expected offspring number of individual \(i\):

    $$\begin{aligned} b_i (\mathbf{s}) = E \big [ |\alpha ^{-1}(i)| \big ] = \sum _{(R, \alpha )} p_\mathbf{s}\, (R,\alpha ) \; |\alpha ^{-1}(i)|, \end{aligned}$$
  • The death probability of \(i\):

    $$\begin{aligned} d_i (\mathbf{s}) = \Pr [i \in R] = \sum _{\begin{matrix} (R, \alpha )\\ i \in R \end{matrix}} p_\mathbf{s}\, (R,\alpha ), \end{aligned}$$
  • The fitness of \(i\):

    $$\begin{aligned} w_i (\mathbf{s}) = 1 + b_i(\mathbf{s}) - d_i (\mathbf{s}). \end{aligned}$$
  • The frequencies of types 0 and 1, respectively:

    $$\begin{aligned} x_0(\mathbf{s}) = \frac{1}{N} \sum _{i=1}^N (1-s_i), \quad x_1(\mathbf{s}) = \frac{1}{N} \sum _{i=1}^N s_i. \end{aligned}$$

Example: Moran process For the frequency-dependent Moran process, we have

$$\begin{aligned} b_i(\mathbf{s}) = \frac{f_{s_i}\big (x_1(\mathbf{s}) \big )}{\sum _{j=1}^N f_{s_j} \big (x_1(\mathbf{s}) \big )}, \quad d_i(\mathbf{s}) = \frac{1}{N}, \end{aligned}$$
(3.1)

for each \(\mathbf{s}\in \{0,1\}^N\) and each \(i \in \{1, \ldots , N\}\).

Example: BD on graphs For the birth-death process on a graph with edge weights \(\{e_{ij}\}\), we have

$$\begin{aligned} b_i(\mathbf{s}) = \frac{1 + (r-1)s_i}{\sum _{j=1}^N (1+ (r-1)s_j)}, \quad d_i(\mathbf{s}) = \sum _{k=1}^N \frac{(1 + (r-1)s_k)}{\sum _{j=1}^N (1+ (r-1)s_j)} e_{ki}. \end{aligned}$$

3.3 Assumptions

We will assume the following restrictions on the replacement rule \(\mathfrak{R }\). First, as we will later see, it is mathematically convenient to assume a constant overall birth rate across states:

Assumption 1

The total expected offspring number \(\sum _{i=1}^N b_i(\mathbf{s})\) has a constant value \(B\) over all \(\mathbf{s}\in \{0,1\}^N\).

Second, for each position \(i\), it should be possible for \(i\)’s descendants to eventually dominate the population:

Assumption 2

For each individual \(i\), there is a positive integer \(n\) and a finite sequence \(\{ (R_k, \alpha _k) \}_{k=1}^n\) of replacement events such that

  • \(p_\mathbf{s}\, (R_k, \alpha _k)>0\) for all \(k\) and all \(\mathbf{s}\in \{0,1\}^N\), and

  • For all individuals \(j \in \{1, \ldots , N\}\),

    $$\begin{aligned} \alpha _{k_1} \circ \alpha _{k_2} \circ \cdots \circ \alpha _{k_m} (j) = i, \end{aligned}$$

    where \(1 \le k_1 < k_2 < \cdots < k_m \le n\) are the indices \(k\) for which \(j \in R_k\).

In other words, there is some sequence of possible replacement events leading to the result that the entire population is descended from a single individual in position \(i\).

The first assumption is for the sake of mathematical convenience. It could be relaxed to yield a more general framework, at the cost of increasing the complexity of a number of our results. The second assumption, however, is fundamental: it guarantees the unity of the population. Without this second assumption, the population could in theory consist of separate sub-populations with no gene flow between them, or with gene flow only in one direction.

It is straightforward to verify that our two examples, the frequency-dependent Moran process and the birth–death process on graphs, both satisfy these assumptions.

3.4 The evolutionary Markov chain

To every evolutionary process \(\mathcal{E }\), there is an associated evolutionary Markov chain \(\mathcal M _\mathcal{E }\). The state space of \(\mathcal M _\mathcal{E }\) is the set \(\{0,1\}^N\) of possible state vectors s. From a given state s, the next state \(\mathbf {s}^\prime \) is determined by a two-step random process (replacement then mutation). First, a replacement event \((R, \alpha )\) is sampled from the distribution \(\{p_\mathbf{s}(R,\alpha )\}_{(R, \alpha )}\). Then for each \(i \in \{1, \ldots , N\}\),

  • If \(i \notin R\) then \(s_i^{\prime }=s_i\). (If \(i\) was not replaced, its type stays the same.)

  • If \(i \in R\) then \(s_i^{\prime }\) is assigned by the rule:

    $$\begin{aligned} s_i^{\prime }= \left\{ \begin{array}{l@{\quad }l} s_{\alpha (i)}&\mathrm{with\,probability}\,1-u,\\ 0&\mathrm{with\,probability} \, u/2,\\ 1&\mathrm{with\,probability} \, u/2. \end{array}\right. \end{aligned}$$

In either of the latter two subcases of the \(i \in R\) case, we say that a mutation has occurred.

We denote transition probabilities in \(\mathcal M _\mathcal{E }\) by \(p_{\mathbf{s}_1 \rightarrow \mathbf{s}_2}\). The probability of transition from \(\mathbf{s}_1\) to \(\mathbf{s}_2\) in exactly \(n\) steps is denoted \(p^{(n)}_{\mathbf{s}_1 \rightarrow \mathbf{s}_2}\).

Example: Moran process Since the \(N\) positions are interchangeable in the Moran process, the state space for the evolutionary Markov chain can be reduced from \(\{0,1\}^N\) to \(\{0, \ldots , N\}\), where the elements in the latter set correspond to the number \(m\) of type 1 individuals. The transition probabilities of the evolutionary Markov chain can then be written as

$$\begin{aligned} p_{m \rightarrow m+1}&= \frac{N-m}{N} \left( (1-u) \frac{m f_1\big (\tfrac{m}{N} \big )}{ m f_1\big (\tfrac{m}{N} \big ) + (N-m) f_0\big (\tfrac{m}{N} \big )} + \frac{u}{2} \right),\nonumber \\ p_{m \rightarrow m-1}&= \frac{n}{N} \left((1-u) \frac{ (N-m) f_0\big (\tfrac{m}{N} \big ) }{ m f_1\big (\tfrac{m}{N} \big ) + (N-m) f_0\big (\tfrac{m}{N} \big )} + \frac{u}{2} \right),\\ p_{m \rightarrow m}&= 1-p_{m \rightarrow m+1} - p_{m \rightarrow m-1}.\nonumber \end{aligned}$$
(3.2)

Example: BD on star graphs Since the leaves are interchangeable on the star graph, the state space for the evolutionary Markov chain can be reduced from \(\{0,1\}^N\) to \(\{0,1\} \times \{1, \ldots , N-1\}\). In this reduced state space, a state can be represented by the pair \((s_1, \ell )\), where \(s_1 \in \{0,1\}\) represents the type of the center, and \(\ell \) indicates the abundance of type 1 individuals among the leaves. Using this representation, the transition probabilities for the evolutionary Markov chain can be written

$$\begin{aligned} \begin{aligned} p_{(0,\ell ) \rightarrow (1, \ell )}&= \frac{\ell r}{N-\ell + \ell r},\\ p_{(0,\ell ) \rightarrow (0, \ell -1)}&= \frac{1}{N-\ell + \ell r} \frac{\ell }{N-1},\\ p_{(0,\ell ) \rightarrow (0, \ell )}&= 1 - p_{(0,\ell ) \rightarrow (1, \ell )} - p_{(0,\ell ) \rightarrow (0, \ell -1)},\\ p_{(1,\ell ) \rightarrow (0, \ell )}&= \frac{N - \ell - 1}{N-\ell -1 + (\ell +1) r},\\ p_{(1,\ell ) \rightarrow (1, \ell +1)}&= \frac{r}{N-\ell - 1 + (\ell +1) r} \frac{N-\ell -1}{N-1},\\ p_{(1,\ell ) \rightarrow (1, \ell )}&= 1- p_{(1,\ell ) \rightarrow (0, \ell )} - p_{(1,\ell ) \rightarrow (1, \ell +1)}. \end{aligned} \end{aligned}$$
(3.3)

4 Asymptotic behavior of the evolutionary Markov chain

We now examine the asymptotic properties of the evolutionary Markov chain \(\mathcal M _\mathcal{E }\) as \(t \rightarrow \infty \). We separate the cases \(u>0\) (mutation is present; Sect. 4.1) and \(u=0\) (no mutation; Sect. 4.2). We then, in Sect. 4.3, explore how these cases are linked in the limit of low mutation rate \((u \rightarrow 0)\).

4.1 Processes with mutation: ergodicity and the mutation-selection stationary distribution

In the case \(u>0\), we show that \(\mathcal M _\mathcal{E }\) is ergodic. As a consequence, its long-term behavior can be described by a stationary probability distribution over states. We call this the mutation-selection stationary distribution. This distribution is a stochastic analogue of the mutation-selection equilibrium in deterministic models.

Theorem 1

For \(u>0\), \(\mathcal M _\mathcal{E }\) is ergodic (aperiodic and positive recurrent).

Proof

To prove ergodicity, it suffices to show that there is a whole number \(m\) for which it is possible to transition between any two states in exactly \(m\) steps. We take \(m=N\) and show that \(p^{(N)}_{\mathbf{s}_1 \rightarrow \mathbf{s}_2}>0\) for any pair of states \(\mathbf{s}_1\) and \(\mathbf{s}_2\).

Assumption 2 trivially implies that, for any individual \(i\) there is at least one replacement event \((R, \alpha )\) such that \(i \in R\) and \(p_\mathbf{s}\, (R, \alpha )>0\) for all \(\mathbf{s}\in \{0,1\}^N\). So for each \(i\), fix a replacement event \((R_i, \alpha _i)\) with these properties. This gives a sequence \(\{ (R_i, \alpha _i) \}_{i=1}^N\) of replacement events such that (a) each replacement event has nonzero probability for all \(\mathbf{s}\in \mathbb{R }^N\), and (b) each position is replaced at least once over the course of the sequence.

We now assign mutations according to the following rule: For each \((R_i, \alpha _i)\), and each replaced position \(j \in R_i\), a mutation occurs so that the type of \(j\) becomes equal to \((\mathbf{s}_2)_j\). Since \(u>0\) and \(p_\mathbf{s}\, (R_i, \alpha _i)>0\) for all \(\mathbf{s}\in \{0,1\}^N\), the resulting state transition has nonzero probability.

We have now constructed a sequence of \(N\) state transitions, each with nonzero probability, in which each position \(i\) is replaced at least once. Since each replacement of \(i\) results in \(s_i = (\mathbf{s}_2)_i\), the final state after all \(m\) transitions is \(\mathbf{s}_2\). This completes the proof that \(\mathcal M _\mathcal{E }\) is ergodic. \(\square \)

Ergodicity implies that each evolutionary Markov chain with \(u>0\) has a well-defined stationary distribution. We call this the mutation-selection stationary distribution, and denote the probability of state \(\mathbf{s}\) in this distribution by \(\pi _\mathbf{s}\). The mutation-selection stationary distribution can be computed from the recurrence relation

$$\begin{aligned} \pi _\mathbf{s}= \sum _{\mathbf{s}^{\prime } \in \{0,1\}^N} \pi _{\mathbf{s}^{\prime }} p_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}}. \end{aligned}$$
(4.1)

We use the notation \(\langle \; \rangle \) to represent the expectation of a quantity over the mutation-selection stationary distribution of \(\mathcal M _\mathcal{E }\). For example, the expected fitness of position \(i\) is denoted \(\langle w_i \rangle \) and is given by

$$\begin{aligned} \langle w_i \rangle = \sum _{\mathbf{s}\in \{0,1\}^N} \pi _{\mathbf{s}} \, w_i (\mathbf{s}). \end{aligned}$$

The utility of the mutation-selection stationary distribution is made clear by the following consequence of the Markov chain ergodic theorem (e.g., Woess 2009): Let \(g(\mathbf{s})\) be any function of the state \(\mathbf{s}\), and let \(\mathbf{S}(t)\) denote the random variable representing the state of \(\mathcal M _\mathcal{E }\) after \(t\) time-steps, given the initial state \(\mathbf{S}(0)\). Then

$$\begin{aligned} \lim _{T \rightarrow \infty } \frac{1}{T} \sum _{t=0}^{T-1} g(\mathbf{S}(t)) = \langle g \rangle , \end{aligned}$$

almost surely. In words, \(\langle g \rangle \) is almost surely equal to time-average of \(g(\mathbf{S}(t))\), as the total time \(T\) goes to infinity. We also have, as a consequence of the Perron-Frobenius theorem, that

$$\begin{aligned} \lim _{t \rightarrow \infty } \mathrm E [g(\mathbf{S}(t))] = \langle g \rangle . \end{aligned}$$

It is therefore natural to characterize the long-term behavior of the evolutionary Markov chain using expectations over the mutation-selection stationary distribution. Mutation-selection stationary distributions have been used recently to analyze a number of particular models (Rousset and Ronce 2004; Taylor et al. 2007b; Antal et al. 2009a, b; Tarnita et al. 2009a) as well as classes of models (Tarnita et al. 2009b, 2011;  Nathanson et al. 2009).

Example: Moran process Mutation-selection stationary distributions for a Prisoner’s Dilemma game under the Moran process are illustrated in Fig. 3.

Fig. 3
figure 3

Mutation-selection stationary distributions for the Moran process, with frequency dependence described by the Prisoner’s Dilemma game with payoffs \(a_{00}=6\), \(a_{01} = 4\), \(a_{10} = 7\), \(a_{11} = 5\). Type 0 plays the role of the cooperator. The population size is \(N=10\). States are grouped together according to the number of type 1 individuals. The two panels correspond to mutation rates \(u=0.1\) and \(u=0.2\), respectively. Calculations are based on Eqs. 3.2 and 4.1. Lower mutation rates lead to increased probability concentration on the states \(\mathbf 0 \) and \(\mathbf 1 \) in which only one type is present, while higher mutation increases the probabilities of intermediate frequencies

4.2 Processes without mutation: absorbing behavior

In the case \(u=0\), the evolutionary Markov chain is not ergodic, but rather converges to one of two absorbing states, corresponding to fixation of the two competing types. We state this formally in the following theorem:

Theorem 2

For \(u=0\), \(\mathcal M _\mathcal{E }\) has exactly two absorbing states, \(\mathbf 0 \) and \(\mathbf 1 \). For any states \(\mathbf{s}, \mathbf{s}^{\prime }\) with \(\mathbf{s}^{\prime } \notin \left\{ \mathbf{0 }, \mathbf{1 }\right\} \), \(\lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }} = 0\).

Informally, the second claim of the theorem asserts that, from any initial state, the process will almost certainly become absorbed in \(\mathbf 0 \) or \(\mathbf 1 \) as the number of steps goes to infinity.

Proof

It is clear that \(\mathbf 0 \) and \(\mathbf 1 \) are indeed absorbing states. To prove the rest of the claim, consider an initial state \(\mathbf{s}_0\). It suffices to show that there is a nonzero probability of transitioning from \(\mathbf{s}_0\) to one of \(\mathbf 0 \) or \(\mathbf 1 \) in a finite number of steps. For then all states other than \(\mathbf 0 \) and \(\mathbf 1 \) are transient, and the probability of occupying a transient state at time \(t\) approaches 0 as \(t \rightarrow \infty \) (e.g., Koralov and Sinai 2007; Woess 2009).

Choose a position \(i\). By Assumption 2 there is a sequence \(\{ (R_k, \alpha _k) \}_{k=1}^n\) of replacement events such that

  • \(p_{\mathbf{s}} \, (R_k, \alpha _k)>0\) for all \(k\) and all \(\mathbf{s}\in \{0,1\}^N\), and

  • For all individuals \(j \in \{1, \ldots , N\}\),

    $$\begin{aligned} \alpha _1 \circ \alpha _2 \circ \cdots \circ \alpha _{n} (j) = i. \end{aligned}$$

Since mutation does not occur for \(u=0\), the sequence \(\{ (R_k, \alpha _k) \}_{k=1}^n\) unambiguously determines a sequence of state transitions, all of which have nonzero probability. After this sequence of state transitions, the type of each individual is \((\mathbf{s}_0)_i\) (again, since mutations are impossible). Thus the resulting state is either \(\mathbf 0 \) or \(\mathbf 1 \). This completes the proof. \(\square \)

4.3 The low-mutation limit and the rare-mutation dimorphic distribution

Having described the asymptotic behavior of the evolutionary Markov chain with and without mutation, we now connect these cases by investigating the limit \(u \rightarrow 0\). This limit describes the situation in which mutation is rare, and sporadic episodes of competition separate long periods in which the population is fixed for one type or the other. In particular, we introduce the rare-mutation dimorphic distribution , which characterizes the likelihood of states to arise during these competition episodes.

For some fixed population size \(N\) and replacement rule \(\mathfrak{R }\), we let \(\mathcal{E }_u\) denote the evolutionary process with mutation rate \(u \ge 0\). In this section we study the family of Markov chains \(\left\{ \mathcal{M }_{\mathcal{E }_u}\right\} _{u \ge 0}\) as a perturbation (Gyllenberg and Silvestrov 2008) of the mutation-free Markov chain \(\mathcal M _{\mathcal{E }_0}\). We denote by \(p_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }}(u)\) the transition probability from \(\mathbf{s}\) to \(\mathbf{s}^{\prime }\) in \(\mathcal M _{\mathcal{E }_u}\). In the case \(u>0\) we write \(\pi _\mathbf{s}(u)\) for the stationary probability of \(\mathbf{s}\) in \(\mathcal M _{\mathcal{E }_u}\).

The following lemma will be of great use in understanding the behavior of \(\mathcal M _{\mathcal{E }_u}\) as \(u \rightarrow 0\).

Lemma 1

For each \(\mathbf{s}\in \{0,1\}^N\), \(\pi _\mathbf{s}(u)\) is a rational function of \(u\).

Proof

For any finite ergodic Markov chain, the probabilities associated to each state in the stationary distribution are rational functions of the transition probabilities (e.g. Mihoc 1935; Iosifescu 1980). Since the transition probabilities \(p_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }}(u)\) of \(\mathcal M _{\mathcal{E }_u}\) are polynomial functions of \(u\), \(\pi _\mathbf{s}(u)\) is a rational function of \(u\) for each \(\mathbf{s}\). \(\square \)

This lemma allows us to consider the limit of the mutation-selection stationary distribution as \(u \rightarrow 0\). For each \(\mathbf{s}\in \{0,1\}^N\), we define \(\pi _\mathbf{s}(0) = \lim _{u \rightarrow 0} \pi _\mathbf{s}(u)\); this limit exists since \(\pi _\mathbf{s}(u)\) is a bounded rational function of \(u \in (0,1]\). By taking the limit \(u \rightarrow 0\) in the recurrence relation Eq. 4.1, we obtain that \(\{\pi _\mathbf{s}(0)\}_{\mathbf{s}\in \{0,1\}^N}\) solves Eq. 4.1 as well. Thus \(\{\pi _\mathbf{s}(0)\}_\mathbf{s}\) is a stationary distribution of \(\mathcal M _{\mathcal{E }_0}\). Since \(\mathcal M _{\mathcal{E }_0}\) is absorbing (Theorem 2), it follows that \(\{\pi _\mathbf{s}(0)\}_\mathbf{s}\) is concentrated entirely on the absorbing states \(\mathbf 0 \) and \(\mathbf 1 \); that is, \(\pi _\mathbf{s}(0)=0\) for \(\mathbf{s}\notin \left\{ \mathbf{0 },\mathbf{1 }\right\} \). This formalizes the above remark that, in the low-mutation limit, the evolutionary process is almost always fixed for one type or the other. The frequencies with which types 0 and 1 are fixed are given by \(\pi _\mathbf 0 (0)\) and \(\pi _\mathbf 1 (0)\), respectively.

We note that, in contrast to the \(u>0\) case, \(\{\pi _\mathbf{s}(0)\}_\mathbf{s}\) is not unique as a stationary distribution of the evolutionary Markov chain \(\mathcal M _{\mathcal{E }_0}\). Indeed, any distribution concentrated on the absorbing states \(\mathbf 0 \) and \(\mathbf 1 \) is a stationary distribution of this Markov chain.

4.3.1 The rare-mutation dimorphic distribution

Though, under rare mutation, the population is most often fixed for one type or the other, the states that are most interesting from an evolutionary perspective are those that arise during episodes of competition between the two types. Here we introduce the rare-mutation dimorphic distribution , which characterizes the relative likelihoods of these states in the low-mutation limit.

To define this distribution, we first consider the conditional mutation-selection stationary distribution of \(\mathcal M _{\mathcal{E }_u}\) for \(u>0\), given that both types are present in the population. The probability \(\pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u)\) of a state \(\mathbf{s}\in \{0,1\}^N \setminus \left\{ \mathbf{0 },\mathbf{1 }\right\} \) in this conditional distribution is given by

$$\begin{aligned} \pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u) = \frac{\pi _\mathbf{s}(u)}{1-\pi _\mathbf{0 }(u)-\pi _\mathbf{1 }(u)}. \end{aligned}$$
(4.2)

The rare-mutation dimorphic distribution  \(\{ \pi _\mathbf{s}^* \}_{\mathbf{s}\in \{0,1\}^N \setminus \left\{ \mathbf{0 },\mathbf{1 }\right\} }\) is then obtained as the limit of this conditional distribution as \(u \rightarrow 0\):

$$\begin{aligned} \pi _\mathbf{s}^* = \lim _{u \rightarrow 0} \pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u) = \lim _{u \rightarrow 0} \frac{\pi _\mathbf{s}(u)}{1-\pi _\mathbf{0 }(u)-\pi _\mathbf{1 }(u)}. \end{aligned}$$
(4.3)

In short, the rare-mutation dimorphic distribution  is the limit as \(u \rightarrow 0\) of the mutation-selection stationary distribution of \(\mathcal M _{\mathcal{E }_u}\), conditioned on both types being present. The following lemma ensures that the rare-mutation dimorphic distribution  is well-defined.

Lemma 2

For all states \(\mathbf{s}\notin \left\{ \mathbf{0 }, \mathbf{1 }\right\} \), the limit

$$\begin{aligned} \lim _{u \rightarrow 0} \frac{\pi _\mathbf{s}(u)}{1-\pi _\mathbf{0 }(u)-\pi _\mathbf{1 }(u)} \end{aligned}$$
(4.4)

exists.

Proof

Since \(\pi _\mathbf{s}(u)\), \(\pi _\mathbf{0 }(u)\), and \(\pi _\mathbf{1 }(u)\) are all rational functions of \(u\), the conditional probability

$$\begin{aligned} \pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u) = \frac{\pi _\mathbf{s}(u)}{1-\pi _\mathbf{0 }(u)-\pi _\mathbf{1 }(u)} \end{aligned}$$

is a rational function of \(u\) as well. Since \(0 \le \pi _\mathbf{s}(u) \le 1-\pi _\mathbf{0 }(u)-\pi _\mathbf{1 }(u)\), we have \(0 \le \pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u) \le 1\) for all \(u>0\). A rational function which is bounded on an open subset \(U \subset \mathbb{R }\) must extend continuously to the closure \(\bar{U}\) of \(U\). Therefore the limit 4.4 exists. \(\square \)

Since it pertains to evolutionary competition under rare mutation, the rare-mutation dimorphic distribution  arises naturally from the basic considerations of evolutionary theory. Indeed, Corollaries 1 and 2 below show that the rare-mutation dimorphic distribution  is the correct distribution to average over when comparing quantities that pertain to specific states (e.g. frequency, fitness) to those that characterize the entire evolutionary process (e.g. fixation probability), under rare mutation. Despite this usefulness, we have found only a handful of examples in which such a distribution is considered (e.g. Zhou et al. 2010, and Zhou and Qian 2011, who considered a conditional stationary distribution equivalent to, but defined differently from, the rare-mutation dimorphic distribution for the Moran process).

The rare-mutation dimorphic distribution  is similar in spirit to quasi-stationary distributions (Darroch and Seneta 1965; Barbour 1976; Gyllenberg and Silvestrov 2008; Cattiaux et al. 2009; Collet et al. 2011), in that both distributions describe asymptotic behavior in absorbing Markov chains, conditioned (in some sense) on non-absorption. However, there is a substantive difference in the way this conditioning is implemented: In quasi-stationary distributions, one conditions on the event that absorption has not yet occurred. In contrast, the rare-mutation dimorphic distribution  begins with an ergodic Markov chain (\(\mathcal M _{\mathcal{E }_u}\) for \(u>0\)), conditions on not occupying a subset of states (the fixation states \(\left\{ \mathbf{0 }, \mathbf{1 }\right\} \)), and then takes a limit (\(u \rightarrow 0\)) under which this subset becomes absorbing. These two ways of conditioning on non-absorption are mathematically different; thus the rare-mutation dimorphic distribution  is not quasi-stationary according to standard mathematical definitions.

We denote expectations in the rare-mutation dimorphic distribution  by \(\langle \; \rangle ^*\). In Sect. 4.3.3 we derive a recurrence formula that can be used to compute this distribution for any model satisfying our assumptions.

Example: Moran process The rare-mutation dimorphic distribution  for the frequency-dependent Moran process is given by

$$\begin{aligned} \pi _m \propto \frac{ m f_1\big (\tfrac{m}{N} \big ) + (N-m) f_0\big (\tfrac{m}{N} \big )}{m(N-m) f_0\big (\tfrac{m}{N} \big ) } \prod _{i=1}^{m-1} \frac{f_1\left(\tfrac{i}{N} \right)}{f_0\left(\tfrac{i}{N} \right)}. \end{aligned}$$
(4.5)

Above, \(m\) represents the number of type 1 individuals, and the symbol \(\propto \) means “proportional to”. This expression can be verified using the recurrence formula derived in Theorem 3 below, together with the transition probabilities in Eq. 3.2. Equivalent formulas were obtained by Zhou et al. (2010) and Zhou and Qian (2011). Figure 4 illustrates the rare-mutation dimorphic distribution s for the Moran process in the cases of a Prisoner’s Dilemma and a Snowdrift game.

Fig. 4
figure 4

Rare-mutation dimorphic distributions for the Moran process, with frequency dependence described by a Prisoner’s Dilemma game (left) and a Snowdrift game (right). Type 0 plays the role of the cooperator in each case. Note that, since the Snowdrift game promotes coexistence between cooperators and defectors, the rare-mutation dimorphic distribution  for the Snowdrift game places greater probability on intermediate states. Calculations are based on Eq. 4.5, together with the transition probabilities in Eq. 3.2

Example: BD on star graphs Fig. 5 illustrates the rare-mutation dimorphic distribution  for the birth-death process on a star graph for two values of \(r\).

Fig. 5
figure 5

Rare-mutation dimorphic (RMD) distributions for the BD process on a star graph with 8 leaves (see Sect. 2.2 and Fig. 1). The bars show the probabilities of the various states \((s_1, \ell )\). The horizontal axes correspond to the abundance \(\ell \) of type 1 among the leaves, while the two colors of bars correspond to the type \(s_1 \in \{0,1\}\) of the center. In a, both types have equal reproductive rate (from which the symmetry in the figure follows), while in b, types 0 and 1 have reproductive rates 1 and 1.1, respectively. Calculations are based on the recurrence formula Eq. 4.8, together with the transition probabilities in Eq. 3.3

4.3.2 The mutant appearance distribution

It is also important to characterize the states that are likely to arise when a new mutation appears. This is because mutant offspring may be more likely to appear in some positions rather than others, and the ultimate success of a mutation may depend on the position in which it arises.

To this end, we here define the mutant appearance distributions \(\{ \mu ^1_\mathbf{s}\}_{\mathbf{s}\in \{0,1\}^N}\) and \(\{ \mu ^0_\mathbf{s}\}_{\mathbf{s}\in \{0,1\}^N}\). For a state \(\mathbf{s}\), \(\mu ^1_\mathbf{s}\) gives the probability of being in state \(\mathbf{s}\) immediately after a type 1 mutant arises in a population otherwise of type 0, and vice versa for \(\mu ^0_\mathbf{s}\). These distributions are essential to our definition of fixation probability, and also play a role in the recurrence formula for the rare-mutation dimorphic distribution  that we derive in Sect. 4.3.3.

Definition 7

The mutant appearance distribution of type 1, \(\{ \mu ^1_\mathbf{s}\}_\mathbf{s}\) is a probability distribution on \(\{0,1\}^N\) defined by

$$\begin{aligned} \mu ^1_\mathbf{s}= \left\{ \begin{array}{ll} \lim _{u \rightarrow 0} \frac{p_\mathbf{0 \rightarrow \mathbf s }(u)}{1-p_\mathbf{0 \rightarrow \mathbf 0 }(u)}&\mathbf{s}\ne \mathbf 0 ,\\ 0&\mathbf{s}= \mathbf 0 . \end{array}\right. \end{aligned}$$

In words, \(\mu ^1_\mathbf{s}\) is the low-mutation limit of the probability that state \(\mathbf{s}\) is reached in a transition that leaves state \(\mathbf 0 \). The mutant appearance distribution of type 0, \(\{ \mu ^0_\mathbf{s}\}_\mathbf{s}\), is defined similarly.

It is intuitively clear that \(\mu ^1_\mathbf{s}\) and \(\mu ^0_\mathbf{s}\) should be nonzero only for states \(\mathbf{s}\) that have exactly one individual—the mutant—whose type is different from the others. Further reflection suggests that mutants should appear in position \(i\) with probability proportional to the rate at which position \(i\) is replaced, which is \(d_i (\mathbf 0 )\) or \(d_i (\mathbf 1 )\) for the two respective distributions. We formalize these observations in the following lemma:

Lemma 3

The mutant appearance distribution \(\{ \mu ^1_\mathbf{s}\}_\mathbf{s}\) satisfies

$$\begin{aligned} \mu ^1_\mathbf{s}= \left\{ \begin{array}{ll} \frac{d_i (\mathbf 0 )}{B}&\quad \text{ if} \quad s_i=1 \quad \text{ and} \quad s_j=0 \quad \text{ for}\,j \ne i,\\ { 0}&\quad \text{ otherwise,} \end{array} \right. \end{aligned}$$

and the analogous result holds for \(\{ \mu ^0_\mathbf{s}\}_\mathbf{s}\).

We recall that \(B\) is the total expected offspring number for the whole population, which is constant over states by Assumption 1.

Proof

We give the proof for \(\{ \mu ^1_\mathbf{s}\}_\mathbf{s}\); the proof for \(\{ \mu ^0_\mathbf{s}\}_\mathbf{s}\) is similar. If \(\mathbf{s}\) has \(s_i=1\) and \(s_j=0\) for \(j \ne i\), then

$$\begin{aligned} p_\mathbf{0 \rightarrow \mathbf{s}} (u) = \sum _{\begin{matrix} (R, \alpha ) \\ i \in R \end{matrix}} p_\mathbf{0 } (R, \alpha ) \frac{u}{2} \left( 1 - \frac{u}{2} \right)^{|R|-1} = \frac{u}{2} d_i( \mathbf 0 ) + \mathcal O (u^2). \end{aligned}$$
(4.6)

For all other states \(\mathbf{s}\), \(p_\mathbf{0 \rightarrow \mathbf{s}}(u)\) is of order \(u^2\) as \(u \rightarrow 0\). Summing Eq. 4.6 over states \(\mathbf{s}\ne \mathbf 0 \), we obtain

$$\begin{aligned} 1-p_\mathbf{0 \rightarrow \mathbf 0 }(u) = \frac{u}{2} \sum _{i=1}^N d_i (\mathbf 0 )+ \mathcal O (u^2) = \frac{uB}{2} + \mathcal O (u^2) \quad (u \rightarrow 0). \end{aligned}$$
(4.7)

Dividing Eq. 4.6 by Eq. 4.7 and taking the limit as \(u \rightarrow 0\) yields the desired result. \(\square \)

Example: Moran process In the Moran process, \(d_i(\mathbf{s})=1/N\) for each position \(i\) and state \(\mathbf{s}\), thus the mutant appearance distributions place equal probability on mutants arising in each position.

Example: BD on graphs For the birth–death process on graphs, we have \(d_i(\mathbf 0 ) = d_i (\mathbf 1 ) = 1/N \sum _{j=1}^N e_{ji}\). This quantity—called the temperature of vertex \(i\) by Lieberman et al. (2005)—gives the probability of mutants appearing at vertex \(i\) under the two mutant appearance distributions. In particular, for the star graph, a new mutant has probability \((N-1)/N\) of arising at the center, versus \(1/\big (N(N-1)\big )\) for each leaf. Our convention for the appearance of mutants departs from that of Lieberman et al. (2005) and other works in evolutionary graph theory, which assume that mutants are equally likely to arise at each position.

4.3.3 A recurrence formula for the rare-mutation dimorphic distribution

As we will later show, the rare-mutation dimorphic distribution  is very useful for linking different measures of evolutionary success. It is therefore desirable to have a recurrence formula for this distribution, so that it can be obtained numerically without the computation of limits. The following theorem provides such a formula.

Theorem 3

The rare-mutation dimorphic distribution  \(\{ \pi _{\mathbf{s}}^* \}_\mathbf{s}\) satisfies

$$\begin{aligned} \pi _{\mathbf{s}}^* = \sum _{\mathbf{s}^{\prime } \notin \left\{ \mathbf{0 },\mathbf{1 }\right\} } \pi _{\mathbf{s}^{\prime }}^* \left( p_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}} + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 0 } \, \mu _\mathbf{s}^1 + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 1 } \, \mu _\mathbf{s}^0 \right), \end{aligned}$$
(4.8)

where \(\{ p_{\mathbf{s}_1 \rightarrow \mathbf{s}_2} \}_{\mathbf{s}_1,\mathbf{s}_2}\) are the transition probabilities in \(\mathcal M _{\mathcal{E }_0}\). Together with the constraint \(\sum _{\mathbf{s}\notin \left\{ \mathbf{0 },\mathbf{1 }\right\} } \pi _{\mathbf{s}}^* = 1\), this system of equations uniquely determines \(\{ \pi _{\mathbf{s}}^* \}_\mathbf{s}\).

This recurrence formula has an intuitive interpretation. It implies that \(\{ \pi _{\mathbf{s}}^* \}_\mathbf{s}\) is the stationary distribution of a Markov chain \(\mathcal M ^*\) on \(\{0,1\}^N \!\setminus \! \{ \mathbf 0 , \mathbf 1 \}\) with transition probabilities

$$\begin{aligned} p^*_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }} = p_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }} + p_{\mathbf{s}\rightarrow \mathbf 0 } \, \mu _{\mathbf{s}^{\prime }}^1 + p_{\mathbf{s}\rightarrow \mathbf 1 } \, \mu _{\mathbf{s}^{\prime }}^0. \end{aligned}$$
(4.9)

The Markov chain \(\mathcal M ^*\) is the same as \(\mathcal M _{\mathcal{E }_0}\) except that, if one type would fix, instead a new state is sampled from the appropriate mutant appearance distribution. (In other words, the time between fixation and new mutant appearance is collapsed.) The proof below formalizes this intuition.

Proof

This proof is based on the principle of state reduction (Sonin 1999). From the Markov chain \(\mathcal M _{\mathcal{E }_u}\) with \(u>0\), we eliminate the states \(\mathbf 0 \) and \(\mathbf 1 \), so that transitions to either of these states instead go to the next state other than \(\mathbf 0 \) or \(\mathbf 1 \) that would be reached. This results (Sonin 1999, Proposition A) in a Markov chain, which we denote \(\mathcal M _{\mathcal{E }_u | \notin \{ \mathbf 0 , \mathbf 1 \}}\), whose set of states is \(\{0,1\}^N {\setminus } \{ \mathbf 0 , \mathbf 1 \}\) and whose transition probabilities are

$$\begin{aligned} p^*_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }}(u) = p_{\mathbf{s}\rightarrow \mathbf{s}^{\prime }}(u) + p_{\mathbf{s}\rightarrow \mathbf 0 }(u) r_\mathbf{0 \rightarrow \mathbf{s}^{\prime }}(u) + p_{\mathbf{s}\rightarrow \mathbf 1 }(u) r_\mathbf{1 \rightarrow \mathbf{s}^{\prime }}(u). \end{aligned}$$
(4.10)

Above, \(r_\mathbf{0 \rightarrow \mathbf{s}}(u)\) denotes the probability that \(\mathbf{s}\) is the first state in \(\{0,1\}^N {\setminus } \{ \mathbf 0 , \mathbf 1 \}\) visited by the Markov chain \(\mathcal M _{\mathcal{E }_u}\) with initial state \(\mathbf 0 \). An analogous definition holds for \(r_\mathbf{1 \rightarrow \mathbf{s}^{\prime }}(u)\). These probabilities satisfy the recurrence relations

$$\begin{aligned} r_\mathbf{0 \rightarrow \mathbf{s}}(u)&= p_\mathbf{0 \rightarrow \mathbf{s}}(u) + p_\mathbf{0 \rightarrow \mathbf 0 }(u) r_\mathbf{0 \rightarrow \mathbf{s}}(u) + p_\mathbf{0 \rightarrow \mathbf 1 }(u)r_\mathbf{1 \rightarrow \mathbf{s}}(u)\\ r_\mathbf{1 \rightarrow \mathbf{s}}(u)&= p_\mathbf{1 \rightarrow \mathbf{s}}(u) + p_\mathbf{1 \rightarrow \mathbf 1 }(u) r_\mathbf{1 \rightarrow \mathbf{s}}(u) + p_\mathbf{1 \rightarrow \mathbf 0 }(u)r_\mathbf{0 \rightarrow \mathbf{s}}(u). \end{aligned}$$

or equivalently,

$$\begin{aligned} r_\mathbf{0 \rightarrow \mathbf{s}}(u)&= \frac{p_\mathbf{0 \rightarrow \mathbf{s}}(u) + p_\mathbf{0 \rightarrow \mathbf 1 }(u) r_\mathbf{1 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{0 \rightarrow \mathbf 0 }(u)}\\ r_\mathbf{1 \rightarrow \mathbf{s}}(u)&= \frac{p_\mathbf{1 \rightarrow \mathbf{s}}(u) + p_\mathbf{1 \rightarrow \mathbf 0 }(u) r_\mathbf{0 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{1 \rightarrow \mathbf 1 }(u)}. \end{aligned}$$
(4.11)

Lemma 1(b) and Proposition C of Sonin (1999) imply that, for all \(u>0\), the stationary distribution of \(\mathcal M _{\mathcal{E }_u | \notin \{ \mathbf 0 , \mathbf 1 \}}\) coincides with the conditional stationary distribution \(\{ \pi _{\mathbf{s}| \notin \left\{ \mathbf{0 }, \mathbf{1 }\right\} }(u) \}_\mathbf{s}\) defined in Eq. 4.2. Furthermore, using Eqs. 4.104.12, we obtain the following recurrence relation for all \(\mathbf{s}\in \{0,1\}^N \!\setminus \! \{ \mathbf 0 , \mathbf 1 \}\), \(u>0\):

$$\begin{aligned} \pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u)&= \sum _{\mathbf{s}^{\prime } \notin \left\{ \mathbf{0 },\mathbf{1 }\right\} } \pi _{\mathbf{s}^{\prime } {|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u) \bigg ( p_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}}(u) \\&\quad + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 0 }(u) \frac{p_\mathbf{0 \rightarrow \mathbf{s}}(u) + p_\mathbf{0 \rightarrow \mathbf 1 }(u) r_\mathbf{1 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{0 \rightarrow \mathbf 0 }(u)}\\&\quad + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 1 }(u) \frac{p_\mathbf{1 \rightarrow \mathbf{s}}(u) + p_\mathbf{1 \rightarrow \mathbf 0 }(u) r_\mathbf{0 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{1 \rightarrow \mathbf 1 }(u)} \bigg ). \end{aligned}$$

Now taking the limit \(u \rightarrow 0\) and invoking the definitions of \(\pi _\mathbf{s}^*\), \(\mu _\mathbf{s}^0\), and \(\mu _\mathbf{s}^1\) yields

$$\begin{aligned} \pi _\mathbf{s}^*&= \sum _{\mathbf{s}^{\prime } \notin \left\{ \mathbf{0 },\mathbf{1 }\right\} } \pi _{\mathbf{s}^{\prime }}^* \bigg ( p_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}} + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 0 } \, \mu _\mathbf{s}^1 + p_{\mathbf{s}^{\prime } \rightarrow \mathbf 1 } \, \mu _\mathbf{s}^0 \nonumber \\&\quad + \lim _{u \rightarrow 0} \frac{p_\mathbf{0 \rightarrow \mathbf 1 }(u) r_\mathbf{1 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{0 \rightarrow \mathbf 0 }(u)} + \lim _{u \rightarrow 0} \frac{p_\mathbf{1 \rightarrow \mathbf 0 }(u) r_\mathbf{0 \rightarrow \mathbf{s}}(u)}{1-p_\mathbf{1 \rightarrow \mathbf 1 }(u)} \bigg ). \end{aligned}$$
(4.12)

The two limits on the right-hand side above are zero because, as \(u \rightarrow 0\), \(p_\mathbf{0 \rightarrow \mathbf 1 }(u)\) and \(p_\mathbf{1 \rightarrow \mathbf 0 }(u)\) are of order \(u^N\) (these transitions require \(N\) simultaneous mutations; recall \(N \ge 2\)), while \(1-p_\mathbf{0 \rightarrow \mathbf 0 }(u)\) and \(1-p_\mathbf{1 \rightarrow \mathbf 1 }(u)\) are of order \(u\) (see proof of Lemma 3). This proves that \(\{\pi _\mathbf{s}^*\}_\mathbf{s}\) satisfies the recurrence relations 4.8.

To show that Eq. 4.8, together with \(\sum _{\mathbf{s}\notin \left\{ \mathbf{0 },\mathbf{1 }\right\} } \pi _{\mathbf{s}}^* = 1\), uniquely determines \(\{\pi _\mathbf{s}^*\}_\mathbf{s}\), we will prove the equivalent statement that \(\{\pi _\mathbf{s}^*\}_\mathbf{s}\) is the unique stationary distribution of the Markov chain \(\mathcal M ^*\), defined following the statement of Theorem 3. We first claim that \(\mathcal M ^*\) has a single closed communicating class, accessible from any state. To prove this, let \(\mathbf{s}_0 \in \{0,1\}^N \!\setminus \! \left\{ \mathbf{0 },\mathbf{1 }\right\} \) be a state satisfying \(\mu ^1_{\mathbf{s}_0} > 0\). We show that \(\mathbf{s}_0\) is accessible from any state, by the following argument: for any state \(\mathbf{s}_1 \notin \left\{ \mathbf{0 },\mathbf{1 }\right\} \), there is at least one position \(i\) with \((\mathbf{s}_1)_i=0\). As a consequence of Assumption 2, there is a sequence of states \((\mathbf{s}_1, \mathbf{s}_2, \ldots , \mathbf{s}_k, \mathbf 0 )\), for some \(k \ge 1\), such that each transition between consecutive states of this sequence has positive probability in \(\mathcal M _{\mathcal{E }_0}\). Since \(\mathbf 0 \) and \(\mathbf 1 \) are absorbing states of \(\mathcal M _{\mathcal{E }_0}\), we have \(\mathbf{s}_2, \ldots , \mathbf{s}_k \notin \left\{ \mathbf{0 },\mathbf{1 }\right\} \). Consider now the amended sequence \((\mathbf{s}_1, \mathbf{s}_2, \ldots , \mathbf{s}_k, \mathbf{s}_0)\). By Eq. 4.9 and the positivity of \(\mu ^1_{\mathbf{s}_0}\), each transition in this latter sequence has positive probability in \(\mathcal M ^*\). Thus \(\mathbf{s}_0\) is accessible from any state, which proves that \(\mathcal M ^*\) has a single closed communicating class, accessible from any state. A standard variation of the Markov chain ergodic theorem (e.g. Woess 2009, Corollary 3.23) now implies that \(\mathcal M ^*\) has a unique stationary distribution, which implies that \(\{\pi _{\mathbf{s}}^*\}_\mathbf{s}\) is uniquely determined as claimed. \(\square \)

We remark that further asymptotic properties (as \(u \rightarrow 0\) and \(\text{ time} \rightarrow \infty \)) of the Markov chains \(\mathcal M _{\mathcal{E }_u}\) and \(\mathcal M _{\mathcal{E }_u | \notin \{ \mathbf 0 , \mathbf 1 \}}\) can be obtained using the results of Gyllenberg and Silvestrov (2008). For example, Theorem 5.5.1 of Gyllenberg and Silvestrov (2008) applied to \(\mathcal M _{\mathcal{E }_u {|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}\) (after an appropriate transformation from discrete to continuous time) can yield a power series expansion of \(\pi _{\mathbf{s}{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}}(u)\) in \(u\) around \(u=0\). This expansion characterizes how intermediately small mutation rates affect the likelihood of states to arise during evolutionary competition.

5 Measures of evolutionary success

We now turn to quantities that characterize evolutionary success. We focus first, in Sect. 5.1 on the expected frequencies \(\langle x_0 \rangle \) and \(\langle x_1 \rangle \) of types 0 and 1 respectively, on the expected change in \(x_1\) due to selection, \(\langle \Delta ^\mathrm{sel }x_1 \rangle \), and on quantities related to average fitness. We prove a number of relations between these quantities, including, in Sect. 5.1.4, stochastic versions of the Price equation and replicator equation.

We then turn to fixation probability in Sect. 5.2. Section 5.2.1 defines the fixation probabilities \(\rho _1\) and \(\rho _0\) of types 1 and 0, respectively. Section 5.2.2 then proves our central result: that in the limit of low mutation, the success conditions \(\langle x_1 \rangle > 1/2\), \(\langle \Delta ^\mathrm{sel }x_1 \rangle > 0\), and \(\rho _1 > \rho _0\) all coincide.

5.1 Frequency, fitness, and change due to selection

5.1.1 Expected frequency

It is natural to quantify the success of types 0 and 1 in terms of their respective frequencies \(x_0\) and \(x_1\), as defined in Sect. 3.2. When mutation is present (\(u>0\)), ergodicity (Theorem 1) implies that, over long periods of time, the time averages of \(x_0\) and \(x_1\) converge to their expectations, \(\langle x_0 \rangle \) and \(\langle x_1 \rangle \), in the mutation-selection stationary distribution. Therefore, a natural condition for the success of type 1 is that it has higher expected frequency than type 0, \(\langle x_1 \rangle > \langle x_0 \rangle \) (see, for example Antal et al. 2009a, b; Nowak et al. 2010b). Since the two frequencies sum to one, this is equivalent to \(\langle x_1 \rangle > 1/2\).

5.1.2 Average fitness

Evolutionary success can also be quantified in terms of the average fitnesses \(\bar{w}^1\) and \( \bar{w}^0\) of types 1 and 0 respectively. We define these average fitnesses in a particular state \(\mathbf{s}\) as

$$\begin{aligned} \bar{w}^1(\mathbf{s}) = \left\{ \begin{array}{ll} \frac{\sum _{i=1}^N s_i w_i(\mathbf{s})}{\sum _{i=1}^N s_i}&\mathbf{s}\ne \mathbf 0 \\ 0&\mathbf{s}= \mathbf 0 , \end{array}\right. \end{aligned}$$

and

$$\begin{aligned} \bar{w}^0(\mathbf{s}) = \left\{ \begin{array}{ll} \frac{\sum _{i=1}^N (1-s_i) w_i(\mathbf{s})}{\sum _{i=1}^N (1-s_i)}&\mathbf{s}\ne \mathbf 1 \\ 0&\mathbf{s}= \mathbf 1 . \end{array}\right. \end{aligned}$$

5.1.3 Expected change in frequency due to selection

Yet another success measure is the expected change in frequency due to selection (Antal et al. 2009a, b; Tarnita et al. 2009a; Nowak et al. 2010b). For type 1, and in a particular state \(\mathbf{s}\), this is defined as

$$\begin{aligned} \Delta ^\mathrm{sel }x_1(\mathbf{s}) = \frac{1}{N} \sum _{i=1}^N s_i (b_i(\mathbf{s}) - d_i(\mathbf{s})). \end{aligned}$$

In words, \(\Delta ^\mathrm{sel }x_1(\mathbf{s})\) is the expected number of offspring born to type 1 individuals, minus the expected number of deaths to type 1 individuals, divided by the total population size. Equivalently, \(\Delta ^\mathrm{sel }x_1(\mathbf{s})\) equals the expected change in the frequency \(x_1\) from the current state \(\mathbf{s}\) to the next, conditioned on no mutations occurring.

Moving to the entire evolutionary process, we say type 1 is “favored by selection” if \(\langle \Delta ^\mathrm{sel }x_1 \rangle >0\), or \(\langle \Delta ^\mathrm{sel }x_1 \rangle ^*>0\) in the case \(u=0\).

5.1.4 The stationary Price equation and stationary replicator equation

The following theorem equates the expected change due to selection \(\langle \Delta ^\mathrm{sel }x_1 \rangle \) to three other quantities, each of which is a stochastic version of a well-known formula in evolutionary theory.

Theorem 4

For any evolutionary process \(\mathcal{E }\), the following identities hold, with \(\langle \; \rangle ^*\) in place of \(\langle \; \rangle \) in the case \(u=0\):

  1. (a)

    The “stationary Price equation”:

    $$\begin{aligned} \langle \Delta ^\mathrm{sel }x_1 \rangle = \frac{1}{N} \left\langle \sum _{i=1}^N s_i w_i \right\rangle - \frac{1}{N^2} \left\langle \sum _{i=1}^N s_i \sum _{i=1}^N w_i \right\rangle , \end{aligned}$$
    (5.1)
  2. (b)

    The “stationary replicator equation”:

    $$\begin{aligned} \langle \Delta ^\mathrm{sel }x_1 \rangle = \langle x_1( \bar{w}_1 - \bar{w}) \rangle = \langle x_1 x_0 (\bar{w}^1 - \bar{w}^0) \rangle . \end{aligned}$$
    (5.2)

In Eq. (5.2) above, \(\bar{w}\) denotes the average fitness of all individuals. Since population size is constant (hence average birth rate equals average death rate), \(\bar{w}\) is identically equal to 1 for our class of models. The symbol \(\bar{w}\) is used in order to maintain consistency with usages of the replicator equation in models where average fitness is not constant.

This theorem is proven by verifying the corresponding identities in each state (including, in the case \(u>0\), the monomorphic states \(\mathbf 1 \) and \(\mathbf 0 \)). This can be achieved by straightforward algebraic manipulation. Since these identities hold in each state, they also hold in expectation over the stationary and dimorphic distributions.

We pause to discuss the names given to these identities. The stationary Price equation, Eq. (5.1), is a stochastic version of the well-known Price equation (Price 1970, 1972), which can be written (in the case of constant population size) as

$$\begin{aligned} \Delta ^\mathrm{sel }x_1(\mathbf{s}) = \frac{1}{N} \sum _{i=1}^N s_i w_i(\mathbf{s}) - \frac{1}{N^2} \sum _{i=1}^N s_i \sum _{i=1}^N w_i(\mathbf{s}). \end{aligned}$$
(5.3)

The right-hand side of the Price equation is customarily written as a covariance between type and fitness (Price 1970); however, we avoid this notation because it can lead to confusion over which quantities are to be regarded as random variables (see van Veelen 2005, 2012). We note that while the original Price equation, Eq. 5.3, applies to a particular state \(\mathbf{s}\), the stationary Price equation, Eq. (5.1), applies to the entire evolutionary Markov chain \(\mathcal M _\mathcal{E }\).

The stationary replicator equation, Eq. (5.2), is a stochastic variation of the replicator equation (Taylor and Jonker 1978; Hofbauer and Sigmund 1998, 2003)—a differential equation for the evolutionary dynamics of competing types in an infinite population. In the case of two types, the replicator equation can be written as

$$\begin{aligned} \dot{x_1} = x_1 (w^1 - \bar{w})= x_0 x_1(w^1-w^0), \end{aligned}$$

where \(x_1\) and \(x_0\) represent the respective frequencies of types 1 and 0 at time \(t\), and \(w^1\) and \(w^0\) represent their respective fitnesses. (The first equality defines the dynamics, while the second is a mathematical identity.) Another version of the stationary replicator equation, for a different class of evolutionary models, was proven by Nathanson et al. (2009).

Finally we remark that, although the expected average fitnesses \(\langle \bar{w}^1 \rangle \) and \(\langle \bar{w}^1 \rangle \) appear to be natural success measures, it is not true in general that \(\langle \Delta ^\mathrm{sel }x_1 \rangle >0 \Leftrightarrow \langle \bar{w}^1 \rangle > \langle \bar{w}^0 \rangle \). Rather, Theorem 4 implies that to obtain the expected direction of selection, the correct comparison is not \(\langle \bar{w}^1 \rangle \) versus \( \langle \bar{w}^0 \rangle \), but \(\langle x_0 x_1 \bar{w}^1 \rangle \) versus \(\langle x_0 x_1 \bar{w}^0 \rangle \).

5.1.5 The relation between expected frequency and expected change due to selection

The success conditions \(\langle x_1 \rangle >1/2\) and \(\langle \Delta ^\mathrm{sel }x_1 \rangle >0\) can be related using the following theorem. This is proven by Nowak et al. (2010b, Appendix A), under assumptions applicable to the class of models considered here:

Theorem 5

In an evolutionary process with \(u>0\),

$$\begin{aligned} \langle x_1 \rangle > \frac{1}{2} \Longleftrightarrow \langle \Delta ^\mathrm{sel }x_1 \rangle - \frac{u}{N} \left\langle \sum _{i=1}^N s_i (b_i-B/N) \right\rangle > 0. \end{aligned}$$
(5.4)

Above, the term

$$\begin{aligned} - \frac{u}{N} \left\langle \sum _{i=1}^N s_i (b_i-B/N) \right\rangle \end{aligned}$$

characterizes the net effect of mutation on the expected frequency of type 1. If type 1 individuals have a higher birth rate than the average birth rate \(B/N\), then on average, more type 1 individuals will be lost rather than gained through mutation. An important lesson of this theorem is that the direction of selection may be different from the direction of evolution. For example, if types 0 and 1 have equal fitness, but type 1 replaces itself more often, there will be more mutations from 1 to 0 than vice versa. This will cause the expected frequency of type 0 to exceed that of type 1.

In the low-mutation limit, the mutational bias term vanishes, and the success conditions \(\langle x_1 \rangle > \frac{1}{2}\) and \(\langle \Delta ^\mathrm{sel }x_1 \rangle >0\) coincide. To state this result formally, we consider, as in Sect. 4.3, a family of evolutionary processes \(\{\mathcal{E }_u\}_{u \ge 0}\), in which the population size \(N\) and replacement rule \(\mathfrak{R }\) are fixed but the mutation rate \(u\) varies. The low-mutation result can then be stated as follows:

Corollary 1

\(\langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0\) if and only if \(\lim _{u \rightarrow 0} \langle x_1 \rangle > 1/2\) in the family of evolutionary processes \(\{\mathcal{E }_u\}_{u \ge 0}\).

Proof

First we note that, for the states \(\mathbf{s}=\mathbf 0 \) and \(\mathbf{s}=\mathbf 1 \), the quantities \(\Delta ^\mathrm{sel }x_1(\mathbf{s})\) and

$$\begin{aligned} \sum _{i=1}^N s_i (b_i(\mathbf{s}) - B/N) \end{aligned}$$

appearing in Therorem 5 both vanish. This is trivial for \(\mathbf{s}= \mathbf 0 \) since in this state \(s_i = 0\) for all \(i\). For \(\mathbf{s}= \mathbf 1 \) this is true because \(\sum _{i=1}^N b_i(\mathbf{s}) = \sum _{i=1}^N d_i(\mathbf{s}) = B\) by Assumption 1. Because of this, we can replace the expectations in Theorem 5 with expectations over the distribution \(\{ \pi _{\mathbf{s}| \notin \left\{ \mathbf{0 }, \mathbf{1 }\right\} } \}_\mathbf{s}\) conditioned on both types being present (defined in Sect. 4.3.1), yielding

$$\begin{aligned} \langle x_1 \rangle > \frac{1}{2} \Longleftrightarrow \langle \Delta ^\mathrm{sel }x_1 \rangle _{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}- \frac{u}{N} \left\langle \sum _{i=1}^N s_i (b_i-B/N) \right\rangle _{|\notin \{\,\mathbf 0 , \mathbf 1 \,\}}> 0, \end{aligned}$$

in the evolutionary process \(\mathcal{E }_u\) with \(u>0\). Now taking the limit \(u \rightarrow 0\) yields

$$\begin{aligned} \lim _{u \rightarrow 0} \langle x_1 \rangle > \frac{1}{2} \Longleftrightarrow \langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0, \end{aligned}$$

as desired. \(\square \)

5.2 Fixation probability

Evolutionary success is frequently defined in terms of fixation probability—the probability that a new mutant trait will become fixed in the population. Section 5.2.1 gives a rigorous definintion of the fixation probabilities \(\rho _1\) and \(\rho _0\) for our class of models. Then Sect. 5.2.2 proves our central result, that the success conditions \(\langle x_1 \rangle > 1/2\) and \(\langle \Delta ^\mathrm{sel }x_1 \rangle > 0\) become equivalent to \(\rho _1>\rho _0\) in the limit of low mutation rate.

5.2.1 The definition of fixation probability

Intuitively, fixation probability is the probability that, starting from a single individual, a type will come to dominate the population (i.e. be present in all individuals). However, this apparently simple notion is complicated by the fact that the success of the new type may depend on the position in which it arises. We resolve this issue by sampling the initial state from the appropriate mutant appearance distribution.

Definition 8

For an evolutionary process \(\mathcal{E }\) with \(u=0\), we define the fixation probability \(\rho _1\) of type 1 as the probability that the evolutionary Markov process \(\mathcal M _\mathcal{E }\) becomes absorbed in state \(\mathbf 1 \), given that its initial state was sampled from \(\{ \mu ^1_\mathbf{s}\}_\mathbf{s}\):

$$\begin{aligned} \rho _1 = \sum _{\mathbf{s}\in \{0,1\}^N} \mu ^1_\mathbf{s}\lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 1 }. \end{aligned}$$

We define the fixation probability \(\rho _0\) of type 0 in similar fashion. We observe that, as a direct consequence of Assumption 2, both \(\rho _0\) and \(\rho _1\) are positive for every evolutionary process with \(u=0\).

Example: Moran process For the frequency-dependent Moran process with no mutation, fixation probabilities are given by (Nowak et al. 2004; Taylor et al. 2004)

$$\begin{aligned} \rho _1 = \left( 1 + \sum _{m=1}^{N-1} \sum _{k=1}^m \frac{f_0 \big ( \tfrac{k}{m} \big )}{f_1 \big ( \tfrac{k}{m} \big )} \right)^{-1}, \quad \rho _0 = \left( 1 + \sum _{m=1}^{N-1} \sum _{k=1}^m \frac{f_1 \big ( \tfrac{k}{m} \big )}{f_0 \big ( \tfrac{k}{m} \big )} \right)^{-1}. \end{aligned}$$

In the case of neutral evolution, \(f_1(x) \equiv f_0(x) \equiv 1\), we have \(\rho _1=\rho _0=1/N\).

Example: BD on star graphs Based on the work of Broom and Rychtár (2008), we calculate the fixation probability of type 1 (with reproductive rate \(r\)) on a star graph to be

$$\begin{aligned} \rho _1 = \frac{n (r-1)(r+1)^2}{\displaystyle (nr+1)^2 \left(\frac{n+r}{r(nr+1)} \right)^n -r(nr+1)(n+r)}. \end{aligned}$$

Above, \(n=N-1\) is the number of leaves. Our answer differs from the fixation probability obtained by Broom and Rychtár (2008) because they assume mutations are equally likely to arise in each position, whereas in our framework mutants arise according to the mutant appearance distribution. In particular, for neutral evolution (\(r=1\)), we have

$$\begin{aligned} \rho _1 = \frac{2n}{1+n+n^2+n^3}. \end{aligned}$$

Interestingly, this fixation probability is less than or equal to the neutral fixation probability for the Moran process, with equality only in the case \(n=1\) (equivalently, \(N=2\)). This suggests that neutral mutations accumulate more slowly for BD on the star than in a well-mixed population. This raises the intriguing question of how different population structures affect the accumulation of neutral mutations. This question is currently unexplored in evolutionary graph theory, because previous work has assumed that mutations are equally likely to arise in each position, leading to a neutral fixation probability of \(1/N\) for any process.

5.2.2 Equivalence of fixation probability to other success measures

For an evolutionary process without mutation, the condition \(\rho _1 > \rho _0\) is a natural and well-established criterion for the evolutionary success of type 1, relative to type 0 (see, for example, Nowak 2006b). It is of considerable theoretical interest to link this condition to other success measures—particularly those involving quantities that can be calculated in each state. In this way, the state-by-state dynamics of an evolutionary process can be related to its ultimate outcome.

The following theorem and its corollary below achieve this goal, in a general way, for the class of models under consideration. Theorem 6 states that, in the limit of low mutation, the success conditions \(\rho _1 > \rho _0\) and \(\langle x_1 \rangle >1/2\) coincide. Corollary 2 shows that both of these coincide with the condition \(\langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0.\)

To state these results, we again consider a family of evolutionary processes \(\{\mathcal{E }_u\}_{u \ge 0}\), with fixed \(N\) and \(\mathfrak{R }\) but varying \(u\).

Theorem 6

\(\rho _1 > \rho _0\) in the evolutionary process \(\mathcal{E }_0\) if and only if

$$\begin{aligned} \lim _{u \rightarrow 0} \langle x_1 \rangle > \frac{1}{2} \end{aligned}$$

in the family of evolutionary processes \(\{\mathcal{E }_u\}_{u \ge 0}\).

Proof

We begin by expanding

$$\begin{aligned} \lim _{u \rightarrow 0} \langle x_1 \rangle = \sum _{\mathbf{s}\in \{0,1\}^N} \left( \pi _\mathbf{s}(0) \frac{1}{N} \sum _{i=1}^N s_i \right) = \pi _\mathbf{1 }(0), \end{aligned}$$

since, by the remarks following Lemma 1, the only states \(\mathbf{s}\) with positive \(\pi _\mathbf{s}(0)\) are \(\mathbf 1 \), for which \(x_1=1\), and \(\mathbf 0 \), for which \(x_1=0\). Since \(\pi _\mathbf{0 }(0) + \pi _\mathbf{1 }(0) = 1\), we have the equivalence

$$\begin{aligned} \lim _{u \rightarrow 0} \langle x_1 \rangle > \frac{1}{2} \Longleftrightarrow \pi _\mathbf{1 }(0) > \pi _\mathbf{0 }(0). \end{aligned}$$
(5.5)

We now relate \(\pi _\mathbf{0 }(0)\) and \(\pi _\mathbf{1 }(0)\) to the fixation probabilities \(\rho _0\) and \(\rho _1\). By ergodicity (Theorem 1), for \(u>0\), the mutation-selection stationary distribution satisfies \(\pi _\mathbf{s}(u) = \lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}}(u)\) for any states \(\mathbf{s}, \mathbf{s}^{\prime } \in \{0,1\}^N\), where \(p^{(n)}_{\mathbf{s}^{\prime } \rightarrow \mathbf{s}}(u)\) denotes the \(n\)-step transition probability from \(\mathbf{s}^{\prime }\) to \(\mathbf{s}\). Applying this rule to the states \(\mathbf 0 \) and \(\mathbf 1 \) and dividing yields

$$\begin{aligned} \frac{ \pi _\mathbf{1 }(u)}{\pi _\mathbf{0 }(u)}&= \frac{\lim _{n \rightarrow \infty } p^{(n)}_\mathbf{0 \rightarrow \mathbf 1 }(u)}{\lim _{n \rightarrow \infty } p^{(n)}_\mathbf{1 \rightarrow \mathbf 0 }(u)}\\&= \frac{\sum _\mathbf{s}p_\mathbf{0 \rightarrow \mathbf{s}}(u) \lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 1 }(u)}{\sum _\mathbf{s}p_\mathbf{1 \rightarrow \mathbf{s}}(u) \lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 0 }(u)}\\&= \frac{\sum _\mathbf{s}\frac{p_\mathbf{0 \rightarrow \mathbf{s}}(u)}{uB/2} \lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 1 }(u)}{\sum _\mathbf{s}\frac{p_\mathbf{1 \rightarrow \mathbf{s}}(u)}{uB/2} \lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 0 }(u)}. \end{aligned}$$

Recalling that \(1-p_\mathbf{0 \rightarrow \mathbf 0 }\) and \(1-p_\mathbf{1 \rightarrow \mathbf 1 }\) can both be expanded as \(uB/2 + \mathcal O (u^2)\) (see the proof of Lemma 3), we take the limit of both sides as \(u \rightarrow 0\), obtaining

$$\begin{aligned} \frac{ \pi _\mathbf{1 }(0)}{\pi _\mathbf{0 }(0)} = \frac{\sum _\mathbf{s}\mu ^1_\mathbf{s}\lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 1 }(u)}{\sum _\mathbf{s}\mu ^0_\mathbf{s}\lim _{n \rightarrow \infty } p^{(n)}_{\mathbf{s}\rightarrow \mathbf 0 }(u)} = \frac{\rho _1}{\rho _0}. \end{aligned}$$

Combining with the equivalence 5.5 completes the proof. \(\square \)

We observe that this proof relies on Assumption 1 (constant overall birth and death rates). The theorem does not hold if Assumption 1 is violated. If instead the birth rate is \(B_0\) in state \(\mathbf 0 \) and \(B_1\) in state \(\mathbf 1 \), we have the alternate equivalence:

$$\begin{aligned} \lim _{u \rightarrow 0} \langle x_1 \rangle > \frac{1}{2} \Longleftrightarrow \frac{\rho _1}{B_1} > \frac{\rho _0}{B_0}. \end{aligned}$$

To gain intuition for this rule, suppose \(B_1>B_0\). Then type 1 replaces itself faster than does type 0 (in states for which only one type is present). Consequently, type 1 produces more type 0 mutants than vice versa. As a result, type 1 will have lower expected frequency than would be expected from comparing only fixation probabilities.

Combining Corollary 1 and Theorem 6 yields the following equivalence of success conditions for mutation-free evolutionary processes:

Corollary 2

In the evolutionary process \(\mathcal{E }\) with \(u=0\),

$$\begin{aligned} \rho _1 > \rho _0 \Longleftrightarrow \langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0. \end{aligned}$$

The utility of Corollary 2 is that it equates a success measure characterizing ultimate outcomes of evolution, \(\rho _1 > \rho _0\), with another, \(\langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0\), characterizing selective forces across states of the evolutionary process. This generalizes a result of Taylor et al. (2007b), who prove a similar theorem for a particular model (the Moran process on graphs with bi-transitive symmetry).

Example: Moran process For the frequency-dependent Moran process, evolutionary success can be determined using the formula Nowak et al. (2004), Taylor et al. (2004)

$$\begin{aligned} \frac{\rho _1}{\rho _0} = \prod _{i=1}^{N-1} \frac{f_1 \big ( \tfrac{k}{m} \big )}{f_0 \big ( \tfrac{k}{m} \big )}. \end{aligned}$$

Combining the formula for the rare-mutation dimorphic distribution, Eq. 4.5, with the definition of \(\Delta ^\mathrm{sel }x_1\) and the formulas for \(b_i(\mathbf{s})\) and \(d_i(\mathbf{s})\) in Eq. 3.1, we can calcuate

$$\begin{aligned} \langle \Delta ^\mathrm{sel }x_1 \rangle ^* = \frac{1}{N} \left( \prod _{i=1}^{N-1} \frac{f_1 \big ( \tfrac{k}{m} \big )}{f_0 \big ( \tfrac{k}{m} \big )} - 1 \right) = \frac{1}{N} \left( \frac{\rho _1}{\rho _0} - 1 \right), \end{aligned}$$

which verifies Corollary 2 for this process.

6 Summary of results

Our main results can be summarized as follows.

6.1 Asymptotic behavior of the evolutionary process

  • If mutation is present, the evolutionary Markov chain is ergodic. Its time-averaged asymptotic behavior is described by the mutation-selection stationary distribution.

  • If there is no mutation, the evolutionary Markov chain eventually becomes absorbed in a state in which only one trait is present.

  • In the limit of low mutation, the time-averaged asymptotic behavior conditioned on both types being present is described by the rare-mutation dimorphic distribution .

6.2 Measures of evolutionary success

  • The following identities hold for all evolutionary processes, with \(\langle \; \rangle \) replaced by \(\langle \; \rangle ^*\) in the case \(u=0\):

    • The stationary Price equation:

      $$\begin{aligned} \langle \Delta ^\mathrm{sel }x_1 \rangle = \frac{1}{N} \left\langle \sum _{i=1}^N s_i w_i \right\rangle - \frac{1}{N^2} \left\langle \sum _{i=1}^N s_i \sum _{i=1}^N w_i \right\rangle , \end{aligned}$$
    • The stationary replicator equation:

      $$\begin{aligned} \langle \Delta ^\mathrm{sel }x_1 \rangle = \langle x_1( \bar{w}_1 - \bar{w}) \rangle = \langle x_1 x_0 (\bar{w}^1 - \bar{w}^0) \rangle . \end{aligned}$$
  • For an evolutionary process \(\mathcal{E }\) with \(u=0\), the following success conditions are equivalent:

    • \(\rho _1 > \rho _0\),

    • \(\langle \Delta ^\mathrm{sel }x_1 \rangle ^* > 0\),

    • \(\lim _{u \rightarrow 0} \langle x_1 \rangle > 1/2\),

    where, in the last condition, \(\langle x_1 \rangle \) is evaluated in the family of evolutionary processes \(\{ \mathcal{E }_u \}_{u \ge 0}\) having the same population size \(N\) and replacement rule \(\mathfrak{R }\) as \(\mathcal{E }\).

7 Discussion

7.1 Overview

This work provides a rigorous foundation for studying evolution in a fairly general class of models, including established models of well-mixed (Fisher 1930; Moran 1958; Cannings 1974; Nowak et al. 2004; Taylor et al. 2004; Imhof and Nowak 2006; Lessard and Ladret 2007; Traulsen et al. 2007) and graph-structured (Holley and Liggett 1975; Cox 1989; Cox et al. 2000; Lieberman et al. 2005; Santos and Pacheco 2005; Sood and Redner 2005; Ohtsuki et al. 2006; Taylor et al. 2007a; Szabó and Fáth 2007; Santos et al. 2008; Sood et al. 2008; Szolnoki et al. 2008; Roca et al. 2009; Allen et al. 2012; Shakarian et al. 2012) populations. Our definitions give formal mathematical meaning to important quantities—such as fixation probability and expected frequency—in greater generality than is usually considered. Our results comparing measures of evolutionary success may prove helpful in determining evolutionarily successful traits and behaviors, both in particular models and in classes of models.

7.2 The rare-mutation dimorphic distribution

One important theoretical contribution of this work is the introduction of the rare-mutation dimorphic distribution. This distribution helps address a central challenge in evolutionary theory: to link quantities or characteristics of specific states to the ultimate outcome of evolution. In approaching this challenge, it is useful to consider probability distributions that reflect the time-averaged behavior of the evolutionary process. Then, by taking expectations, one can move from quantities describing specific states to quantities characterizing the overall process. In the case of nonzero , many works (Rousset and Ronce 2004; Taylor et al. 2007b; Antal et al. 2009a, b; Tarnita et al. 2009a; Tarnita et al. 2009b; Nathanson et al. 2009; Tarnita et al. 2011) have made use of the mutation-selection stationary distribution for this purpose. However, the question of what distribution to use in models without mutation has not been addressed, save for a few specific examples (Zhou et al. 2010; Zhou and Qian 2011). Our results (e.g. Corollary 2) show that the rare-mutation dimorphic distribution  is a natural choice for linking state-dependent quantities to evolutionary outcomes when mutation is rare.

7.3 The Price equation and general approaches to evolutionary theory

Our approach has distinct advantages over approaches to evolutionary theory that use the Price equation—Eq. 5.3 and variants thereof—as a starting point (Queller 1992, 2011; Gardner et al. 2011; Marshall 2011 and many others). These approaches are sometimes advertised as being assumption-free, and hence applicable to any evolutionary process. It is true that, as a mathematical identity, the Price equation requires no assumptions other than the axioms of the real numbers. But this lack of assumptions means that the Price equation cannot, on its own, be used to draw conclusions about the outcome of evolution. Indeed, it is logically impossible to derive mathematical conclusions about any process without first making assumptions. Therefore, though the Price equation may be useful for describing aspects of evolution, and as a mathematical step within a derivation, it is inadequate as a foundational starting point for evolutionary theory (see van Veelen 2005, 2012 for further discussion).

In contrast, our approach shows how general theorems about evolution can be proven within an assumption-based framework. Indeed, one of our results is itself a stochastic version of the Price equation. This stationary Price equation, item (5.1), applies to the entire evolutionary process. In this way, it represents a longer-term view than the original Price equation, as well as stochastic generalizations developed by Grafen (2000) and Rice (2008), which apply only to a single evolutionary time-step. We emphasize, however, that the stationary Price equation is not assumption-free; it depends on the assumptions that delineate our class of models. It is not immediately clear whether the stationary Price equation can be extended to more general classes of models, particularly those with changing population size.

7.4 Directions for future research

Many avenues of future research present themselves, both for generalizing this framework and for applying it to specific problems of interest.

7.4.1 Dynamic size and structure

Most immediate, perhaps, is the need to extend to populations of dynamic size and structure. The dynamics of population structure can significantly affect evolutionary outcomes (Gross and Blasius 2008; Perc and Szolnoki 2010), particularly in the case of cooperative dilemmas (Pacheco et al. 2006a; Pacheco et al. 2006b;  Tarnita et al. 2009a; Fu et al. 2008; Wu et al. 2010; Fehl et al. 2011; Rand et al. 2011). Though generalizing in these directions will present some difficulties (notational as well as mathematical), these difficulties do not appear insurmountable.

7.4.2 Evolutionary games and other interactions

Another important research direction involves representing social and ecological interactions more explicitly. In the framework presented here, all aspects of population structure and interaction are subsumed in the replacement rule \(\mathfrak{R }\). Unpacking \(\mathfrak{R }\), and the interactions and processes it incorporates, will yield further insight into how population structure and other variables affect the evolution.

In particular, our approach can be applied to evolutionary game theory (Maynard Smith and Price 1973; Cressman 1992; Weibull 1997; Hofbauer and Sigmund 2003; Nowak 2006a). The effects of spatial structure on evolutionary game behavior is a topic of intense interest (Nowak and May 1992; van Baalen and Rand 1998; Hauert and Doebeli 2004; Santos and Pacheco 2005; Ohtsuki et al. 2006; Taylor et al. 2007a; Szabó and Fáth 2007; Roca et al. 2009; Broom et al. 2010; Shakarian et al. 2012). By incorporating games into our framework, and formalizing the notion of an “update rule” (Ohtsuki et al. 2006), we can potentially unify and contextualize results from many different models.

7.4.3 Greater genetic complexity

Our framework can also be extended beyond the competition between two haploid types, toward evolution on complex genetic landscapes. One obvious extension is to incorporate diploid genetics. This would involve modifying replacement events to incorporate two offspring-to-parent maps, one for each sex. Another straightforward extension is to incorporate more than two types and different rates of mutation between types. For example, types can be represented as genetic sequences, with mutations possible at each position.

Alternatively, one could instead consider a continuous space of possible types with localized mutation. This could connect our approach to the fields of quantitative genetics (Falconer 1981; Lynch and Walsh 1998) and adaptive dynamics (Hofbauer and Sigmund 1990; Dieckmann and Law 1996; Metz et al. 1996; Geritz et al. 1997). One potential goal is to extend the elegant mathematical framework of Champagnat et al. (2006) to spatially structured populations.

There is also potential to connect this framework to standard mathematical tools of population genetics, such as diffusion approximations (Kimura 1964; Ewens 1979) and coalescent theory (Kingman 1982; Wakeley 2009).

7.4.4 Symmetry in population structure

As we observed in our two example processes, symmetries in population structure allow for reduction of the state space of the evolutionary Markov chain. Such symmetries are useful in calculating evolutionary dynamics (Taylor et al. 2007a; Broom and Rychtár 2008), and represent an interesting connection between evolutionary theory and group theory (see, for example, Taylor et al. 2011). Our framework and other general approaches may enable a more systematic study of symmetry and its consequences in evolution.

7.5 Outlook on general approaches to evolutionary theory

This work, and the future research avenues described here, are part of a larger project: to extend the field of evolutionary theory from the analysis of particular models to the level of a mathematical theory. Mathematical theories begin with fundamental axioms, and from them derive broad theorems illustrating general principles. We believe this can be done for evolution. Indeed, general, assumption-based approaches have already shed light on the dynamics of structured populations (Metz and de Roos 1992; Diekmann et al. 2001, 1998; Diekmann et al. 2007), evolutionary game theory (Tarnita et al. 2009b, 2011), and quantitative trait evolution (Champagnat et al. 2006; Durinx et al. 2008; Simon 2008). While individual models will continue to be important for formulating new hypotheses and understanding particular systems, axiomatic approaches can make rigorous the unifying principles of evolution. This project is in its infancy. There is much exciting work to be done.