1 Introduction

Ever since the Modern Evolutionary Synthesis, mathematics has played an indispensable role in the theory of evolution. Typically, the contribution of mathematics comes in the development and analysis of mathematical models. By representing evolutionary scenarios in a precise way, mathematical modeling can clarify conceptual issues, elucidate underlying mechanisms, and generate new hypotheses.

However, conclusions from mathematical models must always be interrogated with respect to their robustness. Often, this interrogation takes place in ad hoc fashion: Assumptions are relaxed one at a time (sometimes within the original work, sometimes in later works by the same or other authors), until a consensus emerges as to which conclusions are robust and which are merely artifacts.

An alternative approach is to take advantage of the generality made possible by mathematical abstraction. If one can identify a minimal set of assumptions that apply to a broad class of models, any theorem proven from these assumptions will apply to the entire class. Such theorems eliminate the duplicate work of deriving special cases one model at a time. More importantly, the greater the generality in which a theorem is proven, the more likely it is to represent a robust scientific principle. This mathematically general approach has been applied to a number of fields within evolutionary biology, including demographically-structured populations (Metz and de Roos 1992; Diekmann et al. 2001, 1998, 2007; Lessard and Soares 2018), group- and deme-structured populations (Simon et al. 2013; Lehmann et al. 2016), evolutionary game theory (Tarnita et al. 2009b, 2011; Wu et al. 2013; McAvoy and Hauert 2016), quantitative trait evolution (Champagnat et al. 2006; Durinx et al. 2008; Allen et al. 2013; Van Cleve 2015), population extinction and persistence (Schreiber et al. 2010; Roth and Schreiber 2013, 2014; Benaïm and Schreiber 2018), and many aspects of population genetics (Tavaré 1984; Bürger 2000; Ewens 2004).

Currently, there is great theoretical and empirical interest in understanding how the spatial and/or genetic structure of a population influences its evolution. Here, spatial structure refers to the physical layout of the habitat as well as patterns of interaction and dispersal; genetic structure refers to factors such as ploidy, sex ratio, and mating patterns. These factors can affect the rate of genetic change (Allen et al. 2015; McAvoy et al. 2018a), the balance of selection versus drift (Lieberman et al. 2005; Broom et al. 2010; Adlam et al. 2015; Pavlogiannis et al 2018), and the evolution of cooperation and other social behavior (Nowak and May 1992; Taylor and Frank 1996; Rousset and Billiard 2000; Ohtsuki et al. 2006; Taylor et al. 2007b; Nowak et al. 2010a; Débarre et al. 2014; Peña et al. 2016; Allen et al. 2017; Fotouhi et al. 2018).

To study the effects of spatial structure in a mathematically general way, Allen and Tarnita (2014) introduced a class of models with fixed population size and spatial structure. Each model in this class represents competition between two alleles on a single locus in a haploid, asexually-reproducing population. Replacement depends stochastically on the current population state, subject to general assumptions that are compatible with many established models in the literature. For this class, Allen and Tarnita (2014) defined three criteria for success under natural selection and proved that they coincide when mutation is rare.

Here, we generalize the class of models studied by Allen and Tarnita (2014) and significantly extend the results. As in Allen and Tarnita (2014), selection occurs on a single biallelic locus, in a population of fixed size and structure. However, whereas Allen and Tarnita (2014) assumed haploid genetics, the class introduced here allows for arbitrary genetic structure, including diploid (monoecious or dioecious), haplodiploid, and polyploid genetics. Arbitrary mating patterns are allowed, including self-fertilization. This level of generality is achieved using the notion of genetic sites. Each genetic site houses a single allele copy, and each individual contains a number of genetic sites equal to its ploidy. Spatial structure is also arbitrary, in that the patterns of interaction and replacement among individuals are subject only to a minimal assumption ensuring the unity of the population. We also allow for arbitrary mutational bias.

In this class of models, which we present in Sect. 2, natural selection proceeds by replacement events. Replacement events distill all interaction, mating, reproduction, dispersal, and death events into what ultimately matters for selection—namely, which alleles are replaced by copies of which others. Replacement events occur with probability depending on the current population state, according to a given replacement rule. The replacement rule implicitly encodes all relevant aspects of the spatial and genetic structure.

The replacement rule, together with the mutation rate and mutational bias, define an evolutionary Markov chain representing natural selection. Basic results on the asymptotic behavior of the evolutionary Markov chain are established in Sect. 3.

In Sect. 4, we turn to the question of identifying which of two competing alleles is favored by selection. We compare four criteria: one based on fixation probabilities, one based on time-averaged frequency, and two based on expected frequency change due to selection. We prove (Theorem 4) that these coincide in the limit of low mutation, thereby generalizing the main result of Allen and Tarnita (2014).

Sections 5 and 6 explore the closely-related concepts of reproductive value, fitness, and neutral drift. We define these notions in the context of our formalism and prove connections among them. Interestingly, to define reproductive value requires an additional assumption that does not necessarily hold for all models; thus, the concept of reproductive value may not be as general as is sometimes thought. We also provide a new proof for the recently-observed principle (Maciejewski 2014; Allen et al. 2015) that the reproductive value of a genetic site is proportional to the fixation probability, under neutral drift, of a mutation arising at that site.

We next turn to weak selection (Sect. 7), meaning that the alleles in question have only a small effect on reproductive success. Mathematically, weak selection can be considered a perturbation of neutral drift. Using this perturbative approach, one can obtain closed-form conditions for success under weak selection for models that would be otherwise intractable. This approach has fruitfully been applied in a great many contexts (Taylor and Frank 1996; Rousset and Billiard 2000; Leturque and Rousset 2002; Nowak et al. 2004; Ohtsuki et al. 2006; Lessard and Ladret 2007; Taylor et al. 2007b; Antal et al. 2009a; Tarnita et al. 2009b; Chen 2013; Débarre et al. 2014; Durrett 2014; Tarnita and Taylor 2014; Van Cleve 2015; Allen et al. 2017). Our second main result (Theorem 8) formalizes this weak-selection approach for our class of models. It asserts that, to determine whether an allele is favored under weak selection, one can take the expectation of a quantity describing selection over a probability distribution that pertains to neutral drift. The usefulness of this result stems from the fact that many evolutionary models become much simpler in the case of neutral drift.

The bulk of this work adopts a “gene’s-eye view,” in that the analysis is conducted at the level of genetic sites. In Sect. 8, we reframe our results using quantities that apply at the level of the individual. This reframing again requires additional assumptions, such as fair meiosis. Without these additional assumptions, natural selection cannot be characterized solely in terms of individual-level quantities.

We illustrate the power of our formalism with two examples (Sect. 9). The first is a model of evolutionary games on an arbitrary weighted graph. For this model, we recover recent results of Allen et al. (2017), using only results proven in this work. The second is a haplodiploid population model in which a mutation may have different selective effects in males and females. We obtain a simple condition to determine whether such a mutation is favored under weak selection.

Although our formalism is quite general in some respects, it still makes a number of simplifying assumptions. For example, we assume a population of fixed size in a constant environment, but real-world populations are subject to demographic fluctuations and ecological feedbacks, which may have significant consequences for their evolution (Dieckmann and Law 1996; Metz et al. 1996; Geritz et al. 1997; Pelletier et al. 2007; Wakano et al. 2009; Schoener 2011; Constable et al. 2016; Chotibut and Nelson 2017). Other limitations arise from our assumptions of fixed spatial structure, single-locus genetics, and trivial demography. Section 10 discusses these limitations and the prospects for extending beyond them.

2 Class of models for natural selection

We consider a class of models representing selection, on a single biallelic locus, in a population with arbitrary—but fixed—spatial and genetic structure. Each model within this class is represented by a set of genetic sites (partitioned into individuals), a replacement rule, a mutation probability, and a mutational bias. In this section, we introduce each of these ingredients in detail and discuss how they combine to form a Markov chain representing natural selection. A glossary of our notation is provided in Table 1.

2.1 Sites and individuals

We represent arbitrary spatial and genetic structure by using the concept of genetic sites (Figs. 1a, 2). Each genetic site corresponds to a particular locus, on a single chromosome, within an individual. Since we consider only single-locus traits, each individual has a number of sites equal to its ploidy (e.g. one for haploids, two for diploids).

Table 1 Glossary of notation
Fig. 1
figure 1

a The parentage mapping, \(\alpha \), in the case of a diploid, sexually-reproducing population. In a diploid population, each individual contains two genetic sites. Here, site \(c_1\) in the child inherits the allele from site \(m_1\) in the mother (\(\alpha (c_1)=m_1\)), while site \(c_2\) in the child inherits the allele from site \(f_2\) in the father (\(\alpha (c_2)=f_2\)). Note that, although arrows are drawn from parent to child, the parentage map \(\alpha \) is from child to parent. b Mutations are resolved as follows: With probability \(1-u\), there is no mutation and the allele remains the parental type (A in this case). With probability u, the allele mutates (lightning bolt) and becomes either A (probability \(\nu \)) or a (probability \(1-\nu \))

Fig. 2
figure 2

One complete update step in the evolutionary Markov chain. An example population is pictured with two diploid and three haploid individuals. Genetic sites are indicated by numerals to the left of the site, and the individuals in which these sites reside are labeled by bold numerals. First, a replacement event, \(\left( R,\alpha \right) \), is chosen according to the distribution \(\left\{ p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\). In this case, the replaced set, R, is shown in yellow. Note that arrows are drawn from parents to children, but the parentage map, \(\alpha \), is from child to parent. For every genetic site that is replaced under this event (yellow), the replicated allele is then subjected to possible mutation, resulting in a new state (color figure online)

The genetic sites in the population are represented by a finite set G. The individuals are represented by a finite set I. To each individual \(i \in I\), there corresponds a set of genetic sites \(G_i \subseteq G\) residing in i. The collection of these sets, \(\{G_i \}_{i \in I}\), forms a partition of G. We use the equivalence relation \(\sim \) to indicate that two sites reside in the same individual; thus, \(g \sim h\) if and only if \(g,h \in G_i\) for some \(i \in I\).

The total number of sites is denoted \(n :=\left| G\right| \), and the total number of individuals is denoted \(N :=\left| I\right| \). The ploidy of individual \(i \in I\) is denoted \(n_i :=\left| G_i\right| \); for example, \(n_i=2\) if i is diploid. The total number of sites is equal to the total ploidy across all individuals: \(\sum _{i \in I} n_i=n\).

For a particular model within the class defined here, each individual may be labeled with additional information. For example, each individual may be designated as male or female and/or could be understood as occupying a particular location. However, these details are not explicitly represented in our formalism. In particular, we do not specify any representation of spatial structure (lattice, graph, metapopulation, etc.), although our formalism is compatible with all of these. Instead, all relevant aspects of spatial and genetic structure are implicitly encoded in the replacement rule (see Sect. 2.3 below). The spatial and genetic structure are considered fixed, in the sense that the roles of individuals and genetic sites do not change over time.

2.2 Alleles and states

There are two competing alleles, a and A. Each genetic site holds a single allele copy. The allele currently occupying site \(g \in G\) is indicated by the variable \(x_g \in \left\{ 0,1\right\} \), with 0 corresponding to a and 1 corresponding to A. The overall population state is represented by the vector \({\mathbf {x}}:=\left( x_g\right) _{g \in G}\), which specifies the allele (a or A) occupying each genetic site. The set of all possible states is denoted \(\left\{ 0,1\right\} ^G\).

It will sometimes be convenient to label a state by the subset of sites that contain the A allele. Thus, for any subset \(S \subseteq G\), we let \({{\mathbf {1}}}_S \in \left\{ 0,1\right\} ^G\) denote the state in which sites in S have allele A, and sites not in S have allele a. That is, the state \({{\mathbf {1}}}_S\) is defined by

$$\begin{aligned} \left( {{\mathbf {1}}}_S \right) _g= {\left\{ \begin{array}{ll} 1 &{} g \in S , \\ 0 &{} g \notin S. \end{array}\right. } \end{aligned}$$
(1)

Of particular interest are the monoallelic states\({\mathbf {a}}:={{\mathbf {1}}}_\emptyset \), in which only allele a is present; and \({\mathbf {A}}:={{\mathbf {1}}}_G\), in which only allele A is present.

2.3 Replacement

Natural selection proceeds by replacement events, wherein some individuals are replaced by the offspring of others (Figs. 1a, 2). We let \(R\subseteq G\) denote the set of genetic sites that are replaced in such an event. For example, if only a single individual \(i \in I\) dies, then \(R=G_i\). If the entire population is replaced, then \(R=G\).

The alleles in the sites in R are then replaced by alleles in new offspring. Each new offspring inherits (possibly mutated) copies of alleles from its parents. The parentage of new alleles is recorded in a set mapping \(\alpha :R \rightarrow G\). For each replaced site \(g \in R\), \(\alpha \left( g\right) \) indicates the parental site from which g inherits its new allele. In other words, the new allele in g is derived from a parent copy that (in the previous time-step) occupied site \(\alpha \left( g\right) \). In haploid asexual models, \(\alpha \left( g\right) \) simply indicates the parent of the new offspring in g. In models with sexual reproduction, \(\alpha \) identifies not only the parents of each new offspring but also which alleles were inherited from each parent (Fig. 1a).

Overall, a replacement event is represented by the pair \((R, \alpha )\), where \(R \subseteq G\) is the set of replaced positions and \(\alpha :R \rightarrow G\) is the parentage mapping. Any pair \(\left( R, \alpha \right) \) with \(R \subseteq G\) and \(\alpha :R \rightarrow G\) can be considered a potential replacement event. Whether or not a given replacement event is possible in a given state, and how likely it is to occur, depends on the model in question. The probability that a given replacement event \(\left( R,\alpha \right) \) occurs in state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\) is denoted \(p_{\left( R, \alpha \right) }\left( {\mathbf {x}}\right) \); these satisfy \(\sum _{\left( R,\alpha \right) } p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) =1\) for each fixed \({\mathbf {x}}\). The probabilities \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), as functions of \({\mathbf {x}}\), are collectively called the replacement rule.

All biological events such as births, deaths, mating, dispersal and interaction, and all aspects of spatial and genetic structure, are represented implicitly in the replacement rule. For example, in a model of a diploid population with nonrandom mating, the replacement rule encodes mating probabilities as well as the laws of Mendelian inheritance. In a model of a spatially-structured population with social interactions (see Sect. 9.1), the replacement rule encodes interaction patterns, as well as the effects of interactions on births and deaths. From these biological details, the replacement rule distills what ultimately matters for selection: the transmission and inheritance of alleles.

2.4 Mutation

Each replacement of an allele provides an opportunity for mutation (Fig. 1b). Mutation is described by two parameters: (i) the mutation probability, \(0 \leqslant u \leqslant 1\), which is the probability that a given allele copy in a new offspring is mutated from its parent; and (ii) the mutational bias, \(0< \nu < 1\), which is the probability that such a mutation results in A rather than a.

In each time-step, after the replacement event \(\left( R,\alpha \right) \) has been chosen, mutations are resolved and the new state, \({\mathbf {x}}'\), is determined as follows. For each replaced site \(g \in R\), one of three outcomes occurs:

  • With probability \(1-u\), there is no mutation, and site g inherits the allele of its parent: \(x_g' = x_{\alpha (g)}\),

  • With probability \(u\nu \), a mutation to A occurs, and \(x_g'=1\),

  • With probability \(u(1-\nu )\), a mutation to a occurs, and \(x_g'=0\).

Mutation events are assumed to be independent across replaced sites and across time. Each site that is not replaced retains its current allele: \(x_g'=x_g\) for all \(g \notin R\). In this way, the updated state, \({\mathbf {x}}'\), is determined.

2.5 The evolutionary Markov chain

Overall, from a given state \({\mathbf {x}}\), first a replacement event is chosen according to the probabilities \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), and then mutations are resolved as described in Sect. 2.4. This update leads to a new state \({\mathbf {x}}'\), and the process then repeats (Fig. 2). This process defines a Markov chain \({\mathcal {M}}\) on \(\left\{ 0,1\right\} ^G\), which we call the evolutionary Markov chain. The evolutionary Markov chain is completely determined by the replacement rule \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), the mutation rate u, and the mutational bias \(\nu \). We denote the transition probability from state \({\mathbf {x}}\) to state \({\mathbf {y}}\) in \({\mathcal {M}}\) by \(P_{{\mathbf {x}}\rightarrow {\mathbf {y}}}\).

2.6 Fixation axiom

In order for the population to function as a single evolving unit, it should be possible for an allele to sweep to fixation. To state this principle formally, we introduce some new notation. For a given replacement event, \(\left( R, \alpha \right) \), let \({\tilde{\alpha }}: G \rightarrow G\) be the mapping that coincides with \(\alpha \) on elements of R and coincides with the identity otherwise:

$$\begin{aligned} {\tilde{\alpha }}\left( g\right) = {\left\{ \begin{array}{ll} \alpha \left( g\right) &{} g \in R , \\ g &{} g \notin R. \end{array}\right. } \end{aligned}$$
(2)

In words, \({\tilde{\alpha }}\) maps to the parent of each replaced site, and to the site itself for those not replaced.

We now formalize the notion of population unity as an axiom:

Fixation Axiom

There exists a genetic site \(g \in G\), a positive integer m, and a finite sequence \(\{ (R_k, \alpha _k) \}_{k=1}^m\) of replacement events, such that

  1. (a)

    \(p_{(R_k, \alpha _k)}({\mathbf {x}})>0\) for all \(k \in \{1, \ldots , m\}\) and all \({\mathbf {x}}\in \{0,1\}^G\),

  2. (b)

    \(g \in R_k\) for some \(k \in \{1, \ldots , m\}\),

  3. (c)

    For each \(h \in G\), \({\tilde{\alpha }}_1 \circ {\tilde{\alpha }}_2\circ \cdots \circ {\tilde{\alpha }}_m (h) = g.\)

In words, there should be at least one genetic site \(g \in G\) that can eventually spread its contents throughout the population, such that all sites ultimately trace their ancestry back to g. Part (b) is included to guarantee that no site is eternal (otherwise no evolution would occur). The Fixation Axiom ensures that the population evolves as a single unit, rather than (for example) being comprised of isolated subpopulations with no gene flow among them. We regard this axiom as a defining property of our class of models.

2.7 Relation to Allen and Tarnita (2014)

Our formalism extends the class of models introduced by Allen and Tarnita (2014), which considered only haploid populations with asexual reproduction, to populations with arbitrary genetic structure. Despite the differences in genetics, the two classes are very similar in their formal structure. Indeed, one can “forget” the partition of genetic sites into individuals and instead consider the population as consisting of haploid asexual replicators. With this perspective, the results of Allen and Tarnita (2014) can be applied at the level of genetic sites rather than individuals.

Beyond genetic structure, our current formalism generalizes that of Allen and Tarnita (2014) in three ways. First, whereas Allen and Tarnita (2014) assumed unbiased mutation, we consider here arbitrary mutational bias, \(0<\nu <1\). Second, Allen and Tarnita (2014) assumed that the total birth rate is constant over states; here this assumption is deferred until Sect. 5.2, by which point we have already established a number of fundamental results. Third, our Fixation Axiom generalizes its analogue in Allen and Tarnita (2014) (there labeled Assumption 2), which required that fixation be possible from every site. Here, we only require fixation to be possible from at least one site. The current formulation allows for “dead end” sites, such as those in sterile worker insects, which were not allowed in the formalism of Allen and Tarnita (2014).

Despite the increase in generality, some proofs from Allen and Tarnita (2014) carry over to the current formalism with little or no modification. We will not repeat proofs from Allen and Tarnita (2014) here unless they need to be modified significantly.

3 Stationarity and fixation

In this section, we establish fundamental results regarding the asymptotic behavior of the evolutionary Markov chain. We also define fixation probability and introduce probability distributions that characterize the frequency with which states arise under natural selection.

3.1 Demographic variables

We first introduce the following variables as functions of the state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\). The frequency of the allele A is denoted by x:

$$\begin{aligned} x :=\frac{1}{n} \sum _{g \in G} x_g. \end{aligned}$$
(3)

The (marginal) probability that the allele in site \(g \in G\) transmits a copy of itself to site \(h \in G\) over the next transition is denoted \(e_{gh}\left( {\mathbf {x}}\right) \):

$$\begin{aligned} e_{gh}\left( {\mathbf {x}}\right) :=\sum _{\begin{array}{c} \left( R, \alpha \right) \\ \alpha \left( h\right) =g \end{array}} p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) . \end{aligned}$$
(4)

The expected number of copies that the allele in g transmits, which we call the birth rate of site g in state \({\mathbf {x}}\), can be calculated as:

$$\begin{aligned} b_g \left( {\mathbf {x}}\right) :=\sum _{h\in G} e_{g h} \left( {\mathbf {x}}\right) = \sum _{\left( R, \alpha \right) } p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \, \left| \alpha ^{-1}\left( g\right) \right| . \end{aligned}$$
(5)

The probability that the allele in g is replaced, which we call the death probability of site g in state \({\mathbf {x}}\), can be calculated as:

$$\begin{aligned} d_g \left( {\mathbf {x}}\right) :=\sum _{h \in G} e_{hg} \left( {\mathbf {x}}\right) = \sum _{\begin{array}{c} \left( R, \alpha \right) \\ g \in R \end{array}} p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) . \end{aligned}$$
(6)

The Fixation Axiom guarantees that \(d_g \left( {\mathbf {x}}\right) >0\) for all \(g \in G\) and \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\).

The total birth rate in state \({\mathbf {x}}\) is denoted \(b\left( {\mathbf {x}}\right) \). Since the population size is fixed, \(b\left( {\mathbf {x}}\right) \) also gives the expected number of deaths:

$$\begin{aligned} b\left( {\mathbf {x}}\right) :=\sum _{g \in G} b_{g} \left( {\mathbf {x}}\right) = \sum _{g \in G} d_{g} \left( {\mathbf {x}}\right) = \sum _{g,h \in G} e_{g h} \left( {\mathbf {x}}\right) . \end{aligned}$$
(7)

3.2 The mutation-selection stationary distribution

When mutation is present (\(u > 0\)), the evolutionary Markov chain is ergodic (aperiodic and positive recurrent; Theorem 1 of Allen and Tarnita 2014). In this case, the evolutionary Markov chain has a unique stationary distribution called the mutation-selection stationary distribution, or MSS distribution for short. For any state function \(f\left( {\mathbf {x}}\right) \), its time-averaged value converges almost surely, as time goes to infinity, to its expectation under this distribution:

$$\begin{aligned} \lim _{T \rightarrow \infty } \frac{1}{T} \sum _{t=0}^{T-1} f\left( {\mathbf {X}}\left( t\right) \right) = {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ f\right] \quad \text {almost surely}. \end{aligned}$$
(8)

We denote the probability of state \({\mathbf {x}}\) in the MSS distribution by \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) :={{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}={\mathbf {x}}\right] \). The MSS distribution is uniquely determined by the system of equations

$$\begin{aligned}&\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) = \sum _{{\mathbf {y}}\in \{0,1\}^G} \pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} , \end{aligned}$$
(9a)
$$\begin{aligned}&\sum _{{\mathbf {x}}\in \left\{ 0,1\right\} ^G} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) = 1. \end{aligned}$$
(9b)

3.3 Fixation probability

When there is no mutation (\(u=0\)), the monoallelic states \({\mathbf {a}}\) and \({\mathbf {A}}\) are absorbing, and all other states are transient (Theorem 2 of Allen and Tarnita 2014). Thus, from any initial state, the evolutionary Markov chain converges, almost surely as \(t \rightarrow \infty \), to one of the two monoallelic states. We say that the population has become fixed for allele a if the state converges to \({\mathbf {a}}\), and fixed for allele A if the state converges to \({\mathbf {A}}\).

The fixation probability of an allele is informally defined as the probability that it becomes fixed when starting from a single copy. A precise definition, however, must take into account that the fate of a mutant allele can depend on the site in which it arises (Allen and Tarnita 2014; Maciejewski 2014; Adlam et al. 2015; Allen et al. 2015; Chen et al. 2016). Since each replacement provides an independent opportunity for mutation, new mutations arise in proportion to the rate at which a site is replaced (Allen and Tarnita 2014). Thus, in state \({\mathbf {a}}\), A mutations arise in site g at a rate proportional to \(d_g\left( {\mathbf {a}}\right) \), while in state \({\mathbf {A}}\), a mutations arise in site g at a rate proportional to \(d_g\left( {\mathbf {A}}\right) \). The probability of multiple A mutations arising in state \({\mathbf {a}}\), or multiple a mutations arising in state \({\mathbf {A}}\), is of order \(u^2\) as \(u \rightarrow 0\). We formalize these observations as a lemma:

Lemma 1

$$\begin{aligned} P_{{\mathbf {a}}\rightarrow {\mathbf {x}}}&= {\left\{ \begin{array}{ll} 1 - u\nu \, b\left( {\mathbf {a}}\right) + {\mathcal {O}}\left( u^2\right) &{} \text {if} \; {\mathbf {x}}= {\mathbf {a}}, \\ u\nu \, d_g\left( {\mathbf {a}}\right) + {\mathcal {O}}\left( u^2\right) &{} \text {if} \; {\mathbf {x}}= {{\mathbf {1}}}_{\left\{ g\right\} } \; \text {for some} \; g \in G, \\ {\mathcal {O}}\left( u^2\right) &{} \text {otherwise}; \end{array}\right. } \end{aligned}$$
(10a)
$$\begin{aligned} P_{{\mathbf {A}}\rightarrow {\mathbf {x}}}&= {\left\{ \begin{array}{ll} 1 - u \left( 1-\nu \right) \, b\left( {\mathbf {A}}\right) + {\mathcal {O}}\left( u^2\right) &{} \text {if} \; {\mathbf {x}}= {\mathbf {A}}, \\ u \left( 1-\nu \right) \, d_g\left( {\mathbf {A}}\right) + {\mathcal {O}}\left( u^2\right) &{} \text {if} \; {\mathbf {x}}= {{\mathbf {1}}}_{G \setminus \left\{ g\right\} } \; \text {for some} \; g \in G, \\ {\mathcal {O}}\left( u^2\right) &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(10b)

The proof is a minor variation on the proof of Lemma 3 in Allen and Tarnita (2014) and is therefore omitted.

Lemma 1 motivates the following definitions (from Allen and Tarnita 2014), describing the relative likelihoods of initial states when a mutant first arises under rare mutation:

Definition 1

The mutant appearance distribution for alleleA is a probability distribution on \(\left\{ 0,1\right\} ^G\) defined by

$$\begin{aligned} \mu _A \left( {\mathbf {x}}\right) :={\left\{ \begin{array}{ll} \frac{d_g\left( {\mathbf {a}}\right) }{b\left( {\mathbf {a}}\right) } &{} \text {if} \; {\mathbf {x}}= {{\mathbf {1}}}_{\left\{ g\right\} } \; \text {for some} \; g \in G , \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(11)

Similarly, the mutant appearance distribution for allele a is a probability distribution on \(\left\{ 0,1\right\} ^G\) defined by

$$\begin{aligned} \mu _a \left( {\mathbf {x}}\right) :={\left\{ \begin{array}{ll} \frac{d_g\left( {\mathbf {A}}\right) }{b\left( {\mathbf {A}}\right) } &{} \text {if} \; {\mathbf {x}}= {{\mathbf {1}}}_{G \setminus \left\{ g\right\} } \; \text {for some} \; g \in G, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(12)

Taking these mutant appearance distributions into account, Allen and Tarnita (2014) defined the overall fixation probabilities of A and a as follows:

Definition 2

The fixation probability ofA, denoted \(\rho _A\), is defined as

$$\begin{aligned} \rho _A :=\sum _{{\mathbf {x}}\in \left\{ 0,1\right\} ^G} \mu _A\left( {\mathbf {x}}\right) \left( \lim _{t \rightarrow \infty } P^{\left( t\right) }_{ {\mathbf {x}}\rightarrow {\mathbf {A}}} \right) . \end{aligned}$$
(13)

Similarly, the fixation probability ofa, denoted \(\rho _a\), is defined as

$$\begin{aligned} \rho _a :=\sum _{{\mathbf {x}}\in \left\{ 0,1\right\} ^G} \mu _a\left( {\mathbf {x}}\right) \left( \lim _{t \rightarrow \infty } P^{\left( t\right) }_{ {\mathbf {x}}\rightarrow {\mathbf {a}}} \right) . \end{aligned}$$
(14)

Above, \(P^{\left( t\right) }_{ {\mathbf {x}}\rightarrow {\mathbf {y}}}\) denotes the probability of transition from state \({\mathbf {x}}\) to state \({\mathbf {y}}\) in t steps. The Fixation Axiom guarantees that there is at least one site g for which \(\lim _{t \rightarrow \infty } P^{\left( t\right) }_{ {{\mathbf {1}}}_{\left\{ g\right\} } \rightarrow {\mathbf {A}}}\), \(\lim _{t \rightarrow \infty } P^{\left( t\right) }_{ {{\mathbf {1}}}_{G \setminus \left\{ g\right\} } \rightarrow {\mathbf {a}}}\), \(d_g\left( {\mathbf {a}}\right) \), and \(d_g\left( {\mathbf {A}}\right) \) are all positive. It follows that \(\rho _A\) and \(\rho _a\) are both positive.

3.4 The limit of rare mutation

We now consider the limit of low mutation for a fixed replacement rule, \(\big \{p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \big \}_{\left( R,\alpha \right) }\), and mutational bias \(\nu \). There is an elegant relationship between the fixation probabilities and the limiting MSS distribution:

Theorem 1

Fix a replacement rule \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\) and a mutational bias \(\nu \). Then for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\), \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) exists and is given by

$$\begin{aligned} \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{\nu b \left( {\mathbf {a}}\right) \rho _{A}}{ \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b \left( {\mathbf {A}}\right) \rho _{a}} &{} \text {for} \; {\mathbf {x}}= {\mathbf {A}}, \\ \displaystyle \frac{\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}}{\nu b \left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b ({\mathbf {A}})\rho _{a}} &{} \text {for} \; {\mathbf {x}}= {\mathbf {a}}, \\ 0 &{} \text {for} \; {\mathbf {x}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} . \end{array}\right. } \end{aligned}$$
(15)

Above, \(\rho _A\) and \(\rho _a\) are the fixation probabilities for this replacement rule when \(u=0\).

Intuitively, Theorem 1 states that as \(u\rightarrow 0\), the MSS distribution becomes concentrated on the monoallelic states \({\mathbf {A}}\) and \({\mathbf {a}}\), with probabilities determined by the relative rates of transit, \(\nu b \left( {\mathbf {a}}\right) \rho _{A}\) and \(\left( 1-\nu \right) b \left( {\mathbf {A}}\right) \rho _{a}\). Theorem 1 result generalizes Theorem 6 of Allen and Tarnita (2014) and a result of Van Cleve (2015), both of which apply to the special case \(\nu =1/2\) and \(b\left( {\mathbf {a}}\right) =b\left( {\mathbf {A}}\right) \).

We will prove Theorem 1 using the principle of state space reduction. Let \({\mathcal {A}}\) be a finite Markov chain and let S be a nonempty subset of the states of \({\mathcal {A}}\). (In proving Theorem 1 we will use \({\mathcal {A}}={\mathcal {M}}\) and \(S=\{{\mathbf {a}}, {\mathbf {A}}\}\).) For any states \(s,s' \in S\), let \(Q_{s \rightarrow s'}\) be the probability that, from initial state s, the next visit to S occurs in state \(s'\). We define a reduced Markov chain \({\mathcal {A}}_{|S}\) with set of states S and transition probabilities \(Q_{s \rightarrow s'}\). The following standard result (e.g. Theorem 6.1.1 of Kemeny and Snell 1960) shows that stationary distributions for the original and reduced Markov chains are compatible in a simple way:

Theorem 2

Let \({\mathcal {A}}\) be finite Markov chain with a unique stationary distribution, \(\pi _{\mathcal {A}}\), and let S be a nonempty subset of states of \({\mathcal {A}}\). Then, the reduced Markov chain, \({\mathcal {A}}_{|S}\), has a unique stationary distribution, \(\pi _{{\mathcal {A}}_{|S}}\), which is given by conditioning the stationary distribution \(\pi _{\mathcal {A}}\) on the event S:

$$\begin{aligned} \pi _{{\mathcal {A}}_{|S}}\left( s\right) :=\frac{\pi _{\mathcal {A}}\left( s\right) }{\sum _{s'\in S} \pi _{\mathcal {A}}\left( s'\right) } . \end{aligned}$$
(16)

Proof of Theorem 1

The limits \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) exist for each \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\) since each \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) is a bounded, rational function of u (see Lemma 1 of Allen and Tarnita 2014). Since \(\pi _{\mathrm {MSS}}\) satisfies Eq. (9) for each \(u>0\), it also satisfies Eq. (9) in the limit \(u\rightarrow 0\). Therefore, \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\) is a stationary distribution for the mutation-free (\(u=0\)) evolutionary Markov chain, \({\mathcal {M}}\). Since all states other than \({\mathbf {a}}\) and \({\mathbf {A}}\) are transient when \(u=0\), they must have zero probability in any stationary distribution; therefore, \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) =0\) for \({\mathbf {x}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \).

To determine the limiting values of \(\pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) \) and \(\pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) \), we temporarily fix some \(u>0\) and consider the reduction of \({\mathcal {M}}\) to the set of states \(\left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \). By Theorem 2, the reduced Markov chain \({\mathcal {M}}_{|\left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }\) has a unique stationary distribution, \(\pi _{{\mathcal {M}}_{|\left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }}\), satisfying

$$\begin{aligned} \frac{\pi _{{\mathcal {M}}_{| \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }}\left( {\mathbf {A}}\right) }{\pi _{{\mathcal {M}}_{| \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }}\left( {\mathbf {a}}\right) } =\frac{\pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) }{\pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) }. \end{aligned}$$
(17)

Let \(Q_{{\mathbf {a}}\rightarrow {\mathbf {A}}}\) and \(Q_{{\mathbf {A}}\rightarrow {\mathbf {a}}}\) denote the transition probabilities in \({\mathcal {M}}_{| \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }\). Equation (17) and the stationarity of \(\pi _{{\mathcal {M}}_{|\left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }}\) imply that

$$\begin{aligned} \frac{\pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) }{\pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) } = \frac{Q_{{\mathbf {a}}\rightarrow {\mathbf {A}}}}{Q_{{\mathbf {A}}\rightarrow {\mathbf {a}}}} . \end{aligned}$$
(18)

We note that \(Q_{{\mathbf {a}}\rightarrow {\mathbf {A}}}\) equals the probability, in \({\mathcal {M}}\) with initial state \({\mathbf {a}}\), of (i) leaving \({\mathbf {a}}\) in the initial step, and (ii) subsequently visiting \({\mathbf {A}}\) before revisiting \({\mathbf {a}}\). Step (i) occurs with probability \(u \nu b\left( {\mathbf {a}}\right) + {\mathcal {O}}\left( u^2\right) \) as \(u \rightarrow 0\), while step (ii) occurs with probability \(\rho _A + {\mathcal {O}}\left( u\right) \). Thus, overall, we have the expansion

$$\begin{aligned} Q_{{\mathbf {a}}\rightarrow {\mathbf {A}}} = u \nu b\left( {\mathbf {a}}\right) \rho _A + {\mathcal {O}}\left( u^2\right) \quad \left( u \rightarrow 0\right) . \end{aligned}$$
(19)

Similarly, we have

$$\begin{aligned} Q_{{\mathbf {A}}\rightarrow {\mathbf {a}}} = u \left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _a + {\mathcal {O}}\left( u^2\right) \quad \left( u \rightarrow 0\right) . \end{aligned}$$
(20)

Substituting these expansions in Eq. (18) and taking the limit as \(u \rightarrow 0\) yields

$$\begin{aligned} \lim _{u \rightarrow 0} \frac{\pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) }{\pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) } = \frac{\nu b\left( {\mathbf {a}}\right) \rho _A}{\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _a} . \end{aligned}$$
(21)

The desired result now follows from the fact that, since \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) =0\) for \({\mathbf {x}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \), we must have \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) + \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) = 1\). \(\square \)

Theorem 1 implies that the stationary probabilities \(\pi _{{\mathrm {MSS}}}\left( {\mathbf {x}}\right) \) extend to smooth functions of the mutation rate u on the interval \(0 \leqslant u \leqslant 1\), with the values at \(u=0\) defined according to Eq. (15). As mentioned in the proof, the limiting probabilities in Eq. (15) comprise a stationary distribution for the evolutionary Markov chain with \(u=0\). However, this stationary distribution is not unique—indeed, any probability distribution concentrated entirely on states \({\mathbf {A}}\) and \({\mathbf {a}}\) is stationary for \(u=0\). We can achieve uniqueness at \(u=0\) by augmenting Eq. (9) by an additional equation:

$$\begin{aligned} \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{Q_{{\mathbf {a}}\rightarrow {\mathbf {A}}}}{Q_{{\mathbf {A}}\rightarrow {\mathbf {a}}}} \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) &{} 0 < u \leqslant 1 , \\ \displaystyle \frac{\nu b\left( {\mathbf {a}}\right) \rho _A}{\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _a} \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) &{} u=0. \end{array}\right. } \end{aligned}$$
(22)

The system of linear equations (9) and (22) has a unique solution that varies smoothly with u for \(0 \leqslant u \leqslant 1\), coincides with \(\pi _{{\mathrm {MSS}}}\left( {\mathbf {x}}\right) \) for \(0<u\leqslant 1\), and coincides with the right-hand side of Eq. (15) for \(u=0\). We will make use of these observations in Sect. 7.

Alternatively, Theorem 1 can be proven using Theorem 2 of Fudenberg and Imhof (2006), which implies that as \(u\rightarrow 0\), the vector \(\left( \pi _{{\mathrm {MSS}}}\left( {\mathbf {A}}\right) ,\pi _{{\mathrm {MSS}}}\left( {\mathbf {a}}\right) \right) \) converges to the stationary distribution of the embedded Markov chain on the absorbing states, \({\mathbf {A}}\) and \({\mathbf {a}}\). The transition matrix of this embedded Markov chain is

$$\begin{aligned} \begin{pmatrix} 1-\gamma \nu b\left( {{\mathbf {a}}}\right) \rho _{A} &{}\quad \gamma \nu b\left( {{\mathbf {a}}}\right) \rho _{A} \\ \gamma \left( 1- \nu \right) b\left( {{\mathbf {A}}}\right) \rho _{a} &{}\quad 1-\gamma \left( 1- \nu \right) b\left( {{\mathbf {A}}}\right) \rho _{a} \end{pmatrix} , \end{aligned}$$
(23)

where \(\gamma \) is an arbitrary constant chosen small enough to ensure that this matrix has non-negative entries. The stationary distribution of this embedded Markov chain is independent of \(\gamma \) and consists of the limiting probabilities for \(\pi _{{\mathrm {MSS}}}\left( {\mathbf {A}}\right) \) and \(\pi _{{\mathrm {MSS}}}\left( {\mathbf {a}}\right) \) specified in Eq. (15).

3.5 The rare-mutation conditional distribution

According to Theorem 1, as \(u\rightarrow 0\), the mutation-selection stationary distribution becomes concentrated on the monoallelic states, \({\mathbf {a}}\) or \({\mathbf {A}}\). However, since no selection occurs in the monoallelic states, it is important to quantify the frequencies with which other states are visited in transit between them. For this purpose, Allen and Tarnita (2014) introduced the rare-mutation dimorphic (RMD) distribution for haploid models with two alleles. Here, we introduce a natural generalization of this distribution, which we call the rare-mutation conditional (RMC) distribution. We avoid the term “dimorphic” because it can be misleading with non-haploid genetics; for example, the genotypes AA, Aa, and aa could correspond to three different morphologies.

Definition 3

The rare-mutation conditional (RMC) distribution is the probability distribution on \(\left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \) obtained by conditioning the MSS distribution on being in states other than \({\mathbf {a}}\) and \({\mathbf {A}}\), and then taking the limit \(u \rightarrow 0\):

$$\begin{aligned} \begin{aligned} \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right)&:=\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}={\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \\&= \lim _{u \rightarrow 0} \frac{\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) }{1-\pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) -\pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) }. \end{aligned} \end{aligned}$$
(24)

The existence of the above limit was shown by Allen and Tarnita (2014, Lemma 2).

Allen and Tarnita (2014, Theorem 3) derived a recurrence relation from which the RMC distribution can be computed, in the case of unbiased mutation (\(\nu = 1/2\)). Here, we show that this recurrence relation—and hence the RMC distribution itself—is in fact independent of the mutational bias \(\nu \). Informally speaking, as \(u \rightarrow 0\), the mutational bias only affects the amount of time spent in the monoalleleic states \({\mathbf {a}}\) and \({\mathbf {A}}\), which are by definition excluded from the RMC distribution. The RMC distribution depends only on transition probabilities in the absence of mutation and is therefore independent of \(\nu \).

Theorem 3

For any given replacement rule \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), the RMC distribution is independent of the mutational bias \(\nu \) and is uniquely determined by the recurrence relations

$$\begin{aligned} \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) = \sum _{{\mathbf {y}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} } \pi _{\mathrm {RMC}}\left( {\mathbf {y}}\right) \left( P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} + P_{{\mathbf {y}}\rightarrow {\mathbf {a}}} \mu _A\left( {\mathbf {x}}\right) + P_{{\mathbf {y}}\rightarrow {\mathbf {A}}} \mu _a\left( {\mathbf {x}}\right) \right) , \end{aligned}$$
(25)

where the transition probabilities \(P_{{\mathbf {y}}\rightarrow {{\mathbf {z}}}}\) above are evaluated at \(u=0\).

Proof

Let us temporarily fix a positive mutation rate, \(u>0\), and a mutational bias, \(\nu \). We apply Theorem 2 to reduce \({\mathcal {M}}\) to the set of states \(S :=\left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \), i.e. those states for which both alleles are present. The reduced Markov chain \({\mathcal {M}}_{|S}\) has for a stationary distribution \(\left\{ \pi _{{\mathcal {M}}_{|S}} \right\} \), which is determined by the recurrence relations

$$\begin{aligned} \pi _{{\mathcal {M}}_{|S}} \left( {\mathbf {x}}\right) = \sum _{{\mathbf {y}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} }\pi _{{\mathcal {M}}_{|S}}\left( {\mathbf {y}}\right) \left( P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} + P_{{\mathbf {y}}\rightarrow {\mathbf {a}}} Q_{{\mathbf {a}}\rightarrow {\mathbf {x}}} + P_{{\mathbf {y}}\rightarrow {\mathbf {A}}} Q_{{\mathbf {A}}\rightarrow {\mathbf {x}}} \right) . \end{aligned}$$
(26)

Here, \(Q_{{\mathbf {a}}\rightarrow {\mathbf {x}}}\) is the probability that, from state \({\mathbf {a}}\), the first visit to the set \(\left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \) occurs in state \({\mathbf {x}}\); \(Q_{{\mathbf {A}}\rightarrow {\mathbf {x}}}\) is defined similarly. By Lemma 1, these probabilities have the low-mutation expansion

$$\begin{aligned} Q_{{\mathbf {a}}\rightarrow {\mathbf {x}}}&= \mu _A\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( u\right) ; \end{aligned}$$
(27a)
$$\begin{aligned} Q_{{\mathbf {A}}\rightarrow {\mathbf {x}}}&= \mu _a\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( u\right) . \end{aligned}$$
(27b)

Therefore, taking the \(u \rightarrow 0\) limit of Eq. (26) yields Eq. (25).

To show that Eq. (25) uniquely defines the RMC distribution, we note that any solution (in \(\left\{ \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \right\} \)) to Eq. (25) is a stationary distribution for a new Markov chain \({\mathcal {M}}_{\mathrm {RMC}}\), with states \(\left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \) and transition probabilities

$$\begin{aligned} P_{{\mathbf {x}}\rightarrow {\mathbf {y}}}^{\mathrm {RMC}}:=P_{{\mathbf {x}}\rightarrow {\mathbf {y}}} + P_{{\mathbf {x}}\rightarrow {\mathbf {a}}} \mu _A\left( {\mathbf {y}}\right) + P_{{\mathbf {x}}\rightarrow {\mathbf {A}}} \mu _a\left( {\mathbf {y}}\right) . \end{aligned}$$
(28)

Let \({\mathbf {x}}, {\mathbf {y}}\in \left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \) be any pair of states with \(\mu _A\left( {\mathbf {y}}\right) >0\). Using the Fixation Axiom one can show that it is possible to reach state \({\mathbf {A}}\) from \({\mathbf {x}}\), in the original Markov chain \({\mathcal {M}}\), by a finite sequence of transitions with nonzero probability. Therefore, it is also possible to reach \({\mathbf {y}}\) from \({\mathbf {x}}\) in \({\mathcal {M}}_{\mathrm {RMC}}\) by a finite sequence of transitions with nonzero probability, which shows that \({\mathcal {M}}_{\mathrm {RMC}}\) has only a single closed communicating class and therefore possesses a unique stationary distribution, determined by Eq. (25).

Finally, we note that none of the quantities in Eq. (25) depend on the mutational bias \(\nu \) since they are evaluated at \(u=0\). Thus, the RMC distribution is independent of \(\nu \). \(\square \)

The following lemma, which relates the RMC distribution to the u-derivative of the MSS distribution at \(u=0\), is very useful for both proofs and computations:

Lemma 2

For any given replacement rule \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\) and mutational bias \(\nu \), the limit

$$\begin{aligned} K :=\lim _{u \rightarrow 0} \frac{u}{{{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] } \end{aligned}$$
(29)

exists and is finite and positive. Furthermore, if \(\phi \left( {\mathbf {x}}\right) \) is any state function with \(\phi \left( {\mathbf {a}}\right) =\phi \left( {\mathbf {A}}\right) =0\), then

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{{\mathrm {RMC}}}\left[ \phi \right] = K \frac{d {{\mathrm{{\mathbb {E}}}}}_{{\mathrm {MSS}}}\left[ \phi \right] }{du} \Big |_{u=0} . \end{aligned}$$
(30)

Lemma 2 allows expectations under RMC distribution to be computed, up to the proportionality constant K, from the MSS distribution (which is often easier to analyze). For many purposes, it is not necessary to know the value of K, only that it exists and is positive.

Proof

Summing Eq. (9a) over the states \({\mathbf {x}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} \), we have

$$\begin{aligned} {{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right]&= \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) \sum _{{\mathbf {x}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} } P_{{\mathbf {a}}\rightarrow {\mathbf {x}}} + \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) \sum _{{\mathbf {x}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} } P_{{\mathbf {A}}\rightarrow {\mathbf {x}}} \nonumber \\&\quad + \sum _{{\mathbf {x}},{\mathbf {y}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} }\pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} . \end{aligned}$$
(31)

Applying Theorem 1 and Lemma 1 to the first two terms on the right-hand side, we obtain the following expansion as \(u \rightarrow 0\):

$$\begin{aligned} {{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right]&= u \frac{ \nu \left( 1-\nu \right) b\left( {\mathbf {a}}\right) b\left( {\mathbf {A}}\right) \left( \rho _a + \rho _A\right) }{ \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}} \nonumber \\&\quad + \sum _{{\mathbf {x}},{\mathbf {y}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} }\pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} + {\mathcal {O}}\left( u^2\right) . \end{aligned}$$
(32)

Dividing by u and taking \(u \rightarrow 0\), we have

$$\begin{aligned} \lim _{u \rightarrow 0} \frac{1}{u} {{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right]&= \frac{ \nu \left( 1-\nu \right) b\left( {\mathbf {a}}\right) b\left( {\mathbf {A}}\right) \left( \rho _a + \rho _A\right) }{ \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}} \nonumber \\&\quad + \lim _{u \rightarrow 0} \left( \frac{1}{u} \sum _{{\mathbf {x}},{\mathbf {y}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} } \pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) P_{{\mathbf {y}}\rightarrow {\mathbf {x}}} \right) . \end{aligned}$$
(33)

Since \(\nu \), \(1-\nu \), \(b\left( {\mathbf {a}}\right) \), \(b\left( {\mathbf {A}}\right) \), \(\rho _a\), and \(\rho _A\) are all positive, the first term on the right-hand side of Eq. (33) is positive. The limit in the second term exists and is finite since \(\pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) \) and \(P_{{\mathbf {y}}\rightarrow {\mathbf {x}}}\) are both rational functions of u and \(\lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) =0\) for \({\mathbf {y}}\notin \left\{ {\mathbf {a}},{\mathbf {A}}\right\} \). The limit in the second term is nonnegative since both \(\pi _{\mathrm {MSS}}\left( {\mathbf {y}}\right) \) and \(P_{{\mathbf {y}}\rightarrow {\mathbf {x}}}\) are. Therefore, the limit on the left-hand side of Eq. (33) exists and is positive; consequently, the limit in Eq. (29) exists and is positive as well.

For the second claim, we have

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ \phi \right]&= \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \phi \left( {\mathbf {X}}\right) \ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \nonumber \\&= \lim _{u \rightarrow 0} \frac{{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \phi \right] }{{{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] } \quad \text {(since} \; \phi \left( {\mathbf {A}}\right) =\phi \left( {\mathbf {a}}\right) =0) \nonumber \\&= \left( \lim _{u \rightarrow 0} \frac{u}{{{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] } \right) \left( \lim _{u \rightarrow 0} \frac{{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \phi \right] }{u} \right) \nonumber \\&= K \frac{d {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \phi \right] }{du} \Big |_{u=0} , \end{aligned}$$
(34)

which completes the proof. \(\square \)

4 Selection

We turn now to the question of how selection acts on the two competing alleles, a and A. We can ask this question on two different time-scales. In the short term, we can look at how natural selection acts to change allele frequencies from a given state. In the longer term, we can look at the fixation probabilities of each allele, or at their stationary frequencies under mutation-selection balance. These notions lead to different criteria for evaluating the success of an allele under natural selection. In this section, we define these criteria and prove (Theorem 4) that they become equivalent in the limit of low mutation when averaged over the RMC distribution.

4.1 Change due to selection

To address questions of short-term selection, we consider an evolutionary process in a given state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\). We let \({\Delta }\left( {\mathbf {x}}\right) \) denote the expected change in the absolute frequency of A (i.e. the number of A alleles) from state \({\mathbf {x}}\), over a single transition:

$$\begin{aligned} {\Delta }\left( {\mathbf {x}}\right) :=\left( 1-u\right) \sum _{g \in G} x_g b_g\left( {\mathbf {x}}\right) - \sum _{g \in G} x_g d_g\left( {\mathbf {x}}\right) + u \nu b\left( {\mathbf {x}}\right) . \end{aligned}$$
(35)

We use absolute (rather than relative) frequency in the definition of \({\Delta }\left( {\mathbf {x}}\right) \) to avoid tedious factors of 1 / n. The three terms on the right-hand side represent the respective contributions of faithful reproduction, death, and mutation. Collecting the terms involving u, we have

$$\begin{aligned} {\Delta }\left( {\mathbf {x}}\right) = \sum _{g \in G} x_g \left( b_g\left( {\mathbf {x}}\right) - d_g\left( {\mathbf {x}}\right) \right) + u \sum _{g \in G} \left( \nu - x_g\right) b_g\left( {\mathbf {x}}\right) . \end{aligned}$$
(36)

Equation (36) can be understood as a version of the Price (1970) equation (but see van Veelen 2005). The two terms on the right-hand side of Eq. (36) represent the effects of selection and mutation, respectively, which motivates the following definitions (Nowak et al. 2010b):

Definition 4

The expected change due to selection from state \({\mathbf {x}}\) is defined as

$$\begin{aligned} {\Delta }_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) :=\sum _{g \in G}x_g \left( b_g\left( {\mathbf {x}}\right) - d_g\left( {\mathbf {x}}\right) \right) . \end{aligned}$$
(37)

The expected change due to mutation from state \({\mathbf {x}}\) is defined as

$$\begin{aligned} {\Delta }_{{\mathrm {mut}}}\left( {\mathbf {x}}\right) :=u \sum _{g \in G} \left( \nu - x_g\right) b_g\left( {\mathbf {x}}\right) . \end{aligned}$$
(38)

With the above definitions, Eq. (36) can be restated as \({\Delta }\left( {\mathbf {x}}\right) = {\Delta }_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) + {\Delta }_{{\mathrm {mut}}}\left( {\mathbf {x}}\right) \).

4.2 Equivalence of success criteria

How does one judge which of the two competing alleles, a and A, is favored by selection? There are a number of reasonable criteria to use:

  • In a given state \({\mathbf {x}}\), we could say that A is favored if \({\Delta }_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) >0\).

  • For an evolutionary process with no mutation, we could say that A is favored if it has larger fixation probability; that is, if \(\rho _A > \rho _a\).

  • For an evolutionary process with mutation, we could say that A is favored if its stationary frequency is greater than one would expect by mutation alone; the latter quantity can be obtained by setting \(\rho _A=\rho _a\) in Eq. (15). This leads to the success criterion

    $$\begin{aligned} \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] > \frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) }. \end{aligned}$$
    (39)

    In the case that the overall birth rates coincide in the two monoallelic states, \(b\left( {\mathbf {a}}\right) =b\left( {\mathbf {A}}\right) \), this criterion reduces to \(\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] > \nu \).

Our first main result shows that these success criteria become equivalent in the limit \(u \rightarrow 0\), when \({\Delta }_{{\mathrm {sel}}}\) is averaged over the RMC distribution. Alternatively, one may average \({\Delta }_{{\mathrm {sel}}}\) over the MSS distribution and take the u-derivative at \(u=0\).

Theorem 4

For any replacement rule, \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), and any mutational bias, \(\nu \), the following success criteria are equivalent:

  1. (a)

    \(\rho _A > \rho _a\);

  2. (b)

    \(\displaystyle \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] > \frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) }\);

  3. (c)

    \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ {\Delta }_{{\mathrm {sel}}}\right] >0\);

  4. (d)

    \(\frac{d}{du}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\Delta }_{{\mathrm {sel}}}\right] \big |_{u=0} > 0\).

The equivalence of (b) and (d), in the case that \(b({\mathbf {a}})=b({\mathbf {A}})\), was previously shown by Tarnita and Taylor (2014). Under the further assumption that \(\nu =1/2\), Nowak et al. (2010b, Corollary 1 of Appendix A) showed the equivalence of (b) and (d); Allen and Tarnita (2014, Theorem 6 and Corollary 2) showed the equivalence of (a), (b), and (c); and Van Cleve (2015) showed the equivalence of (a), (b), and a variant of (c). Special cases of this result for particular models were also proven by Rousset and Billiard (2000) and Taylor et al. (2007a).

Proof

We begin by assuming a fixed \(u>0\) and rewriting Eq. (36) as

$$\begin{aligned} {\Delta }\left( {\mathbf {x}}\right) = {\Delta }_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) - u \sum _{g \in G} \left( x_g-\nu \right) b_g\left( {\mathbf {x}}\right) . \end{aligned}$$
(40)

We now take the expectation of both sides under the MSS distribution. The left-hand side vanishes since the expected change in any quantity is zero when averaged over a stationary distribution. We therefore have

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\Delta }_{{\mathrm {sel}}}\right] = u {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) b_g \right] . \end{aligned}$$
(41)

We also observe from Eq. (37) that \({\Delta }_{{\mathrm {sel}}}\left( {\mathbf {a}}\right) ={\Delta }_{{\mathrm {sel}}}\left( {\mathbf {A}}\right) =0\). Applying Lemma 2, we have

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ {\Delta }_{{\mathrm {sel}}}\right] = K \frac{d {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\Delta }_{{\mathrm {sel}}}\right] }{du} \Big |_{u=0} = K \lim _{u \rightarrow 0}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) b_g \right] , \end{aligned}$$
(42)

with \(K>0\). Theorem 1 now gives

$$\begin{aligned} \lim _{u \rightarrow 0}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) b_g \right]&= \left( 1-\nu \right) b\left( {\mathbf {A}}\right) \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) \nonumber \\&\quad - \nu b\left( {\mathbf {a}}\right) \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) \nonumber \\&= \frac{\nu \left( 1-\nu \right) b\left( {\mathbf {a}}\right) b\left( {\mathbf {A}}\right) }{ \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b \left( {\mathbf {A}}\right) \rho _{a}} \left( \rho _A - \rho _a\right) . \end{aligned}$$
(43)

The coefficient of \(\rho _A - \rho _a\) above is positive; thus \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ {\Delta }_{{\mathrm {sel}}}\right] \), \(\frac{d}{du}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\Delta }_{{\mathrm {sel}}}\right] \big |_{u=0}\), and \(\rho _A - \rho _a\) have the same sign. This proves \(\text {(a)} \Leftrightarrow \text {(c)} \Leftrightarrow \text {(d)}\). For (b), we write

$$\begin{aligned}&\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] - \frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) } \nonumber \\&\quad = \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) -\frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) } \nonumber \\&\quad = \frac{\nu \left( 1-\nu \right) b\left( {\mathbf {a}}\right) b\left( {\mathbf {A}}\right) }{ \left( \nu b\left( {\mathbf {a}}\right) +\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \right) \left( \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}\right) } \left( \rho _{A} - \rho _{a} \right) , \end{aligned}$$
(44)

by Theorem 1. The last line above has the sign of \(\rho _A - \rho _a\), thus \(\text {(a)}\Leftrightarrow \text {(b)}\). \(\square \)

If the conditions of Theorem 4 are satisfied, we say that allele A is favored by selection. An interesting consequence of Theorem 4 is that Condition (b) is independent of the mutational bias, \(\nu \); either it holds for all values of \(\nu \) or else for none of them. We can understand this result to say that, when mutation is vanishingly rare, mutational bias does not affect the direction of selection.

Theorem 4 is our most general equivalence result. It shows that four reasonable measures of selection coincide with each other when mutation is rare. However, for many models of interest, none of the four conditions are analytically or computationally tractable (Ibsen-Jensen et al. 2015). In what follows, we will begin to introduce additional assumptions that allow us to define important notions such as reproductive value and fitness and obtain conditions that more tractable than those in Theorem 4.

5 Reproductive value and fitness

Reproductive value and fitness are ubiquitous concepts in evolutionary theory. Both quantify the expected reproductive success of an individual or a genetic site. Fitness, which is used to quantify selection, takes into account the alleles present in the population. In contrast, reproductive value quantifies reproductive success in the absence of selection and is therefore independent of the alleles in the population. Both fitness and reproductive value may depend on other factors such as age, sex, caste, and spatial location.

In this section, we define both notions for our class of models. First, however, we must introduce an additional assumption regarding the consistency of replacement in the monoallelic states \({\mathbf {a}}\) and \({\mathbf {A}}\).

5.1 Consistency of monoallelic states

Since selection does not occur in the monoallelic states \({\mathbf {a}}\) and \({\mathbf {A}}\), the capacity of a site to reproduce (i.e. its reproductive value) is ascribable to the site itself, not to the alleles in the population. It is therefore reasonable to define reproductive value with respect to states \({\mathbf {a}}\) and \({\mathbf {A}}\). To obtain a consistent definition, we require that the probabilities of replacement events coincide in these states:

Assumption 1

(Consistency of monoallelic states) Each replacement event \(\left( R,\alpha \right) \) has the same probability in state \({\mathbf {A}}\) as in state \({\mathbf {a}}\): \(p_{\left( R,\alpha \right) }\left( {\mathbf {A}}\right) = p_{\left( R,\alpha \right) }\left( {\mathbf {a}}\right) \).

Assumption 1 does not necessarily hold for every plausible model of natural selection. For example, if allele A has a positive fitness effect in some sites and a negative fitness effect in others (relative to allele a), then the patterns of replacement in state \({\mathbf {A}}\) are likely to differ from those in state \({\mathbf {a}}\). Importantly, if Assumption 1 is not satisfied, there may not be a consistent way to define reproductive values. That is, one may obtain two different sets of reproductive values, one for state \({\mathbf {a}}\) and one for state \({\mathbf {A}}\), with no obvious way of reconciling them.

With Assumption 1 in force, we denote the probability of a replacement event \((R,\alpha )\) in state \({\mathbf {a}}\) or \({\mathbf {A}}\) by \(p^\circ _{\left( R,\alpha \right) } :=p_{\left( R,\alpha \right) }\left( {\mathbf {A}}\right) = p_{\left( R,\alpha \right) }\left( {\mathbf {a}}\right) \). We will also use the superscript \({}^\circ \) to denote the following other quantities in states \({\mathbf {a}}\) or \({\mathbf {A}}\):

$$\begin{aligned} e_{gh}^\circ&:=e_{gh}\left( {\mathbf {a}}\right) = e_{gh}\left( {\mathbf {A}}\right) = \sum _{\begin{array}{c} \left( R, \alpha \right) \\ \alpha \left( h\right) =g \end{array}} p^\circ _{\left( R, \alpha \right) } ; \end{aligned}$$
(45a)
$$\begin{aligned} b_g^\circ&:=b_g\left( {\mathbf {a}}\right) = b_g\left( {\mathbf {A}}\right) = \sum _{h\in G} e_{gh}^\circ ; \end{aligned}$$
(45b)
$$\begin{aligned} d_g^\circ&:=d_g\left( {\mathbf {a}}\right) = d_g\left( {\mathbf {A}}\right) = \sum _{h \in G} e_{hg}^\circ . \end{aligned}$$
(45c)

5.2 Reproductive value

Reproductive value (Fisher 1930; Taylor 1990; Maciejewski 2014) quantifies the expected contribution of an individual or a genetic site to the future gene pool of the population in the absence of selection, depending on factors such as age, sex, location, and caste. Reproductive values indicate the relative importance of different individuals to the process of natural selection. For example, a sterile worker in an insect colony has zero reproductive value, since it has no opportunity to transmit its genetic material.

We will first define reproductive value on the level of genetic sites, and later (in Sect. 8) extend to individuals. The reproductive value \(v_g\) of a genetic site \(g \in G\) is defined as follows:

Definition 5

For a replacement rule satisfying Assumption 1, the reproductive values\(\left\{ v_g\right\} _{g \in G}\) are defined as the unique solution to the system of equations

$$\begin{aligned} d_g^\circ v_g = \sum _{h \in G} e_{gh}^\circ v_h&\quad \text {for all} \; g \in G ; \end{aligned}$$
(46a)
$$\begin{aligned} \sum _{g \in G} v_g = n&. \end{aligned}$$
(46b)

Equation (46a) can be understood as saying that, for an allele occupying site \(g \in G\), under one transition in either of the monoallelic states, the expected loss of reproductive value due to death, \(d_g^\circ v_g\), is balanced by the expected reproductive value of new copies produced, \(\sum _{h \in G} e_{gh}^\circ v_h\). The normalization in Eq. (46b) is arbitrary, chosen so that the average reproductive value of each site is one.

We prove in Proposition 1 below that reproductive values are uniquely defined by Eq. (46) and are nonnegative. We show in Sect. 6 below that, under neutral drift, the reproductive value of a site g is proportional to the probability that a mutation arising at site g becomes fixed.

Reproductive value (RV) provides a natural weighting for genetic sites when computing quantities related to natural selection. We indicate RV-weighted quantities with a hat; for example, the RV-weighted frequency \({\hat{x}}\) is defined as

$$\begin{aligned} {\hat{x}} :=\frac{1}{n} \sum _{g \in G} v_g x_g . \end{aligned}$$
(47)

We also define the RV-weighted birth and death rates of each site \(g \in G\):

$$\begin{aligned} {\hat{b}}_g \left( {\mathbf {x}}\right)&:=\sum _{h\in G} e_{gh} \left( {\mathbf {x}}\right) v_h ; \end{aligned}$$
(48a)
$$\begin{aligned} {\hat{d}}_g \left( {\mathbf {x}}\right)&:=v_g d_g \left( {\mathbf {x}}\right) . \end{aligned}$$
(48b)

It follows from Eq. (46) that in the monoallelic states, the RV-weighted birth and death rates are equal for each state:

$$\begin{aligned} {\hat{b}}^\circ _g = {\hat{d}}^\circ _g. \end{aligned}$$
(49)

We let \({\hat{b}}\left( {\mathbf {x}}\right) \) denote the total RV-weighted birth rate in state \({\mathbf {x}}\), which is equal to the total RV-weighted death rate:

$$\begin{aligned} {\hat{b}}\left( {\mathbf {x}}\right) = \sum _{g \in G} {\hat{b}}_g \left( {\mathbf {x}}\right) = \sum _{g,h \in G} e_{gh}\left( {\mathbf {x}}\right) v_h = \sum _{h \in G} {\hat{d}}_h \left( {\mathbf {x}}\right) . \end{aligned}$$
(50)

5.3 Fitness

Fitness quantifies reproductive success under natural selection. Although the concept of fitness is fundamental in evolutionary biology (Haldane 1924; Fisher 1930), it can be difficult to define for a general evolutionary process (Metz et al. 1992; Rand et al. 1994; Doebeli et al. 2017).

As with other quantities, we first define fitness on the level of genetic sites. Intuitively, the fitness of site g in state \({\mathbf {x}}\) should quantify the expected reproductive success of the allele occupying g in this state. This success can be quantified in terms of its own reproductive value (if it survives) plus the expected reproductive value of copies it produces and transmits, which leads to the following definition:

Definition 6

The fitness of site \(g \in G\) in state \({\mathbf {x}}\) is defined as

$$\begin{aligned} w_g \left( {\mathbf {x}}\right)&:=\left( 1 - \sum _{h \in G} e_{hg}\left( {\mathbf {x}}\right) \right) v_g + \sum _{h \in G} e_{gh}\left( {\mathbf {x}}\right) v_h \nonumber \\&= v_g - {\hat{d}}_g \left( {\mathbf {x}}\right) + {\hat{b}}_g \left( {\mathbf {x}}\right) . \end{aligned}$$
(51)

The definition of fitness used here differs from the definition in Allen and Tarnita (2014), which does not take reproductive value into account.

We observe that, by Eq. (46b), the total fitness is n in every state \({\mathbf {x}}\), i.e.

$$\begin{aligned} \sum _{g \in G} w_g \left( {\mathbf {x}}\right) = n. \end{aligned}$$
(52)

For the monoallelic states, it follows from Eq. (49) that each site has fitness equal to its reproductive value:

$$\begin{aligned} w_g^\circ = v_g - {\hat{d}}_g^\circ + {\hat{b}}_g^\circ = v_g. \end{aligned}$$
(53)

5.4 Selection with reproductive value

To quantify selection using reproductive value, we use the expected change in the absolute RV-weighted frequency from a given state \({\mathbf {x}}\), denoted \(\hat{{\Delta }}\left( {\mathbf {x}}\right) \), which we define as follows:

$$\begin{aligned} \hat{{\Delta }}\left( {\mathbf {x}}\right) :=&- \sum _{g \in G} x_g v_g d_g \left( {\mathbf {x}}\right) + \left( 1-u\right) \sum _{g,h \in G} x_g e_{gh} \left( {\mathbf {x}}\right) v_h + u \nu \sum _{g \in G} v_g d_g\left( {\mathbf {x}}\right) \nonumber \\ =&- \sum _{g \in G} x_g {\hat{d}}_g \left( {\mathbf {x}}\right) + \left( 1-u\right) \sum _{g \in G} x_g {\hat{b}}_g \left( {\mathbf {x}}\right) + u \nu \sum _{g \in G} {\hat{d}}_g\left( {\mathbf {x}}\right) . \end{aligned}$$
(54)

(Note that \(\hat{{\Delta }}\left( {\mathbf {x}}\right) \), like \({\Delta }\left( {\mathbf {x}}\right) \), is defined using absolute rather than relative weights, in order to avoid tedious factors of 1 / n.)

We rewrite Eq. (54), in analogy to Eq. (40), as

$$\begin{aligned} \hat{{\Delta }}\left( {\mathbf {x}}\right) = \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) - u \sum _{g \in G} \left( x_g - \nu \right) {\hat{b}}_g \left( {\mathbf {x}}\right) . \end{aligned}$$
(55)

Above (following Tarnita and Taylor 2014) we have introduced \(\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) \), the expected change in RV-weighted frequency due to selection from state \({\mathbf {x}}\), which can be defined in a number of equivalent ways:

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right)&= \sum _{g \in G}x_g \left( {\hat{b}}_g\left( {\mathbf {x}}\right) -{\hat{d}}_g\left( {\mathbf {x}}\right) \right) \nonumber \\&= \sum _{g \in G} x_g \left( w_g\left( {\mathbf {x}}\right) - v_g \right) \nonumber \\&= \sum _{g,h \in G}x_g \left( e_{gh}\left( {\mathbf {x}}\right) v_h - e_{hg}\left( {\mathbf {x}}\right) v_g \right) \nonumber \\&= \frac{1}{2} \sum _{g,h \in G} \left( x_g - x_h\right) \left( e_{gh}\left( {\mathbf {x}}\right) v_h - e_{hg}\left( {\mathbf {x}}\right) v_g \right) . \end{aligned}$$
(56)

We now extend our equivalence of success criteria (Theorem 4) to include measures weighted by reproductive value:

Theorem 5

For any replacement rule, \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), satisfying Assumption 1, and any mutational bias \(\nu \), the following success criteria are equivalent:

  1. (a)

    \(\rho _A > \rho _a\);

  2. (b)

    \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ {\Delta }_{{\mathrm {sel}}}\right] > 0\);

  3. (c)

    \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] > 0\);

  4. (d)

    \(\frac{d}{du} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\Delta }_{{\mathrm {sel}}}\right] \big |_{u=0} > 0\);

  5. (e)

    \(\frac{d}{du} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] \big |_{u=0} > 0\);

  6. (f)

    \(\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x \right] \; (= \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\hat{x}} \right] ) > \nu \).

The equivalence of (d) and (f) was previously shown by Tarnita and Taylor (2014).

Proof

Analogously to the proof of Theorem 4, we temporarily fix \(u>0\) and take the expectation of both sides of Eq. (55) under the MSS distribution. Upon rearranging, we obtain

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] = u {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) {\hat{b}}_g \right] . \end{aligned}$$
(57)

Since \(\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {a}}\right) =\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {A}}\right) =0\), application of Lemma 2 yields

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] = K \frac{d {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] }{du} \Big |_{u=0} = K \lim _{u \rightarrow 0}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) {\hat{b}}_g \right] , \end{aligned}$$
(58)

for some \(K>0\). We observe that, by Assumption 1, \(b\left( {\mathbf {A}}\right) =b\left( {\mathbf {a}}\right) \) and \({\hat{b}}\left( {\mathbf {A}}\right) = {\hat{b}}\left( {\mathbf {a}}\right) \); we denote the latter quantity by \({\hat{b}}^\circ \). Theorem 1 now gives

$$\begin{aligned} \lim _{u \rightarrow 0}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) {\hat{b}}_g \right]&= {\hat{b}}^\circ \left( \left( 1-\nu \right) \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {A}}\right) - \nu \lim _{u \rightarrow 0} \pi _{\mathrm {MSS}}\left( {\mathbf {a}}\right) \right) \nonumber \\&= \frac{{\hat{b}}^\circ \nu \left( 1-\nu \right) }{\nu \rho _{A}+\left( 1-\nu \right) \rho _{a}} (\rho _A - \rho _a). \end{aligned}$$
(59)

The coefficient of \(\rho _A - \rho _a\) above is positive, which proves the equivalence of (a), (c), and (e). The rest of the proof follows from Theorem 4. \(\square \)

We note that Theorem 5 requires Assumption 1 and is therefore less general than its non-RV-weighted analogue, Theorem 4 (see Tarnita and Taylor 2014, for related discussion).

6 Neutral drift

Neutral drift describes a situation where the alleles present in the population do not affect the drivers of selection (births and deaths). Neutral drift is interesting and important in its own right (e.g. Kimura et al. 1968; Allen et al. 2015; McAvoy et al. 2018a), and will also serve as a baseline for studying weak selection. In our framework, neutral drift is characterized by the property that the probabilities of replacement events are independent of the population state:

Definition 7

A replacement rule \(\left\{ p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\)represents neutral drift if \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \) is independent of the state \({\mathbf {x}}\) for every \((R,\alpha )\).

Clearly, a replacement rule representing neutral drift satisfies Assumption 1; indeed, we have \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) = p^\circ _{\left( R,\alpha \right) }\) for each replacement event, \(\left( R,\alpha \right) \), and each state, \({\mathbf {x}}\). For this reason, we will denote a replacement rule representing neutral drift by its (fixed) probability distribution, \(\left\{ p_{\left( R,\alpha \right) }^\circ \right\} _{\left( R,\alpha \right) }\), over replacement events.

6.1 Ancestral random walks and uniqueness of reproductive value

A convenient property of neutral drift is that it can be analyzed backwards in time, using the perspective of coalescent theory (Kingman 1982; Wakeley 2009). Let \(\left\{ p_{\left( R,\alpha \right) }^\circ \right\} _{\left( R,\alpha \right) }\) be a replacement rule representing neutral drift. For a given site, \(g \in G\), the probability \(a_{gh}\) that the parent copy of the allele in g occupied site h is

$$\begin{aligned} a_{gh} :={{\mathrm{{\mathbb {P}}}}}^\circ \left[ \alpha \left( g\right) =h \, | \, g\in R\right] = \frac{e_{hg}^\circ }{d_g^\circ } . \end{aligned}$$
(60)

(Note that \(d_g^\circ >0\) as a consequence of the Fixation Axiom.) We define the ancestral random walk as a Markov chain, \({\mathcal {A}}\), on G with transition probabilities \(a_{gh}\) from g to h. The trajectory of the ancestral random walk from a given site g represents the ancestry of g traced backwards in time. The ancestral random walk is a special case of a coalescing random walk (Holley and Liggett 1975; Cox 1989) for which there is only one walker.

The ancestral random walk enables an intuitive proof for the uniqueness and non-negativity of reproductive values:

Proposition 1

For a given replacement rule satisfying Assumption 1, the reproductive values, \(\left\{ v_g\right\} _{g \in G}\), are uniquely defined by Eq. (46) and are nonnegative for every \(g \in G\).

Proof

First suppose that the given replacement rule represents neutral drift. Let \({\mathcal {A}}\) be the corresponding ancestral random walk, with transition probabilities \(a_{gh} = e_{hg}^\circ /d_g^\circ \). The Fixation Axiom implies that there exists a \(g \in G\) such that, for all \(h \in G\), there is a \(k \geqslant 0\) and a sequence \(\ell _1, \ldots , \ell _k \in G\) such that \(a_{h \ell _k} a_{\ell _k \ell _{k-1}} \cdots a_{\ell _2 \ell _1} a_{\ell _1 g} >0\). In other words, there exists \(g \in G\) such that for every \(h\in G\), there is a finite sequence of transitions in \({\mathcal {A}}\), each with positive probability, from h to g. It follows that \({\mathcal {A}}\) has a single closed communicating class, and therefore it has a unique stationary probability distribution, \(\{z_g\}_{g \in G}\), which is the unique solution to the system of equations

$$\begin{aligned}&z_g = \sum _{h \in G} \frac{e_{gh}^\circ }{d_h^\circ } z_h ; \end{aligned}$$
(61a)
$$\begin{aligned}&\sum _{g \in G} z_g = 1. \end{aligned}$$
(61b)

Setting

$$\begin{aligned} v_g :=\frac{ z_g /d_g^\circ }{\sum _{\ell \in G}\left( z_\ell /d_\ell ^\circ \right) } , \end{aligned}$$
(62)

it follows that \(\left\{ v_g\right\} _{g \in G}\) is the unique solution to Eq. (46). The \(z_g\) are nonnegative since they comprise a probability distribution, and it follows that the \(v_g\) are nonnegative as well.

If the given replacement rule does not represent neutral drift, we define a new replacement rule \(\left\{ {\tilde{p}}_{\left( R,\alpha \right) } \right\} _{\left( R,\alpha \right) }\) by \({\tilde{p}}_{\left( R,\alpha \right) } :=p_{\left( R,\alpha \right) }\left( {\mathbf {a}}\right) =p_{\left( R,\alpha \right) }\left( {\mathbf {A}}\right) \). This new replacement rule represents neutral drift by definition. The above argument again shows that the reproductive values are uniquely defined by Eq. (46) and are nonnegative. \(\square \)

As a corollary to the proof, we can see that the states with positive reproductive value are precisely those that are recurrent under the ancestral random walk, which in turn are those that are able to spread their contents throughout the population in the sense of the Fixation Axiom. This fact hints at a connection between reproductive value and fixation probability, which we will make explicit in Sect. 6.3.

6.2 Change due selection vanishes under neutral drift

A key result for neutral drift is that the RV-weighted change due to selection, \(\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) \), is zero in every state, \({\mathbf {x}}\). This property mathematically expresses the neutrality (absence of selection) in neutral drift. Notably, the analogous property does not hold for unweighted change due to selection, \({\Delta }_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) \). Indeed, this property uniquely defines reproductive value, and is one of the primary motivations studying RV-weighted quantities (see also Tarnita and Taylor 2014). We formalize these observations in the following proposition:

Theorem 6

If the replacement rule \(\left\{ p^\circ _{\left( R,\alpha \right) }\right\} _{\left( R,\alpha \right) }\) represents neutral drift, then \(\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) = 0\) for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\). Furthermore, if \(\left\{ {\tilde{v}}_g\right\} _{g \in G}\) is any weighting of genetic sites \(g \in G\) such that \({\tilde{{\Delta }}}_{\mathrm {sel}}\left( {\mathbf {x}}\right) \) (the change in \({\tilde{v}}\)-weighted frequency) is zero for each state \({\mathbf {x}}\), then \(\left\{ {\tilde{v}}_g\right\} _{g \in G}\) are a constant multiple of \(\left\{ v_g\right\} _{g \in G}\).

Proof

For the first claim, combining Eqs. (49) and (56) gives

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}({\mathbf {x}}) = \sum _{g \in G} x_g \left( {\hat{b}}_g^\circ - {\hat{d}}_g^\circ \right) = 0. \end{aligned}$$
(63)

For the second claim, consider an arbitrary weighting of genetic sites \(\left\{ {\tilde{v}}_g\right\} _{g \in G}\). In analogy with Eq. (56), the expected change in \({\tilde{x}} = \sum _{g \in G} {\tilde{v}}_g x_g\) can be written

$$\begin{aligned} {\tilde{{\Delta }}}_{\mathrm {sel}}\left( {\mathbf {x}}\right) = \sum _{g,h \in G}x_g \left( e_{gh}^\circ {\tilde{v}}_h - e_{hg}^\circ {\tilde{v}}_g \right) . \end{aligned}$$
(64)

If \({\tilde{{\Delta }}}_{\mathrm {sel}}\left( {\mathbf {x}}\right) =0\) for each state \({\mathbf {x}}\), then, by substituting \({\mathbf {x}}={{\mathbf {1}}}_{\left\{ g\right\} }\) into Eq. (64) and rearranging, we obtain

$$\begin{aligned} \sum _{h \in G} e_{gh}^\circ {\tilde{v}}_h = \sum _{h \in G} e_{hg}^\circ {\tilde{v}}_g = d_g^\circ {\tilde{v}}_g \quad \text {for each} \; g \in G. \end{aligned}$$
(65)

The above equation is equivalent to Eq. (46a), with \({\tilde{v}}_g\) in place of \(v_g\). Proposition 1 guarantees the solution to Eq. (46a) is unique up to a constant multiple. Therefore, the \(\left\{ {\tilde{v}}_g\right\} _{g \in G}\) are a constant multiple of \(\left\{ v_g\right\} _{g \in G}\). \(\square \)

6.3 Reproductive value and fixation probability

As an application of Theorem 6, we deduce a remarkable relationship between reproductive value and fixation probability: under neutral drift with no mutation, the reproductive value of a site is proportional to the fixation probability of a novel type initiated at that site. This relationship was previously noted by Maciejewski (2014) and Allen et al. (2015); here, we provide an elegant proof using martingales:

Theorem 7

Let \(\left\{ p^\circ _{\left( R,\alpha \right) }\right\} _{\left( R,\alpha \right) }\) be a replacement rule representing neutral drift and let \({\mathcal {M}}\) be the associated evolutionary Markov chain with \(u=0\). Then, for any state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\),

$$\begin{aligned} \lim _{t \rightarrow \infty } P^{\left( t\right) }_{{\mathbf {x}}\rightarrow {\mathbf {A}}} = {\hat{x}} . \end{aligned}$$
(66)

In particular, the fixation probability of a single mutation arising in site g is

$$\begin{aligned} \lim _{t \rightarrow \infty } P^{\left( t\right) }_{{{\mathbf {1}}}_{\left\{ g\right\} } \rightarrow {\mathbf {A}}} = \frac{v_g}{n} , \end{aligned}$$
(67)

and the overall fixation probabilities of A and a are

$$\begin{aligned} \rho _{A} = \rho _{a} = \frac{{\hat{b}}^{\circ }}{nb^{\circ }} . \end{aligned}$$
(68)

Proof

Consider \({\mathcal {M}}\) started from an arbitrary initial state \({\mathbf {X}}\left( 0\right) ={\mathbf {x}}_0\). Let \({\mathbf {X}}\left( t\right) \) denote the state at time t, and let \({\hat{X}}\left( t\right) :=\frac{1}{n}\sum _{g \in G} v_g X_g\left( t\right) \) denote the RV-weighted frequency at time t. We claim that \({\hat{X}}\left( t\right) \) is a martingale, which can be seen by writing Eq. (55) as

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}\left[ {\hat{X}}\left( t+1\right) -{\hat{X}}\left( t\right) \ |\ {\mathbf {X}}\left( t\right) ={\mathbf {x}}\right]&= \frac{1}{n} \hat{{\Delta }}\left( {\mathbf {x}}\right) \nonumber \\&= \frac{1}{n} \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) - \frac{u}{n} \sum _{g \in G} \left( x_g - \nu \right) {\hat{b}}_g^\circ . \end{aligned}$$
(69)

The first term in the final expression is zero by Theorem 6, and the second is zero since \(u=0\); thus \({\hat{X}}(t)\) is a martingale.

Since \(u=0\), \({\mathcal {M}}\) eventually becomes absorbed either in state \({\mathbf {A}}\), for which \({\hat{x}}=1\), or \({\mathbf {a}}\), for which \({\hat{x}}=0\). The martingale property then implies

$$\begin{aligned} {\hat{x}}_0 = \lim _{t \rightarrow \infty } {{\mathrm{{\mathbb {E}}}}}\left[ {\hat{X}}\left( t\right) \ |\ {\mathbf {X}}\left( 0\right) ={\mathbf {x}}_0\right] = \lim _{t \rightarrow \infty } P^{\left( t\right) }_{{\mathbf {x}}_0 \rightarrow {\mathbf {A}}} , \end{aligned}$$
(70)

which proves Eq. (66). Equation (67) follows from setting \({\mathbf {x}}_0 = {{\mathbf {1}}}_{\left\{ g\right\} }\), and Eq. (68) follows from Definitions 1 and 2. \(\square \)

Note that, according to Eq. (68), the fixation probability of a neutral mutation is not necessarily 1 / N, and depends on the spatial structure. It follows that spatial structure can affect a population’s rate of neutral substitution (or “molecular clock”; Kimura et al. 1968). This effect is discussed in detail by Allen et al. (2015).

6.4 Symmetry of neutral distributions

Since, for neutral drift, the alleles a and A are interchangeable, we expect the neutral RMC and MSS distributions to be insensitive to interchanging the roles of a and A. To formalize this property, we define the complement of a state \({\mathbf {x}}\), denoted \({\bar{{\mathbf {x}}}}\), by \({\bar{x}}_{g}=1-x_g\) for each \(g \in G\). In other words, \({\bar{{\mathbf {x}}}}\) is formed from \({\mathbf {x}}\) by replacing all a’s with A’s and vice versa. In particular, \({\bar{{\mathbf {a}}}} = {\mathbf {A}}\) and \({\bar{{\mathbf {A}}}} = {\mathbf {a}}\). The symmetry property for the RMC distribution can then be stated as follows:

Proposition 2

If the replacement rule \(\left\{ p^\circ _{\left( R,\alpha \right) }\right\} _{\left( R,\alpha \right) }\) represents neutral drift, then for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\), \(\pi _{\mathrm {RMC}}\left( {\bar{{\mathbf {x}}}} \right) = \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \).

Proof

For a replacement rule representing neutral drift and \(u=0\), it follows from how transitions are defined that \(P_{{\mathbf {x}}\rightarrow {\mathbf {y}}}= P_{{\bar{{\mathbf {x}}}} \rightarrow {\bar{{\mathbf {y}}}}}\) for all states \({\mathbf {x}},{\mathbf {y}}\in \left\{ 0,1\right\} ^G\). We further observe that \(\mu _a\left( {\mathbf {x}}\right) =\mu _A \left( {\bar{{\mathbf {x}}}} \right) \) and \(\mu _A\left( {\mathbf {x}}\right) =\mu _a \left( {\bar{{\mathbf {x}}}} \right) \). Substituting into the recurrence relation for the RMC distribution, Eq. (25), we obtain

$$\begin{aligned} \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) = \sum _{{\mathbf {y}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} } \pi _{\mathrm {RMC}}\left( {\mathbf {y}}\right) \left( P_{{\bar{{\mathbf {y}}}} \rightarrow {\bar{{\mathbf {x}}}}} + P_{{\bar{{\mathbf {y}}}} \rightarrow {\mathbf {A}}} \mu _a \left( {\bar{{\mathbf {x}}}} \right) + P_{{\bar{{\mathbf {y}}}} \rightarrow {\mathbf {a}}} \mu _A \left( {\bar{{\mathbf {x}}}} \right) \right) , \end{aligned}$$
(71)

for all states \({\mathbf {x}}\), with the transition probabilities evaluated at \(u=0\). Since Eq. (71) uniquely determines \(\left\{ \pi _{\mathrm {RMC}}\left( {\bar{{\mathbf {x}}}} \right) \right\} \), we have \(\pi _{\mathrm {RMC}}\left( {\bar{{\mathbf {x}}}} \right) = \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \) for all \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\). \(\square \)

In particular, under neutral drift, we have \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ x_g\right] = 1/2\) for each site g; that is, g is equally likely to be occupied by either allele in the RMC distribution for neutral drift.

For the MSS distribution, interchanging the roles of a and A leads to a different symmetry property, one that incorporates the mutational bias, \(\nu \):

Proposition 3

Let \(\left\{ p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\) be a replacement rule representing neutral drift and let the mutation probability \(u>0\) be fixed. Then, for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\), the value of \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) with mutational bias \(\nu \) equals the value \(\pi _{\mathrm {MSS}}\left( {\bar{{\mathbf {x}}}}\right) \) with mutational bias \(1-\nu \).

We omit the proof, which is similar to that Proposition 2 but uses the recurrence relations (9a) in place of (25). Proposition 3 implies in particular that \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x_g\right] = \nu \) for each site g. The apparent discrepancy between the results \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ x_g\right] = 1/2\) and \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x_g\right] = \nu \) can be resolved by recalling that, as \(u \rightarrow 0\), the MSS distribution becomes concentrated on state \({\mathbf {A}}\) (with probability approaching \(\nu \)) and state \({\mathbf {a}}\) (with probability approaching \(1-\nu \)); in contrast, the RMC distribution excludes these monoallelic states and is independent of \(\nu \).

7 Weak selection

We say that selection is weak if the process of natural selection between A and a approximates neutral drift. Weak selection is mathematically convenient because it allows the use of perturbative techniques; it is also biologically relevant since, for many systems of interest, mutations have a relatively small effect on reproductive success.

7.1 Formalism

To formalize the notion of weak selection, we introduce a selection strength parameter, \(\delta \), which takes values in some half-open neighborhood \(\left[ 0,\varepsilon \right) \) of zero. We consider a \(\delta \)-indexed family of replacement rules, subject to the following assumption:

Assumption 2

(Assumptions for weak selection) For each replacement event \(\left( R,\alpha \right) \), the probabilities \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \) satisfy the following:

  1. (a)

    \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \) varies smoothly with respect to \(\delta \in \left[ 0,\varepsilon \right) \) for each state \({\mathbf {x}}\);

  2. (b)

    \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \) is independent of \({\mathbf {x}}\) for \(\delta =0\).

Part (b) guarantees that the replacement rule represents neutral drift when \(\delta =0\). For now, we do not require that Assumption 1 be satisfied for all values of \(\delta \). Assumption 1 will appear later as a condition of Corollary 1 below.

The following proposition shows that, under Assumption 2, other fundamental quantities of interest also vary smoothly with respect to \(\delta \):

Proposition 4

For a \(\delta \)-indexed family of replacement rules, \(\left\{ p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\), satisfying Assumption 2, and for any mutational bias, \(0<\nu <1\),

  1. (a)

    \(\rho _A\), \(\rho _a\), and \(\pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \) for each \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\) are smooth functions of \(\delta \in \left[ 0,\varepsilon \right) \);

  2. (b)

    For each \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\), \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) extends uniquely to a smooth function of \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0,\varepsilon \right) \);

  3. (c)

    For each \({\mathbf {x}}\in \left\{ 0,1\right\} ^G \setminus \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \), \({{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}= {\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \) extends uniquely to a smooth function of \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0,\varepsilon \right) \).

Proof

We first observe that, from the definition of the evolutionary Markov chain and the formalism for weak selection, the transition probabilities \(P_{{\mathbf {x}}\rightarrow {\mathbf {y}}}\) are smooth functions of \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0, \varepsilon \right) \).

The fixation probabilities, \(\rho _A\) and \(\rho _a\), being absorption probabilities for a finite Markov chain, are bounded, rational functions of the transition probabilities (see, for example, Theorem 3.3.7 of Kemeny and Snell 1960), and are therefore smooth functions of \(\delta \in \left[ 0, \varepsilon \right) \).

We turn now to the stationary probabilities \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \). The system of Eqs. (9) and (22) define a unique continuous extension of \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) to \(0 \leqslant u \leqslant 1\). Since the equations of this system vary smoothly with u and \(\delta \) and the solution is unique, this extension of \(\pi _{\mathrm {MSS}}\left( {\mathbf {x}}\right) \) is smooth in \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0, \varepsilon \right) \).

The argument for \({{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}= {\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \) is similar, except that the relevant system of equations is Eq. (26)—which is replaced by Eq. (25) for \(u=0\)—together with the additional equation \(\sum _{{\mathbf {x}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} } {{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}= {\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] = 1\). This system of equations has a unique solution for each \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0, \varepsilon \right) \), which coincides with \(\{{{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}= {\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \}\) for \(u>0\) and with \(\{ \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \}\) for \(u=0\). Thus, for each \({\mathbf {x}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \), \({{\mathrm{{\mathbb {P}}}}}_{\mathrm {MSS}}\left[ {\mathbf {X}}= {\mathbf {x}}\ |\ {\mathbf {X}}\notin \left\{ {\mathbf {a}}, {\mathbf {A}}\right\} \right] \) extends uniquely to a smooth function of \(\left( u,\delta \right) \in \left[ 0,1\right] \times \left[ 0, \varepsilon \right) \), which coincides with \(\pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \) at \(u=0\). \(\square \)

We will study weak selection as a perturbation of neutral drift (\(\delta =0\)). In formulating weak-selection expansions of various quantities, we will use a circle (\({}^\circ \)) to indicate the value at \(\delta =0\) and a prime (\('\)) to denote the first-order coefficient in \(\delta \) as \(\delta \rightarrow 0^+\). (In light of Assumption 2b, this use of \({}^\circ \) is consistent with the previous use in Sect. 6.) For example, we have the following weak selection expansions:

$$\begin{aligned} p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right)&= p^\circ _{\left( R,\alpha \right) } + \delta p'_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( \delta ^2\right) ; \end{aligned}$$
(72a)
$$\begin{aligned} \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right)&= \pi ^\circ _{\mathrm {RMC}}\left( {\mathbf {x}}\right) + \delta \pi '_{\mathrm {RMC}}\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( \delta ^2\right) . \end{aligned}$$
(72b)

We say that a statement holds under weak selection if it holds to first order in \(\delta \) as \(\delta \rightarrow 0^+\). We deal only with first-order expansions here; for the mathematical theory of higher-order perturbations of a Markov chain, see Silvestrov and Silvestrov (2017).

7.2 Success criteria for weak selection

Turning now to quantities describing selection, we have the following weak-selection expansion of fitness:

$$\begin{aligned} w_g\left( {\mathbf {x}}\right) = v_g + \delta w'_g\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( \delta ^2\right) , \end{aligned}$$
(73)

with

$$\begin{aligned} w'_g\left( {\mathbf {x}}\right)&= {\hat{b}}_g'\left( {\mathbf {x}}\right) - {\hat{d}}_g'\left( {\mathbf {x}}\right) = \sum _{h \in G} \left( e_{gh}'\left( {\mathbf {x}}\right) v_h - e_{hg}'\left( {\mathbf {x}}\right) v_g \right) . \end{aligned}$$
(74)

Above and throughout this section, the reproductive values \(v_g\) are understood to be computed at \(\delta = 0\), and not to vary with \(\delta \). We note that, in light of Eq. (52), \(\sum _{g \in G} w_g' \left( {\mathbf {x}}\right) = 0\) for each state, \({\mathbf {x}}\).

For the RV-weighted change due to selection, \(\hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) \), we note that Theorem 6 implies that \(\hat{{\Delta }}_{{\mathrm {sel}}}^\circ \left( {\mathbf {x}}\right) =0\) for all states \({\mathbf {x}}\). We therefore have the weak-selection expansion

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) = \delta \hat{{\Delta }}_{{\mathrm {sel}}}' \left( {\mathbf {x}}\right) + {\mathcal {O}}\left( \delta ^2\right) , \end{aligned}$$
(75)

with

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}' \left( {\mathbf {x}}\right)&= \sum _{g\in G}x_{g}w_{g}'\left( {\mathbf {x}}\right) \nonumber \\&= \sum _{g,h \in G}x_g \left( e'_{gh}\left( {\mathbf {x}}\right) v_h - e'_{hg}\left( {\mathbf {x}}\right) v_g \right) \nonumber \\&= \frac{1}{2} \sum _{g,h \in G}\left( x_g - x_h\right) \left( e'_{gh}\left( {\mathbf {x}}\right) v_h - e'_{hg}\left( {\mathbf {x}}\right) v_g \right) . \end{aligned}$$
(76)

Our second main result is a weak-selection analogue of Theorems 4 and 5. It proves the equivalence of four success criteria: one based on fixation probability, one based on expected frequency, and two based on change due to selection.

Theorem 8

For any replacement rule \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\) satisfying Assumption 2, and any mutational bias \(\nu \), the following success criteria are equivalent:

  1. (a)

    \(\rho _A > \rho _a\) under weak selection;

  2. (b)

    \(\displaystyle \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] > \frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) } \) under weak selection;

  3. (c)

    \(\displaystyle {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] > K\nu \left( 1-\nu \right) \left( {\hat{b}}'\left( {{\mathbf {A}}}\right) -{\hat{b}}'\left( {{\mathbf {a}}}\right) - \frac{{\hat{b}}^{\circ }}{b^\circ }\left( b'\left( {{\mathbf {A}}}\right) -b'\left( {{\mathbf {a}}}\right) \right) \right) \);

  4. (d)

    \(\displaystyle \frac{d}{du} \, {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \Big |_{u=0}> \nu \left( 1-\nu \right) \left( {\hat{b}}'\left( {{\mathbf {A}}}\right) -{\hat{b}}'\left( {{\mathbf {a}}}\right) - \frac{{\hat{b}}^{\circ }}{b^\circ }\left( b'\left( {{\mathbf {A}}}\right) -b'\left( {{\mathbf {a}}}\right) \right) \right) \),

with K defined as in Eq. (29).

Conditions (a) and (b) above are the weak-selection versions of the corresponding criteria in Theorem 4. Conditions (c) and (d) involve expectations of \(\hat{{\Delta }}_{{\mathrm {sel}}}'\left( {\mathbf {x}}\right) \) over the neutral RMC and MSS distributions, respectively. However, these latter conditions also involve terms on the right-hand side that were not seen in our previous results. To gain intuition for these additional terms, it is helpful to note that Condition (c) can be rewritten as

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] + Kb^\circ \nu \left( 1-\nu \right) \frac{d}{d\delta } \left( \frac{{\hat{b}}\left( {\mathbf {a}}\right) }{b\left( {\mathbf {a}}\right) } - \frac{{\hat{b}}\left( {\mathbf {A}}\right) }{b\left( {\mathbf {A}}\right) } \right) \bigg |_{\delta =0} > 0. \end{aligned}$$
(77)

Equation (77) reveals that \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \) captures only part of the effects of weak selection on allele A. The other part, represented by the second term in Eq. (77), has to do with the average reproductive value of offspring created in the monoallelic states. For example, the average reproductive value of new offspring in state \({\mathbf {a}}\) is \({\hat{b}}\left( {\mathbf {a}}\right) /b\left( {\mathbf {a}}\right) \). If this quantity increases with \(\delta \), new A-mutants arising in state \({\mathbf {a}}\) will have additional reproductive value for \(\delta >0\), relative to the neutral drift (\(\delta =0\)) case. Such effects are accounted for in the second term of Eq. (77), or equivalently, in the right-hand sides of Conditions (c) and (d). In short, fitness-based quantities such as \(\hat{{\Delta }}_{{\mathrm {sel}}}\) account for only part of the direction of selection. This phenomenon is discussed in detail by Tarnita and Taylor (2014), who also proved the equivalence of (b) and (d) in the special case that \(b({\mathbf {a}}) = b({\mathbf {A}})\) for all \(\delta \geqslant 0\).

We also observe that Condition (b) above involves two limits, \(u \rightarrow 0\) and \(\delta \rightarrow 0\). These limits can be freely interchanged according to Proposition 4, so there is no concern regarding limit orderings.

Proof of Theorem 8

We first show the equivalence of (a) and (b). Theorem 1 gives

$$\begin{aligned} \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] = \frac{\nu b\left( {\mathbf {a}}\right) \rho _{A}}{ \nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}} . \end{aligned}$$
(78)

Note that for \(\delta =0\), we have \(\rho _A = \rho _a = {\hat{b}}^{\circ }/(n b^\circ )\) by Theorem 7, and both sides of Condition (b) become equal to \(\nu \). It therefore suffices to show that the first \(\delta \)-derivatives of \(\rho _A - \rho _a\) and \(\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] - \nu b\left( {\mathbf {a}}\right) /\left( \nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) \right) \) have the same sign at \(\delta = 0\). Applying Eq. (78), we compute:

$$\begin{aligned}&\frac{d}{d\delta } \left( \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] - \frac{\nu b\left( {\mathbf {a}}\right) }{\nu b\left( {\mathbf {a}}\right) + \left( 1-\nu \right) b\left( {\mathbf {A}}\right) } \right) \Big |_{\delta =0} \nonumber \\&\quad = \nu \left( 1-\nu \right) \left( \frac{nb^\circ }{{\hat{b}}^\circ } \left( \rho _A' - \rho _a'\right) + \frac{b'\left( {\mathbf {a}}\right) - b'\left( {\mathbf {A}}\right) }{b^\circ } \right) - \nu \left( 1-\nu \right) \frac{b'\left( {\mathbf {a}}\right) - b'\left( {\mathbf {A}}\right) }{b^\circ } \nonumber \\&\quad = \nu \left( 1-\nu \right) \frac{nb^\circ }{{\hat{b}}^\circ } \left( \rho _A' - \rho _a'\right) . \end{aligned}$$
(79)

This shows (a) \(\Leftrightarrow \) (b). We now show (a) \(\Leftrightarrow \) (c). From Eq. (58), we have

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right]&= K \lim _{u \rightarrow 0}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ \sum _{g \in G} \left( x_g-\nu \right) {\hat{b}}_g \right] \nonumber \\&= K \left( \left( 1-\nu \right) {\hat{b}}\left( {\mathbf {A}}\right) \lim _{u\rightarrow 0}\pi _{{\mathrm {MSS}}}\left( {\mathbf {A}}\right) -\nu {\hat{b}}\left( {\mathbf {a}}\right) \lim _{u\rightarrow 0}\pi _{{\mathrm {MSS}}}\left( {\mathbf {a}}\right) \right) \nonumber \\&= K \nu \left( 1-\nu \right) \frac{{\hat{b}}\left( {\mathbf {A}}\right) b\left( {\mathbf {a}}\right) \rho _A - {\hat{b}}\left( {\mathbf {a}}\right) b\left( {\mathbf {A}}\right) \rho _a}{\nu b\left( {\mathbf {a}}\right) \rho _{A}+\left( 1-\nu \right) b\left( {\mathbf {A}}\right) \rho _{a}} , \end{aligned}$$
(80)

where the last line comes from Theorem 1. Differentiating both sides with respect to \(\delta \) at \(\delta =0\) gives

$$\begin{aligned}&\frac{d}{d\delta } \, {\mathbb {E}}_{{\mathrm {RMC}}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] \Big \vert _{\delta =0} \nonumber \\&\quad = K\nu \left( 1-\nu \right) \left( nb^{\circ }\left( \rho _{A}'-\rho _{a}'\right) +{\hat{b}}'\left( {\mathbf {A}}\right) -{\hat{b}}'\left( {\mathbf {a}}\right) - \frac{{\hat{b}}^{\circ }}{b^{\circ }}\left( b'\left( {\mathbf {A}}\right) -b'\left( {\mathbf {a}}\right) \right) \right) . \end{aligned}$$
(81)

We rewrite the left-hand side of this equation as

$$\begin{aligned} \frac{d}{d\delta } \, {\mathbb {E}}_{{\mathrm {RMC}}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}\right] \Big \vert _{\delta =0}&= \sum _{{\mathbf {x}}\in \left\{ 0,1\right\} ^G\setminus \left\{ {\mathbf {a}},{\mathbf {A}}\right\} } \frac{d}{d\delta } \left( \pi _{\mathrm {RMC}}\left( {\mathbf {x}}\right) \, \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) \right) \Big \vert _{\delta =0} \nonumber \\&= \sum _{{\mathbf {x}}\in \left\{ 0,1\right\} ^G\setminus \left\{ {\mathbf {a}},{\mathbf {A}}\right\} } \pi _{\mathrm {RMC}}^\circ \left( {\mathbf {x}}\right) \hat{{\Delta }}_{{\mathrm {sel}}}'\left( {\mathbf {x}}\right) \nonumber \\&= {\mathbb {E}}_{{\mathrm {RMC}}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] . \end{aligned}$$
(82)

In the second line above, all terms of the form \(\pi _{\mathrm {RMC}}'\left( {\mathbf {x}}\right) \hat{{\Delta }}_{{\mathrm {sel}}}^\circ \left( {\mathbf {x}}\right) \) vanish since \(\hat{{\Delta }}_{{\mathrm {sel}}}^\circ \left( {\mathbf {x}}\right) =0\) for all states \({\mathbf {x}}\) by Theorem 6. Combining Eqs. (81) and (82) gives the equivalence of (a) and (c). Finally, (a) and (d) are equivalent by Lemma 2, completing the proof. \(\square \)

If the conditions of Theorem 8 are satisfied, we say that allele A is favored under weak selection. The power of Theorem 8 lies in the fact that, for many models of interest, Conditions (c) and (d) are easier to evaluate than (a) and (b) because (c) and (d) involve expectations taken at neutrality (\(\delta =0\)), meaning that the probabilities of replacement events are independent of the state. At neutrality, the recurrence relations governing the RMC and MSS distributions simplify greatly, in many cases allowing conditions (c) or (d) to be simplified to closed form (see Sect. 9 for examples).

Still, Conditions (c) and (d) are not as simple as one might hope. Evaluating them requires computing the full value (not just the sign) of either \(\frac{d}{du} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \big |_{u=0}\) or both \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \) and K. However, if we assume that the replacement probabilities in the monoallelic states satisfy Assumption 1 and are independent of \(\delta \), only the sign of \(\frac{d}{du} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \big |_{u=0}\) or \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \) is needed, as shown below:

Corollary 1

Let \(\left\{ p_{\left( R, \alpha \right) } \left( {\mathbf {x}}\right) \right\} _{\left( R,\alpha \right) }\) be a replacement rule satisfying Assumption 2. Suppose that \(p_{\left( R,\alpha \right) }\left( {\mathbf {a}}\right) \) and \(p_{\left( R,\alpha \right) }\left( {\mathbf {A}}\right) \) are equal to each other and independent of \(\delta \), for all replacement events \(\left( R,\alpha \right) \) and all sufficiently small \(\delta \geqslant 0\). Then, for any mutational bias \(\nu \), the following success criteria are equivalent:

  1. (a)

    \(\rho _A > \rho _a\) under weak selection;

  2. (b)

    \(\lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x \right] \; (= \lim _{u \rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ {\hat{x}} \right] ) > \nu \) under weak selection;

  3. (c)

    \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}' \right] >0\);

  4. (d)

    \(\frac{d}{du} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \big |_{u=0} > 0\).

The proof follows immediately from Theorem 8. Aspects of this result, in the special case that all sites have the same reproductive value, were proven in Theorem 2 of Nowak et al. (2010b) and in Eq. (16) of Van Cleve (2015). Additionally, a number of instances of this result have been obtained for particular models (Rousset and Billiard 2000; Leturque and Rousset 2002; Lessard and Ladret 2007; Taylor et al. 2007a; Antal et al. 2009a; Wakano et al. 2013; Débarre et al. 2014; Allen et al. 2017).

8 Results for individuals

All of the above results apply equally to haploid, diploid, haplodiploid, or polyploid populations, because the analysis is based on genetic sites rather than individuals. However, any formalism for natural selection must reserve a special role for the individual, as the entity that carries and is shaped by its genetic material. Here, we connect the individual-level and gene-level perspectives with definitions and results that apply at the level of the individual.

We recall from Sect. 2.1 that the set of genetic sites, G, is partitioned into a collection, \(\left\{ G_i \right\} _{i \in I}\), where \(G_i\) is the set of sites residing in individual \(i \in I\). The ploidy of individual i is \(n_i :=\left| G_i\right| \). The notation \(g \sim h\) indicates that sites g and h reside in the same individual.

8.1 Assumptions

Moving to an individual-level perspective requires the introduction of additional assumptions. First, we assume that an individual’s alleles survive or die all together (along with the individual itself). Thus, if one of an individual’s genetic sites is replaced, then all of them are. This excludes the possibility that, for example, a virus causes a germline mutation in one of a diploid individual’s alleles.

Assumption 3

(Coherence of individuals) If genetic sites \(g,h \in G\) reside in the same individual, \(g \sim h\), then for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\) and each replacement event \(\left( R,\alpha \right) \) with \(p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) >0\), either \(g,h \in R\) or \(g,h \notin R\).

The second assumption is that meiosis is fair: each of the alleles in an individual is equally likely to be passed on to offspring when this individual reproduces. This assumption excludes the possibility of meiotic drive (Sandler and Novitski 1957; Lindholm et al. 2016).

Assumption 4

(Fair meiosis) Let \(\left( R,\alpha _1\right) \) and \(\left( R, \alpha _2\right) \) be replacement events with the same set \(R \subseteq G\) of replaced individuals. If \(\alpha _1\left( g\right) \sim \alpha _2\left( g\right) \) for all \(g \in R\), then \(p_{\left( R, \alpha _1\right) }\left( {\mathbf {x}}\right) = p_{\left( R,\alpha _2\right) }\left( {\mathbf {x}}\right) \) for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\).

In words, the probability of a replacement event depends only on the (individual) parent of each replaced site, not on which allele is inherited from that parent.

8.2 Fitness and selection at the individual level

As immediate consequences of Assumptions 3 and 4, we see that sites residing in the same individual have the same birth rate, death rate, reproductive value, and fitness:

Lemma 3

Suppose the replacement rule \(\left\{ p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) \right\} _{\left( R, \alpha \right) }\) satisfies Assumptions 3 and 4. If sites \(g, h \in G\) reside in the same individual, \(g \sim h\), then \(d_g\left( {\mathbf {x}}\right) =d_h\left( {\mathbf {x}}\right) \) and \(e_{g \ell }\left( {\mathbf {x}}\right) =e_{h \ell }\left( {\mathbf {x}}\right) \) for each state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\) and each site \(\ell \in G\). If, furthermore, Assumption 1 holds, then \(v_{g}=v_{h}\) and \(w_{g}\left( {\mathbf {x}}\right) =w_{h}\left( {\mathbf {x}}\right) \) for each \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\).

Proof

Fix sites \(g,h \in G\) with \(g \sim h\). Assumption 3 and Eq. (6) imply that \(d_{g}\left( {\mathbf {x}}\right) =d_{h}\left( {\mathbf {x}}\right) \) for all states \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\). Assumption 4 implies that

$$\begin{aligned} \sum _{\begin{array}{c} \left( R,\alpha \right) \\ \alpha \left( \ell \right) =g \end{array}} p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) = \sum _{\begin{array}{c} \left( R,\alpha \right) \\ \alpha \left( \ell \right) =h \end{array}} p_{\left( R,\alpha \right) }\left( {\mathbf {x}}\right) , \end{aligned}$$
(83)

which is equivalent to \(e_{g\ell }\left( {\mathbf {x}}\right) = e_{h\ell }\left( {\mathbf {x}}\right) \).

If Assumption 1 holds, then the above arguments imply that \(d_g^\circ = d_h^\circ \) and \(e_{g\ell }^{\circ }=e_{h\ell }^{\circ }\) for all \(\ell \in G\). It follows that \(v_{g}=\frac{1}{d_{g}^{\circ }}\sum _{\ell \in G}e_{g\ell }^{\circ }v_{\ell }=\frac{1}{d_{h}^{\circ }}\sum _{\ell \in G}e_{h\ell }^{\circ }v_{\ell }=v_{h}\), as desired.

The definitions in Eqs. (48a) and (48b) now imply that \({\hat{b}}_g\left( {\mathbf {x}}\right) ={\hat{b}}_h\left( {\mathbf {x}}\right) \) and \({\hat{d}}_g\left( {\mathbf {x}}\right) ={\hat{d}}_h\left( {\mathbf {x}}\right) \) for all states \({\mathbf {x}}\), from which it follows that \(w_g\left( {\mathbf {x}}\right) =w_h\left( {\mathbf {x}}\right) \) for all \({\mathbf {x}}\). \(\square \)

Moving to individual-level quantities, we can identify the type of an individual \(i \in I\), in state \({\mathbf {x}}\in \left\{ 0,1\right\} ^G\), by its fraction of A alleles:

$$\begin{aligned} X_i :=\frac{1}{n_i} \sum _{g \in G_i} x_g. \end{aligned}$$
(84)

For example, a diploid heterozygote (genotype Aa) has \(X_i = 1/2\). The reproductive value and fitness of individual \(i \in I\) are defined by summing the corresponding quantities over all sites in I:

$$\begin{aligned} V_i&:=\sum _{g \in G_i} v_g ; \end{aligned}$$
(85a)
$$\begin{aligned} W_i\left( {\mathbf {x}}\right)&:=\sum _{g \in G_i} w_g\left( {\mathbf {x}}\right) . \end{aligned}$$
(85b)

If Assumptions 13, and 4 hold, then all sites in the same individual have the same reproductive value and fitness according to Lemma 3, and it follows that \(V_i = n_i v_g\) and \(W_i\left( {\mathbf {x}}\right) = n_i w_g\left( {\mathbf {x}}\right) \) for any \(g \in G_i\).

We can use the above definitions to express the RV-weighted change due to selection, \(\hat{{\Delta }}_{{\mathrm {sel}}}\), using only quantities that apply at the level of the individual:

Proposition 5

For any replacement rule satisfying Assumptions 13, and 4, and any state \({\mathbf {x}}\),

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) = \sum _{i \in I} X_i \left( W_i\left( {\mathbf {x}}\right) -V_i\right) . \end{aligned}$$
(86)

Proof

For each individual \(i \in I\) we calculate

$$\begin{aligned} X_i \left( W_i\left( {\mathbf {x}}\right) -V_i\right)&= \left( \frac{1}{n_i} \sum _{g \in G_i} x_g \right) \left( \sum _{h \in G_i} w_h({\mathbf {x}}) - \sum _{h \in G_i} v_h\right) \nonumber \\&= \left( \sum _{g \in G_i} x_g \right) \left( \frac{1}{n_i} \sum _{h \in G_i} \left( w_h\left( {\mathbf {x}}\right) -v_h\right) \right) . \end{aligned}$$
(87)

Lemma 3 implies that \(\frac{1}{n_i} \sum _{h \in G_i} \left( w_h\left( {\mathbf {x}}\right) -v_h\right) =w_g\left( {\mathbf {x}}\right) -v_g\) for any \(g \in G_i\); thus, the right-hand side above is equal to \(\sum _{g \in G_i} x_g \left( w_g\left( {\mathbf {x}}\right) -v_g\right) \). Now summing over all individuals \(i \in I\) we have

$$\begin{aligned} \sum _{i \in I} X_i \left( W_i\left( {\mathbf {x}}\right) -V_i\right)&= \sum _{i \in I} \sum _{g \in G_i} x_g \left( w_g\left( {\mathbf {x}}\right) -v_g\right) \nonumber \\&= \sum _{g \in G} x_g \left( w_g\left( {\mathbf {x}}\right) -v_g\right) \nonumber \\&= \hat{{\Delta }}_{{\mathrm {sel}}}\left( {\mathbf {x}}\right) , \end{aligned}$$
(88)

as desired. \(\square \)

We can use Proposition 5 to restate the criteria for success in Theorems 5 and 8 using individual-level quantities. For example, if Assumptions 13, and 4) hold, then A is favored by selection if and only if \(\sum _{i \in I} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}\left[ X_i \left( W_i -V_i\right) \right] > 0\). Likewise, if Assumptions 234, and the assumptions of Corollary 1 hold, then A is favored under weak selection if and only if \(\sum _{i \in I} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ X_i W_i' \right] > 0\).

If Assumptions 3 and 4 do not hold, then sites in the same individual may have different reproductive value and/or fitness. These differences would not be reflected in the individual-level quantities \(V_i\) and \(W_i\); thus, the criteria for success may not be expressible in terms of these quantities.

9 Examples

We illustrate the application of our formalism to two examples: evolutionary games on an arbitrary weighted graph (Allen et al. 2017), and a haplodiploid population in which alleles may affect males and females differently. In each case, we show how Conditions (c) and (d) of Corollary 1 can be evaluated to obtain tractable conditions for success under natural selection.

9.1 Games on graphs

Evolutionary games on graphs (Nowak and May 1992; Blume 1993; Santos and Pacheco 2005; Ohtsuki et al. 2006; Szabó and Fáth 2007; Chen 2013; Allen and Nowak 2014; Débarre et al. 2014; Peña et al. 2016) are a well-studied mathematical model for the evolution of social behavior. Individuals play a game with neighbors, and payoffs from this game determine reproductive success. Analytical results were first obtained for regular graphs (Ohtsuki et al. 2006; Taylor et al. 2007b; Cox et al. 2013; Chen 2013; Allen and Nowak 2014; Débarre et al. 2014; Durrett 2014; Peña et al. 2016) and have recently been extended to arbitrary weighted graphs (Allen et al. 2017; Fotouhi et al. 2018).

9.1.1 Model

Population structure is represented as a weighted (undirected) graph G, which we assume to be connected. Each vertex is always occupied by a single haploid individual (see Fig. 3). The edge weight between vertices \(g,h \in G\), denoted \(\omega _{gh} \geqslant 0\), indicates the strength of spatial relationship between these vertices. It is helpful to define the weighted degree of vertex g as \(\omega _g = \sum _{h \in G} \omega _{gh}\). The random-walk step probability from g to h is denoted \(p_{gh}=\omega _{gh}/\omega _g\). The probability that an m-step random walk from g terminates at h is denoted \(p_{gh}^{\left( m\right) }\).

Fig. 3
figure 3

Three examples of graph-structured populations. In each population, blue indicates that a location is occupied by the allele A, while red indicates the allele at that location is a. a A complete graph, wherein each player is a neighbor of (i.e. shares a link with) every other player in the population. b A (non-complete) regular graph, for which all individuals have the same number of neighbors (4, in this instance). c A weighted heterogeneous graph, for which both the number of neighbors and the weights of the connections (given by the shading of the links) may vary from player to player. Historically, the analysis of evolutionary games on graphs has proceeded in order of increasing asymmetry, from a to b to c (color figure online)

In each state of the process, each individual interacts with each of its neighbors according to a game a \(2 \times 2\) matrix game of the form

(89)

In each state \({\mathbf {x}}\), each individual g retains the edge-weighted average payoff it receives from neighbors, given by

$$\begin{aligned} f_g ({\mathbf {x}})= & {} \sum _{\ell \in G} p_{g\ell } \left( f_{AA} x_g x_\ell + f_{Aa} x_g \left( 1-x_\ell \right) + f_{aA} \left( 1-x_g\right) x_\ell \right. \nonumber \\&\left. + f_{aa} \left( 1-x_g\right) \left( 1-x_\ell \right) \right) . \end{aligned}$$
(90)

Game payoff is translated into fecundity by \(F_g\left( {\mathbf {x}}\right) = 1 + \delta f_g\left( {\mathbf {x}}\right) \), where \(\delta >0\) is a parameter representing the strength of selection.

Reproduction and replacement proceed according to a specified update rule (Ohtsuki et al. 2006) (Fig. 4). For Birth–Death (BD) updating, an individual g is chosen at random, with probability proportional to reproductive rate \(F_g\left( {\mathbf {x}}\right) \), to (asexually) produce an offspring. This offspring replaces a random neighbor of g, chosen proportionally to edge weight \(\omega _{gh}\). For Death–Birth (DB) updating, an individual h is chosen, uniformly at random, to be replaced. Then, a neighbor g is chosen, proportionally to \(\omega _{gh} F_g\left( {\mathbf {x}}\right) \), to produce an offspring to fill the vacancy. Mutations are resolved in accordance with our framework (Sect. 2.4), leading to a new state.

9.1.2 Basic quantities

Since only one individual is replaced per time-step, any replacement event with positive probability has the form \((\{h\}, h \mapsto g)\), meaning that the occupant of site h is replaced by the offspring of site g. The nonzero probabilities in the replacement rule are given by

$$\begin{aligned} p_{\left( \left\{ h\right\} ,h \mapsto g\right) }\left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \displaystyle \left( \frac{1 + \delta f_g\left( {\mathbf {x}}\right) }{\sum _{\ell \in G} \left( 1 + \delta f_\ell \left( {\mathbf {x}}\right) \right) } \right) p_{gh} &{} \text {for BD updating} , \\ \displaystyle \frac{1}{n} \left( \frac{\omega _{gh} \left( 1+\delta f_g\left( {\mathbf {x}}\right) \right) }{\sum _{\ell \in G} \omega _{\ell h} \left( 1+\delta f_\ell \left( {\mathbf {x}}\right) \right) }\right) &{} \text {for DB updating} . \end{array}\right. } \end{aligned}$$
(91)

Above, \(f_g({\mathbf {x}})\) is the payoff to g in state \({\mathbf {x}}\), given by Eq. (90). We note also that \(e_{gh}\left( {\mathbf {x}}\right) = p_{\left( \left\{ h\right\} ,h \mapsto g\right) }\left( {\mathbf {x}}\right) \) for each \(g,h \in G\).

For DB updating, new mutants are equally likely to appear at each vertex since each vertex is equally likely to be replaced in each state. Thus, the mutant appearance distribution for DB updating is

$$\begin{aligned} \mu _A \left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \frac{1}{n} &{} \text {if} \, {\mathbf {x}}= {{\mathbf {1}}}_{\left\{ g\right\} } \, \text {for some} \, g \in G, \\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(92)

The formula for \(\mu _a\left( {\mathbf {x}}\right) \) is analogous. For BD updating, the mutant appearance is distribution is nonuniform and given by

$$\begin{aligned} \mu _A \left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \frac{1}{n} {\sum }_{h \in G} p_{hg} &{} \text {if} \, {\mathbf {x}}= {{\mathbf {1}}}_{\left\{ g\right\} } \, \text {for some} \, g \in G, \\ 0 &{} \text {otherwise} , \end{array}\right. } \end{aligned}$$
(93)

and analogously for \(\mu _a\left( {\mathbf {x}}\right) \).

Fig. 4
figure 4

Birth–Death (BD) and Death–Birth (DB) updating on a graph. Each node is occupied by a haploid individual with allele A (blue) or a (red). Under BD updating, an individual is first chosen for reproduction with probability proportional to fecundity (large blue node). The offspring then replaces a neighbor, with probability determined by the weights of the outgoing links. Under DB updating, an individual is chosen for death uniformly at random from the population (empty node). The neighbors of this individual then compete to reproduce and fill the vacancy, with probability determined by both fecundity and the weights of the incoming links to the empty node (color figure online)

Both BD and DB updating satisfy Assumption 2, and therefore have a well-defined neutral process. The probability that vertex h is replaced by the offspring of vertex g, given by Eq. (91), reduces under the neutral process to

$$\begin{aligned} e^\circ _{gh} = {\left\{ \begin{array}{ll} p_{gh}/n &{} \text {for BD updating} , \\ p_{hg}/n &{} \text {for DB updating} . \end{array}\right. } \end{aligned}$$
(94)

The reproductive value of a vertex g is proportional to its weighted degree for DB updating, and inversely proportional to its weighted degree for BD updating:

$$\begin{aligned} v_g = {\left\{ \begin{array}{ll} \displaystyle n \frac{\omega _g^{-1}}{{\tilde{{\varOmega }}}} &{} \text {for BD updating} , \\ \displaystyle n \frac{\omega _g}{{\varOmega }} &{} \text {for DB updating,} \end{array}\right. } \end{aligned}$$
(95)

where \({\varOmega }= \sum _{h \in G} \omega _h\) and \({\tilde{{\varOmega }}}= \sum _{h \in G} \omega ^{-1}_h\) are the total weighted degree and total inverse weighted degree, respectively (see Fig. 5). These reproductive values were first discovered by Maciejewski (2014) for unweighted graphs and generalized to weighted graphs by Allen et al. (2017).

Fig. 5
figure 5

Reproductive values on a weighted graph for BD and DB updating. The weighted degree\(\omega _g\) of vertex g is defined as \(\omega _{g}:=\sum _{h\in G}\omega _{gh}\), where \(\omega _{gh}\) is the weight of the edge between vertices g and h. a For BD updating, the reproductive value of a vertex is inversely proportional to weighted degree: \(v_g = n \omega _g^{-1}/\sum _{g \in G} \omega _g^{-1}\). b For DB updating the reproductive value of a vertex is directly proportional to its weighted degree: \(v_g = n \omega _g/\sum _{g \in G} \omega _g\). For both panels, vertices are sized proportionally to reproductive value

A weak-selection expansion of Eq. (91) yields

$$\begin{aligned} e_{gh}' \left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{p_{gh}}{n} \left( f_g\left( {\mathbf {x}}\right) - \frac{1}{n} \sum _{\ell \in G} f_\ell \left( {\mathbf {x}}\right) \right) &{} \text {for BD updating} , \\ \displaystyle \frac{p_{hg}}{n} \left( f_g\left( {\mathbf {x}}\right) - \sum _{\ell \in G} p_{h \ell } f_\ell \left( {\mathbf {x}}\right) \right) &{} \text {for DB updating} . \end{array}\right. } \end{aligned}$$
(96)

The fitness of vertex g in state \({\mathbf {x}}\) has the weak-selection expansion

$$\begin{aligned} w_g\left( {\mathbf {x}}\right) = v_g + \delta w_g'\left( {\mathbf {x}}\right) + {\mathcal {O}}\left( \delta ^2\right) , \end{aligned}$$
(97)

where

$$\begin{aligned} w_g'\left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{1}{n {\tilde{{\varOmega }}}} \sum _{h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) &{} \text {for BD updating} , \\ \displaystyle \frac{1}{n {\varOmega }} \sum _{g,h \in G} \omega _g p_{gh}^{\left( 2\right) } \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) &{} \text {for DB updating}. \end{array}\right. } \end{aligned}$$
(98)

The first-order term of the RV-weighted change due to selection can be written

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}'\left( {\mathbf {x}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{1}{2n {\tilde{{\varOmega }}}} \sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) &{} \text {for BD updating} , \\ \displaystyle \frac{1}{2n {\varOmega }} \sum _{g,h \in G} \omega _g p_{gh}^{\left( 2\right) } \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) &{} \text {for DB updating} . \end{array}\right. } \end{aligned}$$
(99)

9.1.3 Condition for success: Birth–Death

Applying Corollary 1, we obtain that for BD, type A is favored under weak selection if and only if

$$\begin{aligned} \sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) \right] > 0 . \end{aligned}$$
(100)

To express Condition (100) in terms of the entries of the payoff matrix (89), we use Eq. (90) to calculate

$$\begin{aligned}&{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) -f_h\left( {\mathbf {x}}\right) \right) \right] \nonumber \\&\quad = \sum _{\ell \in G} \Big ( f_{AA} \left( p_{g\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g \left( 1-x_h\right) x_\ell \right] + p_{h\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( 1-x_g\right) x_h x_\ell \right] \right) ) \nonumber \\&\qquad + f_{Aa} \left( p_{g\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g\left( 1-x_h\right) \left( 1-x_\ell \right) \right] + p_{h\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( 1-x_g\right) x_h \left( 1-x_\ell \right) \right] \right) \nonumber \\&\qquad - f_{aA} \left( p_{g\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( 1-x_g\right) x_h x_\ell \right] + p_{h\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g \left( 1-x_h\right) x_\ell \right] \right) \nonumber \\&\qquad - f_{aa} \left( p_{g\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( 1-x_g\right) x_h \left( 1-x_\ell \right) \right] + p_{h\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g \left( 1-x_h\right) \left( 1-x_\ell \right) \right] \right) \Big ) . \end{aligned}$$
(101)

We can reduce to pairwise quantities by noting that Proposition 2 implies

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_h x_\ell \right] = {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( 1-x_g\right) \left( 1- x_h\right) \left( 1- x_\ell \right) \right] , \end{aligned}$$
(102)

which leads to the identity

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_h x_\ell \right] = \frac{1}{2}\left( {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] + {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_\ell \right] + {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_h x_\ell \right] \right) - \frac{1}{4} . \end{aligned}$$
(103)

Applying this identity, Eq. (101) reduces to

$$\begin{aligned}&{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) -f_h\left( {\mathbf {x}}\right) \right) \right] \nonumber \\&\quad = \frac{1}{2} \Big (\left( f_{AA} + f_{Aa} - f_{aA} - f_{aa}\right) \left( 1 - 2{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] \right) \nonumber \\&\qquad + \left( f_{AA} - f_{Aa} + f_{aA} - f_{aa}\right) \sum _{\ell \in G} \left( p_{g\ell } - p_{h \ell }\right) \nonumber \\&\qquad \times \left( {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_\ell \right] - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_h x_\ell \right] \right) \Big ). \end{aligned}$$
(104)

Substituting and rearranging, we can rewrite Condition (100) as

$$\begin{aligned}&\sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \Big (\left( f_{AA} + f_{Aa} - f_{aA} - f_{aa}\right) \left( \frac{1}{2} - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] \right) \nonumber \\&\quad + \left( f_{AA} - f_{Aa} + f_{aA} - f_{aa}\right) \sum _{\ell \in G} p_{g\ell } \left( {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_\ell \right] - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_h x_\ell \right] \right) \Big ) > 0.\nonumber \\ \end{aligned}$$
(105)

It remains to describe how to compute the pairwise expectations \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] \). Let us define the centered variables \({\underline{x}}_g = x_g - \nu \). Working through the possibilities of a single replacement event under neutral drift, one arrives at the following recurrence relation: for all pairs of sites \(g \ne h\),

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right]= & {} \frac{1-u}{\sum _{\ell \in G} \left( p_{\ell g} + p_{\ell h}\right) } \nonumber \\&\sum _{\ell \in G} \left( p_{\ell g} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_\ell {\underline{x}}_h\right] + p_{\ell h}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g {\underline{x}}_\ell \right] \right) . \end{aligned}$$
(106)

Define the state function

$$\begin{aligned} \phi _{gh}\left( {\mathbf {x}}\right) = {\underline{x}}_g{\underline{x}}_h - \frac{1}{\sum _{\ell \in G} \left( p_{\ell g} + p_{\ell h}\right) } \sum _{\ell \in G} \left( p_{\ell g} {\underline{x}}_\ell {\underline{x}}_h + p_{\ell h} {\underline{x}}_g {\underline{x}}_\ell \right) . \end{aligned}$$
(107)

From the recurrence relation (106), we have

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \phi _{gh}\right] = -\,\frac{u}{1-u} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] \end{aligned}$$
(108)

Note also that \(\phi _{gh}\left( {\mathbf {a}}\right) = \phi _{gh}\left( {\mathbf {A}}\right) = 0\). Therefore, by Lemma 2, there exists \(K>0\) such that, for all \(g,h \in G\) with \(g \ne h\),

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \phi _{gh}\right]&= K \frac{d}{du} \bigg |_{u=0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ \phi _{gh}\right] \nonumber \\&= K \frac{d}{du} \bigg |_{u=0} \left( -\frac{u}{1-u} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] \right) \nonumber \\&= -\, K {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] \bigg |_{u=0} \nonumber \\&= -\, K \nu \left( 1-\nu \right) . \end{aligned}$$
(109)

Substituting in from Eq. (107), we see that for all \(g,h \in G\) with \(g \ne h\),

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right]&= \frac{1}{\sum _{\ell \in G} \left( p_{\ell g} + p_{\ell h}\right) } \sum _{\ell \in G} \left( p_{\ell g} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_\ell {\underline{x}}_h\right] + p_{\ell h}{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g {\underline{x}}_\ell \right] \right) \nonumber \\&\quad \quad - K \nu \left( 1-\nu \right) . \end{aligned}$$
(110)

We define the quantity \(\tau _{gh}\), for all pairs \(g,h \in G\), by

$$\begin{aligned} \tau _{gh} = \frac{\frac{1}{2} - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] }{K \nu \left( 1-\nu \right) } = \frac{\frac{1}{2} - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] - \nu \left( 1-\nu \right) }{K \nu \left( 1-\nu \right) }. \end{aligned}$$
(111)

Note that \(\tau _{gg}=0\) for all \(g \in G\) since \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g^2\right] ={{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g\right] =\frac{1}{2}\) by Proposition 2. Equation (110) then leads to the recurrence relations:

$$\begin{aligned} \tau _{gh} = {\left\{ \begin{array}{ll} 1 + \frac{\sum _{\ell \in G} \left( p_{\ell g} \tau _{\ell h} + p_{\ell h} \tau _{g \ell } \right) }{\sum _{\ell \in G} \left( p_{\ell g} + p_{\ell h}\right) } &{} g \ne h , \\ 0 &{} g=h . \end{array}\right. } \end{aligned}$$
(112)

The system of linear equations (112) can be solved for the \(\tau _{gh}\) on any given graph; the solution exists and is unique provided that G is connected. The condition for success can then be rewritten in terms of the \(\tau _{gh}\): By Corollary 1, A is favored under weak selection if and only if

$$\begin{aligned}&\sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \Big (\left( f_{AA} + f_{Aa} - f_{aA} - f_{aa}\right) \tau _{gh} \nonumber \\&\quad +\, (f_{AA} - f_{Aa} + f_{aA} - f_{aa}) \sum _{\ell \in G} p_{g\ell }\left( \tau _{h \ell } - \tau _{g \ell }\right) \Big ) > 0 . \end{aligned}$$
(113)

The \(\tau _{gh}\) can be interpreted as coalescence times for a discrete-time coalescing random walk (CRW) process; however, this interpretation is not necessary here. We note that Eq. (112) differs from the recurrence for the CRW for BD updating presented in Allen et al. (2017). The difference arises from the way initial mutants are introduced; in Allen et al. (2017) it is assumed that the location of the initial mutant is chosen uniformly among vertices; here, the initial mutant location is chosen according to the mutant appearance distribution given in Eq. (93).

We observe that Condition (113) can be written in the form

$$\begin{aligned} \sigma f_{AA} + f_{Aa} > f_{aA} + \sigma f_{aa}, \end{aligned}$$
(114)

with

$$\begin{aligned} \sigma = \frac{ \sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \left( \tau _{gh} - \sum _{\ell \in G} \left( p_{g\ell } - p_{h \ell }\right) \tau _{g \ell } \right) }{\sum _{g,h \in G} \frac{\omega _{gh}}{\omega _g \omega _h} \left( \tau _{gh} + \sum _{\ell \in G} \left( p_{g\ell } - p_{h \ell }\right) \tau _{g \ell } \right) } . \end{aligned}$$
(115)

Equation (114) is an instance of the Structure Coefficient Theorem (Tarnita et al. 2009b; Nowak et al. 2010a; Allen et al. 2013), which states that, for a quite general class of evolutionary game models, the condition for success under weak selection takes the form (114) for some “structure coefficient,” \(\sigma \).

9.1.4 Conditions for success: Death–Birth

For DB updating, applying Corollary 1 to Eq. (99), we obtain that \(\rho _A > \rho _a\) under weak selection if and only if

$$\begin{aligned} \sum _{g,h \in G} \omega _g p_{gh}^{\left( 2\right) } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ \left( x_g - x_h\right) \left( f_g\left( {\mathbf {x}}\right) - f_h\left( {\mathbf {x}}\right) \right) \right] > 0. \end{aligned}$$
(116)

Applying Eq. (104), this condition becomes

$$\begin{aligned}&\sum _{g,h \in G} \omega _g p_{gh}^{(2)} \Bigg (\left( f_{AA} + f_{Aa} - f_{aA} - f_{aa}\right) \left( 1 - 2{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] \right) \nonumber \\&\quad + \left( f_{AA} - f_{Aa} + f_{aA} - f_{aa}\right) \sum _{\ell \in G} (p_{g\ell } - p_{h \ell }) \nonumber \\&\quad \times \left( {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_g x_\ell \right] - {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_h x_\ell \right] \right) \Bigg )> 0. \end{aligned}$$
(117)

To obtain the quantities \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ x_gx_h\right] \), we note that DB updating has a recurrence equation analogous to Eq. (106):

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] = \frac{1-u}{2} \sum _{\ell \in G} \left( p_{g\ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_\ell {\underline{x}}_h\right] + p_{h\ell }{{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}^\circ \left[ {\underline{x}}_g {\underline{x}}_\ell \right] \right) , \end{aligned}$$
(118)

where \({\underline{x}}_g=x_g-\nu \) as before. Upon defining

$$\begin{aligned} \phi _{gh}({\mathbf {x}}) = {\underline{x}}_g{\underline{x}}_h - \frac{1}{2} \sum _{\ell \in G} \left( p_{g\ell } {\underline{x}}_\ell {\underline{x}}_h + p_{h\ell } {\underline{x}}_g {\underline{x}}_\ell \right) , \end{aligned}$$
(119)

we find that Eq. (108) holds for DB updating as well. Following the argument of Sect. 9.1.3, we obtain the following analogue of Eq. (110):

$$\begin{aligned} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right]= & {} \frac{1}{2} \sum _{\ell \in G} \left( p_{g \ell } {{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_\ell {\underline{x}}_h\right] + p_{h\ell }{{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g {\underline{x}}_\ell \right] \right) \nonumber \\&- K \nu \left( 1-\nu \right) . \end{aligned}$$
(120)

Defining the quantities \(\tau _{gh}\) for \(g,h \in G\) according to Eq. (111), we arrive at the DB analogue of the recurrence relations (112):

$$\begin{aligned} \tau _{gh} = {\left\{ \begin{array}{ll} 1 + \frac{1}{2} {\sum }_{\ell \in G} \left( p_{g \ell } \tau _{\ell h} + p_{h \ell } \tau _{g \ell } \right) &{} g \ne h , \\ 0 &{} g=h . \end{array}\right. } \end{aligned}$$
(121)

These \(\tau _{gh}\) are precisely the pairwise coalescence times studied by Allen et al. (2017), and they can be obtained for any given (weighted, connected) graph by solving Eq. (121) as a linear system of equations. If we now define, for \(m \geqslant 0\),

$$\begin{aligned} \tau ^{\left( m\right) } :=\sum _{g,h \in G} \frac{\omega _g}{{\varOmega }} p_{gh}^{\left( m\right) } \tau _{gh} , \end{aligned}$$
(122)

then the condition for success (117) can be rewritten as

$$\begin{aligned}&\left( f_{AA} + f_{Aa} - f_{aA} - f_{aa}\right) \tau ^{\left( 2\right) } + \left( f_{AA} - f_{Aa} + f_{aA} - f_{aa}\right) \left( \tau ^{\left( 3\right) } -\tau ^{\left( 1\right) }\right) \nonumber \\&\quad > 0 , \end{aligned}$$
(123)

Again, the condition for success takes the form (114), with the structure coefficient for DB updating given by

$$\begin{aligned} \sigma = \frac{\tau ^{(2)}+\tau ^{(3)}-\tau ^{(1)}}{\tau ^{(2)}-\tau ^{(3)}+\tau ^{(1)}} , \end{aligned}$$
(124)

which is exactly the result obtained by Allen et al. (2017). The appearances of one-step, two-step, and three-step random walks in Eq. (124) have an elegant interpretation in terms of interactions at various distances; see Allen et al. (2017).

9.2 Haplodiploid population

To illustrate the applicability of our framework beyond haploid populations, we analyze a simple model of evolution in a haplodiploid population with one male and one female parent per generation (Fig. 6). This could represent, for example, a completely inbred, singly-mated, eusocial insect colony.

9.2.1 Model

The population consists of \(N_{{\mathrm {F}}}\) diploid females and \(N_{{\mathrm {M}}}\) haploid males. Thus, there are \(n=2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}\) genetic sites and \(N=N_{{\mathrm {F}}}+N_{{\mathrm {M}}}\) individuals. Each genotype has an associated fecundity, denoted \(F_{xy}\) for females and \(F_{z}\) for males, with \(x,y,z \in \left\{ a,A\right\} \). (Here and throughout this example, xy is understood as an unordered pair.) The fecundities for females are

$$\begin{aligned} F_{aa} = 1 ; \quad F_{Aa} = 1+\delta hs ; \quad F_{AA} = 1+\delta s . \end{aligned}$$
(125)

Above, the parameter s quantifies selection on the A allele in females and the parameter h represents the degree of dominance. Fecundities for males are

$$\begin{aligned} F_{a} = 1 ; \quad F_{A} = 1+ \delta m , \end{aligned}$$
(126)

where the parameter m quantifies selection on the A allele in males.

Each time-step, one female and one male are chosen at random, with probability proportional to fecundity, to be parents for the next generation. These parents produce a new generation of offspring, replacing all previous individuals. Females are produced sexually while males are produced asexually (parthenogenetically); thus, each female offspring inherits the allele of the male parent and one of the two alleles of the female parent, while each male offspring inherits one of the two alleles of the female parent.

Fig. 6
figure 6

Model of selection in a haplodiploid population. Females (green) are diploid and males (purple) are haploid. At each time step, one female (mother) and one male (father) are selected to populate the subsequent generation. Each parent is chosen with probability proportional to its fecundity, which depends on its genotype according to Eqs. (125)–(126). Each female in the next generation inherits one allele from the mother and one allele from the father, while each male inherits a single allele from the mother. The process then repeats. The genetic sites are numbered in their upper-left corner. In generation \(t+1\), the color of the site label indicates the parent from which the allele was inherited (which, in females, determines whether the site is in \(G_{{\mathrm {F}}}^{{\mathrm {F}}}\) or \(G_{{\mathrm {F}}}^{{\mathrm {M}}}\)) (color figure online)

9.2.2 Sites and replacement rule

To translate this model into our formalism, we partition the set of sites G as \(G=G_{\mathrm {F}}\sqcup G_{\mathrm {M}}\), where \(G_{\mathrm {F}}\) and \(G_{\mathrm {M}}\) are the sets of sites in females and males, respectively. Similarly, we partition the set I of individuals into females and males: \(I = I_{\mathrm {F}}\sqcup I_{\mathrm {M}}\). It is notationally convenient (although not strictly necessary) to distinguish the sites in females according to which parent (male or female) they inherit alleles from. We therefore partition the sites in females as \(G_{\mathrm {F}}= G_{\mathrm {F}}^{\mathrm {F}}\sqcup G_{\mathrm {F}}^{\mathrm {M}}\), where sites in \(G_{\mathrm {F}}^{\mathrm {F}}\) house alleles from the female parent, and sites in \(G_{\mathrm {F}}^{\mathrm {M}}\) house alleles from the male parent.

For a given state \({\mathbf {x}}\in \{0,1\}^G\), we denote the number of females of type xy by \(n_{xy}\), and the number of males of type z by \(n_{z}\), where \(x,y,z \in \left\{ A,a\right\} \). Clearly, \(n_{aa}+n_{Aa}+n_{AA}=N_{{\mathrm {F}}}\) and \(n_{a}+n_{A}=N_{{\mathrm {M}}}\) in every state.

We recall that, at each time-step, one male and one female are chosen to replace the entire population. Therefore, all replacement events \((R,\alpha )\) with nonzero probability have \(R=G\); furthermore, there exists a pair of individuals \(i \in I_{\mathrm {F}}\) and \(j \in I_{\mathrm {M}}\), with genetic sites \(G_i = \{g_i, g_i'\}\) and \(G_j =\{g_j\}\), such that

$$\begin{aligned} {\left\{ \begin{array}{ll} \alpha \left( g\right) \in \{g_i, g_i'\} &{} \text {for } g \in G_{{\mathrm {M}}} \cup G_{{\mathrm {F}}}^{{\mathrm {F}}}, \\ \alpha \left( g\right) = g_j &{} \text {for } g \in G_{{\mathrm {F}}}^{{\mathrm {M}}}. \end{array}\right. } \end{aligned}$$
(127)

The probability that a particular replacement event \((G,\alpha )\) of the above form occurs in state \({\mathbf {x}}\) can be written as

$$\begin{aligned} p_{\left( G,\alpha \right) }({\mathbf {x}}) = \frac{1}{2^{N}} \left( \frac{F_{x_{g_i}x_{g_i'}}}{n_{aa}F_{aa}+n_{Aa}F_{Aa} +n_{AA}F_{AA}} \right) \left( \frac{F_{x_{g_j}}}{n_{a}F_a+n_{A}F_A} \right) . \end{aligned}$$
(128)

Above, we have extended our notation for fecundity to numerical genotypes, so that \(F_{11}=F_{AA}\), etc. The prefactor \(1/2^N\) reflects the \(2^{N}\) possible mappings from \(G_{\mathrm {M}}\cup G_{\mathrm {F}}^{\mathrm {F}}\) to \(\{g_i, g_i'\}\), representing the possible ways that each new offspring could inherit one of the two alleles in the mother.

9.2.3 Evolutionary Markov chain

We now establish some basic results for the evolutionary Markov chain. First, we note that the transition probabilities depend only the 5-tuple   \(\left( n_{aa},n_{Aa},n_{AA},n_{a},n_{A}\right) \). We also observe that transitions consist of two steps: (i) selection of parents and (ii) production of offspring. The probabilities resulting from the second step depend only on the genotypes of the two chosen parents, for which there are six possibilities:

$$\begin{aligned} S_{\mathrm {par}} :=\Big \{ \left( aa,a\right) , \left( Aa,a\right) , \left( AA,a\right) , \left( aa,A\right) , \left( Aa,A\right) , \left( AA,A\right) \Big \}. \end{aligned}$$
(129)

Transitions for the evolutionary Markov chain can therefore be written in the form

$$\begin{aligned} P_{{\mathbf {x}}\rightarrow {\mathbf {x}}'}&= \sum _{\left( xy,z\right) \in S_{\mathrm {par}}} P_{{\mathbf {x}}\rightarrow \left( xy,z\right) } \, P_{\left( xy,z\right) \rightarrow {\mathbf {x}}'} . \end{aligned}$$
(130)

In other words, the transition matrix \({{\mathbf {M}}}\) for the evolutionary Markov chain factors as \({{\mathbf {M}}}={\mathbf {SR}}\), where \({{\mathbf {S}}}=\left( P_{{\mathbf {x}}\rightarrow (xy,z)}\right) \) represents the selection of parents and \({{\mathbf {R}}}=\left( P_{(xy,z) \rightarrow {\mathbf {x}}}\right) \) represents (re)production of offspring.

The probabilities for selection (i.e. the entries of \({{\mathbf {S}}}\)) can be written as

$$\begin{aligned} P_{{{\mathbf {x}}}\rightarrow \left( xy,z\right) } = \frac{n_{xy}F_{xy}}{n_{aa}F_{aa}+n_{Aa}F_{Aa} +n_{AA}F_{AA}} \, \frac{n_{z}F_z}{n_{a}F_a+n_{A}F_A}. \end{aligned}$$
(131)

The probabilities for reproduction (i.e. the entries of \({{\mathbf {R}}}\)) can be written as

$$\begin{aligned} P_{\left( xy,z\right) \rightarrow {{\mathbf {x}}}}&= \left( \left( 1-q_{xy}\right) \left( 1-q_{z}\right) \right) ^{n_{aa}}\left( q_{xy}+q_{z}-2q_{xy}q_{z}\right) ^{n_{Aa}} \left( q_{xy}q_{z}\right) ^{n_{AA}} \nonumber \\&\quad \times \left( 1-q_{xy}\right) ^{n_{a}}q_{xy}^{n_{A}} . \end{aligned}$$
(132)

Above, \(q_{xy}\) (resp. \(q_{z}\)) is the probability that an allele inherited from a mother of type xy (resp. a father of type z) is A, with mutation taken into account. Specifically,

$$\begin{aligned} q_{aa}&= u\nu ; \end{aligned}$$
(133a)
$$\begin{aligned} q_{Aa}&= \frac{1}{2}\left( 1-u+2u\nu \right) ; \end{aligned}$$
(133b)
$$\begin{aligned} q_{AA}&= 1-u\left( 1-\nu \right) ; \end{aligned}$$
(133c)
$$\begin{aligned} q_{a}&= u\nu ; \end{aligned}$$
(133d)
$$\begin{aligned} q_{A}&= 1-u\left( 1-\nu \right) . \end{aligned}$$
(133e)

Note that the entries of \({{\mathbf {R}}}\) do not depend on the parameters quantifying selection (h, m, and s).

9.2.4 Reproductive value and selection

The marginal probability that g transmits a copy of itself to \(\ell \) is

$$\begin{aligned} e_{g\ell }\left( {{\mathbf {x}}}\right) = {\left\{ \begin{array}{ll} \displaystyle \frac{1}{2} \frac{F_{x_{g}x_{g'}}}{n_{aa}F_{aa}+ n_{Aa}F_{Aa}+ n_{AA}F_{AA}} &{} g\in G_{{\mathrm {F}}},\ \ell \in G_{{\mathrm {M}}} \cup G_{\mathrm {F}}^{\mathrm {F}}, \\ \displaystyle \frac{F_{x_{g}}}{n_{a}F_{a}+ n_{A}F_{A}} &{} g\in G_{{\mathrm {M}}},\ \ell \in G_{{\mathrm {F}}}^{\mathrm {M}}, \\ \displaystyle 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(134)

Above, \(g'\) denotes the other site in the same individual as site \(g \in G_{\mathrm {F}}\). For neutral drift (\(\delta =0\)), Eq. (134) reduces to

$$\begin{aligned} e_{g\ell }^\circ = {\left\{ \begin{array}{ll} \frac{1}{2N_{{\mathrm {F}}}} &{} g\in G_{{\mathrm {F}}},\ \ell \in G_{{\mathrm {M}}} \cup G_{\mathrm {F}}^{\mathrm {F}}, \\ \frac{1}{N_{{\mathrm {M}}}} &{} g \in G_{{\mathrm {M}}},\ \ell \in G_{{\mathrm {F}}}^{\mathrm {M}}, \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$
(135)

A straightforward calculation gives the mutant-appearance distribution,

$$\begin{aligned} \mu _{A}\left( {\mathbf {x}}\right)&= {\left\{ \begin{array}{ll} \frac{1}{n} &{} \text {if} \, {\mathbf {x}}= {{\mathbf {1}}}_{\left\{ g\right\} } \, \text {for some} \, g \in G , \\ 0 &{} \text {otherwise} , \end{array}\right. } \end{aligned}$$
(136)

and analogously for \(\mu _{a}\left( {\mathbf {x}}\right) \).

Substituting from Eq. (135) into the recurrence equation (46a) for reproductive values, using the fact that there are \(2N_{{\mathrm {F}}}\) female genetic sites and \(N_{{\mathrm {M}}}\) male sites, yields

$$\begin{aligned} v_{{\mathrm {F}}}&= \frac{1}{2} v_{{\mathrm {F}}} + \frac{N_{{\mathrm {M}}}}{2N_{{\mathrm {F}}}} v_{{\mathrm {M}}} , \end{aligned}$$
(137)

so \(v_{{\mathrm {F}}}=\frac{N_{{\mathrm {M}}}}{N_{{\mathrm {F}}}}v_{{\mathrm {M}}}\). Since \(\sum _{g\in G}v_{g}=n=2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}\), we obtain

$$\begin{aligned} v_{{\mathrm {F}}}&= \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3N_{{\mathrm {F}}}} ; \end{aligned}$$
(138a)
$$\begin{aligned} v_{{\mathrm {M}}}&= \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3N_{{\mathrm {M}}}} . \end{aligned}$$
(138b)

Note that for an even sex ratio (\(N_{{\mathrm {F}}}=N_{{\mathrm {M}}}\)), all sites have equal reproductive value.

Applying Eq. (51), the fitness of site g in state \({\mathbf {x}}\) is given by

$$\begin{aligned} w_{g}\left( {{\mathbf {x}}}\right) = {\left\{ \begin{array}{ll} \displaystyle \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \frac{F_{x_gx_{g'}}}{n_{aa}F_{aa}+ n_{Aa}F_{Aa}+ n_{AA}F_{AA}} &{} g \in G_{{\mathrm {F}}} , \\ \displaystyle \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \frac{F_{x_g}}{n_{a}F_a+ n_{A}F_A} &{} g \in G_{{\mathrm {M}}} . \end{array}\right. } \end{aligned}$$
(139)

On the individual level, the fitness of a female with genotype xy is

$$\begin{aligned} W_{xy}({\mathbf {x}}) = 2 \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \frac{F_{xy}}{n_{aa}F_{aa}+ n_{Aa}F_{Aa}+ n_{AA}F_{AA}}, \end{aligned}$$
(140)

and the fitness of a male with genotype z is

$$\begin{aligned} W_{z}({\mathbf {x}}) = \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \frac{F_{z}}{n_{a}F_a+ n_{A}F_A}. \end{aligned}$$
(141)

The RV-weighted change due to selection is

$$\begin{aligned} \hat{{\Delta }}_{{\mathrm {sel}}}({\mathbf {x}})= & {} \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \nonumber \\&\times \left( \frac{ n_{Aa} F_{Aa} + 2 n_{AA}F_{AA}}{n_{aa}F_{aa}+ n_{Aa}F_{Aa}+ n_{AA}F_{AA}} + \frac{ n_{A}F_A}{n_{a}F_a+n_{A}F_A} \right. \nonumber \\&\left. - \,\frac{n_{Aa}+2n_{AA}}{N_{\mathrm {F}}} - \frac{n_A}{N_{\mathrm {M}}} \right) . \end{aligned}$$
(142)

Substituting from Eqs. (125) and (126) and applying a weak-selection expansion, we obtain

$$\begin{aligned}&\hat{{\Delta }}_{{\mathrm {sel}}}'\left( {{\mathbf {x}}}\right) = \left( \frac{2N_{{\mathrm {F}}}+N_{{\mathrm {M}}}}{3}\right) \nonumber \\&\quad \times \left( \frac{\left( hn_{Aa}+2n_{AA}\right) s}{N_{{\mathrm {F}}}} - \frac{\left( hn_{Aa}+n_{AA}\right) \left( n_{Aa}+2n_{AA}\right) s}{N_{{\mathrm {F}}}^{2}} + \frac{mn_{a}n_{A}}{N_{{\mathrm {M}}}^{2}} \right) .\nonumber \\ \end{aligned}$$
(143)

9.2.5 Neutral stationary distributions

The two-step nature of transitions allows us to define the parental Markov chain, which describes the transition from one generation’s parents to the next. The parental Markov chain has state space \(S_{\mathrm {par}}\) and transition matrix \({{\mathbf {P}}}={\mathbf {RS}}\). More explicitly, we can write

$$\begin{aligned} P_{(xy,z) \rightarrow (x'y',z')}&= \sum _{{\mathbf {x}}\in \{0,1\}^N} P_{\left( xy,z\right) \rightarrow {\mathbf {x}}'} \, P_{{\mathbf {x}}\rightarrow \left( x'y',z'\right) } . \end{aligned}$$
(144)

For neutral drift, probabilities for selection (entries of \({{\mathbf {S}}}\)) are given by

$$\begin{aligned} P_{{\mathbf {x}}\rightarrow \left( xy,z\right) }^{\circ } = \frac{n_{xy}}{N_{{\mathrm {F}}}}\frac{n_{z}}{N_{{\mathrm {M}}}}. \end{aligned}$$
(145)

Substituting from Eqs. (132), (133), and (145) into Eq. (144) and applying multinomial expectations, we obtain the following neutral transition probabilities on the parental chain:

$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( aa,a\right) }^{\circ }&= \left( 1-q_{xy}\right) ^{2}\left( 1-q_{z}\right) ; \end{aligned}$$
(146a)
$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( Aa,a\right) }^{\circ }&= \Big (q_{xy}+q_{z}-2q_{xy}q_{z}\Big )\left( 1-q_{xy}\right) ; \end{aligned}$$
(146b)
$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( AA,a\right) }^{\circ }&= q_{xy}q_{z}\left( 1-q_{xy}\right) ; \end{aligned}$$
(146c)
$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( aa,A\right) }^{\circ }&= \left( 1-q_{xy}\right) \left( 1-q_{z}\right) q_{xy} ; \end{aligned}$$
(146d)
$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( Aa,A\right) }^{\circ }&= \Big (q_{xy}+q_{z}-2q_{xy}q_{z}\Big ) q_{xy} ; \end{aligned}$$
(146e)
$$\begin{aligned} P_{\left( xy,z\right) \rightarrow \left( AA,A\right) }^{\circ }&= q_{xy}^{2}q_{z} . \end{aligned}$$
(146f)

Since the parental chain has only six states, one can directly solve for its stationary distribution, \({\varPi }_{{\mathrm {MSS}}}^{\circ }\):

$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( aa,a\right)&= \frac{\left( 1 - \nu \right) \left( \begin{array}{c} 1- 2u^{7}\nu ^{2} + 3u^{7}\nu - u^{7} + 14u^{6}\nu ^{2} - 21u^{6}\nu + 7u^{6} - 48u^{5}\nu ^{2} \\ + 70u^{5}\nu - 23u^{5} + 96u^{4}\nu ^{2} - 134u^{4}\nu + 43u^{4} - 110u^{3}\nu ^{2} \\ + 143u^{3}\nu - 43u^{3} + 58u^{2}\nu ^{2} - 61u^{2}\nu + 13u^{2} - 16u\nu + 11u \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} ; \end{aligned}$$
(147a)
$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( Aa,a\right)&= \frac{4u\nu \left( 1 - \nu \right) \left( \begin{array}{c} 3 + 20u - 29u\nu + 55u^{2}\nu - 48u^{3}\nu + 24u^{4}\nu \\ - 7u^{5}\nu + u^{6}\nu - 45u^{2} + 43u^{3} - 23u^{4} + 7u^{5} - u^{6} \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} ; \end{aligned}$$
(147b)
$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( AA,a\right)&= \frac{u\nu \left( 1 - \nu \right) \left( \begin{array}{c} 4+ 58u\nu - 19u - 110u^{2}\nu + 96u^{3}\nu - 48u^{4}\nu \\ + 14u^{5}\nu - 2u^{6}\nu + 37u^{2} - 38u^{3} + 22u^{4} - 7u^{5} + u^{6} \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} ; \end{aligned}$$
(147c)
$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( aa,A\right)&= \frac{u\nu \left( 1-\nu \right) \left( \begin{array}{c} 4 + 39u - 58u\nu + 110u^{2}\nu - 96u^{3}\nu + 48u^{4}\nu \\ - 14u^{5}\nu + 2u^{6}\nu - 73u^{2} + 58u^{3} - 26u^{4} + 7u^{5} - u^{6} \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} ; \end{aligned}$$
(147d)
$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( Aa,A\right)&= \frac{4u\nu \left( 1-\nu \right) \left( \begin{array}{c} 3 + 29u\nu - 9u - 55u^{2}\nu + 48u^{3}\nu \\ - 24u^{4}\nu + 7u^{5}\nu - u^{6}\nu + 10u^{2} - 5u^{3} + u^{4} \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} ; \end{aligned}$$
(147e)
$$\begin{aligned} {\varPi }_{{\mathrm {MSS}}}^{\circ }\left( AA,A\right)&= \frac{\nu \left( \begin{array}{c} 1 - 2u^{7}\nu ^{2} + u^{7}\nu + 14u^{6}\nu ^{2} - 7u^{6}\nu - 48u^{5}\nu ^{2} + 26u^{5}\nu \\ - u^{5} + 96u^{4}\nu ^{2} - 58u^{4}\nu + 5u^{4} - 110u^{3}\nu ^{2} + 77u^{3}\nu \\ - 10u^{3} + 58u^{2}\nu ^{2} - 55u^{2}\nu + 10u^{2} + 16u\nu - 5u \end{array}\right) }{1+11u+13u^{2}-43u^{3}+43u^{4}-23u^{5}+7u^{6}-u^{7}} . \end{aligned}$$
(147f)

Notably, this distribution is independent of both \(N_{{\mathrm {F}}}\) and \(N_{{\mathrm {M}}}\).

The stationary distribution for the full evolutionary Markov chain can be obtained from \({\varPi }_{{\mathrm {MSS}}}^\circ \) using the fact that, if \({\varPi }\) is stationary for \({{\mathbf {P}}}={\mathbf {RS}}\), then, defining \(\pi :={{\mathbf {R}}}^\intercal {\varPi }\) we have \(\pi ^{\intercal }{\mathbf {SR}}= {\varPi }^{\intercal }{\mathbf {RSR}}={\varPi }^{\intercal }{{\mathbf {R}}}=\pi ^{\intercal }\). Applying this idea, we obtain the neutral RMC distribution for the full chain:

$$\begin{aligned} 2\left( 16+n-3\cdot 2^{-N+2}\right) \pi _{{\mathrm {RMC}}}^{\circ }\left( {{\mathbf {x}}}\right)&= 4\delta _{n_{aa},0}\,\delta _{n_{AA},0}\left( \delta _{n_{a},0} + \delta _{n_{A},0}\right) \nonumber \\&\quad +3\cdot 2^{-N+2}\left( \delta _{n_{aa},0} + \delta _{n_{AA},0}\right) \nonumber \\&\quad + 2^{n_{Aa}} \delta _{n_{AA},0}\left( \delta _{n_{Aa},0}\, \delta _{n_{A},1}+\delta _{n_{Aa},1}\,\delta _{n_{A},0}\right) \nonumber \\&\quad + 2^{n_{Aa}} \delta _{n_{aa},0}\left( \delta _{n_{Aa},0}\, \delta _{n_{a},1}+\delta _{n_{Aa},1}\,\delta _{n_{a},0}\right) . \end{aligned}$$
(148)

Above, \(\delta _{m,m'}\) is the Kronecker symbol, equal to 1 if \(m=m'\) and 0 otherwise.

In contrast, the parental chain has a much simpler RMC distribution:

$$\begin{aligned} {\varPi }_{{\mathrm {RMC}}}^{\circ }\left( Aa,a\right)&= {\varPi }_{{\mathrm {RMC}}}^{\circ }\left( Aa,A\right) = \frac{3}{8} ; \end{aligned}$$
(149a)
$$\begin{aligned} {\varPi }_{{\mathrm {RMC}}}^{\circ }\left( AA,a\right)&= {\varPi }_{{\mathrm {RMC}}}^{\circ }\left( aa,A\right) = \frac{1}{8} . \end{aligned}$$
(149b)

9.2.6 Condition for success

Combining Eqs. (143) and (148), it follows from Corollary 1 that

$$\begin{aligned} \rho _{A}> \rho _{a}&\iff {\mathbb {E}}_{{\mathrm {RMC}}}^{\circ }\left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right]> 0 \nonumber \\&\iff 8\frac{N_{{\mathrm {F}}}}{N_{{\mathrm {F}}}-1} m + 5\frac{N_{{\mathrm {M}}}}{N_{{\mathrm {M}}}-1} s > 0 . \end{aligned}$$
(150)

Interestingly, this condition is independent of the degree h of genetic dominance in females. Moreover, for large \(N_{{\mathrm {F}}}\) and \(N_{{\mathrm {M}}}\), we have the approximate condition

$$\begin{aligned} \rho _{A}> \rho _{a}&\iff 8m + 5s > 0 . \end{aligned}$$
(151)

Condition (151) becomes exact under the limit ordering (i) \(\delta \rightarrow 0\), (ii) \(N_M \rightarrow \infty , N_F \rightarrow \infty \) (i.e. the wN limit, sensu Jeong et al. 2014; Sample and Allen 2017).

It is tempting to think that one could more easily obtain Condition (151) by first defining an analogue of \(\hat{{\Delta }}_{{\mathrm {sel}}}\) on the parental chain, and then averaging this analogue over the RMC distribution on the parental chain, as given by Eq. (149). This scheme turns out not to work because it does not account for mutations correctly. Since different numbers of male and female offspring are produced each generation, and each offspring provides an independent opportunity for mutation, the mutant appearance distribution in parents differs from the expression given in Eqs. (11)–(12); this effect is not captured by the above scheme.

10 Discussion

10.1 Summary

10.1.1 Generality and abstraction

The aim of this work is to provide a mathematical formalism for describing natural selection that is general enough to encompass a wide variety of biological scenarios and modeling approaches. Dealing in such generality requires “abstracting away” the details of particular models while preserving what is ultimately relevant for natural selection. To this end, we use the replacement rule (introduced by Allen and Tarnita 2014) to represent birth, death, and inheritance, and we use the formalism of genetic sites to allow for different genetic systems (haploid, diploid, haplodiploid, or polyploid). This level of abstraction, although atypical for evolutionary theory, entails a number of advantages: (i) it provides a common language for natural selection with formal definitions of key concepts; (ii) it enables proofs of general theorems that eliminate the duplicate work of deriving analogous results one model at a time; (iii) it may help distinguish robust theoretical principles from artifacts of particular modeling assumptions.

10.1.2 Main results

Our main results, Theorems 4 and 8, show that various criteria for success under natural selection become equivalent in the limit of low mutation. Theorem 4, which shows the equivalence of success criteria based on fixation probability, stationary frequency, and change due to selection, is quite general: it applies under arbitrary strength of selection and requires no assumptions beyond the basic setup of our formalism. However, these criteria may all be intractable for a model of reasonable complexity (Ibsen-Jensen et al. 2015).

Our second main result, Theorem 8, applies to weak selection. The advantage of Theorem 8 is that two of these criteria involve expectations under neutral drift, for which the recurrence relations for the MSS and RMC distributions simplify considerably. Moreover, if the additional assumptions of Corollary 1 hold, the direction of selection is completely determined by the sign of \({{\mathrm{{\mathbb {E}}}}}^\circ _{\mathrm {RMC}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right] \) or \(\frac{d}{du}{{\mathrm{{\mathbb {E}}}}}^\circ _{\mathrm {MSS}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}' \right] \big |_{u=0}\). Evaluating these criteria may not require knowing the full RMC or MSS distributions, but only certain statistics of them. For the example of games on graphs, one only needs to obtain the pairwise expectations, \({{\mathrm{{\mathbb {E}}}}}_{\mathrm {RMC}}^\circ \left[ {\underline{x}}_g{\underline{x}}_h\right] \), which obey their own recurrence relations (110) and (120). In contrast, the other success criteria, \(\rho _A > \rho _a\) and \(\lim _{u\rightarrow 0} {{\mathrm{{\mathbb {E}}}}}_{\mathrm {MSS}}\left[ x\right] > \nu \), are of clear biological relevance but are difficult to analyze directly.

To be clear, neither of our main results comes as a surprise. Theorem 4 generalizes the main result of Allen and Tarnita (2014), and special cases of this result are also discussed by Rousset and Billiard (2000), Taylor et al. (2007a), Nowak et al. (2010b) and Tarnita and Taylor (2014). Theorem 8 generalizes and extends the main result of Tarnita and Taylor (2014). Aspects and instances of Corollary 1—which is a special case of Theorem 8—have been obtained in so many contexts that this result might be called a “folk theorem” (Taylor and Frank 1996; Rousset and Billiard 2000; Leturque and Rousset 2002; Nowak et al. 2004; Ohtsuki et al. 2006; Lessard and Ladret 2007; Taylor et al. 2007b; Antal et al. 2009a; Chen 2013; Cox et al. 2013; Wakano et al. 2013; Débarre et al. 2014; Durrett 2014; Van Cleve 2015; Allen et al. 2017). The contribution of Theorems 4 and 8 is to formalize and prove these results in a general setting, and to elucidate the assumptions on which upon which they depend. For example, the equivalence \({{\mathrm{{\mathbb {E}}}}}^\circ _{\mathrm {RMC}}\left[ \hat{{\Delta }}_{{\mathrm {sel}}}'\right]>0 \Leftrightarrow \rho _A > \rho _a\), under weak selection, relies on the addition assumptions of Corollary 1, and is not valid in the more general setting of Theorem 8 (see also Tarnita and Taylor 2014).

10.1.3 Examples

Our two examples illustrate how our results (in particular, Corollary 1) can be applied to obtain conditions for success in models with different spatial and genetic structures. The games on graphs example (Sect. 9.1) recovers the main results of Allen et al. (2017), i.e. the conditions for success under weak selection in two-player games on arbitrary weighted graphs. Notably, while Allen et al. (2017) relied on sophisticated results regarding perturbations of voter models (Chen 2013) along with the Structure Coefficient Theorem (Tarnita et al. 2009b), the present analysis uses only the results of this work. This fact underscores the fact that Corollary 1 provides not only an equivalence result, but also a problem-solving methodology general enough to handle reasonably complicated models.

For the haplodiploid model, we have found that a mutation with selection coefficient m in males and s in females is favored under weak selection in large populations if \(8m+5s>0\), regardless of the degree of dominance, h. This result suggests, intriguingly, that a mutation’s effect in males is 5 / 8 as important as its effect in females. The applicability of this particular result is limited by the rather artificial assumption that each generation is produced by a single mating pair. However, there is no a priori reason the analysis could not be generalized to populations with larger numbers of reproducers.

10.2 Conceptual issues

10.2.1 Gene’s-eye view

Identifying the salient units or levels at which selection operates is a longstanding conversation in evolutionary theory (Williams 1966; Lewontin 1970; Hull 1980; Gould and Lloyd 1999; Okasha 2006; Akçay and Van Cleve 2016). This work employs a “gene’s-eye view” (Williams 1966), in that the analysis is conducted almost exclusively at the level of genetic sites. In particular, the criteria for success in Theorems 4 and 8 are expressed in terms of gene-level quantities. Under Assumptions 3 and 4, these criteria may be rewritten in terms of individual-level quantities via Proposition 5. However, with meiotic drive and/or frequent horizontal gene transfer, these assumptions are violated, and an analysis based solely on individual-level quantities would not accurately describe selection.

Our gene-centric perspective does not preclude the possibility that selection may also operate at other levels, including the individual and the group or colony. However, our analysis makes clear that all effects of individual-level or group-level selection can be analyzed using of gene-level quantities (e.g. \(\hat{{\Delta }}_{{\mathrm {sel}}}\)). Analysis of individual-level or group-level quantities is not mathematically necessary, although it may be conceptually illuminating.

10.2.2 Reproductive value

Reproductive value has been a central concept in evolutionary theory since Fisher (1930). Recent work (Maciejewski 2014) has extended this notion to populations with heterogeneous spatial structure. The key property of reproductive value—that RV-weighted frequency has zero expected change under neutral drift—is encapsulated in our Theorem 6. A closely related result is that the reproductive value of a site is proportional to the fixation probability of a neutral mutation arising at this site (Theorem 7; see also Maciejewski 2014; Allen et al. 2015).

Our work underscores the importance of reproductive value, but also reveals an important limitation of this concept. Reproductive value can only be defined with reference to some particular process of neutral drift. In Sect. 5.2, we defined reproductive value with respect to replacement probabilities in the monoallelic states \({\mathbf {a}}\) and \({\mathbf {A}}\). However, without further assumptions, there is no guarantee that these two states will yield the same reproductive value for each genetic site. It may therefore be impossible to assign consistent reproductive values in a given model. This issue may be exacerbated if one considers models with more than two alleles (see Sect. 10.3.3), in which each new selective sweep may alter the reproductive values of sites, thereby affecting selection pressure on subsequent mutations. Here, we obtained consistent reproductive values either by invoking Assumption 1, or (in the context of weak selection) by using \(\delta =0\) as a reference process of neutral drift. For an arbitrary mathematical model or biological population, there is no guarantee that well-defined reproductive values exist.

10.2.3 Fitness

Like reproductive value, the concept of fitness is both fundamental to evolutionary theory and difficult to formalize in a fully consistent way (Metz et al. 1992; Akçay and Van Cleve 2016; Lehmann et al. 2016; Metz and Geritz 2016; Doebeli et al. 2017). Here, we have defined fitness as the RV-weighted contribution of a genetic site to the next time-step, which can be decomposed as the sum of the RV-weighted survival probability, \(v_g - {\hat{d}}_g\left( {\mathbf {x}}\right) = v_g\left( 1-d_g\left( {\mathbf {x}}\right) \right) \), and the RV-weighted birth rate, \({\hat{b}}_g\left( {\mathbf {x}}\right) \). This definition is consistent with other definitions of fitness in the literature (e.g. Tarnita and Taylor 2014). However, we do not intend this definition to be universal or canonical, for two reasons. First, it depends on the existence of well-defined reproductive values, which is not guaranteed if Assumption 1 is violated. Second, it quantifies only the one-step contribution of a site in a given state, which may not accurately capture the long-term success of the progeny of this site. In general, there may not be any fully satisfactory definition of fitness of a single genetic site (let alone an individual). Indeed, Akçay and Van Cleve (2016) argue that fitness should instead be ascribed to a genetic lineage—that is, the progeny of a given allele copy. An interesting direction for future work would be to describe the lineage-eye view of fitness in the context of our formalism.

10.3 Limitations and possible extensions

Although we have aimed for a relatively high level of generality, our formalism still entails a number of limiting assumptions. Here, we discuss these assumptions and the prospects for extending beyond them.

10.3.1 Fixed population size

Fluctuations in population size can affect the process of natural selection (Lambert 2006; Parsons and Quince 2007a, b; Wakano et al. 2009; Parsons et al. 2010; Schoener 2011; Uecker and Hermisson 2011; Waxman 2011). These effects cannot be studied in the current framework, which assumes constant population size. In some cases, fluctuations in population size can be safely ignored—for example, if selection depends only on the relative frequencies of competing types (as in the classical population genetics setting of Crow and Kimura 1970; Bürger 2000; Ewens 2004), or if the population is assumed to remain close to its carrying capacity (Bürger 2005). In other cases, however, population dynamics have important consequences for long-term evolution (Dieckmann and Law 1996; Metz et al. 1996; Pelletier et al. 2007; Dercole and Rinaldi 2008; Wakano et al. 2009; Schoener 2011; Korolev 2013; Constable et al. 2016; Chotibut and Nelson 2017), including the possibility of evolutionary branching (Geritz et al. 1997; Dieckmann and Doebeli 1999) or evolutionary suicide (Gyllenberg and Parvinen 2001).

Technical complications arise when mathematically modeling populations of fluctuating size. For example, it may be possible for the entire population to become extinct, and as a result, there may be no non-trivial stationary distribution. An alternative is to consider the quasi-stationary distribution for the process (Haccou et al. 2005; Gyllenberg and Silvestrov 2008; Cattiaux et al. 2009; Faure and Schreiber 2014), which is conditioned on non-extinction of the population. Quantifying the evolutionary success of an allele also becomes more nuanced when population size can vary (Lambert 2006; Parsons et al. 2010; Constable et al. 2016; McAvoy et al. 2018b).

To extend the current framework to populations of changing size would require the set of sites, G, to itself be an aspect of the population state. The offspring-to-parent map, \(\alpha \), would then map the set of sites at time \(t+1\) to the set of sites at time t. Such complications are not necessarily insurmountable but would add considerable notational and technical burden to our formalism.

10.3.2 Fixed spatial structure and environment, trivial demography

In our framework, the probability of a replacement event depends only on the current population state, \({\mathbf {x}}\). This precludes the possibility that the population structure could change over time (Pacheco et al. 2006a, b; Antal et al. 2009a; Tarnita et al. 2009a; Wu et al. 2010; Cavaliere et al. 2012; Wardil and Hauert 2014), or that replacement events could depend on variable aspects of the environment (Cohen 1966; Philippi and Seger 1989; Haccou and Iwasa 1996; Kussell and Leibler 2005; Starrfelt and Kokko 2012; Cvijović et al. 2015) or on the demographic stages of individuals (e.g. juvenile vs. adult; Diekmann et al. 1998, 2001; Parvinen and Seppänen 2016; Lessard and Soares 2018). Eco-evolutionary feedbacks between a population and its demography and environment can be important drivers of evolutionary change (Dieckmann and Law 1996; Metz et al. 1996; Geritz et al. 1997; Dercole and Rinaldi 2008; Durinx et al. 2008; Perc and Szolnoki 2010), but such phenomena lie outside the scope of the formalism presented here.

These limitations can be overcome by allowing the probabilities of replacement events to depend on additional variables beyond the population state \({\mathbf {x}}\). These variables can represent the current population structure, environmental conditions, and/or the demographic stages of individuals. Additional rules must then be provided for updating these variables. In such an extension, natural selection is described by a Markov chain with state space \(\left\{ 0,1\right\} ^G \times E\), where E is the set of possible joint values for these extra variables. This extension would bring phenomena such as bet-hedging (Cohen 1966; Philippi and Seger 1989; Kussell and Leibler 2005; Starrfelt and Kokko 2012) and structure-strategy coevolution (Pacheco et al. 2006b; Tarnita et al. 2009a; Perc and Szolnoki 2010) into the purview of our formalism. However, the incorporation of this additional information may introduce significant difficulties in establishing basic results such as the equivalence of success criteria.

10.3.3 Two alleles

We have focused here on the case of two competing alleles; however, generalizing to any number of alleles appears straightforward. In the limit of rare mutation, the outcome of natural selection depends only on the pairwise competitions between alleles. We therefore anticipate that the RMC distribution (generalized for more than two alleles) will be nonzero only for states containing exactly two alleles. Moreover, the results of Fudenberg and Imhof (2006) imply that the equilibrium mutation-selection balance will be obtainable from the pairwise fixation probabilities of each allele into each other. On the other hand, away from the limit of rare mutation, determining which of multiple alleles is favored by selection is a more nuanced question (Antal et al. 2009b; Traulsen et al. 2009; Tarnita et al. 2011; Wu et al. 2012).

10.3.4 One locus

Although our formalism allows for arbitrary genetics in the sense of ploidy and mating structure, we have focused on selection at a single genetic locus. Extending to multilocus genetics would require considering additional genetic sites—one for each locus on each chromosome. It would then be natural to impose assumptions on the replacement rule so that replacement can only occur between sites at the same locus. The replacement rule could then encode an arbitrary pattern of linkage and recombination. One complication is that quantifying success under natural selection becomes subtler in the context of multiple linked loci (Hammerstein 1996; Eshel et al. 1998; Lehmann and Rousset 2009).

10.3.5 Independence and rarity of mutations

Although we have generalized the framework of Allen and Tarnita (2014) to include arbitrary mutational bias, \(\nu \), we still assume that mutations occur independently with a constant probability per offspring. This assumption may be violated in natural systems; for example, an adult may acquire a germline mutation that is passed to all offspring. Our formalism can be generalized to accommodate such cases by allowing for adult mutation, or by relaxing the assumption that offspring mutations are independent. One could also allow for the mutation parameters u and \(\nu \) to depend on the parental site and/or the population state. The effect of such amendments to our framework would be to alter the mutant appearance distribution (which in turn affects fixation probabilities) as well as the RMC distribution.

Additionally, most of our results assume that mutation is either absent or rare. With nonvanishing mutation, even defining which type is favored under natural selection is a nontrivial question, since distinct, intuitive criteria for success may disagree with each other (Allen and Tarnita 2014). Furthermore, mutation can alter the direction of selection (Traulsen et al. 2009); for example, high mutation rates can impede the evolution of cooperation in spatially-structured populations by diluting the clustering of cooperators (Allen et al. 2012; Débarre 2017). General mathematical theorems on natural selection with nonvanishing mutation could shed new light on such results and perhaps uncover new ones.

10.4 Connections to other approaches

Beyond the above-mentioned possibilities for generalization, there is significant opportunity to connect our formalism to population genetics and other general approaches in evolutionary theory. For example, one might conceive of a coalescent process (Kingman 1982; Cox 1989; Wakeley 2009) for our formalism, using the neutral replacement probabilities \(p_{\left( R,\alpha \right) }^{\circ }\). Likewise, it appears possible to define a notion of identity-by-descent probability (Malécot 1948; Rousset and Billiard 2000; Allen and Nowak 2014) to characterize the extent to which a given pair (or set) of sites are genetically related. More speculatively, one could ask whether diffusion methods (Kimura 1964; Chen 2018) could be applied to our formalism in the large-population limit. Developing these connections may provide opportunities to generalize results from population genetics to a wider variety of spatial and genetic structures.

11 Conclusion

Mathematical modeling of evolution is a robust and growing field, with a rapid pace of new discoveries and rich interplay between theoretical and empirical study. At such moments of expansion, unifying mathematical frameworks are particularly helpful—by clarifying concepts, providing common definitions, and establishing fundamental results. We hope that the formalism presented here will provide a strong foundation for the theory of natural selection in structured populations to build upon.