Extinction scenarios in evolutionary processes: a multinomial Wright–Fisher approach

Roitershtein, Alexander; Rastegar, Reza; Chapkin, Robert S.; Ivanov, Ivan

doi:10.1007/s00285-023-01993-7

Extinction scenarios in evolutionary processes: a multinomial Wright–Fisher approach

Published: 26 September 2023

Volume 87, article number 63, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Mathematical Biology Aims and scope Submit manuscript

Extinction scenarios in evolutionary processes: a multinomial Wright–Fisher approach

Download PDF

Alexander Roitershtein¹,
Reza Rastegar²,
Robert S. Chapkin³ &
…
Ivan Ivanov ORCID: orcid.org/0000-0002-5323-8325⁴

232 Accesses
Explore all metrics

Abstract

We study a discrete-time multi-type Wright–Fisher population process. The mean-field dynamics of the stochastic process is induced by a general replicator difference equation. We prove several results regarding the asymptotic behavior of the model, focusing on the impact of the mean-field dynamics on it. One of the results is a limit theorem that describes sufficient conditions for an almost certain path to extinction, first eliminating the type which is the least fit at the mean-field equilibrium. The effect is explained by the metastability of the stochastic system, which under the conditions of the theorem spends almost all time before the extinction event in a neighborhood of the equilibrium. In addition to the limit theorems, we propose a maximization principle for a general deterministic replicator dynamics and study its implications for the stochastic model.

Towards a replicator dynamics model of age structured populations

Article Open access 02 April 2021

Moran-type bounds for the fixation probability in a frequency-dependent Wright–Fisher model

Article 16 May 2017

Equilibrium in Wright–Fisher Models of Population Genetics

Article 15 March 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper, we study a multi-type Wright–Fisher process, modeling the evolution of a population of cells or microorganisms in discrete time. Each cell belongs to one of the several given types, and its type evolves stochastically while the total number of cells does not change with time. The process is a time-homogeneous Markov chain that exhibits a rich and complex behavior.

Due to the apparent universality of its dynamics, availability of effective approximation schemes, and its relative amenability to numerical simulations, the Wright–Fisher process constitutes a popular modelling tool in applied biological research. In particular, there is a growing body of literature using the Wright–Fisher model of evolutionary game theory (Imhof and Nowak 2006) and its variations along with other tools of game theory and evolutionary biology to study compartmental models of complex microbial communities and general metabolic networks (Cavaliere et al. 2017; Hummert et al. 2014; Tarnita 2017; West et al. 2007; Zeng et al. 2017; Zomorrodi and Segrè 2017). For instance, recent studies (Zomorrodi and Segrè 2017) incorporate evolutionary game theory and concepts from behavioral economics into their hybrid models of metabolic networks including microbial communities with so-called Black Queen functions. See also Cai et al. (2019), Hummert et al. (2014), Ruppin et al. (2010) for related research. The Wright–Fisher model has been employed to study the evolution of host-associated microbial communities (Zeng et al. 2017). In a number of studies, the Wright–Fisher model was applied to examine adaptive dynamics in living organisms, for instance viral and immune populations perpetually adapting (Nourmohammad et al. 2016) and response to T-cell-mediated immune pressure (Barton et al. 2016; Chen and Kardar 2019) in HIV-infected patients. Another interesting application of the Wright–Fisher model to the study of evolution of viruses has been previously reported (Zinger et al. 2019).

Other biological applications of the Wright–Fisher model supported by experimental data include, for instance, substitutions in protein-coding genes (dos Reis 2015), analysis of the single-nucleotide polymorphism differentiation between populations (Dewar et al. 2011), correlations between genetic diversity in the soil and above-ground population (Koopmann et al. 2017), and persistence of decease-associated gene alleles in wild populations (Vallier et al. 2017). In cancer-related research, the Wright–Fisher model has been used to model tumor progression (Beerenwinkel et al. 2007; Datta et al. 2013), clonal interference (Garcia et al. 2018; Park and Krug 2007), cancer risk across species (Peto’s paradox) (Caulin et al. 2015), and intra-tumoral heterogeneity (Iwasa and Michor 2011). Not surprisingly, the Wright–Fisher model is also extensively used to model phenomena in theoretical evolutionary biology (Cvijović et al. 2018; Goudenège and Zitt 2015; Hobolth and Sireén 2016; Hofrichter et al. 2017; Houchmandzadeh 2015; John and Seetharaman 2016; Proulx 2011; Simonsen Speed et al. 2019; Waxman and Loewe 2010).

We now describe the Wright–Fisher model and state our results in a semi-formal setting. For a population with N cells, we define M-dimensional vectors

$$\begin{aligned} \textbf{Z}^{(N)}_k:=\bigl (Z^{(N)}_k(1),Z^{(N)}_k(2),\ldots ,Z^{(N)}_k(M)\bigr ), \qquad k=0,1,2,\ldots , \end{aligned}$$

where $Z^{(N)}_k(i)$ represents the number of type i cells in the generation k. We denote by

$$\begin{aligned} \textbf{X}^{(N)}_k:=\bigl (X^{(N)}_k(1),X^{(N)}_k(2),\ldots ,X^{(N)}_k(M)\bigr ) \end{aligned}$$

the frequency vectors of the population. That is,

$$\begin{aligned} X^{(N)}_k(i)=\frac{Z^{(N)}_k(i)}{N}=\frac{1}{N}\cdot \#\{\text{ cells } \text{ of } \text{ type } i \text{ in } \text{ the } k\text{-th } \text{ generation }\}, \end{aligned}$$

(1)

for all $1\le i\le M.$ We refer to the frequency vector $\textbf{X}^{(N)}_k$ as the profile of the population at time k.

With biological applications in mind, the terms “particles” and “cells” are used interchangeably throughout this paper. We assume that the sequence $\textbf{Z}^{(N)}_k$ forms a time-homogeneous Markov chain with the Wright–Fisher frequency-dependent transition mechanism. That is, conditioned on $\textbf{Z}^{(N)}_k,$ the next generation vector $\textbf{Z}^{(N)}_{k+1}$ has a multinomial distribution with the profile-dependent parameter ${\varvec{\Gamma }}\big (\textbf{X}^{(N)}_k\big ),$ where ${\varvec{\Gamma }}$ is a vector field that shapes the fitness landscape of the population (see Sect. 2.2 for details). In Example 1 we list several examples of the fitness taken from either biological or biologically inspired mathematical literature and briefly indicate a rationale behind their introduction.

It is commonly assumed in evolutionary population biology that particles reproduce or adopt their behavior according to their fitness, which depends on the population composition through a parameter representing utility of a random interaction within the population. In Sect. 3, in order to relate the stochastic dynamics of the Wright–Fisher model to first biological principles, we interpret the mean-field model in terms of the reproductive fitness of the stochastic process and study some basic properties of the fitness. Our main contribution here is the following maximization principle for a general deterministic replicator dynamics. An informal statement of this contribution is as follows. We refer to a vector $\textbf{x}$ in ${{\mathbb {R}}}^m$ as a population profile vector if all its components are non-negative and their sum is one (i. e., $\textbf{x}$ is a probability vector).

Theorem 1

Assume that ${\varvec{\Gamma }}$ is a continuous vector-field. Then there exists a “nice” (non-decreasing, bounded, complete Lyapunov, see Definition 2) function h mapping population profiles into the set of reals ${{\mathbb {R}}},$ such that:

(i)
$h(\textbf{x})$ is the average $\sum _i x(i)\varphi _i(\textbf{x})$ of an ordinal Darwinian fitness given by
$$\begin{aligned} \varphi _i(\textbf{x})=\frac{\Gamma _i(\textbf{x})}{x(i)}h(\textbf{x}),\qquad \qquad i=1,\ldots ,M. \end{aligned}$$
(2)
(ii)
$h\big ({\varvec{\Gamma }}(\textbf{x})\big )\ge h(\textbf{x})$ for any population profile $\textbf{x}.$
(iii)
$h\big ({\varvec{\Gamma }}(\textbf{x})\big )>h(\textbf{x})$ for any population profile $\textbf{x}$ which is, loosely speaking, “not recurrent” (see Definition 1 for the definition of the “recurrence”).

We refer to Theorem 3.2 for a formal statement of the result. The theorem is a simple consequence of the “fundamental theorem of dynamical systems” due to Conley (1978), see Sect. 3.2 for details.

Let $\textbf{x}$ be an arbitrary population profile, set ${{\varvec{\psi }}}_0=\textbf{x},$ and define recursively,

$$\begin{aligned} {{\varvec{\psi }}}_{k+1}={\varvec{\Gamma }}({{\varvec{\psi }}}_k),\qquad \qquad k\ge 0. \end{aligned}$$

(3)

We will occasionally use the notation ${{\varvec{\psi }}}_k(\textbf{x})$ to indicate the initial state ${{\varvec{\psi }}}_0=\textbf{x}.$ It turns out that dynamical system ${{\varvec{\psi }}}_k$ serves as mean-field approximation for the stochastic model (see Sect. 2.2 and Theorem 4.1 for details). It follows from Theorem 1 that the average fitness h is non-decreasing along the orbits of ${{\varvec{\psi }}}_k:$

$$\begin{aligned} h({{\varvec{\psi }}}_{k+1})\ge h({{\varvec{\psi }}}_k), \end{aligned}$$

and, moreover, (again, informally) “typically” we have a strong inequality $h({{\varvec{\psi }}}_{k+1})>h({{\varvec{\psi }}}_k).$ Furthermore, since h is a bounded function,

$$\begin{aligned} \lim _{k\rightarrow \infty } h({{\varvec{\psi }}}_k)=\sup _k h({{\varvec{\psi }}}_k) \end{aligned}$$

exists for all initial conditions ${{\varvec{\psi }}}_0.$

In fact, the average/population Darwinian fitness (as defined in (14) below) generically remains constant during the evolution of any Wright–Fisher model driven by a replicator mean-field dynamics (see Sect. 3.1 for details). Theorem 1 offers an alternative notion of fitness, a distinguished ordinal utility, that is free of this drawback. In contrast to the most of the known results of this type (Ao 2005; Birch 2016; Bratus et al. 2018; Doebeli et al. 2017; Edwards 2016; Ewens 2011; Ewens and Lessard 2015; Gavrilets 2010; Obolski et al. 2018; Orr 2009; Queller 2017), our maximization principle 1) is genuinely universal, such that it can be applied to any discrete-time and continuous in state space replicator dynamics; and 2) refers to maximization of the entire reproduction fitness, rather than of its part.

In Sect. 4.3 we study fixation probabilities of a certain class of Wright–Fisher models, induced by a classical linear-fractional replicator mean-field dynamics with a symmetric payoff matrix which corresponds to a partnership normal-form game with a unique evolutionary stable equilibrium. Understanding the route to fixation of multivariate population models is of a great importance to many branches of biology (Der et al. 2011; Gavrilets 2010; Hofrichter et al. 2017; Nowak and Sigmund 2004; Obolski et al. 2018; Waxman and Loewe 2010). For instance, extinction risk of biologically important components, regulated by metabolism in living cells, has been the focus of intensive research and is a key factor in numerous biotechnologically and biomedically relevant applications (Assaf and Meerson 2008; Samoilov and Arkin 2006). Our contribution can be summarized as follows. Firstly, Theorem 4.6 offers an interpretation of the notion “the least fit” which is in the direct accordance to the law of competitive exclusion and Fisher’s fundamental theorem of natural selection. Secondly, on a more practical note, the theorem suggests that for the underlying class of population models, the path to extinction is almost deterministic when the population size is large. We remark that for a different class of models, a similar fixation mechanism has been described in Assaf and Meerson (2010), Assaf and Mobilia (2011), Park and Traulsen (2017). However, our proof is based on a different idea, and is mathematically rigorous in contrast to the previous work (where, in order to establish the result, analytical methods are combined with some heuristic arguments).

Informally, the result in Theorem 4.6 can be stated as follows:

Theorem 2

Suppose that ${\varvec{\Gamma }}$ satisfies the following set of conditions:

(i)
There is ${{\varvec{\chi _{eq}}}} \in {{\mathbb {R}}}^M$ such that ${\chi _{eq} }(i)>0$ for all $i=1,\ldots ,M$ and $\lim _{k\rightarrow \infty } {{\varvec{\psi }}}_k={{\varvec{\chi _{eq}}}}$ whenever $\psi _0(i)>0$ for all $i=1,\ldots ,M.$
(ii)
There are no mutations, that is $\Gamma _i(x)=0$ if $x(i)=0.$
(iii)
The Markov chain $\textbf{X}^{(N)}_k$ is metastable, that is it spends a long time confined in a small neighborhood of the deterministic equilibrium ${{\varvec{\chi _{eq}}}}$ before a large (catastrophic) fluctuation occurs and one of the types gets instantly extinct.

Then the probability that the least fit at the equilibrium ${{\varvec{\chi _{eq}}}}$ type will extinct first converges to one when N goes to infinity.

Metastability or stable coexistence is described in Broekman et al. (2019) as “the long-term persistence of multiple competing species, without any species being competitively excluded by the others, and with each species able to recover from perturbation to low density.” Various mathematical conditions equivalent to a variant of the above concept can be found in Auletta et al. (2018), Bovier and den Hollander (2015), Landim (2019), see also Assaf and Meerson (2010). The interplay between metastability, quasi-stationary distribution, strong persistence, and extinction probabilities in multitype population models has been discussed by many authors, see, for instance, Ao (2005), Ashcroft et al. (2015), Benaïm and Schreiber (2019), Block and Allen (2000), Gyllenberg and Silvestrov (2008), Hofbauer and Sigmund (1998), Hutson and Schmitt (1992), Kang and Chesson (2010), Méléard and Villemonais (2012), Steinsaltz and Evans (2004), Xu et al. (2014), Zhou and Qian (2011) and references therein.

Heuristically, the result in our theorem suggests that if a stochastic population model has a metastable state, then under some technical conditions that grantee a strong enough attraction to the quasi-equilibrium, the extinction by a series of intermediate steps is “costly” to the system (that is where the existence of a Lyapunov function is relevant to the result), and the overwhelmingly most likely extinction scenario is an instantaneous “collapse” of the “weakest link”, namely an instantaneous extinction of the type which is the least abundant at the quasi-equilibrium state. We also emphasize the importance of the strong attraction to the quasi-equilibrium factor, without which the mechanism of extinction can be completely different as it has been illustrated in the literature.

The benefit of the kind of results stated in Theorem 2 for applied biological research is that they elucidate conditions under which the ultimate outcome of a complex stochastic evolution is practically deterministic, and thus can be robustly predicted in a general accordance with a simple variation of the competitive exclusion principle.

In Sect. 4.3, in addition to our main result stated in Theorem 4.6, we prove an auxiliary Proposition 4.7 which complements (Losert and Akin 1983) and explains the mechanism of convergence to the equilibrium under the conditions of Theorem 4.6. The proposition establishes a strong contraction near the equilibrium property for the systems considered in Theorem 4.6. The claim is a “reverse engineering” in the sense that it would immediately imply Theorem 2-(i), whereas the latter has been obtained in Losert and Akin (1983) in a different way and is used here to prove the claim in Proposition 4.7.

The Wright–Fisher Markov chains are known to exhibit complex multi-scale dynamics, partially because of the non-smooth boundary of their natural phase space. The analytic intractability of Wright–Fisher models has motivated an extensive literature on the approximation of these stochastic processes (Chalub and Souza 2014; Ewens 2012; Hartle and Clark 2007; Hofrichter et al. 2017; Nagylaki 1990). These approximations “illuminate the evolutionary process by focusing attention on the appropriate scales for the parameters, random variables, and time and by exhibiting the parameter combinations on which the limiting process depends” (Nagylaki 1990). Our results in Sect. 4.3 concerning the model’s path to extinction rely partially on the Gaussian approximation constructed in Theorem 4.1 and error estimates proved in Sect. 4.2. The construction is fairly standard and has been used to study other stochastic population models. Our results in Sects. 4.1 and 4.2 can be informally summarized as follows:

Theorem 3

Assuming that the vector field ${\varvec{\Gamma }}$ is smooth enough, the following is true:

(i)
$\textbf{X}^{(N)}_k={{\varvec{\psi }}}_k+\frac{\textbf{u}^{(N)}_k}{\sqrt{N}},$ where ${{\varvec{\psi }}}_k$ is the deterministic mean-field sequence defined in (3) (in particular, independent of N) and $\textbf{u}^{(N)}_k$ is a random sequence that, as N goes to infinity, converges in distribution to a solution of certain random linear recursion.
(ii)
For a fixed N and any given $\varepsilon >0,$ with an overwhelmingly large probability, the norm of the noise perturbation vector $\frac{\textbf{u}^{(N)}_k}{\sqrt{N}}$ will not exceed $\varepsilon $ for an order of $e^{\alpha \varepsilon ^2 N}$ first steps of the Markov chain, where $\alpha >0$ is a constant independent of N and $\varepsilon .$

The theorem implies that when the population size N is large, the stochastic Wright–Fisher model can be interpreted as a perturbation of the mean-field dynamical system ${\varvec{\Gamma }}$ with a small random noise.

Due to the particular geometry of the phase space boundary, the stochastic dynamics of the Wright–Fisher model often exhibits structural patterns (specifically, invariant and quasi-invariant sets) inherited from the mean-field model. “Appendix A” of this paper is devoted to the exploration of the peculiar connection between invariant sets of the mean-field dynamical system and the form of the equilibria of the stochastic model when mutation is allowed. The results in the appendix aim to contribute to the general theory of Wright–Fisher processes with mutations and are not used directly anywhere in the paper.

The rest of the paper is organized as follows. The mathematical model is formally introduced in Sect. 2. Section 3 is concerned with the geometry of the fitness landscape and fitness maximization principle for the deterministic mean-field model. Our main results for the stochastic dynamics are stated and discussed in Sect. 4, with proofs deferred to Sect. 5. Finally, conclusions are outlined in Sect. 6. “Appendix A” ties the geometric properties of the fitness to the form of stochastic equilibria of the Wright–Fisher process. In order to facilitate the analysis of stochastic persistence in the Wright–Fisher model considered in Sect. 4, we survey in “Appendix B” a mathematical theory of coexistence for a class of deterministic interacting species models, known as permanent (or uniform persistent) dynamical systems.

2 The model

The aim of this section is to formally present general form of the Wright–Fisher model and introduce some of the key notions pertinent to the general framework. The section is divided into two subsections. Section 2.1 introduces a broader geometric context and thus sets the stage for a formal definition of the model given in Sect. 2.2.

2.1 Geometric setup, notation

Fix $M\in {{\mathbb {N}}}$ and define $S_M=\{1,2,\ldots ,M\}.$ Let ${{\mathbb {Z}}}_+$ and ${{\mathbb {R}}}_+$ denote, respectively, the set of non-negative integers and the set of non-negative reals. Given a vector $\textbf{x}\in {{\mathbb {R}}}^M,$ we usually use the notation x(i) to denote its i-th coordinate. The only exception is in the context of vector fields, whose i-th component will typically be indicated with the subscript i.

Let $\textbf{e}_i=(0,\ldots ,0,1,0,\ldots ,0)$ denote the unit vector in the direction of the i-th axis in ${{\mathbb {R}}}^M.$ We define the $(M-1)$-dimensional simplex

$$\begin{aligned} \Delta _M=\Bigg \{\textbf{x}=\big (x(1),\ldots ,x(M)\big )\in {{\mathbb {R}}}_+^M:\,\sum _{i\in S_M} x(i)=1\Bigg \} \end{aligned}$$

and denote by $V_M$ the set of its vertices:

$$\begin{aligned} V_M=\{\textbf{e}_j: j\in S_M\}\subset {{\mathbb {R}}}^M. \end{aligned}$$

(4)

$\Delta _M$ is a natural state space for the entire collection of stochastic processes $(\textbf{X}^{(N)})_{N\in {{\mathbb {N}}}}$ of (1). We equip $\Delta _M$ with the usual Euclidean topology inherited from ${{\mathbb {R}}}^M.$ For an arbitrary closed subset $\Delta $ of $\Delta _M,$ we denote by $\text{ bd }(\Delta )$ its topological boundary and by $\text{ Int }(\Delta )$ its topological interior $\Delta \backslash \text{ bd }(\Delta ).$ For any non-empty subset J of $S_M$ we denote by $\Delta _{[J]}$ the simplex built on the vertices corresponding to J. That is,

$$\begin{aligned} \Delta _{[J]}=\Bigg \{\textbf{x}\in {{\mathbb {R}}}_+^M:\,\sum _{i\in J} x(i)=1\Bigg \} \subset \text{ bd }(\Delta _M). \end{aligned}$$

We denote by $\partial (\Delta _{[J]})$ the manifold boundary of $\Delta _{[J]}$ and by $\Delta _{[J]}^\circ $ the corresponding interior $\Delta _{[J]}\backslash \partial (\Delta _{[J]}).$ That is,

$$\begin{aligned} \partial (\Delta _{[J]})=\big \{\textbf{x}\in \Delta _{[J]}:x(j)=0~ \text{ for } \text{ some }~j\in J\big \} \end{aligned}$$

and

$$\begin{aligned} \Delta _{[J]}^\circ =\big \{\textbf{x}\in \Delta _{[J]}:x(j)>0~ \text{ for } \text{ all }~j\in J\big \}. \end{aligned}$$

We write $\partial (\Delta _M)$ and $\Delta _M^\circ $ for, respectively, $\partial (\Delta _{[S_M]})$ and $\Delta _{[S_M]}^\circ .$ Note that $ \text{ bd }(\Delta _{[J]})=\Delta _{[J]},$ and hence $\partial (\Delta _{[J]})\ne \text{ bd }(\Delta _{[J]})$ for any proper subset J of $S_M.$

Furthermore, let

$$\begin{aligned} \Delta _{M,N}=\Biggl \{0,\frac{1}{N},\frac{2}{N},\ldots ,\frac{N-1}{N},1\Biggr \}^M. \end{aligned}$$

Note that $\textbf{X}^{(N)}_k\in \Delta _{M,N}$ for all $k\in {{\mathbb {Z}}}_+$ and $N\in {{\mathbb {N}}}.$ For any set of indices $J\subset S_M$ we write

$$\begin{aligned}{} & {} \Delta _{[J],N}=\Delta _{[J]}\bigcap \Delta _{M,N},\quad \partial (\Delta _{[J],N})=\partial (\Delta _{[J]})\bigcap \Delta _{M,N},\nonumber \\{} & {} \quad \Delta _{[J],N}^\circ =\Delta _{[J]}^\circ \bigcap \Delta _{M,N}. \end{aligned}$$

(5)

We simplify the notations $\partial (\Delta _{[S_M],N})$ and $\Delta _{[S_M],N}^\circ $ to $\partial (\Delta _{M,N})$ and $\Delta _{M,N}^\circ ,$ respectively.

We use the maximum ($L^\infty $) norms for vectors in $\Delta _M$ and functions in $C(\Delta _M),$ the space of real-valued continuous functions on $\Delta _M:$

$$\begin{aligned} \Vert \textbf{x}\Vert :=\max _{i\in S_M}|x(i)|~\,\text{ for }~\textbf{x}\in \Delta _M \qquad \text{ and }\qquad \Vert f\Vert :=\max _{\textbf{x}\in \Delta _M}|f(\textbf{x})|~\,\text{ for }~f\in C(\Delta _M). \end{aligned}$$

The Hadamard (element-wise) product of two vectors $\textbf{u},\textbf{v}\in {{\mathbb {R}}}^M$ is denoted by $\textbf{u}\circ \textbf{v}.$ That is, $\textbf{u}\circ \textbf{v} \in {{\mathbb {R}}}^M$ and

$$\begin{aligned} (u\circ v)(i)=u(i)v(i)\qquad \qquad \forall \, i\in S_M. \end{aligned}$$

(6)

The usual dot-notation is used to denote the regular scalar product of two vectors in ${{\mathbb {R}}}^M$.

2.2 Formulation of the model

We now proceed with the formal definition of the Markov chain $\textbf{X}^{(N)}_k.$ Let P denote the underlying probability measure defined in some large probability space that includes all random variables considered in this paper. We use E and COV to denote the corresponding expectation and covariance operators. For a random vector $\textbf{X}=(X(1),X(2),\ldots ,X(M))\in {{\mathbb {R}}}^M,$ we set $E(\textbf{X}):=\big (E\big (X(1)\big ),E\big (X(2)\big ),\ldots ,E\big (X(M)\big )\big ).$

Let ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M$ be a mapping of the probability simplex $\Delta _M$ into itself. Suppose that the population profile at time $k\in {{\mathbb {Z}}}_+,$ is given by a vector $\textbf{x}=\bigl (x(1),\ldots ,x(M)\bigr )\in \Delta _{M,N},$ i.e. $\textbf{X}^{(N)}_k=\textbf{x}.$ Then the next generation’s profile, $\textbf{X}^{(N)}_{k+1}$, is determined by a multinomial trial where the probability of producing an i-th type particle is equal to ${\varvec{\Gamma }}_i (\textbf{x}).$ In other words, $\textbf{X}^{(N)}$ is a Markov chain whose transition probability matrix is defined as

$$\begin{aligned} P\left( \textbf{X}^{(N)}_{k+1}=\left( \frac{j_1}{N},\ldots ,\frac{j_M}{N}\right) \,\mid \,\textbf{X}^{(N)}_k=\textbf{x}\right) =\frac{N!}{\prod _{i=1}^M j_i !} ~~\prod \limits _{i=1}^M\Gamma _i(\textbf{x})^{j_i} \end{aligned}$$

(7)

for all $k\in {{\mathbb {Z}}}_+$ and $\Bigl (\frac{j_1}{N},\ldots ,\frac{j_M}{N}\Bigr )\in \Delta _{M,N}.$ Note that for any state $\textbf{x}\in \Delta _{M,N}$ and time $k\in {{\mathbb {Z}}}_+$ we have

$$\begin{aligned} E\big (\textbf{X}^{(N)}_{k+1}\,\mid \,\textbf{X}^{(N)}_k=\textbf{x}\big )={\varvec{\Gamma }}(\textbf{x}) \end{aligned}$$

(8)

and

$$\begin{aligned} COV\big (X^{(N)}_{k+1}(i)X^{(N)}_{k+1}(j)\,\mid \,X^{(N)}_k=x\big )=\left\{ \begin{array}{ll} -\frac{1}{N}\Gamma _i(\textbf{x})\Gamma _j(\textbf{x})&{}\text{ if }~i\ne j, \\ \frac{1}{N}\Gamma _i(\textbf{x})\bigl (1-\Gamma _i(\textbf{x})\bigr )&{}\text{ if }~i=j. \end{array} \right. \end{aligned}$$

(9)

We refer to any Markov chain $\textbf{X}^{(N)}$ on $\Delta _{M,N}$ that satisfies (7) as a Wright–Fisher model associated with the update rule ${\varvec{\Gamma }}.$

We denote transition kernel of $\textbf{X}^{(N)}$ on $\Delta _{M,N}$ by $P_N.$ That is, $P_N(\textbf{x},\textbf{y})$ is the conditional probability in (7) with $y(i)=j_i/N.$ For $k\in {{\mathbb {N}}},$ we denote by $P_N^k$ the k-th iteration of the kernel. Thus,

$$\begin{aligned} P_N^k(\textbf{x},\textbf{y})=P\big (\textbf{X}^{(N)}_{m+k}=\textbf{y}\,\mid \,\textbf{X}^{(N)}_m=\textbf{x}\big ) \end{aligned}$$

for any $m\in {{\mathbb {Z}}}_+.$

The Wright–Fisher model is completely characterized by the choice of the parameters M, N and the update rule ${\varvec{\Gamma }}(\textbf{x}).$ Theorem 4.1 below states that when N approaches infinity while M and ${\varvec{\Gamma }}$ remain fixed, the update rule emerges as the model’s mean-field map, in that the path of the Markov chain $\textbf{X}^{(N)}$ with $\textbf{X}^{(N)}_0=\textbf{x}$ converges to the orbit of the (deterministic) discrete-time dynamical system ${{\varvec{\psi }}}_k$ introduced in (3). Thus, for the large population size N, the Wright–Fisher process can be perceived as a small-noise perturbation of the deterministic dynamical system ${{\varvec{\psi }}}_k.$

Throughout the paper we assume that the mean-field dynamics is induced by a replicator equation

$$\begin{aligned} \Gamma _i(\textbf{x})=\frac{x(i)\varphi _i(\textbf{x})}{\sum _{j\in S_M}x(j)\varphi _j (\textbf{x})}\qquad \qquad \forall \, i\in S_M, \end{aligned}$$

(10)

where the vector-field ${{\varvec{\varphi }}}:\Delta _M\rightarrow {{\mathbb {R}}}_+^M$ serves as a fitness landscape of the model (see Sect. 3 for details). For biological motivations of this definition, see (Cressman and Tao 2014; Hofbauer and Sigmund 2003; Sigmund 1986). The replicator dynamics can be viewed as a natural discretization scheme for its continuous-time counterpart described in the form of a differential equation (Garay and Hofbauer 2003; Hofbauer and Sigmund 1998). The normalization factor $\sum _{j\in S_m} x(j) \varphi _j(\textbf{x})$ in (10) is the total population fitness and plays a role similar to that of the partition function in statistical mechanics (Barton and Coe 2009).

Example 1

(i)
One standard choice of the fitness ${{\varvec{\varphi }}}$ in the replicator equation (10) is Hofbauer (2011), Hofbauer and Sigmund (1998), Hofrichter et al. (2017), Imhof and Nowak (2006):
$$\begin{aligned} {{\varvec{\varphi }}} (\textbf{x})=(1-\omega )\textbf{b}+\omega \textbf{A} \textbf{x}, \end{aligned}$$
(11)
where $\textbf{A}=(A_{ij})_{i,j\in S_M}$ is an $M\times M$ payoff matrix, $\textbf{b}$ is a constant M-vector, and $\omega \in (0,1)$ is a selection parameter. The evolutionary game theory interprets $(Ax)(i)=\sum _{j\in S_M}A_{ij}x(j)$ as the expected payoff of a type i particle in a single interaction (game) with a particle randomly chosen from the population profile $\textbf{x}$ when the utility of the interaction with a type j particle is $A_{ij}.$ The constant vector $\textbf{b}$ represents a baseline fitness and the selection parameter w is a proxy for modeling the strength of the effect of interactions on the evolution of the population compared to inheritance (Imhof and Nowak 2006).
(ii)
In replicator models associated with multiplayer games (Broom and Rychtář 2013; Gokhale and Traulsen 2014), the pairwise interaction of case (i) (two-person game) is replaced by a more complex interaction within a randomly selected group (multiplayer game). In this case, the fitness ${{\varvec{\varphi }}}$ is typically a (multivariate) polynomial of $\textbf{x}$ of a degree higher than one.
(iii)
An evolutionary game dynamics with non-uniform interaction rates has been considered in Taylor and Nowak (2006). Mathematically, their assumptions amount to replacing (11) with
$$\begin{aligned} \varphi _i(\textbf{x})=(1-\omega )b(i)+\omega \frac{Bx(i)}{Rx(i)}, \end{aligned}$$
where $\textbf{R}$ is the matrix of rates reflecting the frequency of pair-wise interaction between different types, and $\textbf{B}=\textbf{A}\circ \textbf{R},$ the Hadamard product of the payoff matrix $\textbf{A}$ and $\textbf{R}.$
(iv)
Negative frequency dependence of the fitness can be captured by, for instance, allowing matrix $\textbf{A}$ in (11) to have negative entries. In fact, it suffices that the matrix off-diagonal elements are larger than the respective diagonal elements (Broekman et al. 2019).
(v)
A common choice of a non-linear fitness is an exponential function (Hofbauer and Sigmund 1998; Sandholm 2010):
$$\begin{aligned} \varphi _i(\textbf{x})=e^{\textbf{e}ta Ax(i)}, \end{aligned}$$
(12)
where, as before, $\textbf{A}$ is a payoff matrix, and $\textbf{e}ta>0$ is a “cooling” parameter [analogue of the inverse temperature in statistical mechanics (Barton and Coe 2009)].

3 Mean-field model: fitness landscape

In Sect. 3.1 we relate the update rule $\Gamma $ to a family of fitness landscapes of the mean-field model in the absence of mutations. All the fitness functions in the family yield the same relative fitness and differ from each other only by a scalar normalization function. Section 3.2 contains a version of Fisher’s maximization principle for the deterministic mean-field model. In Sect. 3.3, we incorporate mutations into the framework presented in Sects. 3.1 and 3.2.

The main result of this section is Theorem 3.2 asserting the existence of a non-constant reproductive fitness which is maximized during the evolutionary process. The result is an immediate consequence of a version of Conley’s “fundamental theorem of dynamical systems” (1978, Theorem B in Section II.6.4), a celebrated mathematical result that appears to be underutilized in biological applications.

3.1 Reproductive fitness and replicator dynamics

The main goal of this section is to introduce a notion of fitness suitable for supporting a formalization of Fisher’s fitness maximization principle stated below in Sect. 3.2 as Theorem 3.2. Proposition 3.1 provides a useful characterization of the fitness, while Theorem 1 [which is a citation from Garay and Hofbauer (2003)] addresses some of its more technical properties, all related to the characterization of the support of the fitness function.

Let ${{\mathcal {C}}}:\Delta _M\rightarrow 2^{S_M}$ be the composition set-function defined as

$$\begin{aligned} {{\mathcal {C}}}(\textbf{x})=\{j\in S_M: x(j)>0\}=\text {``support of vector }{} \textbf{x}\text {''}. \end{aligned}$$

(13)

According to (8), for a population profile $\textbf{x}\in \Delta _{M,N},$ the M-dimensional vector $\textbf{f}(\textbf{x})$ with i-th component equal to

$$\begin{aligned} f_i(\textbf{x})=\frac{\Gamma _i(\textbf{x})}{x(i)}\qquad \text{ if } \quad i\in {{\mathcal {C}}}(\textbf{x}) \end{aligned}$$

(14)

represents the Darwinian fitness of the profile $\textbf{x}$ (Hartle and Clark 2007; Orr 2009), see also Doebeli et al. (2017), Svensson and Connallon (2019). We will adopt the terminology of Chalub and Souza (2017), Traulsen et al. (2005) and call a vector field ${{\varvec{\varphi }}}: \Delta _M\rightarrow {{\mathbb {R}}}_+^M$ a reproductive fitness of the population if for all $\textbf{x}\in \Delta _M$ we have:

$$\begin{aligned} \sum _{j\in {{\mathcal {C}}}(\textbf{x})}x(j)\varphi _j (\textbf{x})>0 \qquad \text{ and } \qquad \Gamma _i(\textbf{x})=\frac{x(i)\varphi _i(\textbf{x})}{\sum _{j\in S_M}x(j)\varphi _j (\textbf{x})}\quad \forall \, i\in {{\mathcal {C}}}(\textbf{x}). \end{aligned}$$

(15)

Notice that even though we assume that both ${{\varvec{\varphi }}}(\textbf{x})$ and $\textbf{f}(\textbf{x})$ are vectors in ${{\mathbb {R}}}_+^M,$ the definitions (14) and (15) only consider $i\in {{\mathcal {C}}}(\textbf{x}),$ imposing no restriction on the rest of the vectors components. Notice that, since ${\varvec{\Gamma }}$ is a probability vector, the Darwinian fitness defined in (14) can serve as an example of the reproductive fitness. Furthermore, a comparison of (15) and (14) shows that $\varphi (\textbf{x})$ is a reproductive fitness if and only if ${{\varvec{\varphi }}}(\textbf{x})=h(\textbf{x})\textbf{f}(\textbf{x})$ for all $\textbf{x}\in \Delta _M,$ where $h: \Delta _M\rightarrow {{\mathbb {R}}}_+$ is an arbitrary strictly positive scalar-valued function.

Both concepts of fitness are pertinent to the amount of reproductive success of particles of a certain type per capita of the population. We remark that in McAvoy et al. (2018), ${{\varvec{\varphi }}}$ is referred to as the reproductive capacity rather than the reproductive fitness. A similar definition of reproductive fitness, but with the particular normalization $\textbf{x}\cdot {{\varvec{\varphi }}}=1$ and the restriction of the domain of ${\varvec{\Gamma }}$ to the interior $\Delta _M^\circ ,$ is given in Sandholm (2010, p. 160).

In the following straightforward extension of Lemma 1 in Chalub and Souza (2017), we show that two notions of fitness essentially agree under the following no-mutations condition:

$$\begin{aligned} \sum _{j\in {{\mathcal {C}}}(\textbf{x})} \Gamma _j(\textbf{x})=1 \qquad \text{ or, } \text{ equivalently, }\qquad {\varvec{\Gamma }}(\textbf{x})\in \Delta _{[{{\mathcal {C}}}(\textbf{x})]}\quad \forall ~\textbf{x}\in \Delta _M. \end{aligned}$$

(16)

Proposition 3.1

The following is true for any given ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M:$

(i)
A reproductive fitness exists if and only if condition (16) is satisfied.
(ii)
Suppose that condition (16) is satisfied and let $h: \Delta _M\rightarrow (0,\infty )$ be an arbitrary strictly positive function defined on the simplex. Then any ${{\varvec{\varphi }}}:\Delta _M\rightarrow {{\mathbb {R}}}_+^M$ such that
$$\begin{aligned} \varphi _i(\textbf{x})=\frac{\Gamma _i(\textbf{x})}{x(i)}h(\textbf{x})\qquad \forall \, i\in {{\mathcal {C}}}(\textbf{x}), \end{aligned}$$
(17)
is a reproductive fitness.
(iii)
Conversely, if ${{\varvec{\varphi }}}: \Delta _M\rightarrow {{\mathbb {R}}}_+^M$ is a reproductive fitness, then (17) holds for $h(\textbf{x})=\textbf{x}\cdot {{\varvec{\varphi }}}(\textbf{x}).$
(iv)
If ${{\varvec{\varphi }}}:\Delta _M\rightarrow {{\mathbb {R}}}_+^M$ is a reproductive fitness, $\textbf{x}\in \Delta _M$ and $i\in {{\mathcal {C}}}(\textbf{x}),$ then $\varphi _i(\textbf{x})>0$ if and only if $\Gamma _i(\textbf{x})>0.$

Note that (16) is equivalent to the condition that ${\varvec{\Gamma }}\big (\Delta _{[I]}^\circ \big )\subset \Delta _{[I]}$ for all $I\subset S_M.$ Therefore, if ${\varvec{\Gamma }}$ is a continuous map, then (16) is equivalent to the condition

$$\begin{aligned} {\varvec{\Gamma }}\big (\Delta _{[I]}\big )\subset \Delta _{[I]} \qquad \qquad \forall \, I\subset S_M. \end{aligned}$$

For continuous fitness landscapes, the following refined version of Proposition 3.1 has been obtained in Garay and Hofbauer (2003). Recall that ${\varvec{\Gamma }}$ is called a homeomorphism of $\Delta _M$ if it is a continuous bijection of $\Delta _M$ and its inverse function is also continuous. Intuitively, a bijection ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M$ is a homeomorphism when the distance $\Vert {\varvec{\Gamma }}(x)-{\varvec{\Gamma }}(y)\Vert $ is small if and only if $\Vert \textbf{x}-\textbf{y}\Vert $ is small. ${\varvec{\Gamma }}$ is a diffeomorphism if it is a homeomorphism and both ${\varvec{\Gamma }}$ and ${\varvec{\Gamma }}^{-1}$ are continuously differentiable.

Theorem 1

(Garay and Hofbauer 2003) Suppose that ${\varvec{\Gamma }}=\textbf{x}\circ {{\varvec{\varphi }}}$ for a continuous ${{\varvec{\varphi }}}:\Delta _M\rightarrow {{\mathbb {R}}}_+^M$ and ${\varvec{\Gamma }}$ is a homeomorphism of $\Delta _M.$ Then:

(i)
We have:
$$\begin{aligned} \Gamma \big (\Delta _{[I]}^\circ \big )=\Delta _{[I]}^\circ \quad \text{ and } \quad \Gamma \big (\partial (\Delta _{[I]})\big )=\partial \big (\Delta _{[I]}\big ),\qquad \forall \,I\subset S_M. \end{aligned}$$
(ii)
If, in addition, ${\varvec{\Gamma }}$ is a diffeomorphism of $\Delta _M$ and each $\varphi _i,$ $i\in S_M,$ can be extended to an open neighborhood U of $\Delta _M$ in ${{\mathbb {R}}}^M$ in such a way that the extension is continuously differentiable on U, then $\varphi _i(\textbf{x})>0$ for all $\textbf{x}\in \Delta _M$ and $i\in S_M.$

The first part of the above theorem is the Surjectivity Theorem on p. 1022 of Garay and Hofbauer (2003) and the second part is Corollary 6.1 there. The assumption that ${\varvec{\Gamma }}$ is a bijection, i. e. one-to-one and onto on $\Delta _M,$ seems to be quite restrictive in biological applications (Bratus et al. 2018; de Visser and Krug 2014; Gavrilets 2010; Obolski et al. 2018). Nevertheless, the theorem is of a considerable interest for us because it turns out that its conditions are satisfied for ${\varvec{\Gamma }}$ in (11) and (12).

3.2 Maximization principle for the reproductive fitness

The aim of this section is to introduce a maximization principle suitable for the deterministic replicator dynamics and reproductive fitness. The result is stated in Theorem 3.2. An example of application to the stochastic Wright–Fisher model is given in Proposition 3.3.

For the Darwinian fitness we have $\overline{\textbf{f}}(\textbf{x}):=\sum _{i\in S_M} x_if_i(\textbf{x})= \sum _{i\in S_M} \Gamma _i(\textbf{x})=1.$ Thus the average Darwinian fitness is independent of the population profile, and in particular does not change with time. In contrary, the average reproductive fitness $\overline{{\varvec{\varphi }}}(\textbf{x}):= \sum _{i\in S_M} x_i\varphi _i(\textbf{x})$ is equal to the (arbitrary) normalization factor $h(\textbf{x})$ in (17), and hence in principle can carry more structure. If the mean-field dynamics is within the domain of applicability of Fisher’s fundamental theorem of natural selection, we should have (Birch 2016; Edwards 2016; Ewens and Lessard 2015; Queller 2017; Schneider 2010):

$$\begin{aligned} h\big ({\varvec{\Gamma }}(\textbf{x})\big )\ge h(\textbf{x})\qquad \forall \,\textbf{x}\in \Delta _M. \end{aligned}$$

(18)

In order to elaborate on this point, we need to introduce the notion of chain recurrence.

Definition 1

(Conley 1978) An $\varepsilon $-chain from $\textbf{x}\in \Delta _M$ to $\textbf{y}\in \Delta _M$ for ${\varvec{\Gamma }}$ is a sequence of points in $\Delta _M,$ $\textbf{x} = \textbf{x}_0, \textbf{x}_1, \ldots , \textbf{x}_n = \textbf{y},$ with $n\in {{\mathbb {N}}},$ such that $\big \Vert {\varvec{\Gamma }}(\textbf{x}_i)-\textbf{x}_{i+1}\big \Vert < \varepsilon $ for $0 \le i \le n-1.$ A point $\textbf{x} \in \Delta _M$ is called chain recurrent if for every $\varepsilon > 0$ there is an $\varepsilon $-chain from x to itself. The set ${{\mathcal {R}}}({\varvec{\Gamma }})$ of chain recurrent points is called the chain recurrent set of ${\varvec{\Gamma }}.$

Write $\textbf{x} \looparrowright \textbf{y}$ if for every $\varepsilon >0$ there is an $\varepsilon $-chain from $\textbf{x}$ to $\textbf{y}$ and $\textbf{x}\sim \textbf{y}$ if both $\textbf{x} \looparrowright \textbf{y}$ and $\textbf{y} \looparrowright \textbf{x}$ hold true. It is easy to see that $\sim $ is an equivalence relation on ${{\mathcal {R}}}({\varvec{\Gamma }}).$ For basic properties of the chain recurrent set we refer the reader to Conley (1978) and Block and Franke (1985), Fathi and Pageault (2015), Franke and Selgrade (1976). In particular, Conley showed [see Proposition 2.1 in Franke and Selgrade (1976)] that equivalence classes of ${{\mathcal {R}}}({\varvec{\Gamma }})$ are precisely its connected components. Note that if ${\varvec{\Gamma }}(\textbf{x})=\textbf{x},$ then $\textbf{x}_0=\textbf{x}$ and $\textbf{x}_1=\textbf{x}$ form an $\varepsilon $-chain for any $\varepsilon >0.$ Thus, any fixed point of ${\varvec{\Gamma }}$ is chain recurrent

Somewhat surprisingly, the following general result is true:

Theorem 3.2

Assume that ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M$ is continuous. Then there exists a reproductive fitness function ${{\varvec{\varphi }}}$ such that

(i)
${{\varvec{\varphi }}}$ is continuous in $\Delta _{[I]}^\circ $ for all $I\subset \Delta _M.$
(ii)
The average $h(\textbf{x})=\textbf{x}\cdot {{\varvec{\varphi }}}(\textbf{x})$ satisfies (18) and, moreover, is a complete Lyapunov function for ${\varvec{\Gamma }}$ (see Definition 2).

In particular, the difference $h\big ({\varvec{\Gamma }}(\textbf{x})\big )-h(\textbf{x})$ is strictly positive on $\Delta _M\backslash {{\mathcal {R}}}({\varvec{\Gamma }}).$

Theorem 3.2 is an immediate consequence of a well-known mathematical result which is sometimes called a “fundamental theorem of dynamical systems” (Franks; Norton 1995; Papadimitriou and Piliouras 2016). The following definition and theorem are adapted from Hurley (1998). To simplify notation, in contrast to the conventional definition, we define Lyapunov function as a function non-decreasing (rather than non-increasing) along the orbits of ${{\varvec{\psi }}}_k.$

Definition 2

A complete Lyapunov function for ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M$ is a continuous function $h: \Delta _M \rightarrow [0,1]$ with the following properties:

1.
If $\textbf{x}\in \Delta _M\backslash {{\mathcal {R}}}({\varvec{\Gamma }}),$ then $h\big ({\varvec{\Gamma }}(\textbf{x})\big ) > h(\textbf{x}).$
2.
If $\textbf{x}, \textbf{y} \in {{\mathcal {R}}}({\varvec{\Gamma }})$ and $ \textbf{x} \looparrowright \textbf{y},$ then $h(\textbf{x})\le h(\textbf{y}).$ Moreover, $h(\textbf{x}) = h(\textbf{y})$ if and only if $\textbf{x} \sim \textbf{y}.$
3.
The image $h\big ({{\mathcal {R}}}({\varvec{\Gamma }})\big )$ is a compact nowhere dense subset of [0, 1].

We remark that while h must be bounded as a continuous functions defined on a compact domain $\Delta _M,$ the normalization $h(\textbf{x})\in [0,1]$ is rather arbitrary and chosen for the convenience only. Indeed, any multiplier ch of a complete Lyapunov function h with a positive scalar $c>0,$ is a complete Lyapunov function itself.

We have:

Theorem 2

(Fundamental theorem of dynamical systems) If ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M$ is continuous, then there is a complete Lyapunov function for ${\varvec{\Gamma }}.$

To derive Theorem 3.2 from this fundamental result, set $\varphi _i(\textbf{x})=\frac{\Gamma _i(\textbf{x})}{x_i}h(\textbf{x})$ for $i\in {{\mathcal {C}}}(\textbf{x})$ and, for instance, $\varphi _i(\textbf{x})=0$ for $i\not \in {{\mathcal {C}}}(\textbf{x}).$

Theorem 2 was established by Conley for continuous-time dynamical systems on a compact space (Conley 1978). The above discrete-time version of the theorem is taken from Hurley (1998), where it is proven for an arbitrary separable metric state space. For recent progress and refinements of the theorem see Bernardi and Florio (2019), Fathi and Pageault (2019), Pageault (2009) and references therein.

The Lyapunov functions constructed in Conley (1978) and Hurley (1998) map ${{\mathcal {R}}}({\varvec{\Gamma }})$ into a subset of the Cantor middle-third set. Thus, typically, $\Delta _M\backslash {{\mathcal {R}}}({\varvec{\Gamma }})$ is a large set [cf. Section 6.2 in Conley (1978), Theorem A in Block and Franke (1985), and an extension of Theorem 2 in Fathi and Pageault (2015)]. For example, for the class of Wright–Fisher models discussed in Sect. 4.3, ${{\mathcal {R}}}({\varvec{\Gamma }})$ consists of the union of a single interior point (asymptotically stable global attractor in $\Delta _M^\circ $) and the boundary of $\Delta _M$ (repeller). For a related more general case, see “Appendix B” and Example 7 below.

Recall that a point $\textbf{x}\in \Delta _M$ is called recurrent if the orbit $({{\varvec{\psi }}}_k)_{k\in {{\mathbb {Z}}}_+}$ with ${{\varvec{\psi }}}_0=\textbf{x}$ visits any open neighborhood of $\textbf{x}$ infinitely often. Any recurrent point is chain recurrent, but the converse is not necessarily true. It is easy to verify that the equality $h(\Gamma (\textbf{x})) =h(\textbf{x})$ holds for any recurrent point $\textbf{x}.$ The following example builds upon this observation. It shows that for a general chaotic map the complete Lyapunov function needs to be constant.

Example 2

The core of this example is the content of Theorem 4 in Basener (2013). Assume that (18) holds for a continuous function $h:\Delta _M\rightarrow {{\mathbb {R}}}_+$ in (17). Suppose in addition that the mean-field dynamics is chaotic. More precisely, assume that ${\varvec{\Gamma }}$ is topologically transitive, that is ${\varvec{\Gamma }}$ is continuous and for any two non-empty open subsets U and V of $\Delta _M,$ there exists $k\in {{\mathbb {N}}}$ such that ${{\varvec{\psi }}}_k(U)\bigcap V\ne \emptyset $ [this is not a standard definition of a chaotic dynamical system, we omitted from a standard Devaney’s set of conditions a part which is not essential for our purpose, cf. Banks et al. (1992), Basener (2013), Silverman (1992)]. Equivalently (Silverman 1992), there is $\textbf{x}_0\in \Delta _M$ such that the orbit $\big ({{\varvec{\psi }}}_k(\textbf{x}_0)\big )_{k\in {{\mathbb {Z}}}_+}$ is dense in $\Delta _M.$

Let $\textbf{x}={{\varvec{\psi }}}_k(\textbf{x}_0)$ be an arbitrary point on this orbit. We will now show that the assumption $\varepsilon :=\big [h\big ({\varvec{\Gamma }}(\textbf{x})\big )-h(\textbf{x})\big ]>0$ leads to a contradiction. To this end, let $\delta >0$ be so small that $|h(\textbf{x})-h(\textbf{y})|<\varepsilon /2$ whenever $\Vert \textbf{x}-\textbf{y}\Vert <\delta .$ By our assumption, there exists $m>k$ such that $\Vert {{\varvec{\psi }}}_m(\textbf{x}_0)-\textbf{x}\Vert <\delta ,$ and hence $h\big ({{\varvec{\psi }}}_m(\textbf{x}_0)\big )<h\big ({\varvec{\Gamma }}(\textbf{x})\big )=h\big ({{\varvec{\psi }}}_{k+1}(\textbf{x}_0)\big ),$ which is impossible in view of the monotonicity condition (18). Thus h is constant along the forward orbit of $\textbf{x}_0.$ Since the normalization function $h(\textbf{x})$ is constant on a dense subset of $\Delta _M$ and $\Delta _M$ is compact, $h(\textbf{x})$ is a constant function. In other words, for a chaotic mean-field dynamics a constant function is the only choice of a continuous normalization $h(\textbf{x})$ that satisfies (18).

A common example of a chaotic dynamical system is the iterations of a one-dimensional logistic map. In our framework this can be translated into the two-dimensional case $M=2$ and ${\varvec{\Gamma }}(x,y)=\big (4x(1-x), 1-4x(1-x)\big )=(4xy,1-4xy).$ In principle, the example can be extended to higher-dimensions using, for instance, techniques of Akhmet and Fen (2016).

A generic instance of the mean-field update function ${\varvec{\Gamma }}$ with a non-constant complete Lyapunov function is discussed in Example 7 deferred to “Appendix B”.

The existence of a non-decreasing reproductive fitness does not translate straightforwardly into a counterpart for the stochastic dynamics. Indeed, in the absence of mutations $\textbf{X}^{(N)}_k$ converges almost surely to an absorbing state, i. e. to a random vertex of the simplex $\Delta _M.$ Therefore, by the bounded convergence theorem,

$$\begin{aligned} \lim _{k\rightarrow \infty } E\big (h(\textbf{X}^{(N)}_k)\,\mid \,\textbf{X}^{(N)}_0=\textbf{x}\big )=\sum _{j\in S_M}p_j(\textbf{x})h(e_j), \end{aligned}$$

where $p_j(\textbf{x})=\lim _{k\rightarrow \infty } P\big (\textbf{X}^{(N)}_k=\textbf{e}_j\,\mid \,\textbf{X}^{(N)}_0=\textbf{x}\big ).$ It then follows from part 2 of Definition 2 that if the boundary $\partial \big (\Delta _{M,N}\big )$ is a repeller for ${\varvec{\Gamma }}$ (cf. “Appendix B”), then generically the limit is not equal to the $\sup _{k\in {{\mathbb {N}}}} E\big (h(\textbf{X}^{(N)}_k)\,\mid \,\textbf{X}^{(N)}_0=\textbf{x}\big ).$

The following example shows that under some additional convexity assumptions, the mean of a certain continuous reproductive fitness is a strictly increasing function of time. More precisely, for the class of Wright–Fisher models in the example, $h\big (\textbf{X}^{(N)}_k)$ is a submartingale for an explicit average reproductive fitness h.

Example 3

Let $h:\Delta _M\rightarrow {{\mathbb {R}}}_+$ be an arbitrary convex (and hence, continuous) function that satisfies (18), not necessarily the one given by Theorem 3.2. Then, by the multivariate Jensen’s inequality (Ferguson 1967, p. 76), with probability one, for all $k\in {{\mathbb {Z}}}_+,$

$$\begin{aligned} E\big (h\big (\textbf{X}^{(N)}_{k+1}\big )\,\mid \,\textbf{X}^{(N)}_k\big )\ge h\big (E\big (\textbf{X}^{(N)}_{k+1}\,\mid \,\textbf{X}^{(N)}_k\big )\big )=h\big ({\varvec{\Gamma }}\big (\textbf{X}^{(N)}_k\big )\big )\ge h\big (\textbf{X}^{(N)}_k). \end{aligned}$$

(19)

Thus, $M_k=h\big (\textbf{X}^{(N)}_k\big )$ is a submartingale in the filtration ${{\mathcal {F}}}_k=\sigma \big (\textbf{X}^{(N)}_0,\ldots ,\textbf{X}^{(N)}_k\big ).$ In particular, the sequence ${{\mathfrak {f}}}_k=E(M_k)$ is non-decreasing and converges to its supremum as $k\rightarrow \infty .$ In fact, since the first inequality in (19) is strict whenever $\textbf{X}^{(N)}_k\not \in V_M,$ the mean of the reproductive fitness is strictly increasing under the no-mutations assumption (16):

$$\begin{aligned} 0\le E\big (h\big (\textbf{X}^{(N)}_k\big )\big )<E\big (h\big (\textbf{X}^{(N)}_{k+1}\big )\big )\le \max _{\textbf{x} \in \Delta _M} h(\textbf{x})<\infty \end{aligned}$$

as long as $P\big (\textbf{X}^{(N)}_0\in \Delta _{M,N}^\circ \big )=1.$ Note that the maximum exists and is finite because h is continuous and $\Delta _M$ is a compact set.

The effect of the fitness convexity assumption on the evolution of biological populations has been discussed by several authors, see, for instance, Christie and Beekman (2017), Hintze et al. (2015), Spichtig and Kawecki (2004) and references therein. Generally speaking, the inequality $h\big (\alpha \textbf{x}+(1-\alpha )\textbf{y}\big )\le \alpha h(\textbf{x})+(1-\alpha )h(\textbf{y}),$ $\alpha \in [0,1],$ $\textbf{x},\textbf{y}\in \Delta _M,$ that characterizes convexity, manifests an evolutionary disadvantage of the intermediate population $\alpha \textbf{x}+(1-\alpha )\textbf{y}$ in comparison to the extremes $\textbf{x},\textbf{y}.$ A typical biological example of such a situation is competitive foraging (Hintze et al. 2015).

It can be shown [see Section 6 in Hofbauer (2011) or Losert and Akin (1983)] that if $\textbf{A}$ is a non-negative symmetric $M\times M$ matrix, then (18) holds with $h(\textbf{x})=\textbf{x}^T\textbf{A}{} \textbf{x}$ for (11) and (12). Here and henceforth, $\textbf{A}^T$ for a matrix $\textbf{A}$ (of arbitrary dimensions) denotes the transpose of $\textbf{A}.$ Moreover, the inequality is strict unless $\textbf{x}$ is a game equilibrium. The evolutionary games with $\textbf{A}=\textbf{A}^T$ are referred to in Hofbauer (2011) as partnership games, such games are a particular case of so-called potential games [see Hofbauer and Sigmund (2003), Sandholm (2010) and references therein]. In the case of the replicator mean-field dynamics (10) and potential games, the potential function $h(\textbf{x})$ coincides with the average reproductive fitness $\textbf{x}\cdot {{\varvec{\varphi }}}(\textbf{x}).$ Let

$$\begin{aligned} W_M=\Bigg \{\textbf{x}\in {{\mathbb {R}}}^M:\sum _{i\in S_M} x(i)=0\Bigg \}. \end{aligned}$$

(20)

It is easy to check that if $\textbf{A}$ is positive definite on $W_M,$ that is

$$\begin{aligned} \textbf{w}^T\textbf{A}{} \textbf{w}>0\qquad \qquad \forall \,\textbf{w}\in W_M,\textbf{w}\ne \textbf{0}, \end{aligned}$$

(21)

then $\textbf{f}(\textbf{x})=\textbf{x}^T\textbf{A}{} \textbf{x}$ is a convex function on $\Delta _M,$ that is

$$\begin{aligned} \frac{1}{2}{} \textbf{x}^T\textbf{A}{} \textbf{x}+\frac{1}{2}{} \textbf{y}^T\textbf{A}\textbf{y}>\frac{(\textbf{x}+\textbf{y})^T}{2}{} \textbf{A}\frac{\textbf{x}+\textbf{y}}{2}\qquad \qquad \forall \,\textbf{x},\textbf{y}\in \Delta _M,\textbf{x}\ne \textbf{y}. \end{aligned}$$

We thus have:

Proposition 3.3

(Hofbauer 2011; Losert and Akin 1983) Suppose that $\textbf{A}$ is a symmetric invertible $M\times M$ matrix with positive entries such that (21) holds true (i. e. $\textbf{A}$ is positive-definite on $W_M$). Then (19) holds with $h(\textbf{x})=\textbf{x}^T\textbf{A}{} \textbf{x}.$

The above proposition shows that a normalized (in order to scale its range into the interval [0, 1]) version of h, namely ${{\widetilde{h}}}(\textbf{x})=\frac{1}{\Vert A\Vert }{} \textbf{x}^T\textbf{A}{} \textbf{x}$ can serve as a complete Lyapunov function, strictly increasing within the interior of $\Delta _M.$ It is shown in Losert and Akin (1983) that under the conditions of the proposition, ${{\varvec{\psi }}}_k$ converges, as k tends to infinity, to an equilibrium on the boundary of the simplex $\Delta _M$ (cf. Theorem 4 in “Appendix B”). The proposition thus indicates that when $\textbf{A}$ is positive-definite on $W_M,$ the stochastic and deterministic mean-field model might have very similar asymptotic behavior when N is large. This observation partially motivated our Theorem 4.6 stated below in Sect. 4.3.

3.3 Incorporating mutations

In order to incorporate mutations into the interpretation of the update rule ${\varvec{\Gamma }}$ in terms of the population’s fitness landscape, we adapt the Wright–Fisher model with neutral selection and mutation of Hobolth and Sireén (2016). Namely, for the purpose of this discussion we will make the following assumption:

Assumption 3.4

The update rule ${\varvec{\Gamma }}$ can be represented in the form ${\varvec{\Gamma }}(\textbf{x})={{\varvec{\Upsilon }}}(\textbf{x}^T{{\varvec{\Theta }}}),$ where $\textbf{x}^T$ denotes the transpose of $\textbf{x},$ and

1.
${{\varvec{\Upsilon }}}: \Delta _M\rightarrow \Delta _M$ is a vector field that satisfies the “no-mutations” condition (cf. (16))
$$\begin{aligned} x(j)=0~\Longrightarrow ~\Upsilon _j(\textbf{x})=0 \end{aligned}$$
(22)
for all $\textbf{x}\in \Delta _M$ and $j\in S_M.$
2.
${{\varvec{\Theta }}}=(\Theta _{ij})_{i,j\in S_M}$ is $M\times M$ stochastic matrix, that is $\Theta _{ij}\ge 0$ for all $i,j\in S_m$ and $\sum _{j\in S_M}\Theta _{ij}=1$ for all $i\in S_M.$

The interpretation is that while the entry $\Theta _{ij}$ of the mutation matrix ${{\varvec{\Theta }}}$ specifies the probability of mutation (mutation rate) of a type i particle into a particle of type j for $j\ne i,$ the update rule ${{\varvec{\Upsilon }}}$ encompasses the effect of evolutional forces other than mutation, for instance genetic drift and selection. The total probability of mutation of a type i particle is $\sum _{j\ne i} \Theta _{ij}=1-\Theta _{ii},$ and, accordingly, ${\varvec{\Gamma }}$ satisfies the no-mutation condition if ${{\varvec{\Theta }}}$ is the unit matrix. Note that, since ${{\varvec{\Theta }}}$ is assumed to be stochastic, the linear transformation $\textbf{x}\mapsto \textbf{x}^T{{\varvec{\Theta }}}$ leaves the simplex $\Delta _M$ invariant, that is $\textbf{x}^T{{\varvec{\Theta }}}\in \Delta _M$ for all $\textbf{x}\in \Delta _M.$ Both the major implicit assumptions forming a foundation for writing ${\varvec{\Gamma }}$ as the composition of a non-mutative map ${{\varvec{\Upsilon }}}$ and the action of a mutation matrix ${{\varvec{\Theta }}},$ namely that mutation can be explicitly separated from other evolutionary forces and that mutations happen at the beginning of each cycle before other evolutionary forces take effect, are standard (Antal et al. 2009; Hobolth and Sireén 2016; Hofrichter et al. 2017) even though at least the latter is a clear idealization (Stephens 2014). Certain instances of Fisher’s fundamental theorem of natural selection have been extended to include mutations in Basener and Sanford (2018), Hofbauer (1985).

We now can extend the definition (15) as follows:

Definition 3

Let Assumption 3.4 hold. A vector field ${{\varvec{\varphi }}}: \Delta _M\rightarrow {{\mathbb {R}}}_+^M$ is called the reproductive fitness landscape of the model if for all $\textbf{x}\in \Delta _M$ we have:

$$\begin{aligned} \textbf{x}^T{{\varvec{\Theta }}}{{\varvec{\varphi }}} (\textbf{x})>0 \qquad \text{ and } \qquad \Gamma _i(\textbf{x})=\frac{(x^T\Theta )(i)\cdot \varphi _i(\textbf{x})}{\textbf{x}^T{{\varvec{\Theta }}}{{\varvec{\varphi }}} (\textbf{x})}\quad \forall \, i\in S_M, \end{aligned}$$

(23)

where $\textbf{x}^T$ is the transpose of $\textbf{x}.$

An analogue of Proposition 3.1 for a model with mutation reads:

Proposition 3.5

Let Assumption 3.4 hold. Then ${{\varvec{\varphi }}}:\Delta _M\rightarrow \Delta _M$ is a reproductive fitness landscape if and only if

$$\begin{aligned} \varphi _i(\textbf{x})= \left\{ \begin{array}{ll} \frac{\Gamma _i(\textbf{x})}{(x^T\Theta )(i)}h(\textbf{x})~&{}~\text{ if }~i\in {{\mathcal {C}}}\big (\textbf{x}^T{{\varvec{\Theta }}}\big ),\\ 0~&{}~\text{ if }~i\not \in {{\mathcal {C}}}\big (\textbf{x}^T{{\varvec{\Theta }}}\big ) \end{array} \right. \end{aligned}$$

(24)

for some function $h:\Delta _M\rightarrow (0,\infty ).$

Proof

For the “if part” of the proposition, observe that if ${{\varvec{\varphi }}}$ is given by (24), then

$$\begin{aligned} \textbf{x}^T{{\varvec{\Theta }}}{{\varvec{\varphi }}} (\textbf{x})=h(\textbf{x})\cdot \sum _{i\in {{\mathcal {C}}}(\textbf{x}^T{{\varvec{\Theta }}})}\Gamma _i(\textbf{x})=h(\textbf{x})\cdot \sum _{i\in {{\mathcal {C}}}({\varvec{\Gamma }}(\textbf{x}))}\Gamma _i(\textbf{x})=h(\textbf{x}), \end{aligned}$$

where in the second step we used the fact that ${{\mathcal {C}}}\big ({\varvec{\Gamma }}(\textbf{x})\big )\subset {{\mathcal {C}}}\big (\textbf{x}^T{{\varvec{\Theta }}}\big )$ by virtue of (22). Thus ${{\varvec{\varphi }}}$ is a solution to (23).

For the “only if part” of the proposition, note that if ${{\varvec{\varphi }}}$ is a reproductive fitness landscape, then (23) implies that (24) holds with $h(\textbf{x})=\textbf{x}^T{{\varvec{\Theta }}}{{\varvec{\varphi }}} (\textbf{x}).$ $\square $

If the weighted average $\textbf{x}^T{{\varvec{\Theta }}}{{\varvec{\varphi }}} (\textbf{x})$ is adopted as the definition of the mean fitness, Theorem 3.2 can be carried over verbatim to the setup with mutations.

4 Stochastic dynamics: main results

In this section we present our main results for the stochastic model. The proofs are deferred to Sect. 5.

First, we prove two different approximation results. The Markov chain $\textbf{X}^{(N)}$, even for the classical two-allele haploid model of genetics, is fairly complex. In fact, most of the known results about the asymptotic behavior of Wright–Fisher Markov chains were obtained through a comparison to various limit processes, including the mean-field deterministic system, branching process, diffusions and Gaussian processes (Chalub and Souza 2014; Ewens 2012; Hartle and Clark 2007; Hofrichter et al. 2017; Nagylaki 1990).

In Sect. 4.1 we are concerned with the so-called Gaussian approximation, which can be considered as an intermediate between the deterministic and the diffusion approximations. This approximation process can be thought of as the mean-field iteration scheme perturbed by an additive Gaussian noise. Theorem 4.1 constructs such an approximation for the Wright–Fisher model. Results of this type are well-known in literature, and our proof of Theorem 4.1 is a routine adaptation of classical proof methods.

In Sect. 4.2, we study the longitudinal error of the deterministic approximation by the mean-field model. Specifically, Theorem 4.2 provides an exponential lower bound on the decoupling time, namely the first time when an error of approximation exceeds a given threshold. A variation of the result is used later in the proof of Theorem 4.6, the main result of this section.

Conceptually, Theorem 4.1 is a limit theorem suggesting that for large values of N, the stochastic component is the major factor in determination of the asymptotic behavior of the stochastic model. Theorem 4.2 then complements this limit theorem by quantifying this intuition for the stochastic model of a given (though large) population size N (versus the infinite-population-limit result of Theorem 4.1).

The impact of the mean-field dynamics on the underlying stochastic process is further studied in Sect. 4.3. The main result is Theorem 4.6 which specifies the route to extinction of the stochastic system in a situation when the mean-field model is “strongly permanent” (heuristically, strongly repelling from the boundary, cf. “Appendix B” and the discussion on p. 22) and has a unique interior equilibrium. In general, the problem of determining the route to extinction for a multi-type stochastic population model is of an obvious importance to biological sciences, but is notoriously difficult in practice.

The proof of Theorem 4.6 highlights a mechanism leading to an almost deterministic route to extinction for the model. Intuitively, under the conditions of Theorem 4.6 the system is trapped in a quasi-equilibrium stochastic state, fluctuating near the deterministic equilibrium for a very long time, eventually “untying the Gordian knot”, and instantly jumping to the boundary. A similar mechanism of extinction was previously described for different stochastic population models in Assaf and Meerson (2010), Assaf and Mobilia (2011), Park and Traulsen (2017). We note that analysis in these papers involves a mixture of rigorous mathematical and heuristic arguments, and is different in nature from our approach.

4.1 Gaussian approximation of the Wright–Fisher model

Let $\overset{P}{\rightarrow }$ denote convergence in probability as the population size N goes to infinity. The following theorem is an adaptation of Theorems 1 and 3 in Buckley and Pollett (2010) for our multivariate setup. We also refer to Klebaner and Nerman (1994), Nagylaki (1986), Nagylaki (1990), Parra-Rojas et al. (2014) for earlier related results and to (Rao 2001, p. 527) for a rigorous construction of a degenerate Gaussian distribution.

Recall ${{\varvec{\psi }}}_k$ from (3). We have:

Theorem 4.1

Suppose that

$$\begin{aligned} \textbf{X}^{(N)}_0\overset{P}{\rightarrow }{{\varvec{\psi }}}_0 \end{aligned}$$

(25)

for some ${{\varvec{\psi }}}_0\in \Delta _M.$ Then the following holds true:

(a)
$\textbf{X}^{(N)}_k\overset{P}{\rightarrow }\ {{\varvec{\psi }}}_k$ for all $k\in {{\mathbb {Z}}}_+.$
(b)
Let
$$\begin{aligned} \textbf{u}^{(N)}_k=\sqrt{N}\bigl (\textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr ),\qquad N\in {{\mathbb {N}}},\,k\in {{\mathbb {Z}}}_+, \end{aligned}$$
and ${{\varvec{\Sigma }}}(\textbf{x})$ be the matrix $M\times M$ with entries (cf. (9) in Sect. 2.2)
$$\begin{aligned} \Sigma _{i,j}(\textbf{x}):= \left\{ \begin{array}{ll} -\Gamma _i(\textbf{x})\Gamma _j(\textbf{x})&{}\text{ if }~i\ne j, \\ \Gamma _i(\textbf{x})\bigl (1-\Gamma _i(\textbf{x})\bigr )&{}\text{ if }~i=j \end{array} \right. , \qquad \textbf{x}\in \Delta _M. \end{aligned}$$
Suppose that in addition to (25), ${\varvec{\Gamma }}$ is twice continuously differentiable and $\textbf{u}^{(N)}_0$ converges weakly, as N goes to infinity, to some (possibly random) $\textbf{u}_0.$

Then the sequence $\textbf{u}^{(N)}:=\bigl (\textbf{u}^{(N)}_k\bigr )_{k\in {{\mathbb {Z}}}_+}$ converges in distribution, as N goes to infinity, to a time-inhomogeneous Gaussian AR(1) sequence $(\textbf{U}_k)_{k\in {{\mathbb {Z}}}_+}$ defined by

$$\begin{aligned} \textbf{U}_{k+1}=\textbf{D}_x({{\varvec{\psi }}}_k)\textbf{U}_k+\textbf{g}_k, \end{aligned}$$

(26)

where $\textbf{D}_x({{\varvec{\psi }}}_k)$ denotes the Jacobian matrix of ${\varvec{\Gamma }}$ evaluated at ${{\varvec{\psi }}}_k,$ and $\textbf{g}_k,$ $k\in {{\mathbb {Z}}}_+,$ are independent degenerate Gaussian vectors, each $\textbf{g}_k$ distributed as $N\bigl (0,{{\varvec{\Sigma }}}({{\varvec{\psi }}}_k)\bigr ).$

The proof of the theorem is included in Sect. 5.1. It is not hard to prove [cf. Remark (v) on p. 61 of Buckley and Pollett (2010), see also Klebaner and Nerman (1994)] that if ${{\varvec{\chi _{eq}}}}$ is a unique global stable point of ${\varvec{\Gamma }},$ ${{\varvec{\psi }}}_0\overset{P}{\rightarrow }\ {{\varvec{\chi _{eq}}}}$ in the statement of Theorem 4.1 and, in addition, $\textbf{u}^{(N)}_0$ converges weakly, as N goes to infinity, to some $\textbf{u}_0,$ then the linear recursion (26) can be replaced with

$$\begin{aligned} \textbf{U}_{k+1}=\textbf{D}_x({{\varvec{\chi _{eq}}}})\textbf{U}_k+\widetilde{\textbf{g}}_k, \end{aligned}$$

(27)

where $\widetilde{\textbf{g}}_k,$ $k\in {{\mathbb {Z}}}_+,$ are i. i. d. degenerate Gaussian vectors in ${{\mathbb {R}}}^M,$ each $\widetilde{\textbf{g}}_k$ distributed as $N\bigl (0,{{\varvec{\Sigma }}}({{\varvec{\chi _{eq}}}})\bigr ).$ One then can show that if the spectral radius of $\textbf{D}_x({{\varvec{\chi _{eq}}}})$ is strictly less than one, Markov chain $\textbf{U}_k$ has a stationary distribution, see Buckley and Pollett (2010, p. 61) for more details. In the case when ${\varvec{\Gamma }}$ has a unique global stable point, (27) was obtained in Parra-Rojas et al. (2014) by different from ours, analytic methods.

4.2 Difference equation approximation

Theorem 4.1 indicates that when N is large, the trajectory of the deterministic dynamical system ${{\varvec{\psi }}}_k$ can serve as a good approximation to the path of the Markov chain $\textbf{X}^{(N)}_k.$ The following theorem offers some insight into the duration of the time for which the deterministic and the stochastic paths stay fairly close to each other, before separating significantly for the first time. The theorem is a suitable modification of some results of Aydogmus (2016) in a one-dimensional setup.

For $\varepsilon >0,$ let

$$\begin{aligned} \tau _N(\varepsilon )=\inf \biggl \{k\in {{\mathbb {Z}}}_+:\bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr \Vert >\varepsilon \biggr \}, \end{aligned}$$

(28)

where ${{\varvec{\psi }}}_0=\textbf{X}^{(N)}_0$ and ${{\varvec{\psi }}}_k$ is the sequence introduced in (3). Thus $\tau _N(\varepsilon )$ is the first time when the path of the Markov chain $\textbf{X}^{(N)}_k$ deviates from the trajectory of the deterministic dynamical system (3) by more than a given threshold $\varepsilon >0.$ We have:

Theorem 4.2

Suppose that the map ${\varvec{\Gamma }}:\Delta _M \rightarrow \Delta _M$ is Lipschitz continuous, that is there exists a constant $\rho >0$ such that

$$\begin{aligned} \Vert {\varvec{\Gamma }}(\textbf{x})-{\varvec{\Gamma }}(\textbf{y})\Vert \le \rho \Vert \textbf{x}-\textbf{y}\Vert \end{aligned}$$

(29)

for all $\textbf{x},\textbf{y}\in \Delta _M.$ Then the following holds for any $\varepsilon >0,$ $K\in {{\mathbb {N}}},$ and $N\in {{\mathbb {N}}}:$

$$\begin{aligned} P\bigl ( \tau _N(\varepsilon )\le K \bigr )\, \le 2KM\exp \Bigl (-\frac{\varepsilon ^2c_K^2}{2}N\Bigr ), \end{aligned}$$

(30)

where

$$\begin{aligned} c_K= \left\{ \begin{array}{ll} \frac{1-\rho }{1-\rho ^K}&{}\text{ if }~\rho \ne 1, \\ \frac{1}{K}&{}\text{ if }~\rho =1. \end{array} \right. \end{aligned}$$

(31)

In particular, if $\rho <1$ we have:

$$\begin{aligned} P\bigl ( \tau _N(\varepsilon )\le K \bigr )\,\le 2KM\exp \Bigl (-\frac{\varepsilon ^2(1-\rho )^2}{2}N\Bigr ). \end{aligned}$$

The proof of the theorem is given in Sect. 5.2. Note that the upper bound in (30) is meaningful for large values of N even when $\rho \ge 1.$ When ${\varvec{\Gamma }}$ is a contraction, i.e. $\rho <1,$ the Banach fixed point theorem implies that there is a unique point ${{\varvec{\chi _{eq}}}}\in \Delta _M$ such that ${\varvec{\Gamma }}({{\varvec{\chi _{eq}}}})={{\varvec{\chi _{eq}}}}.$ Furthermore, for any $\textbf{x}\in \Delta _M$ and $k\in {{\mathbb {N}}}$ we have

$$\begin{aligned} \bigl \Vert {{\varvec{\psi }}}_k-{{\varvec{\chi _{eq}}}}\bigr \Vert \le \rho ^{k-1}\Vert \textbf{X}^{(N)}_0-{{\varvec{\chi _{eq}}}}\Vert , \end{aligned}$$

that is, as k goes to infinity, ${{\varvec{\psi }}}_k$ converges exponentially fast to ${{\varvec{\chi _{eq}}}}.$ The following theorem is a suitable modification of Theorem 4.2 for contraction maps.

Theorem 4.3

Assume that there exist a closed subset ${{\mathcal {E}}}$ of $\Delta _M$ and a constant $\rho \in (0,1)$ such that (29) holds for any $\textbf{x},\textbf{y}\in {{\mathcal {E}}}.$ Then the following holds for any $\varepsilon >0,$ $K\in {{\mathbb {N}}},$ $N\in {{\mathbb {N}}}$ such that $1-2M\exp \bigl (-\frac{(1-\rho )^2\varepsilon ^2 N}{2}\bigr )>0,$ and $\textbf{X}^{(N)}_0\in {{\mathcal {E}}}:$

$$\begin{aligned} P\bigl ( \tau _N(\varepsilon )>K\bigr ) \ge \Bigl [1-2M\exp \Bigl (-\frac{(1-\rho )^2\varepsilon ^2 N}{2}\Bigr )\Bigr ]^K. \end{aligned}$$

In particular,

$$\begin{aligned} E\bigl ( \tau _N(\varepsilon )\bigr )=\sum _{K=0}^\infty P\bigl ( \tau _N(\varepsilon )>K\bigr )\ge \frac{1}{2M}\exp \Bigl (\frac{(1-\rho )^2\varepsilon ^2 N}{2}\Bigr ). \end{aligned}$$

The proof of the theorem is given in Sect. 5.3. We remark that relaxing the assumption that ${\varvec{\Gamma }}$ is Lipschitz continuous on the whole simplex $\Delta _M$ allows, for instance, to cover the one-dimensional setup of Aydogmus (2016) where $M=2,$ the underlying Markov chain is binomial, and ${\varvec{\Gamma }}$ is a contraction only on a subset of $\Delta _2.$

4.3 Metastability and elimination of types

In this section, we consider a large population Wright–Fisher model with the update rule ${\varvec{\Gamma }}$ satisfying the following conditions:

Assumption 4.4

Suppose that ${\varvec{\Gamma }}$ is defined by (10) and (11). Furthermore, assume that $\textbf{b}=(1,1,\ldots ,1)$ and $\textbf{A}$ is a symmetric invertible $M\times M$ matrix with positive entries such that the following two conditions hold:

$$\begin{aligned} \textbf{w}^T\textbf{A}{} \textbf{w}<0\qquad \qquad \forall \, \textbf{w}\in W_M,\,\textbf{w}\ne \textbf{0}, \end{aligned}$$

(32)

and there are ${{\varvec{\chi _{eq}}}}\in \Delta _M^\circ $ and $c>0$ such that

$$\begin{aligned} \textbf{A}{{\varvec{\chi _{eq}}}}=c\textbf{e}, \end{aligned}$$

(33)

where $\textbf{e}:=(1,\ldots ,1)\in {{\mathbb {R}}}^M.$

Note that $\Gamma _j(\textbf{x})=0$ if and only if $x_j=0,$ and hence the Markov chain $\textbf{X}^{(N)}$ is absorbing and will eventually become fixed on one of the simplex vertices (cf. “Appendix A”). The assumption is borrowed from Losert and Akin (1983), where the defined in (3) deterministic dynamical system ${{\varvec{\psi }}}_k$ is studied. In particular, it is shown in Losert and Akin (1983) that under these conditions the internal equilibrium point of ${{\varvec{\psi }}}_k$ is stable (see Theorem 4 in our “Appendix B” for technical details).

The main result of the section is Theorem 4.6, which intuitively says that if the push away from the boundary and toward the equilibrium is strong enough and Assumption 4.4 holds, with a high degree of certainty, the type with the lowest fitness at the equilibrium is the one that goes extinct first. The result provides an interesting link between the interior equilibrium of the mean-field dynamics and the asymptotic behavior of the stochastic system which is eventually absorbed at the boundary. A similar phenomenon for a different type of stochastic dynamics has been described in physics papers (Assaf and Meerson 2010; Assaf and Mobilia 2011; Park and Traulsen 2017), see also Auletta et al. (2018). Key generic features that are used to prove the result are:

1.
The mean-field dynamics is permanent, that is (see “Appendix B” below for a brief background discuion): there exists $\delta >0$ such that
$$\begin{aligned} \liminf _{k\rightarrow \infty } \psi _k(i)>\delta \qquad \forall \,i\in S_M,{{\varvec{\psi }}}_0\in \Delta _M^\circ . \end{aligned}$$
The condition ensures a strong repelling of the mean-field dynamical system from the boundary. Heuristically, for the corresponding Wright–Fisher model and a large population N, this condition propagates into the metastable (asymptotically quasi-stationary with a long life-time before extinction of any type) behavior of the stochastic system.
2.
There is a unique interior equilibrium ${{\varvec{\chi _{eq}}}}$. In addition, $\lim _{k\rightarrow \infty } {{\varvec{\psi }}}_k={{\varvec{\chi _{eq}}}}$ for any ${{\varvec{\psi }}}_0\in \Delta _M^\circ .$
3.
The push toward equilibrium is strong enough to force the stochastic system to follow the deterministic dynamics closely and get trapped for a prolonged time in a “metastable” state near the equilibrium.
4.
Finally, the same strong push toward the equilibrium yields a large deviation principle which ensures that the stochastic system is much likely to jump to the boundary directly from the neighborhood of the equilibrium by one giant fluctuation than by a sequence of small consecutive steps.

With these features, one would expect the following. If the starting point $\textbf{X}^{(N)}_0$ lies within the interior of $\Delta _M,$ the mean-field dynamical system ${{\varvec{\psi }}}_k$ converges to the unique equilibrium, ${{\varvec{\chi _{eq}}}}\in \Delta _M^\circ .$ It can be shown that the rate of the convergence is exponential. In view of Theorem 4 in “Appendix B” and Theorem 4.1, one can expect that the trajectory of the stochastic system will stay close to the deterministic mean-field path for a long enough time, so that the stochastic model will follow the mean-field path to a small neighborhood of ${{\varvec{\chi _{eq}}}}.$ Then the probability of an immediate elimination of a type i will be of order $\big (1-\Gamma _i({{\varvec{\chi _{eq}}}})\big )^N.$ Assuming that

$$\begin{aligned} \text {``it is easier to return to a small neighborhood of }{{\varvec{\chi _{eq}}}}\text { than to leave it''} \end{aligned}$$

(34)

we expect that the stochastic system will enter a metastable state governed by its quasi-stationary distribution and will be trapped in a neighborhood of the equilibrium for a very long time. In fact, since by virtue of Theorem 4.1, $\textbf{X}^{(N)}$ can be considered as a random perturbation of the dynamical system ${{\varvec{\psi }}}_k$ with a small noise, the quasi-stationary distribution should converge as N goes to infinity to the degenerate probability measure supported by the single point ${{\varvec{\chi _{eq}}}}.$ Finally, general results in the theory of large deviations (Faure and Schreiber 2014; Iglehart 1972) suggest that under a suitable condition (34), with an overwhelming probability, the system will eventually jump to the boundary by one giant fluctuation. It follows from the above heuristic argument, that the probability of such an instant jump to the facet where, say, $x(j)=0$ and the rest of coordinates are positive is roughly $\Gamma _j({{\varvec{\chi _{eq}}}})^N.$ For large values of N this practically guarantees that the stochastic system will hit the boundary through the interior of $\Delta _{[J^*]},$ where $J^*=S_M\backslash \{j^*\},$ and $j^*\in \arg \min _{i\in S_M}\{\Gamma _i({{\varvec{\chi _{eq}}}}):i\in S_M\}.$ In other words, under Assumption 4.4, when N is large, the elimination of the least fit and survival of the remaining types is almost certain (see results of a numerical simulation supporting the hypothesis below in Tables 1 and 2).

We now turn to the statement of our results. First, we state the following technical result. Recall (3) and Definition 1.

Proposition 4.5

Let Assumption 4.4 hold. Then for any neighborhood ${{\mathcal {N}}}$ of the interior equilibrium ${{\varvec{\chi _{eq}}}}$ and any compact set $K\subset \Delta _M$ there exist a real constant $\varepsilon =\varepsilon ({{\mathcal {N}}},K)>0$ and an integer $T=T({{\mathcal {N}}},K)\in {{\mathbb {N}}}$ such that if $\delta \in (0,\varepsilon ),$ $\textbf{x}_0\in K,$ and $\textbf{x}_0,\textbf{x}_1,\ldots ,\textbf{x}_T$ is an $\delta $-chain of length T for the dynamical system ${{\varvec{\psi }}}_k,$ then $\textbf{x}_T\in {{\mathcal {N}}}.$

The proof of the proposition is deferred to Sect. 5.4. The proposition is a general property of dynamical system converging to an asymptotically stable equilibrium (see Remark 2 at the end of the proof). Since we were unable to find a reference to this claim in the literature, we provided a short self-contained proof in Sect. 5.4.

We are now ready to state the main result of this section. Suppose $\textbf{x}\in \Delta _M$ such that at least one component $\Gamma _i(\textbf{x})$ is not equal to 1/M. Then we define

$$\begin{aligned} \alpha _\textbf{x}=\min _{j\in S_M} \Gamma _j(\textbf{x}), \qquad J^*_\textbf{x}=\{j\in S_M:\Gamma _j(\textbf{x})=\alpha _\textbf{x}\}, \qquad \text{ and }\qquad \textbf{e}ta_\textbf{x}=\min _{j\not \in J^*_\textbf{x}}\Gamma _j(\textbf{x}). \end{aligned}$$

We set $\alpha :=\alpha _{{\varvec{\chi _{eq}}}},$ $\textbf{e}ta:=\textbf{e}ta_{{\varvec{\chi _{eq}}}},$ and $J^*:=J^*_{{\varvec{\chi _{eq}}}}.$

Additionally, for any $\theta \in \big (0,\frac{\textbf{e}ta-\alpha }{2}\big )$ and $\eta >0$ such that

$$\begin{aligned} 1-\alpha -\theta >(1-\textbf{e}ta+\theta )^{1-\eta }, \end{aligned}$$

(35)

we define (see Fig. 1):

$$\begin{aligned} {{\mathcal {N}}}_\theta =\big \{\textbf{x}\in \Delta _M^\circ : \alpha _x<\alpha +\theta ,\,\textbf{e}ta_x>\textbf{e}ta-\theta \big \} \end{aligned}$$

(36)

and

$$\begin{aligned} K_{\theta ,\eta }:=\bigg \{\textbf{x}\in \Delta _M^\circ \backslash {{\mathcal {N}}}_\theta :\max _{j\not \in J^*}x(j)\ge \eta \bigg \}. \end{aligned}$$

(37)

In the infinite population limit we have:

Theorem 4.6

Assume that ${\varvec{\Gamma }}$ satisfies Assumption 4.4 and $\textbf{X}^{(N)}_0\overset{P}{\rightarrow }\textbf{x}_0$ for some $\textbf{x}_0\in \Delta _M^\circ .$ Suppose in addition that there exist $\theta \in \big (0,\frac{\textbf{e}ta-\alpha }{2}\big )$ and $\eta \in \Big (0,1-\frac{\log (1-\alpha -\theta )}{\log (1-\textbf{e}ta+\theta )}\Big )$ (and, therefore, the inequality in (35) is satisfied) such that

$$\begin{aligned} 1-\alpha -\theta >e^{-\frac{\varepsilon _\theta ^2 c_T^2}{2}}, \end{aligned}$$

(38)

where the constant $c_T$ is introduced in (31), $\varepsilon _\theta =\varepsilon ({{\mathcal {N}}}_\theta ,K_{\theta ,\eta }),$ $T=T({{\mathcal {N}}}_\theta ,K_{\theta ,\eta }),$ and $\varepsilon ({{\mathcal {N}}},K)$ and $T({{\mathcal {N}}}_\theta ,K_{\theta ,\eta })$ are introduced in Proposition 4.5.

Let

$$\begin{aligned} \nu _N:=\inf \big \{k>0:\textbf{X}^{(N)}_k\in \partial (\Delta _M)\big \} \end{aligned}$$

(39)

and ${{\mathcal {E}}}_N$ be the event that

(1)
$\textbf{X}^{(N)}_{\nu _N}$ has exactly $M-1$ non-zero components (that is, $|C(\textbf{X}^{(N)}_{\nu _N}) |=M-1$ in notation of (13)); and
(2)
$X^{(N)}_{\nu _N}(i)=0$ for some $i\in J^*.$

Then the following holds true:

(i)
$\lim _{N\rightarrow \infty } P ({{\mathcal {E}}}_N)=1.$
(ii)
$\lim _{N\rightarrow \infty } E(\nu _N)=+\infty .$

The proof of Theorem 4.6 is included in Sect. 5.5. In words, part (i) of the theorem is the claim that with overwhelming probability, at the moment that the Markov chain hits the boundary, exactly one of its components is zero, and, moreover, the index i of the zero component $\textbf{X}^{(N)}_{\nu _N}(i)$ belongs to the set $J^*.$ A similar metastability phenomenon for continuous-time population models has been considered in Assaf and Meerson (2010) and Assaf and Mobilia (2011), Auletta et al. (2018), Park and Traulsen (2017), however their proofs rely on a mixture of rigorous computational and heuristic “physicist arguments”.

Condition (38) is a rigorous version of the heuristic condition (34). We note that several more general but less explicit and verifiable conditions of this type can be found in mathematical literature, see, for instance, Faure and Schreiber (2014), Panageas et al. (2016) and Assaf and Meerson (2010), Park and Traulsen (2017). It seems plausible that the negative-definite condition (32) is an artifact of the proof and can be relaxed.

It is shown in Losert and Akin (1983) that in this more general situation, the mean-field dynamical system ${{\varvec{\psi }}}_k$ still converges to an equilibrium which might be on the boundary and depends on the initial condition ${{\varvec{\psi }}}_0.$

Remark 1

Recall (13). A classical result of Karlin and McGregor (1964) implies that under the conditions of Theorem 4.6, the types vanish from the population with an exponential rate until only one left. That is, $\lim _{k\rightarrow \infty } \frac{1}{k}\log E(|C(\textbf{X}^{(N)}_k)|-1)$ exists and is finite and strictly negative. However, heuristically, the limit is extremely small under the conditions of the theorem [the logarithm of the expected extinction time is anticipated to be of order $M\cdot N$ (Panageas et al. 2016)], and hence the convergence to boundary states is extremely slow for any practical purpose. The situation when the system escapes from a quasi-equilibrium fast but with a rate of escape quickly converging to zero when the population size increases to infinity, is typical for metastable population systems (Auletta et al. 2018; Broekman et al. 2019; Doorn and Pollett 2013; Méléard and Villemonais 2012).

We will next illustrate the (hypothesized) important role of the threshold condition (38) with a numerical example. To that end, we consider two examples (see matrices $\textbf{A}_1$ and $\textbf{A}_2$, bellow) with $N=500$ and $M=3$. The choice of the low dimension $M=3$ is due to the computational intensity involved with tracking the evolution of a metastable stochastic system in a “quasi-equilibrium” state. For each example, we use various sets of initial conditions and run 10000 rounds of simulations. Each simulation is stopped at the random time T defined as follows:

$$\begin{aligned} T=\min \bigg \{k\in {{\mathbb {N}}}:\min _j X^{(N)}_k(j)\le 0.05\bigg \}. \end{aligned}$$

(40)

Using this termination rule, we were able to complete all the simulation runs in a reasonable finite time. In each simulation run, we sample the value of the process at some random time ranging between 1000 and 5000. At this intermediate sampling time, we measure $d_{eq},$ the distance between the stochastic system and the equilibrium ${{\varvec{\chi _{eq}}}}$. In addition, we record the composition of the last state of the Markov chain at the end of each simulation run.

Example 4

Our first example does not satisfy the threshold conditions. More precisely, we choose $\omega $ in (11) in such a way that $\frac{\omega }{1-\omega }=10^{-3}$ and we consider the following symmetric $3\times 3$ matrix $\textbf{A}_1:$

$$\begin{aligned} \textbf{A}_1= \left( \begin{array}{ccc} 1&{}\quad 20&{}\quad 45 \\ 20&{}\quad 21&{}\quad 30 \\ 45&{}\quad 30&{}\quad 1 \end{array} \right) \end{aligned}$$

with the equilibrium ${{\varvec{\chi ^{(1)}_{eq}}}}=(0.24766355, 0.41121495, 0.3411215)$. Table 1 reports the number of times that each type $j\in S_3$ was the type with abundance less than 0.05 at the time of exiting the simulation runs.

Table 1 The least abundant component of the terminal state $\textbf{X}^{(N)}_T$ by the end of the simulations for $\Gamma _i(\textbf{x})=\frac{1000+A_1x(i)}{1000+\textbf{x}^T\textbf{A}_1\textbf{x}}x(i)$ and the quitting time T introduced in (40)

Full size table

In addition, Fig. 2 below depicts the distribution of $d_{eq}$ different initial values. Evidently, the threshold condition plays an important role as the theory predicts in Theorem 4.6. The results in Table 1 cannot be explained by the equilibrium value ${{\varvec{\chi ^{(1)}_{eq}}}}$ only, rather they constitute an intricate result of the combined effect of this value and the initial position of the Markov chain.

Example 5

For the second example, we let $\omega =1/2$ and use the following matrix $\textbf{A}_2$:

$$\begin{aligned} \textbf{A}_2= \left( \begin{array}{ccc} 1&{}\quad 20&{}\quad 35 \\ 20&{}\quad 21&{}\quad 30 \\ 35&{}\quad 30&{}\quad 1 \end{array} \right) \end{aligned}$$

with the equilibrium ${{\varvec{\chi ^{(2)}_{eq}}}}=(0.0246913, 0.7345679, 0.2407407).$ This matrix is almost the same as $\textbf{A}_1,$ with the only difference that 45 is changed to 35. The result of simulations for three different initial values are reported in Table 2 and Fig. 3, which are in complete agreement with the prediction of Theorem 4.6.

The difference between the two examples is two-fold: (1) by increasing the value of $\omega $ we increase the influence of the “selection matrix” $\textbf{A}$ comparing to the neutral selection, and (2) by changing $\textbf{A}_1$ to $\textbf{A}_2$ we replace the equilibrium ${{\varvec{\chi ^{(1)}_{eq}}}}$ with fairly uniform distribution of “types” by the considerably less balanced ${{\varvec{\chi ^{(2)}_{eq}}}},$ increasing the threshold in (38) by a considerable margin. We note that in a general, and not specifically designed, example, it may be hard to verify conditions of the theorem, particularly (38).

Next we provide an auxiliary “contraction near the equilibrium” type result, further illustrating the mechanism of convergence to equilibrium in Theorem 4.6. Formally, the result is a slight improvement of a classical stability theorem of Losert and Akin (1983) (see Theorem 4 in “Appendix B” of this paper) showing that the unique interior equilibrium of the dynamical system ${{\varvec{\psi }}}_k$ introduced in (3) is in fact exponentially stable.

Proposition 4.7

Let Assumption 4.4 hold. Then the spectral radius of the Jacobian (Frechét derivative in $\Delta _M$) of ${\varvec{\Gamma }}$ at ${{\varvec{\chi _{eq}}}}$ is strictly less than one.

The proof of the proposition is included in Sect. 5.6. This result is instrumental in understanding various applications including but not limited to mixing times (Panageas et al. 2016) and the random sequence $\textbf{u}_k$ defined in the statement of Theorem 4.1. In the later case, as we mentioned before, it implies that the limiting sequence $\textbf{U}_k$ in (27) converges to a stationary distribution.

The following corollary is an application directly related to the metastability of the system. Let $\mu _N$ be the quasi-stationary distribution of the Markov chain $\textbf{X}^{(N)}_k$ on $\Delta _M^\circ .$ Then there exists $\lambda _N\in (0,1)$ such that $\mu _NP_{\circ ,N}=\lambda _N \mu _N,$ where $P_{\circ ,N}$ is the transition matrix of $\textbf{X}^{(N)}_k$ restricted to $\Delta _{M,N}^\circ $ (Doorn and Pollett 2013; Méléard and Villemonais 2012). Iterating, we obtain that for all $k\in {{\mathbb {Z}}}_+$ and $\textbf{y}\in \Delta _{M,N}^\circ ,$

$$\begin{aligned} \mu _N P_{\circ ,N}^k(\textbf{y})=\sum _{\textbf{x}\in \Delta _{M,N}^\circ } \mu _N(\textbf{x})P\big (\textbf{X}^{(N)}_k=\textbf{y}\,\mid \,\textbf{X}^{(N)}_0=\textbf{x}\big )=\lambda _N^k \mu _N(\textbf{y}). \end{aligned}$$

(41)

Summing up over all $\textbf{y}\in \Delta _{M,N}^\circ ,$ we obtain

$$\begin{aligned} \lambda _N^k= P_{\mu _N}(\nu _N>k), \end{aligned}$$

(42)

where $\nu _N$ is the hitting time introduced in (39). Heuristically, (42) is equivalent to the condition $\mu _N(B)=P_{\mu _N}\big (\textbf{X}^{(N)}_k\in B\,\mid \,\nu _N>k\big )=\frac{P_{\mu _N}(\textbf{X}^{(N)}_k\in B)}{P_{\mu _N}(\nu _N>k)}$ for all $B \subset \Delta _{M,N}^\circ ,$ which often serves as an alternative definition of the quasi-stationary distribution.

Table 2 The least abundant component of the terminal state $\textbf{X}^{(N)}_T$ by the end of the simulations for $\Gamma _i(\textbf{x})=\frac{1+A_2x(i)}{1+\textbf{x}^T\textbf{A}_2\textbf{x}}x(i)$ and the quitting time T introduced in (40)

Full size table

In particular, (42) implies that

$$\begin{aligned} 1-\lambda _N=\sum _{\textbf{x}\in \Delta _{M,N}^\circ ,y\in \partial (\Delta _{M,N})} \mu _N(\textbf{x})P_N(\textbf{x},\textbf{y}). \end{aligned}$$

(43)

Corollary 4.8

Let Assumption 4.4 hold. Then there exist constants $C_1>0$ and $C_2>0$ such that $\lambda _N\ge 1-C_1e^{-C_2 N}$ for all $N\in {{\mathbb {N}}}.$

A short proof of the corollary is included in Sect. 5.7 below. The result in the corollary and the comparison of (43) with (58) in the proof of Theorem 4.6 given below, motivate the following hypothesis:

Conjecture 4.9

Let the conditions of Theorem 4.6 hold. Then:

(i)
The quasi-stationary distribution $\mu _N$ converges, as $N\rightarrow \infty ,$ to the degenerate distribution supported on the single point ${{\varvec{\chi _{eq}}}}.$ That is, $\lim _{N\rightarrow \infty } \mu _N({{\mathcal {N}}})=1$ for any open neighborhood of the equilibrium ${{\varvec{\chi _{eq}}}}.$
(ii)
$\lim _{N\rightarrow \infty } \frac{\alpha ^N}{1-\lambda _N}=1.$

The first part of Conjecture 4.9 is a natural property of stochastic models obtained as a perturbation of asymptotically stable deterministic dynamical system (cf. Faure and Schreiber 2014). The second part of the conjecture is related to the first one as follows. Recall that $\alpha ^N=P_{{\varvec{\chi _{eq}}}}({{\mathcal {E}}}_N)$ and $1-\lambda _N=P_{\mu _N}(\nu _N=1).$ If $\mu _N$ indeed converges weakly to the degenerate probability measure supported on ${{\varvec{\chi _{eq}}}},$ then one can reasonably expect that $\alpha =P_{{\varvec{\chi _{eq}}}}({{\mathcal {E}}}_N)\sim P_{\mu _N}({{\mathcal {E}}}_N)$ as $N\rightarrow \infty .$ The assertion that $P_{{\varvec{\chi _{eq}}}}({{\mathcal {E}}}_N)\sim P_{\mu _N}({{\mathcal {E}}}_N)\sim P_{\mu _N}(\nu _N=1),$ if true, would be an expression of the underlying principle that under the conditions of Theorem 4.6 and when N is large, the overwhelming likely scenario for the system to move to the boundary of the simplex $\Delta _M$ is to return to a small neighborhood of ${{\varvec{\chi _{eq}}}}$ and move from there directly to one of the boundary sites which realize the event ${{\mathcal {E}}}_N.$

A weak form of the hypothesis that $\lim _{N\rightarrow \infty } \frac{\alpha ^N}{1-\lambda _N}=1$ is

Conjecture 4.10

Let the conditions of Theorem 4.6 hold. Then, $\lim _{N\rightarrow \infty } \frac{1}{N}\log (1-\lambda _N)=\alpha .$

The abstract quantitative difference between the two conjectures can be illustrated, for instance, by considering the numerical sequences $a_N=Ne^N$ and $b_N=e^N.$ While it is true that $\lim _{N\rightarrow \infty } \frac{\log a_N}{\log b_N}=\lim _{N\rightarrow \infty }\frac{1}{N}\log (Ne^N)=1,$ the corresponding strong version of this identity $\lim _{N\rightarrow \infty } \frac{a_N}{b_N}=\lim _{N\rightarrow \infty } \frac{Ne^N}{e^N}=1$ fails to hold.

5 Proofs

This section is devoted to the proof of our main results stated in Sect. 4. As before, transition matrix of the Markov chain $\textbf{X}^{(N)}$ is denoted by $P_N.$ We write $P_{\mu _N}(A_N)$ for the probability of an event $A_N$ measurable with respect to the $\sigma $-algebra generated by $(\textbf{X}^{(N)}_k)_{k\ge 0}$ if $\mu _N$ is the distribution of $\textbf{X}^{(N)}_0.$ If $\mu _N$ is concentrated on a single point $\textbf{x}\in \Delta _{M,N},$ we simplify this notation to $P_x(A_N).$ We use the standard notations $f_n=o(g_n)$ and $f_n=O(g_n)$ to indicate that, respectively, the sequence $f_n$ is “ultimately smaller” than $g_n,$ that is $\lim _{n\rightarrow \infty } \frac{f_n}{g_n}=0,$ and $f_n$ and $g_n$ are of the same order, that is $\lim _{n\rightarrow \infty } \frac{f_n}{g_n}=$ exists and it is not zero.

5.1 Proof of Theorem 4.1

Let $\zeta :=\bigl \{\zeta ^{(\textbf{x})}_{k,i}:\textbf{x}\in \Delta _M,i\in {{\mathbb {N}}},k\in {{\mathbb {Z}}}_+\bigr \}$ be a collection of independent random variables distributed as follows:

$$\begin{aligned} P\bigl (\zeta ^{(\textbf{x})}_{k,i}=j\bigr )=\Gamma _j(\textbf{x}),\qquad j\in S_M. \end{aligned}$$

(44)

For $\textbf{x}\in \Delta _M,i\in {{\mathbb {N}}},k\in {{\mathbb {Z}}}_+,$ let $ {{\varvec{\xi }}}_{k,i}^{(\textbf{x})}$ be a random M-dimensional vector such that

$$\begin{aligned} \xi ^{(\textbf{x})}_{k,i}(j)= \left\{ \begin{array}{ll} 1&{}\text{ if }~\zeta ^{(\textbf{x})}_{k,i}=j, \\ 0&{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

For given parameters k, i, the family of random vectors $\bigl \{ {{\varvec{\xi }}}_{k,i}^{(\textbf{x})}:\textbf{x}\in \Delta _M\}$ is a “white noise” random field. Without loss of generality we can assume that

$$\begin{aligned} \textbf{Z}^{(N)}_{k+1}=\sum _{i=1}^N {{\varvec{\xi }}}_{k,i}^{(\textbf{X}^{(N)}_k)}. \end{aligned}$$

(45)

Thus, by the law of large numbers, the conditional distribution of $\textbf{X}^{(N)}_{k+1}=\frac{\textbf{Z}^{(N)}_{k+1}}{N},$ given $\textbf{X}^{(N)}_k,$ converges weakly to a degenerate distribution concentrated on the single point ${\varvec{\Gamma }}\bigl (\textbf{X}^{(N)}_k\bigr ).$ This implies part (a) of the theorem by a result of Karr (1975) adopted to a non-homogeneous Markov chain setting [cf. Remark (i) in Buckley and Pollett (2010, p. 60)].

To prove part (b) of the theorem, write

$$\begin{aligned} \textbf{u}^{(N)}_{k+1}=\frac{1}{\sqrt{N}}\sum _{i=1}^N \Bigl ( {{\varvec{\xi }}}_{k,i}^{(\textbf{x})}-{\varvec{\Gamma }}\bigl (\textbf{X}^{(N)}_k\bigr )\Bigr )+\sqrt{N}\Bigl ({\varvec{\Gamma }}\bigl (\textbf{X}^{(N)}_k\bigr )-{{\varvec{\psi }}}_{k+1}\Bigr ). \end{aligned}$$

The first term converges in distribution, as N goes to infinity, to the degenerate Gaussian distribution $g_k$ defined in the statement of the theorem [see, for instance, Theorem 14.6 in Wasserman (2004)]. Heuristically, the Gaussian limit of a sequence of multinomial random vectors $\textbf{X}_k$ is degenerate because of the constraint $\Vert X_k\Vert =1$ which propagates into a minus one degree of freedom in the limit.

Furthermore, Taylor expansion for ${\varvec{\Gamma }}$ gives

$$\begin{aligned} \sqrt{N}\Big ({\varvec{\Gamma }}\bigl (\textbf{X}^{(N)}_k\bigr )-{{\varvec{\psi }}}_{k+1}\Big )= & {} \sqrt{N}\Big (\textbf{D}_x({{\varvec{\psi }}}_k) \big (\textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr )+O(1/N)\Big ) \\= & {} \textbf{D}_x({{\varvec{\psi }}}_k)\textbf{u}^{(N)}_k+o(1), \end{aligned}$$

which implies the result. $\square $

5.2 Proof of Theorem 4.2

We have

$$\begin{aligned} \bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr \Vert= & {} \bigl \Vert \textbf{X}^{(N)}_k-{\varvec{\Gamma }}(\textbf{X}^{(N)}_{k-1})+{\varvec{\Gamma }}(\textbf{X}^{(N)}_{k-1})-{\varvec{\Gamma }}({{\varvec{\psi }}}_{k-1})\bigr \Vert \nonumber \\\le & {} \bigl \Vert \textbf{X}^{(N)}_{k}-\Gamma (\textbf{X}^{(N)}_{k-1})\bigr \Vert +\bigl \Vert \Gamma (\textbf{X}^{(N)}_{k-1})-{\varvec{\Gamma }}({{\varvec{\psi }}}_{k-1})\bigr \Vert \nonumber \\\le & {} \bigl \Vert \textbf{X}^{(N)}_{k}-\Gamma (\textbf{X}^{(N)}_{k-1})\bigr \Vert +\rho \bigl \Vert \textbf{X}^{(N)}_{k-1}-{{\varvec{\psi }}}_{k-1}\Vert . \end{aligned}$$

(46)

For $m\in {{\mathbb {N}}},$ we set $\textbf{u}^{(N)}_m:=\textbf{X}^{(N)}_{m+1}-{\varvec{\Gamma }}(\textbf{X}^{(N)}_{m}).$ Iterating the above inequality, we obtain that

$$\begin{aligned} \bigl \Vert \textbf{X}^{(N)}_{k}-{{\varvec{\psi }}}_{k}\bigr \Vert \le \sum _{m=0}^{k-1} \rho ^m \bigl \Vert \textbf{u}^{(N)}_{k-m-1}\bigr \Vert \le c_k^{-1}\max _{0\le m\le k-1} \bigl \Vert \textbf{u}^{(N)}_m\bigr \Vert , \end{aligned}$$

(47)

where $c_k$ is defined in (31). Therefore,

$$\begin{aligned} P\Bigl ( \max _{1\le k\le K}\bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k \bigr \Vert>\varepsilon \Bigr )\le & {} P\Bigl (\max _{0\le m\le K-1}\Vert \textbf{u}^{(N)}_m\Vert>\varepsilon c_K\Bigr ) \nonumber \\\le & {} \sum _{m=0}^{K-1} P\Bigl (\Vert \textbf{u}^{(N)}_m\Vert>\varepsilon c_K \Bigr ) \nonumber \\= & {} \sum _{m=0}^{K-1} P\Bigl (\Vert \textbf{X}^{(N)}_{m+1}-{\varvec{\Gamma }}(\textbf{X}^{(N)}_m)\Vert >\varepsilon c_K \Bigr ). \end{aligned}$$

(48)

Given $\textbf{X}^{(N)}_m,$ the conditional distribution of the random vector $\textbf{Z}^{(N)}_{m+1}=N\textbf{X}^{(N)}_{m+1}$ is the multinomial ${{\mathcal {M}}}\bigl (N,{\varvec{\Gamma }}(\textbf{X}^{(N)}_m)\bigr ).$ Therefore, applying Hoeffding’s inequality (Hoeffding 1963) to conditionally binomial random variables $X_{m+1}(i),$ we obtain that for any $\eta >0,$

$$\begin{aligned} P\Bigl (\Vert \textbf{X}^{(N)}_{m+1}-{\varvec{\Gamma }}(\textbf{X}^{(N)}_m)\Vert>\eta \Bigr )\le & {} \sum _{i=1}^M P\Bigl ( |X^{(N)}_{m+1}(i)- \Gamma _i(\textbf{X}^{(N)}_m)|>\eta \Bigr ) \nonumber \\\le & {} 2M\exp \Bigl (-\frac{N\eta ^2}{2}\Bigr ). \end{aligned}$$

(49)

Therefore,

$$\begin{aligned} P\Bigr ( \max _{1\le k\le K}\bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr \Vert >\varepsilon \Bigl )\le & {} 2KM\exp \Bigl (-\frac{\varepsilon ^2 c_K^2 N}{2}\Bigr ). \end{aligned}$$

(50)

To complete the proof observe that $c_K=\frac{1-\rho }{1-\rho ^K}>1-\rho $ if $\rho <1.$ $\square $

5.3 Proof of Theorem 4.3

Recall $\tau _N(\varepsilon )$ from (28). It follows from (46) that for all $k\in {{\mathbb {Z}}}_+$ and $\varepsilon >0$,

$$\begin{aligned} \Bigg \{\bigl \Vert \textbf{X}^{(N)}_{k+1}-{{\varvec{\psi }}}_{k+1}\bigr \Vert> \varepsilon \Bigg \} \subseteq \Bigg \{\bigl \Vert \textbf{X}^{(N)}_{k+1}-\Gamma (\textbf{X}^{(N)}_k)\bigr \Vert> (1-\rho )\varepsilon \Bigg \}\bigcup \Bigg \{\bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\Vert > \varepsilon \Bigg \}. \end{aligned}$$

and hence

$$\begin{aligned} \Big \{\tau _N(\varepsilon )=k+1\Big \}\subseteq & {} \Big \{\bigl \Vert \textbf{X}^{(N)}_{k+1}-{{\varvec{\psi }}}_{k+1}\bigr \Vert> \varepsilon \Big \} \bigcap \Big \{\bigl \Vert \textbf{X}^{(N)}_k-{{\varvec{\psi }}}_k\bigr \Vert \le \varepsilon \Big \} \nonumber \\\subseteq & {} \Big \{\bigl \Vert \textbf{X}^{(N)}_{k+1}-\Gamma (\textbf{X}^{(N)}_k)\bigr \Vert > (1-\rho )\varepsilon \Big \}. \end{aligned}$$

(51)

Furthermore, using again (47) and Hoeffding’s inequality, one can verify that the analogues of (48) and subsequently of (49) hold almost surely for the conditional probability $P(\,\cdot \,\mid \textbf{X}^{(N)}_k).$ It follows then from the latter that

$$\begin{aligned} P\Bigl (\Vert \textbf{X}^{(N)}_{k+1}-\Gamma (\textbf{X}^{(N)}_k)\Vert >(1-\rho )\varepsilon \,\mid \textbf{X}^{(N)}_k\Bigr )\le 2M\exp \Bigl (-\frac{N(1-\rho )^2\varepsilon ^2}{2}\Bigr ),\qquad P-a.\,s. \end{aligned}$$

Combining the last inequality with (51) and using the strong Markov property of the Markov chain $\textbf{X}^{(N)},$ we obtain that with probability one,

$$\begin{aligned} P\Bigl (\tau _N(\varepsilon )>k+1\,\mid \,\textbf{X}^{(N)}_k,\tau _N(\varepsilon )>k \Bigl ) \ge 1- 2M\exp \Bigl (-\frac{N(1-\rho )^2\varepsilon ^2}{2}\Bigr ). \end{aligned}$$

Applying the tower property of conditional expectations $E(X)=E\big (E(X\mid Y)\big )$ to the indicator of the event $\{\tau _N(\varepsilon )>k+1\}$ in the role of X, $\textbf{X}^{(N)}_k$ in the role of Y, and the conditional expectation $E(\,\cdot \,\mid \tau _N(\varepsilon )>k)$ as E, we deduce from the last inequality that

$$\begin{aligned} P(\tau _N(\varepsilon )>k+1)&=P\bigl (\tau _N(\varepsilon )>k+1,\tau _N(\varepsilon )>k\bigl ) \\&=P\big (\tau _N(\varepsilon )>k+1\,\mid \,\tau _N(\varepsilon )>k\big )P(\tau _N(\varepsilon )>k) \\&=E\Bigl (\,P\big (\tau _N(\varepsilon )>k+1\,\mid \,\textbf{X}^{(N)}_k,\tau _N(\varepsilon )>k \big )\,\mid \,\tau _N(\varepsilon )>k\,\Big )P(\tau _N(\varepsilon )>k) \\&\ge \Big (1- 2M\exp \Bigl (-\frac{N(1-\rho )^2\varepsilon ^2}{2}\Bigr )\Big )P(\tau _N(\varepsilon )>k), \end{aligned}$$

from which the claim of the theorem follows by induction. $\square $

5.4 Proof of Proposition 4.5

Recall ${{\varvec{\psi }}}_k$ from (3). For ${{\mathcal {N}}}\subset \Delta _M,$ denote by $\tau (\textbf{x},{{\mathcal {N}}})$ the first hitting time of the set ${{\mathcal {N}}}$ by ${{\varvec{\psi }}}_k$ with ${{\varvec{\psi }}}_0=\textbf{x}:$

$$\begin{aligned} \tau (\textbf{x},{{\mathcal {N}}})=\inf \{k\in {{\mathbb {Z}}}_+:{{\varvec{\psi }}}_k(\textbf{x})\in {{\mathcal {N}}} \}. \end{aligned}$$

Further, let $t(\textbf{x},{{\mathcal {N}}})$ be the time when ${{\varvec{\psi }}}_k$ becomes trapped within ${{\mathcal {N}}}:$

$$\begin{aligned} t(\textbf{x},{{\mathcal {N}}})=\inf \{n\in {{\mathbb {Z}}}_+:{{\varvec{\psi }}}_k(\textbf{x})\in {{\mathcal {N}}}~\forall \, k\ge n \}. \end{aligned}$$

In both the definitions we use the regular convention that $\inf \emptyset =+\infty .$ Note that, since ${{\varvec{\psi }}}_k$ converges to ${{\varvec{\chi _{eq}}}},$ both $\tau (\textbf{x},{{\mathcal {N}}})$ and $t(\textbf{x},{{\mathcal {N}}})$ are finite for all $\textbf{x}\in \Delta _M^\circ .$

Lemma 5.1

For any open neighborhood ${{\mathcal {N}}}$ of ${{\varvec{\chi _{eq}}}},$ there is an open neighborhood $U_\textbf{x}$ of $\textbf{x}$ and a constant $T_\textbf{x}\in {{\mathbb {N}}}$ such that $t(\textbf{y},{{\mathcal {N}}})\le T_\textbf{x}$ for all $\textbf{y}\in U_\textbf{x}.$

Proof of Lemma 5.1

Under the conditions stated in Assumption 4.4, ${\varvec{\Gamma }}$ is a diffeomorphism (Losert and Akin 1983). In particular, ${\varvec{\Gamma }}$ is a locally Lipschitz map. Hence, we may apply Theorem 1.2 in Bof et al. (2018) to conclude that the interior equilibrium ${{\varvec{\chi _{eq}}}}$ is asymptotically stable for the dynamical system ${{\varvec{\psi }}}_k.$ That is, for any $\varepsilon >0$ there exists $\delta >0$ such that

$$\begin{aligned} \Vert {{\varvec{\psi }}}_0-{{\varvec{\chi _{eq}}}}\Vert \le \delta \quad \Longrightarrow \quad \Vert {{\varvec{\psi }}}_k-{{\varvec{\chi _{eq}}}}\Vert \le \varepsilon ~\,\forall \,k \ge 0. \end{aligned}$$

(52)

Equivalently, for any open neighborhood ${{\mathcal {N}}}$ of ${{\varvec{\chi _{eq}}}}$ there exists an open neighborhood of ${{\varvec{\chi _{eq}}}},$ say ${{\mathcal {U}}},$ included in ${{\mathcal {N}}}$ such that ${\varvec{\Gamma }}({{\mathcal {U}}})\subset {{\mathcal {U}}}$ (that is ${{\mathcal {U}}}$ is forward-invariant for ${\varvec{\Gamma }}$).

Let ${{\mathcal {U}}}$ be a forward-invariant open neighborhood of ${{\varvec{\chi _{eq}}}}$ included in ${{\mathcal {N}}}.$ Since ${{\mathcal {U}}}$ is an open set and ${{\varvec{\psi }}}_{\tau (\textbf{x},{{\mathcal {U}}})}(\textbf{y})$ is a continuous function of the initial state $\textbf{y},$ one can choose an open neighborhood $V_\textbf{x}$ of ${{\varvec{\psi }}}_{\tau (\textbf{x},{{\mathcal {U}}})}(\textbf{x})$ and a constant $\varepsilon _\textbf{x}>0$ such that $V_\textbf{x}\subset {{\mathcal {U}}}$ and, furthermore, $\Vert \textbf{y}-\textbf{x}\Vert <\varepsilon _\textbf{x}$ implies ${{\varvec{\psi }}}_{\tau (\textbf{x},{{\mathcal {U}}})}(\textbf{y})\in V_\textbf{x}.$ Set $U_\textbf{x}=\big \{\textbf{y}\in \Delta _M^\circ : \Vert \textbf{y}-\textbf{x}\Vert <\varepsilon _\textbf{x}\big \}.$ $\square $

Corollary 5.2

Let ${{\mathcal {N}}}$ be any open neighborhood of ${\chi _{eq} }$ and K be a compact subset of $\Delta _M^\circ .$ Then there exists $T=T(K,{{\mathcal {N}}})\in {{\mathbb {N}}}$ such that $t(\textbf{x},{{\mathcal {N}}})\le T$ for all $\textbf{x}\in K.$

Proof of Corollary 5.2

Let $U_\textbf{x}$ and $T_\textbf{x}$ be as in Lemma 5.1. Consider a cover of K by the union of open sets $\bigcup _{x\in K} U_\textbf{x}.$ Since K is compact, we can choose a finite subcover, say $\bigcup _{j=1}^m U_{\textbf{x}_j}.$ Set $T(K,{{\mathcal {N}}})=\max _{1\le j\le m} t(\textbf{x}_j,{{\mathcal {N}}}).$ $\square $

Remark 2

The claim is a “uniform shadowing property” of ${\varvec{\Gamma }}.$ Recall that a sequence of points $(\textbf{x}_k)_{k\le T}$ where $T\in {{\mathbb {N}}}\bigcup \{+\infty \}$ is a $\delta $-pseudo orbit of the dynamical system ${{\varvec{\psi }}}_k$ if $\Vert {\varvec{\Gamma }}(\textbf{x}_k)-\textbf{x}_{k+1}\Vert \le \delta $ for $k=0,\ldots ,T-1$ (if T is finite then the definitions of $\delta $-pseudo orbit and $\delta $-chain are equivalent) and that a $\delta $-pseudo orbit is said to be $\varepsilon $-shadowed by a true orbit $\big ({{\varvec{\psi }}}_k(\textbf{x})\big )_{k\in {{\mathbb {Z}}}_+}$ if $\Vert {{\varvec{\psi }}}_k(\textbf{x})-\textbf{x}_k\Vert \le \varepsilon $ for all $k \ge 0.$ The proposition is basically saying that since ${{\varvec{\psi }}}_k$ is asymptotically stable, then 1) any infinite $\delta $-pseudo orbit starting at $\textbf{x}_0$ is $\varepsilon $-shadowed by the true orbit $\big ({{\varvec{\psi }}}_k(\textbf{x}_0)\big )_{k\ge 0}$ provided that $\delta >0$ is small enough; and 2) the property (more precisely, a possible value of the parameter $\delta $ given $\varepsilon >0$) is uniform over compact sets with respect to the initial point $\textbf{x}_0.$

5.5 Proof of Theorem 4.6

Proof of (i)

Let $\text{1 }_A$ denote the indicator of the event A, that is $\text{1 }_A=1$ if A occurs and $\text{1 }_A=0$ otherwise. For $\theta $ introduced in the statement of the theorem and ${{\mathcal {N}}}_\theta $ defined in (36), let $L_N$ be the number of times the Markov chain $\textbf{X}^{(N)}$ visits ${{\mathcal {N}}}_\theta $ after time zero and before it hits the boundary $\partial (\Delta _M)$ for the first time. That is,

$$\begin{aligned} L_N=\sum _{k=1}^{\nu _N} \text{1 }_{\{\textbf{X}^{(N)}_k\in {{\mathcal {N}}}_\theta \}}, \end{aligned}$$

(53)

where $\nu _N$ is the first hitting time of the boundary introduced in (39). Notice that time zero is not counted in the above formula regardless of whether or not $\textbf{X}^{(N)}_0\in {{\mathcal {N}}}_\theta .$ Since for a fixed $N\in {{\mathbb {N}}},$ the state space of the Markov chain $\textbf{X}^{(N)}$ is finite and

$$\begin{aligned} P_N\big (\textbf{x},\partial \big (\Delta _{M,N}\big )\big )=\sum _{\textbf{y}\in \partial (\Delta _{M,N})}P_N(\textbf{x},\textbf{y})>0 \end{aligned}$$

for all $\textbf{x}\in \Delta _{M,N},$ we have:

$$\begin{aligned} P_\textbf{x}(0\le L_N<\infty )=P_\textbf{x}(1\le \nu _N<\infty )=1\qquad \forall \,\textbf{x}\in \Delta _{M,N}^\circ . \end{aligned}$$

Note that in (53) we are counting the number of visits after time zero to ${{\mathcal {N}}}_\theta ,$ those excluding the initial state $\textbf{X}^{(N)}_0$ from the account even if $\textbf{X}^{(N)}_0\in {{\mathcal {N}}}_\theta .$

Denote

$$\begin{aligned} G_N=\{L_N>0\},\qquad \qquad G_N=\{L_N=0\}, \end{aligned}$$

and

$$\begin{aligned} \xi _N=\inf \{k\ge 0: \textbf{X}^{(N)}_k\in {{\mathcal {N}}}_\theta \}. \end{aligned}$$

(54)

That is $G_N$ is the event that $\textbf{X}^{(N)}$ visits ${{\mathcal {N}}}_\theta $ after time zero at least once, ${{\overline{G}}}_N$ is its complement, and $\xi _N$ is the time of the first visit (rather than the first return, which means $\textbf{X}^{(N)}_0$ counts in (54) if it belongs to ${{\mathcal {N}}}_\theta $). Note that $P_\textbf{x}(\xi _N=+\infty )>0$ for all $\textbf{x}\in \Delta _{M,N}\backslash {{\mathcal {N}}}_\theta $ because the chain can escape to the boundary of the simplex before it visits ${{\mathcal {N}}}_\theta ,$ and the boundary is an absorbing set for $\textbf{X}^{(N)}.$

Let $\theta >0$ and $\eta >0$ be as in the statement of the theorem, and recall $K_{\theta ,\eta }$ from (37). Proposition 4.5 implies that there exist (deterministic, non-random) real constant $\varepsilon _\theta >0$ and integer $T>0$ such that

$$\begin{aligned} \textbf{X}^{(N)}_0\in K_{\theta ,\eta }~\text{ and }~\big \Vert \textbf{X}^{(N)}_{k+1}-{\varvec{\Gamma }}\big (\textbf{X}^{(N)}_k\big )\big \Vert <\varepsilon _\theta ~ \text{ for }~k \le T-1\quad \Rightarrow \quad \textbf{X}^{(N)}_T\in {{\mathcal {N}}}_\theta . \end{aligned}$$

The vector field ${\varvec{\Gamma }}$ satisfies the conditions of Theorem 4.2, and it follows from (30) that

$$\begin{aligned} P_\textbf{x}({{\overline{G}}}_N)\le 2T Me^{-\frac{N\varepsilon _\theta ^2c_T^2}{2}}\qquad \qquad \forall \,\textbf{x}\in K_{\theta ,\eta }, \end{aligned}$$

(55)

where $c_T$ is defined in (31). For $\textbf{x}\in K_{\theta ,\eta },$ write

$$\begin{aligned} P_\textbf{x}({{\mathcal {E}}}_N)=P_\textbf{x}({{\mathcal {E}}}_N; G_N)+P_\textbf{x}({{\mathcal {E}}}_N; {{\overline{G}}}_N), \end{aligned}$$

and notice that $\lim _{N\rightarrow \infty }P_\textbf{x}({{\overline{G}}}_N)=0$ by virtue of (55). Moreover,

$$\begin{aligned} P_\textbf{x}({{\mathcal {E}}}_N; G_N)=P_\textbf{x}({{\mathcal {E}}}_N\,\mid \,G_N)P_\textbf{x}(G_N)=P_{\phi _{N,\textbf{x}}}({{\mathcal {E}}}_N)P_\textbf{x}(G_N), \end{aligned}$$

(56)

where $\phi _{N,\textbf{x}}$ is the distribution of $\textbf{X}^{(N)}_{\xi _N}$ under the conditional measure $P_\textbf{x}(\cdot \,\mid \,G_N),$ with $\xi _N$ introduced in (54). Hence, in order to prove the first part of the theorem, it suffices to show that for any sequence of (discrete) probability measures $\phi _N,$ each supported on ${{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}$ with the corresponding N, we have

$$\begin{aligned} \lim _{N\rightarrow \infty } P_{\phi _N}({{\mathcal {E}}}_N)=1. \end{aligned}$$

Since

$$\begin{aligned} P_{\phi _N}({{\mathcal {E}}}_N)=\sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} P_\textbf{y}({{\mathcal {E}}}_N)\,\phi _N(\textbf{y}), \end{aligned}$$

it suffices to show that

$$\begin{aligned} \lim _{N\rightarrow \infty } P_\textbf{y}({{\mathcal {E}}}_N)=1, \end{aligned}$$

and the convergence is uniform over $\textbf{y}\in {{\mathcal {N}}}_\theta .$ More precisely,

Lemma 5.3

For any $\varepsilon >0$ there exists an integer $N_0=N_0(\varepsilon )>0$ such that

$$\begin{aligned} P_\textbf{y}({{\mathcal {E}}}_N)\ge 1-\varepsilon \end{aligned}$$

for all $N>N_0$ and $\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}.$

Proof of Lemma 5.3

Recall $L_N$ from (53). If $L_N\ge 1,$ let $S_{1,N},\ldots , S_{L_N,N}$ be successive return times of the Markov chain $\textbf{X}^{(N)}$ to ${{\mathcal {N}}}_\theta .$ Namely, we set $S_{0,N}=0$ and define recursively,

$$\begin{aligned} S_{k,N}=\inf \{j>S_{k-1,N}: \textbf{X}^{(N)}_j\in {{\mathcal {N}}}_\theta \}. \end{aligned}$$

As usual, we assume that $\inf \emptyset =+\infty .$ Let $S_{L_N+1,N}$ be the first hitting time of the boundary:

$$\begin{aligned} S_{L_N+1,N}=\nu _N, \end{aligned}$$

where $\nu _N$ is defined in (39). Finally, for $k\in {{\mathbb {Z}}}_+$ define

$$\begin{aligned} \textbf{Y}^{(N)}_k:=\left\{ \begin{array}{ll} \textbf{X}^{(N)}_{S_{k,N}}~&{}\text{ if }~k\le L_N+1 \\ \textbf{X}^{(N)}_{S_{L_N+1,N}}~&{}\text{ if }~k> L_N+1. \end{array} \right. \end{aligned}$$

Then $\big (\textbf{Y}^{(N)}_j\big )_{j\in {{\mathbb {N}}}}$ is a Markov chain on $\big ({{\mathcal {N}}}_\theta \bigcup \partial (\Delta _M)\big )\bigcap \Delta _{M,N}$ with all states on the boundary $\partial \big (\Delta _{M,N}\big )$ being absorbing. Denote by $Q_N$ transition kernel of this chain.

Recall (4) and (13), and define

$$\begin{aligned} \partial _{M,{\varvec{\Gamma }}}^*=\big \{\textbf{x} \in \partial _M:\,|C(\textbf{x})|=M-1~\text{ and }~x(i)=0~\text{ for } \text{ some }~i\in J^*\big \}. \end{aligned}$$

(57)

In words, $\partial _{M,{\varvec{\Gamma }}}^*$ is precisely the subset of the boundary such that for all $N\in {{\mathbb {N}}},$ the event ${{\mathcal {E}}}_N$ occurs if and only if $\textbf{X}^{(N)}_{\nu _N}\in \partial _{M,{\varvec{\Gamma }}}^*,$ where $\nu _N$ is the first hitting time of the boundary defined in (39). We then have for any $\textbf{x}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}:$

$$\begin{aligned} P_\textbf{x}({{\mathcal {E}}}_N)\ge & {} \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z}) \nonumber \\\ge & {} (1-\alpha -\theta )^N \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} Q_N^m(\textbf{x},\textbf{y}) \nonumber \\= & {} (1-\alpha -\theta )^N \sum _{k=0}^\infty P_\textbf{x}(L_N \ge k)\nonumber \\= & {} (1-\alpha -\theta )^N\big (1+E_\textbf{x}(L_N)\big ), \end{aligned}$$

(58)

where $L_N$ is the number of visits to ${{\mathcal {N}}}_\theta $ after time zero introduced in (53). In the above formula, $P_\textbf{x}({{\mathcal {E}}}_N)$ is first bounded from below by the probability of the event “$\textbf{X}^{(N)}$ hits the boundary at one of the vertices of the set $\partial _{M,{\varvec{\Gamma }}}^*,$ moreover, the last site visited by $\textbf{X}^{(N)}$ before passing to the boundary lies within ${{\mathcal {N}}}_\theta $”. The index m in (58) counts the number of visits to ${{\mathcal {N}}}_\theta $ after time zero, $\textbf{y}$ is the last visited state at $\Delta _M^\circ ,$ and the estimate

$$\begin{aligned} P_N(\textbf{y},\textbf{z})\ge (1-\alpha -\theta )^N,\qquad \qquad \textbf{y}\in {{\mathcal {N}}}_\theta ,\, \textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*, \end{aligned}$$

is a direct implication of (7) and the definition of ${{\mathcal {N}}}_\theta .$

To complete the proof of Lemma 5.3, we will now decompose an (almost surely) certain event “$\textbf{X}^{(N)}$ hits the boundary eventually” in a way similar to (58) and compare two estimates. Let (cf. (37))

$$\begin{aligned} \partial _{\theta ,\eta }:=\big \{x\in \Delta _M^\circ \backslash {{\mathcal {N}}}_\theta :\max _{j\not \in J^*}x(j)< \eta \bigg \}, \end{aligned}$$

so that the interior of the simplex partitions as $\Delta _M^\circ ={{\mathcal {N}}}_\theta \bigcup K_{\theta ,\eta }\bigcup \partial _{\theta ,\eta }.$ Similarly to (58), for the probability to eventually hit the boundary starting from a point $\textbf{x}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N},$ we have:

$$\begin{aligned} 1\le & {} \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z}) \nonumber \\{} & {} + \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{\theta ,\eta }\backslash \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z}) \nonumber \\{} & {} + \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{z\in \Delta _{M,N}\backslash (N_\theta \bigcup \partial _{\theta ,\eta })} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z})P_\textbf{z}\big ({{\overline{G}}}_N\big ) \nonumber \\\le & {} \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N(\textbf{y},\textbf{z}) \nonumber \\{} & {} + (1-\textbf{e}ta+\theta )^{\eta N}\big (1+E_\textbf{x}(L_N)\big ) + \big (1+E_\textbf{x}(L_N)\big )\cdot 2M Te^{-\frac{N\varepsilon _\theta ^2c_T^2}{2}}, \end{aligned}$$

(59)

where we used (55) to estimate $P_\textbf{z}\big ({{\overline{G}}}_N\big ).$

It follows from (58) and the assumptions of the theorem that

$$\begin{aligned}{} & {} \frac{(1-\textbf{e}ta+\theta )^{\eta N}\big (1+E_\textbf{x}(L_N)\big ) + \big (1+E_\textbf{x}(L_N)\big )\cdot 2MT e^{-\frac{N\varepsilon _\theta ^2c_T^2}{2}}}{\sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z})} \\{} & {} \quad \le \frac{(1-\textbf{e}ta+\theta )^{\eta N}\big (1+E_\textbf{x}(L_N)\big ) + \big (1+E_\textbf{x}(L_N)\big )\cdot 2MT e^{-\frac{N\varepsilon _\theta ^2c_T^2}{2}}}{(1-\alpha -\theta )^N\big (1+E_\textbf{x}(L_N)\big )} \longrightarrow _{N\rightarrow \infty } 0, \end{aligned}$$

where (58) is used to estimate the denominator and therefore justify the first inequality. Combining the last result with (58) and (59), we obtain that

$$\begin{aligned} 1\le & {} \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N(\textbf{y},\textbf{z}) \\{} & {} + (1-\textbf{e}ta+\theta )^{\eta N}\big (1+E_\textbf{x}(L_N)\big ) + \big (1+E_\textbf{x}(L_N)\big )\cdot 2M Te^{-\frac{N\varepsilon _\theta ^2c_T^2}{2}} \\\le & {} P_\textbf{x}({{\mathcal {E}}}_N)+o\big (P_\textbf{x}({{\mathcal {E}}}_N)\big )~\text{ as }~N\rightarrow \infty . \end{aligned}$$

Hence,

$$\begin{aligned} \lim _{N\rightarrow \infty } P_\textbf{x}({{\mathcal {E}}}_N)=\lim _{N\rightarrow \infty }\bigg ( \sum _{m=0}^\infty \sum _{\textbf{y}\in {{\mathcal {N}}}_\theta \bigcap \Delta _{M,N}} \sum _{\textbf{z}\in \partial _{M,{\varvec{\Gamma }}}^*} Q_N^m(\textbf{x},\textbf{y})P_N (\textbf{y},\textbf{z})\bigg ) =1. \end{aligned}$$

The proof of Lemma 5.3 is complete. $\square $

This completes the proof of part (i) of the theorem. $\square $

Proof of (ii)

To show that $\lim _{N\rightarrow \infty }E(\nu _N)=+\infty ,$ write, similarly to (56),

$$\begin{aligned} E(\nu _N)\ge & {} E(\nu _N;G_N)=E_{\varsigma _N}(\nu _N)P(G_N), \end{aligned}$$

where $\varsigma _N$ is the distribution of $\textbf{X}^{(N)}_{\xi _N}$ under the conditional measure $P(\cdot \,\mid \,G_N),$ with $\xi _N$ introduced in (54). Using Proposition 4.5 with a compact set K that includes $\textbf{x}_0$ (the limit of the initial state of $\textbf{X}^{(N)}$ introduced in the conditions of the theorem) and leveraging the same argument that we employed in order to obtain (55), one can show that $\lim _{N\rightarrow \infty } P(G_N)=1.$ Furthermore, the estimate in (55) shows that $L_N,$ the number of visits to ${{\mathcal {N}}}_\theta $ after time zero is stochastically dominated from below under $P_{\varsigma _N}$ by a geometric variable with probability of success $e^{-\gamma N}$ for any $\gamma >0$ sufficiently small. Thus,

$$\begin{aligned} E_{\varsigma _N}(\nu _N) \ge E_{\varsigma _N}(L_N)\ge e^{\gamma N}. \end{aligned}$$

Therefore, $\lim _{N\rightarrow \infty } E_{\varsigma _N}(\nu _N)=+\infty ,$ and the proof of the second part of the theorem is complete. $\square $

5.6 Proof of Proposition 4.7

Write

$$\begin{aligned} \Gamma _i(\textbf{x})=\frac{1+B x(i)}{1+ \textbf{x}^T\textbf{B}{} \textbf{x}}x(i), \end{aligned}$$

where $\textbf{B}=\frac{\omega }{1-\omega }{} \textbf{A}.$ By the definition of the interior equilibrium, there is a constant $r>0$ such that

$$\begin{aligned} B{\chi _{eq} }(i)=r\qquad \forall \,i\in S_M. \end{aligned}$$

(60)

Therefore, for $\textbf{z}={{\varvec{\chi _{eq}}}}+ \textbf{u}\in \Delta _M^\circ ,$ as $\textbf{u}$ approaches zero in ${{\mathbb {R}}}^M,$ we have:

$$\begin{aligned} \Gamma _i(\textbf{z})=\frac{1+r+B u(i)}{1+r+\textbf{u}^T\textbf{B} {{\varvec{\chi _{eq}}}}+{{\varvec{\chi _{eq}}}}^T \textbf{B}{} \textbf{u}}\big ({\chi _{eq} }(i)+u(i)\big ) +o(\Vert \textbf{u}\Vert ). \end{aligned}$$

Since $\textbf{B}^T=\textbf{B}$ and $\sum _{i\in S_M}u(i)=0,$ we have:

$$\begin{aligned} {{\varvec{\chi _{eq}}}}^T\textbf{B}{} \textbf{u}=\textbf{u}^T \textbf{B} {{\varvec{\chi _{eq}}}}=r\sum _{i\in S_M}u(i)=0. \end{aligned}$$

Thus, as $\textbf{u}$ approaches $\textbf{0}\in {{\mathbb {R}}}^M,$

$$\begin{aligned} \Gamma _i(\textbf{z})= & {} \frac{1+r+B u(i)}{1+r}\big ({\chi _{eq} }(i)+u(i)\big ) +o(\Vert u\Vert ) \\= & {} \big ({\chi _{eq} }(i)+u(i)\big )+\frac{{\chi _{eq} }(i)\cdot Bu(i)}{1+r}+o(\Vert u\Vert ). \end{aligned}$$

Since ${\varvec{\Gamma }}$ is a diffeomorphism (Losert and Akin 1983), we can compare this formula with the first-order Taylor expansion of ${\varvec{\Gamma }}$ around ${{\varvec{\chi _{eq}}}}.$ It follows that the Jacobian matrix of ${\varvec{\Gamma }}$ evaluated at ${{\varvec{\chi _{eq}}}}$ is given by

$$\begin{aligned} D_{ij}:=\frac{\partial \Gamma _i}{\partial x_j}({{\varvec{\chi _{eq}}}})=\delta _{ij} +\frac{{\chi _{eq} }(i)B_{i,j}}{1+r},\qquad i.j\in S_M, \end{aligned}$$

(61)

where $\delta _{ij}$ is the Kronecker symbol.

Remark 3

The partial derivative computed in (61) is a derivative of ${\varvec{\Gamma }}$ in the probability simplex $\Delta _M,$ which can be thought as an “${{\mathbb {R}}}^M$ with additional constraints.” In order to take the constraints into consideration, we vary the variables around an inner point of $\Delta _M$ and use directly the definition of the derivative rather than the standard “algebra of derivatives”. In comparison to the Fréchet derivative of ${\varvec{\Gamma }}$ with the domain in the Banach space $\Delta _M$ calculated in (61), a similar expression for the derivative $\frac{\partial \Gamma _i}{\partial x_j}({{\varvec{\chi _{eq}}}})$ in ${{\mathbb {R}}}^M$ contains an extra term $-2r\chi _{eq}(i)/(1 + r).$ To illustrate the point, consider as an example $M=2$ and ${\varvec{\Gamma }}=\textbf{I},$ the identity matrix. Then $\chi (1)=\chi (2)=r=1/2$ and (61) suggests that $\textbf{D}=\frac{4}{3}{} \textbf{I}.$ This can be verified directly: for any $t\in (-1/2,1/2)$ and $\textbf{u}=(t,-t)^T,$ we have

$$\begin{aligned} {\varvec{\Gamma }}({{\varvec{\chi _{eq}}}}+\textbf{u})-{{\varvec{\chi _{eq}}}}-\textbf{D}\textbf{u}= & {} \frac{1}{3/2+2t^2}\Big (\frac{3}{4}+2t+t^2,\frac{3}{4}-2t+t^2\Big )^T\\{} & {} - \Big (\frac{1}{2}+\frac{4}{3}t,\frac{1}{2}-\frac{4}{3}t\Big )^T \\= & {} \frac{1}{3/2+2t^2}\Big (-\frac{8}{3}t^3,\frac{8}{3}t^3\Big )^T, \end{aligned}$$

and hence, when ${\varvec{\Gamma }}$ is considered as a function with the domain of definition $\Delta _2$ rather than ${{\mathbb {R}}}^2,$ $\lim _{\Vert u\Vert \rightarrow 0}\frac{\Vert {\varvec{\Gamma }}({{\varvec{\chi _{eq}}}}+\textbf{u})-{{\varvec{\chi _{eq}}}}-\textbf{D}{} \textbf{u}\Vert }{\Vert u\Vert }=0$ as desired.

It is easy to verify that

$$\begin{aligned} \textbf{H}:=\frac{1+r}{1+2r}{} \textbf{D} \end{aligned}$$

is a transpose of a strictly positive stochastic matrix, namely $H_{ij}>0$ for all $i,j\in S_M$ and $\sum _{i\in S_M} H_{ij}=1$ for all $j\in S_M.$

Note that ${{\varvec{\chi _{eq}}}}$ is a left eigenvector of $\textbf{H}^T$ corresponding to the principle eigenvalue 1. Indeed, assuming $i\ne j,$ we deduce from (61) and the fact that $\textbf{B}^T=\textbf{B}:$

$$\begin{aligned} {\chi _{eq} }(i) H^T_{ij}=\frac{1+r}{1+2r}\frac{{\chi _{eq} }(i)B_{i,j}{\chi _{eq} }(j)}{1+r}={\chi _{eq} }(j) H^T_{ji}. \end{aligned}$$

Thus, stochastic matrix $\textbf{H}^T$ is transition kernel of a reversible Markov chain with stationary distribution ${{\varvec{\chi _{eq}}}}.$ From the reversibility of the Markov transition matrix $\textbf{H}^T,$ it follows that all the eigenvalues of $\textbf{H},$ and hence also of $\textbf{D},$ are real numbers [see, for instance, Brémaud (1998)].

Recall $W_M$ from (20). The asymptotic stability of the equilibrium (see the paragraph above (52)) implies that the largest absolute value of an eigenvalue of the linear operator $\textbf{D}$ restricted to the linear space $W_M$ is at most one (Perko 1991). It follows from (61) that if $\lambda $ is this eigenvalue and $|\lambda |=1,$ then there is an eigenvalue $\eta $ of the matrix with entries $\frac{{\chi _{eq} }(i)B_{i,j}}{1+r}$ such that $\eta \in \{-2,0\}.$ To complete the proof of the proposition it remains to show that this is indeed impossible. Toward this end, denote by $\textbf{H}_{{\varvec{\chi }}}$ the matrix with entries $\frac{{\chi _{eq} }(i)B_{i,j}}{1+r}$ and observe that

$$\begin{aligned} \det (\textbf{H}_{{\varvec{\chi }}})=(1+r)^{-M}\prod _{i=1}^M {\chi _{eq} }(i)\cdot \det \textbf{B}. \end{aligned}$$

Since $\textbf{B}$ is invertible by our assumptions, this shows that 0 cannot be an eigenvalue of $\textbf{H}_\chi .$ Furthermore, if $\eta =-2$ is an eigenvalue of $\textbf{H}_{{\varvec{\chi }}},$ then there exists ${{\varvec{\nu }}}\in W_M$ such that

$$\begin{aligned} {{\varvec{\chi _{eq}}}}\circ \textbf{B}{{\varvec{\nu }}}=-2(1+r){{\varvec{\nu }}}, \end{aligned}$$

where $\circ $ denotes the element-wise product introduced in (6). Recall now that $\textbf{B}=h\textbf{A},$ where $h=\frac{\omega }{1-\omega }.$ Plugging this expression into the previous formula, we obtain

$$\begin{aligned} {{\varvec{\chi _{eq}}}}\circ \textbf{A}{{\varvec{\nu }}}=-2(1+r)h^{-1}{{\varvec{\nu }}} \end{aligned}$$

(62)

Now, choose a constant $\eta >1$ so close to one that $\widetilde{\textbf{A}}=\eta \textbf{A}$ satisfies Assumption 4.4, namely $\widetilde{\textbf{A}}$ is invertible and has exactly one positive eigenvalue.

Denote $\widetilde{\textbf{B}}:=h\widetilde{\textbf{A}}=\eta \textbf{B}.$ The analogue of (60) for $\widetilde{\textbf{B}}$ reads

$$\begin{aligned} {{\widetilde{B}}}{\chi _{eq} }(i)=\eta B{\chi _{eq} }(i)=r\eta \qquad \forall \,i\in S_M. \end{aligned}$$

In this sense, $r\eta $ plays the same role for the pair $\widetilde{\textbf{A}}$ and $\widetilde{\textbf{B}}$ as r does for $\textbf{A}$ and $\textbf{B}.$ Moreover, (62) implies that

$$\begin{aligned} \frac{1}{1+r\eta }{{\varvec{\chi _{eq}}}} \circ \widetilde{\textbf{B}}{{\varvec{\nu }}}=-2\frac{\eta (1+r)}{1+r\eta } {{\varvec{\nu }}}. \end{aligned}$$

Since $\left|\frac{\eta (1+r)}{1+r\eta }\right|>1,$ it follows that the spectral radius of the Jacobian of the vector field $\widetilde{\varvec{\Gamma }}(\textbf{x}):=\textbf{x}\circ \frac{1+h\widetilde{\textbf{A}}{} \textbf{x}}{1+h\textbf{x}\widetilde{\textbf{A}}\textbf{x}}$ evaluated at ${{\varvec{\chi _{eq}}}}$ is greater than one. This is a clear contradiction, since matrix $\widetilde{\textbf{A}}$ is assumed to satisfy the conditions of Theorem 4, and hence the spectral radius must be at most one. $\square $

5.7 Proof of Corollary 4.8

For $\varepsilon >0,$ denote ${{\mathcal {U}}}^{(N)}_\varepsilon :=\big \{\textbf{x}\in \Delta _{M,N}:\Vert \textbf{x}-{{\varvec{\chi _{eq}}}}\Vert <\varepsilon \big \}.$ Since ${\varvec{\Gamma }}$ is a diffeomorphism under the conditions stated in Assumption 4.4 (Losert and Akin 1983), Proposition 4.7 implies that there exist $\varepsilon >0$ and $\rho \in (0,1)$ such that the spectral radius of the Jacobian is strictly less than $\rho $ within ${{\mathcal {U}}}^{(N)}_\varepsilon .$ Thus, ${\varvec{\Gamma }}({{\mathcal {U}}}^{(N)}_\varepsilon )\subset {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }.$ Without loss of generality we may assume that N is so large that both ${{\mathcal {U}}}^{(N)}_{\rho \varepsilon }$ and ${{\mathcal {U}}}^{(N)}_\varepsilon \backslash {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }$ are nonempty. Indeed, for any $N_0\in {{\mathbb {N}}}$ and $b>0$ the claim is trivially true for all $N\le N_0$ if we choose $C>0$ in such a way that $1-Ce^{-bN_0}<0.$

Since ${\varvec{\Gamma }}({{\mathcal {U}}}^{(N)}_\varepsilon )\subset {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }$ and $P_N(\textbf{x},\textbf{y})>0$ for all $\textbf{x},\textbf{y}\in \Delta _{M,N}^\circ $ (and hence, by virtue of (41), the quasi-stationary distribution $\mu _N$ puts strictly positive weights on all $\textbf{x}\in \Delta _{M,N}^\circ $), for suitable $C>0$ and $b>0$ we have:

$$\begin{aligned} \lambda _N \mu _N\big ({{\mathcal {U}}}^{(N)}_{\rho \varepsilon }\big )= & {} \mu _N P_{\circ ,N}\big ({{\mathcal {U}}}^{(N)}_{\rho \varepsilon }\big )=\sum _{\textbf{x}\in \Delta _{M,N}^\circ } \sum _{\textbf{y}\in {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }} \mu _N(x)P_N(\textbf{x},\textbf{y}) \nonumber \\\ge & {} \sum _{\textbf{x}\in {{\mathcal {U}}}^{(N)}_\varepsilon } \sum _{\textbf{y}\in {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }} \mu _N(\textbf{x})P_N(\textbf{x},\textbf{y}) \ge \mu _N\big ({{\mathcal {U}}}^{(N)}_\varepsilon )\big (1-Ce^{-b N}\big ) \nonumber \\> & {} \mu _N\big ({{\mathcal {U}}}^{(N)}_{\rho \varepsilon }\big )\big (1-Ce^{-b N}\big ), \end{aligned}$$

(63)

which implies the claim. To obtain the crucial second inequality in (63)

$$\begin{aligned} \sum _{\textbf{y}\in {{\mathcal {U}}}^{(N)}_{\rho \varepsilon }} P_N(\textbf{x},\textbf{y})=P_N\big (\textbf{x},{{\mathcal {U}}}^{(N)}_{\rho \varepsilon }\big )\ge 1-Ce^{-bN} \end{aligned}$$

one can rely on the result of Proposition 4.5 and use an argument similar to the one which led us to (55). $\square $

6 Conclusion

In this paper we examined the long-term behavior of a general Wright–Fisher model. Our results hold for a broad class of Wright–Fisher models, including several instances that have been employed in experimental biology applications. Our primary motivation came from research conducted on gut gene-microbial ecological networks (Chapkin et al.).

The main results of the paper are:

A maximization principle for deterministic replicator dynamics stated in Theorem 3.2. In this theorem, we show that the average of a suitably chosen reproductive fitness of the mean-field model is non-decreasing with time, and is strictly increasing outside of a certain chain-recurrent set. The proof relies on a “fundamental theorem of dynamical systems” of Conley (1978).
Gaussian approximation scheme constructed for the Wright–Fisher model in Theorem 4.1.
Theorem 4.2, which gives an exponential lower bound on the first time that the error of mean-field approximation exceeds a given threshold.
Theorem 4.6, which specifies the route to extinction in the case that the following three conditions are satisfied: (1) the mean-field dynamical system has a unique interior equilibrium; (2) boundary of the probability simplex is repelling for the mean-field map; and (3) the stochastic mixing effect (in certain rigorous sense) dominates extinction forces.
Proposition 4.7 that elaborates on the mechanism of convergence to the equilibrium in the mean-field model of Theorem 4.6 by establishing a stringent contraction near the equilibrium property for the systems considered in the theorem.

We next outline several directions for extensions and future research.

1.
The competitive exclusion principle predicts that only the fittest species will survive in a competitive environment. In Hutchinson (1961), Hutchinson addressed the paradox of the plankton which is a discrepancy between this principle and the observed diversity of the plankton. One of the suggested resolutions of this paradox is that the environmental variability might promote the diversity. Several mathematical models of species persistence in time-homogeneous environments have been proposed (Benaïm and Schreiber 2019; Carja et al. 2014; Dean and Shnerb 2019; Li and Cheng 2019; Li and Chesson 2016; Mahdipour-Shirayeh et al. 2017). In order to maintain the diversity, environmental factors must fluctuate within a broad range supporting a variety of contradicting trends and thus promoting the survival of various species. The theory of permanence and its stochastic counterpart, the theory of stochastic persistence (Benaïm and Schreiber 2019), is an alternative (but not necessarily mutually excluding) mathematical framework able to explain the coexistence of interacting species in terms of topological properties of the underlying dynamical system. Extension of our results to systems in fluctuating environments and fluctuating size populations (e.g., Park and Traulsen 2017) seems to be a natural direction for future research with important implications regarding the dynamical interactions between microbial communities, host gene expression, and nutrients in the human gut.
2.
We believe that Theorem 4.6 can be extended to a broad class of stochastic population models with mean-field vector-field ${\varvec{\Gamma }}$ promoting cooperation between the particles (cells/microorganisms in applications). Among specific biological mechanisms promoting forms of cooperation are: public goods (Archetti and Scheuring 2012; Chuang et al. 2009; Constable et al. 2016; Kurokawa and Ihara 2009; McAvoy et al. 2018; Zhang et al. 2013), relative nonlinearity of competition (Arnoldt et al. 2012; Chesson 2018; Harmand et al. 2019), population-level randomness (Constable et al. 2016; Coron et al. 2019; Kroumi and Lessard 2015), and diversification of cells into distinct ecological niches (Good et al. 2018; Rohlfs and Hoffmeist 2003). Public goods models are of a special interest for us because of a possibility of an application to the study of a gene-microbial-metabolite ecological network in the human gut (Chapkin et al.). A public good is defined in Archetti and Scheuring (2012) as any benefit that is “simultaneously non-excludable (nobody can be excluded from its consumption) and non-rivalrous (use of the benefit by one particle does not diminish its availability to another particle)”. For instance, an alarm call against a predator is a pure public good while natural resources (e. g. the air) and enzyme production in bacteria can be approximated to public goods (Archetti and Scheuring 2012). Public goods games in microbial communities have been considered by several authors, see Archetti and Scheuring (2012), Cavaliere et al. (2017), Chuang et al. (2009), Zhang et al. (2013) for a review. An important feature of the public goods game is the intrinsic non-linearity of the fitness (Archetti and Scheuring 2012; Zhang et al. 2013).
3.
It seems plausible that Theorem 4.6 can be extended to a multitype Moran model. The mean-field dynamics as well the quasi-stationary distributions for such models are relatively well studied.
4.
In contrast to non-rivalrous interactions captured by public goods games, competitive interaction between cells can lead to negative frequency dependence, see Broekman et al. (2019), Broom and Rychtář (2013) and references therein. A classical example of negative frequency dependence is animals foraging on several food patches (Broom and Rychtář 2013). We believe it is of interest to explore the path to extinction of this class of models in the spirit of analysis undertaken in Sect. 4.3.
5.
An interesting instance of the Wright–Fisher model with an replicator update rule was introduced in Kroumi and Lessard (2015). This is a mathematical model for cooperation in a phenotypic space, and is endowed with a rich additional structure representing phenotypic traits of the cells in addition to the usual division into two main categories/types, namely cooperators and defectors. In the exact model considered in Kroumi and Lessard (2015), the phenotypic space is modelled as a Euclidean space ${{\mathbb {Z}}}^d.$ However, a variation with finite population (for instance, restricting the phenotypic space to a cube in ${{\mathbb {Z}}}^d$) can be studied in a similar manner. Extending the result of Theorem 4.6 to a variation of this Wright–Fisher model is challenging, but could potentially contribute to a better understanding of mechanisms forming extinction trajectory in stochastic population models.
6.
In Sect. 4.3 we investigated the phenomenon of extinction of a long-lived closed stochastic population. The manner in which the population is driven to extinction depends on whether the extinction states are repellers or attractors of the mean-field dynamics (cf. Assaf and Meerson 2010; Assaf and Mobilia 2011; Park and Traulsen 2017). In the latter case the extinction time is short because with a high probability the stochastic trajectory follows the mean-field path closely for a long enough time to get to a close proximity (in a suitable phase space) of the extinction set, and then it becomes fixed there via small noise fluctuations. In contrast, in the former scenario, attractors of the deterministic dynamics act as traps or pseudo-equilibria for the stochastic model and thus create metastable stochastic states. While Theorem 4.6 is concerned with a model of this kind, it seems plausible that the result can be extended to a situation where all attractors of the mean-field model belong to the boundary of the simplex $\Delta _M.$
7.
The proof of Theorem 4.1 relies on a representation of $\textbf{X}^{(N)}_k$ as partial sums of i. i. d. random indicators (see equations (44) and (45)), and can be extended beyond the i. i. d. setting. In particular, it can be generalized to cover the Cannings exchangeable model and models in fluctuating environments. We believe that with a proper adjustment, the result also holds for a variation of the model with N depending on time (Gompert 2016; Grosjean and Huillet 2017; Iizuka 2010) (this is, for instance, well known for branching processes with immigration). This theorem can also be extended to the paintbox models (Boenkost et al. 2019) and to a class of Cannings models in phenotypic spaces with weak and fast decaying dependency between phenotypes (Kroumi and Lessard 2015).
8.
Finally, proving Conjecture 4.9 would shed further light on the relations among metastability, quasi-stationary distributions, and the extinction trajectory for the underlying Wright–Fisher model.

References

Akhmet M, Fen MO (2016) Input-output mechanism of the discrete chaos extension. In: Afraimovich V, Machado JAT, Zhang J (eds) Complex motions and chaos in nonlinear systems, Nonlinear Syst. Complex., vol 15. Springer, Berlin, pp 203–233
Antal T, Traulsen A, Ohtsuki H, Tarnita CE, Nowak MA (2009) Mutation-selection equilibrium in games with multiple strategies. J Theor Biol 258:614–622
MathSciNet MATH Google Scholar
Ao P (2005) Laws in Darwinian evolutionary theory. Phys Life Rev 2:117–156
Google Scholar
Archetti M, Scheuring I (2012) Review: game theory of public goods in one-shot social dilemmas without assortment. J Theor Biol 299:9–20
MathSciNet MATH Google Scholar
Arnoldt H, Timme M, Grosskinsky S (2012) Frequency-dependent fitness induces multistability in coevolutionary dynamics. J R Soc Interface 9:3387–3396
Google Scholar
Ashcroft P, Michor F, Galla T (2015) Stochastic tunneling and metastable states during the somatic evolution of cancer. Genetics 199:1213–1228
Google Scholar
Assaf M, Meerson B (2008) Noise enhanced persistence in a biochemical regulatory network with feedback control. Phys Rev Lett 100:058105
Google Scholar
Assaf M, Meerson B (2010) Extinction of metastable stochastic populations. Phys Rev E 81:021116
Google Scholar
Assaf M, Mobilia M (2011) Fixation of a deleterious allele under mutation pressure and finite selection intensity. Phys Rev E 275:93–103
MathSciNet MATH Google Scholar
Auletta V, Ferraioli D, Pasquale F, Persiano G (2018) Metastability of logit dynamics for coordination games. Algorithmica 80:3078–3131
MathSciNet MATH Google Scholar
Aydogmus O (2016) On extinction time of a generalized endemic chain-binomial model. Math Biosci 279:38–42
MathSciNet MATH Google Scholar
Banks J, Brooks J, Cairns G, Davis G, Stacey P (1992) On Devaney’s definition of chaos. Am Math Mon 99:332–334
MathSciNet MATH Google Scholar
Barton NH, Coe JB (2009) On the application of statistical physics to evolutionary biology. J Theor Biol 259:317–324
MathSciNet MATH Google Scholar
Barton JP, Goonetilleke N, Butler TC, Walker BD, McMichael AJ, Chakraborty AK (2016) Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nat Commun 7:11660
Google Scholar
Basener WF (2013) Limits of chaos and progress in evolutionary dynamics, In: Marks II RJ, Behe MJ, Dembski WA, Gordon BL, Sanford JC (eds) Biological information. New Perspectives, Proceedings of the Symposium (Cornell University, 2011). World Scientific, Singapore, pp 87–104
Basener WF, Sanford JC (2018) The fundamental theorem of natural selection with mutations. J Math Biol 76:1589–1622
MathSciNet MATH Google Scholar
Beerenwinkel N, Antal T, Dingli D, Traulsen A, Kinzler KW, Velculescu VE, Vogelstein B, Nowak MA (2007) Genetic progression and the waiting time to cancer. PLoS Comput Biol 3:2239–2246
MathSciNet Google Scholar
Benaïm M, Schreiber SJ (2019) Persistence and extinction for stochastic ecological models with internal and external variables. J Math Biol 79:393–431
MathSciNet MATH Google Scholar
Bernardi O, Florio A (2019) Existence of Lipschitz continuous Lyapunov functions strict outside the strong chain recurrent set. Dyn Syst 34:71–92
MathSciNet MATH Google Scholar
Birch J (2016) Natural selection and the maximization of fitness. Biol Rev 91:712–727
Google Scholar
Block GL, Allen LJS (2000) Population extinction and quasi-stationary behavior in stochastic density-dependent structured models. Bull Math Biol 62:199–228
MATH Google Scholar
Block L, Franke JE (1985) The chain recurrent set, attractors, and explosions. Ergodic Theory Dyn Syst 5:321–327
MathSciNet MATH Google Scholar
Boenkost F, González Casanova A, Pokalyuk C, Wakolbinger A (2019) Haldane’s formula in Cannings models: the case of moderately weak selection, preprint is available at arXiv:1907.10049
Bof N, Carli R, Schenato L (2018) Lyapunov theory for discrete time systems, technical report is available at arXiv:1809.05289
Bovier A, den Hollander F (2015) Metastability: a potential-theoretic approach. Springer, Berlin
MATH Google Scholar
Bratus AS, Semenov YS, Novozhilov AS (2018) Adaptive fitness landscape for replicator systems: to maximize or not to maximize. Math Model Nat Phenom 13:25
MathSciNet MATH Google Scholar
Brémaud P (1998) Markov Chains: Gibbs fields, Monte Carlo simulation, and queues. Springer, Berlin
MATH Google Scholar
Broekman MJE, Muller-Landau HC, Visser MD, Jongejans E, Wright SJ, de Kroon H (2019) Signs of stabilisation and stable coexistence. Ecol Lett 22:1957–1975
Google Scholar
Broom M, Rychtář J (2013) Game-Theoretical Models in Biology. CRC Press, Cambridge
MATH Google Scholar
Buckley FM, Pollett PK (2010) Limit theorems for discrete-time metapopulation models. Probab Surv 7:53–83
MathSciNet MATH Google Scholar
Cai J, Tan T, Chan SHJ (2019) Bridging traditional evolutionary game theory and metabolic models for predicting Nash equilibrium of microbial metabolic interactions, preprint is available at https://www.biorxiv.org/content/early/2019/05/14/623173.full.pdf
Carja O, Liberman U, Feldman MW (2014) Evolution in changing environments: modifiers of mutation, recombination, and migration. Proc Natl Acad Sci USA 111:17935–17940
Google Scholar
Caulin AF, Graham TA, Wang LS, Maley CC (2015) Solutions to Peto’s paradox revealed by mathematical modelling and cross-species cancer gene analysis. Philos Trans R Soc Lond Ser B 370:20140222
Google Scholar
Cavaliere M, Feng S, Soyer OS, Jiménez JI (2017) Cooperation in microbial communities and their biotechnological applications. Environ Microbiol 19:2949–2963
Google Scholar
Chalub FACC, Souza MO (2014) The frequency-dependent Wright–Fisher model: diffusive and non-diffusive approximations. J Math Biol 68:1089–1133
MathSciNet MATH Google Scholar
Chalub FACC, Souza MO (2017) On the stochastic evolution of finite populations. J Math Biol 75:1735–1774
MathSciNet MATH Google Scholar
Chapkin RS et al. Propagation of a signal within a gut gene-microbial ecological network with an interface layer, work in progress
Chen H, Kardar M (2019) Mean-field computational approach to HIV dynamics on a fitness landscape, preprint is available at https://www.biorxiv.org/content/10.1101/518704v2
Chesson P (2018) Updates on mechanisms of maintenance of species diversity. J Ecol 106:1773–1794
Google Scholar
Christie JR, Beekman M (2017) Selective sweeps of mitochondrial DNA can drive the evolution of uniparental inheritance. Evolution 71:2090–2099
Google Scholar
Chuang JS, Rivoire O, Leibler S (2009) Simpson’s paradox in a synthetic microbial system. Science (N S) 323:272–275
Google Scholar
Conley C (1978) Isolated invariant sets and the Morse index. In: C. B. M. S. Regional Conference Series in Math, vol 38. American Mathematical Society
Constable GWA, Rogers T, McKane AJ, Tarnita CE (2016) Demographic noise can reverse the direction of deterministic selection. Proc Natl Acad Sci USA 113:E4745–E4754
Google Scholar
Coron C, Méléard S, Villemonais D (2019) Impact of demography on extinction/fixation events. J Math Biol 78:549–577
MathSciNet MATH Google Scholar
Cressman R, Tao Y (2014) The replicator equation and other game dynamics. Proc Natl Acad Sci USA 111:10810–10817
MathSciNet MATH Google Scholar
Cvijović I, Good BH, Desai MM (2018) The effect of strong purifying selection on genetic diversity. Genetics 209:1235–1278
Google Scholar
Doorn EV, Pollett P (2013) Quasi-stationary distributions for discrete state models. Eur J Oper Res 230:1–14
MathSciNet MATH Google Scholar
Datta RS, Gutteridge A, Swanton C, Maley CC, Graham TA (2013) Modelling the evolution of genetic instability during tumour progression. Evol Appl 6:20–33
Google Scholar
Dean A, Shnerb NM (2019) Stochasticity-induced stabilization in ecology and evolution, preprint is available at https://www.biorxiv.org/content/10.1101/725341v1
Der R, Epstein CL, Plotkin JB (2011) Generalized population models and the nature of genetic drift. Theor Popul Biol 80:80–99
MATH Google Scholar
de Visser JAGM, Krug J (2014) Empirical fitness landscapes and the predictability of evolution. Nat Rev Genet 15:480–490
Google Scholar
Dewar RC, Sherwin WB, Thomas E, Holleley CE, Nichols RA (2011) Predictions of single-nucleotide polymorphism differentiation between two populations in terms of mutual information. Mol Ecol 20:3156–66
Google Scholar
Doebeli M, Ispolatov Y, Simon B (2017) Towards a mechanistic foundation of evolutionary theory. Elife 6:e23804
Google Scholar
dos Reis M (2015) How to calculate the non-synonymous to synonymous rate ratio of protein-coding genes under the Fisher-Wright mutation-selection framework. Biol Lett 11:20141031
Google Scholar
Edwards AWF (2016) Analysing nature’s experiment: Fisher’s inductive theorem of natural selection. Theor Popul Biol 109:1–5
Google Scholar
Ewens WJ (2011) What is the gene trying to do? Br J Philos Sci 62:155–176
MathSciNet Google Scholar
Ewens WJ (2012) Mathematical population genetics (I. Theoretical introduction), 2nd edn. Interdisciplinary Applied Mathematics Series, vol 27. Springer, Berlin
Ewens WJ, Lessard S (2015) On the interpretation and relevance of the Fundamental Theorem of Natural Selection. Theor Popul Biol 104:59–67
MATH Google Scholar
Fathi A, Pageault P (2015) Aubry–Mather theory for homeomorphisms. Ergodic Theory Dyn Syst 35:1187–1207
MathSciNet MATH Google Scholar
Fathi A, Pageault P (2019) Smoothing Lyapunov functions. Trans Am Math Soc 371:1677–1700
MathSciNet MATH Google Scholar
Faure M, Schreiber S (2014) Quasi-stationary distributions for randomly perturbed dynamical systems. Ann Appl Probab 24:553–598
MathSciNet MATH Google Scholar
Ferguson TS (1967) Mathematical statistics: a decision theoretic approach. Academic Press, New York
MATH Google Scholar
Franke JE, Selgrade JF (1976) Abstract $\omega $-limit sets, chain recurrent sets, and basic sets for flows. Proc Am Math Soc 60:309–316
MathSciNet MATH Google Scholar
Franks J Notes on chain recurrence and Lyapunonv functions, (unpublished) lecture notes, available at https://arxiv.org/pdf/1704.07264.pdf
Garay BM, Hofbauer J (2003) Robust permanence for ecological differential equations, minimax, and discretizations. SIAM J Math Anal 34:1007–1039
MathSciNet MATH Google Scholar
Garcia V, Glassberg EC, Harpak A, Feldman MW (2018) Clonal interference can cause wavelet-like oscillations of multilocus linkage disequilibrium. J R Soc Interface 15:20170921
Google Scholar
Gavrilets S (2010) High-dimensional fitness landscapes and speciation. In: Pigliucci M, Müller GB (eds) Evolution–the extended synthesis. MIT Press, Cambridge, pp 45–79
Google Scholar
Gokhale CS, Traulsen A (2014) Evolutionary multiplayer games. Dyn Games Appl 4:468–488
MathSciNet MATH Google Scholar
Gompert Z (2016) Bayesian inference of selection in a heterogeneous environment from genetic time-series data. Mol Ecol 25:121–34
Google Scholar
Good BH, Martis S, Hallatschek O (2018) Directional selection limits ecological diversification and promotes ecological tinkering during the competition for substitutable resources, preprint is available at https://www.biorxiv.org/content/10.1101/292821v1
Goudenège L, Zitt P-A (2015) A Wright–Fisher model with indirect selection. J Math Biol 71:1411–1450
MathSciNet MATH Google Scholar
Grosjean N, Huillet T (2017) Wright–Fisher-like models with constant population size on average. Int J Biomath 10:1750078
MathSciNet MATH Google Scholar
Gyllenberg M, Silvestrov DS (2008) Quasi-stationary phenomena in nonlinearly perturbed stochastic systems, De Gruyter Expositions in Mathematics, vol 44. Walter de Gruyter
Harmand N, Federico V, Hindre T, Lenormand T (2019) Nonlinear frequency-dependent selection promotes long-term coexistence between bacteria species. Ecol Lett 22:1192–1202
Google Scholar
Harper M, Fryer D (2016) Stationary stability for evolutionary dynamics in finite populations. Entropy 18:316
Google Scholar
Hartle DL, Clark AG (2007) Principles of population genetics, 4th edn. Sinauer, Sunderland
Google Scholar
Hintze A, Olson RS, Adami C, Hertwig R (2015) Risk sensitivity as an evolutionary adaptation. Sci Rep 5:8242
Google Scholar
Hobolth A, Sireén J (2016) The multivariate Wright-Fisher process with mutation: moment-based analysis and inference using a hierarchical Beta model. Theor Popul Biol 108:36–50
MATH Google Scholar
Hoeffding W (1963) Probability inequalities for sums of bounded random variables. J Am Stat Assoc 58:13–30
MathSciNet MATH Google Scholar
Hofbauer J (1985) The selection mutation equation. J Math Biol 23:41–53
MathSciNet MATH Google Scholar
Hofbauer J (2011) Deterministic evolutionary game dynamics. In: Sigmund K (ed) Evolutionary game dynamics. Proc. Sympos. Appl. Math., vol. 69, AMS Short Course Lecture Notes. Amer. Math. Soc, pp 61–79
Hofbauer J, Sigmund K (1998) Evolutionary games and population dynamics. Cambridge University Press, Cambridge
MATH Google Scholar
Hofbauer J, Sigmund K (2003) Evolutionary game dynamics. Bull Am Math Soc (N S) 40:479–519
MathSciNet MATH Google Scholar
Hofrichter J, Jost J, Tran T (2017) Information geometry and population genetics. The mathematical structure of the Wright–Fisher model. Springer, Berlin
MATH Google Scholar
Houchmandzadeh B (2015) Fluctuation driven fixation of cooperative behavior. Biosystems 127:60–66
Google Scholar
Hummert S, Bohl K, Basanta D, Deutsch A, Werner S, Theißen G, Schroeter A, Schuster S (2014) Evolutionary game theory: cells as players. Mol BioSyst 10:3044
Google Scholar
Hummert S, Glock C, Lang SN, Hummert C, Skerka C, Zipfel PF, Germerodt S, Schuster S (2018) Playing “hide-and-seek’’ with factor H: game-theoretical analysis of a single nucleotide polymorphism. J R Soc Interface 15:20170963
Google Scholar
Hurley M (1998) Lyapunov functions and attractors in arbitrary metric spaces. Proc Am Math Soc 126:245–256
MathSciNet MATH Google Scholar
Hutchinson GE (1961) The paradox of the plankton. Am Nat 95:137–145
Google Scholar
Huttegger S, Skyrms B, Tarrès P, Wagner E (2014) Some dynamics of signaling games. Proc Natl Acad Sci USA 111:10873–10880
MathSciNet MATH Google Scholar
Hutson V, Schmitt K (1992) Permanence and the dynamics of biological systems. Math Biosci 111:1–71
MathSciNet MATH Google Scholar
Iglehart DL (1972) Extreme values in the $GI/ G/ 1$ queue. Ann Math Stat 43:627–635
Iizuka M (2010) Effective population size of a population with stochastically varying size. J Math Biol 61:359–375
MathSciNet MATH Google Scholar
Imhof LA, Nowak MA (2006) Evolutionary game dynamics in a Wright–Fisher process. J Math Biol 52:667–681
MathSciNet MATH Google Scholar
Iwasa Y, Michor F (2011) Evolutionary dynamics of intratumor heterogeneity. PLoS ONE 6:e17866
Google Scholar
John S, Seetharaman S (2016) Exploiting the adaptation dynamics to predict the distribution of beneficial fitness effects. PLoS ONE 11:e0151795
Google Scholar
Kang Y, Chesson P (2010) Relative nonlinearity and permanence. Theor Popul Biol 78:26–35
MATH Google Scholar
Karlin S, McGregor J (1964) Direct product branching processes and related Markov chains. Proc Natl Acad Sci USA 51:598–602
MathSciNet MATH Google Scholar
Karr AF (1975) Weak convergence of a sequence of Markov chains. Z Wahrsch Verw Gebiete 33:41–48
MathSciNet MATH Google Scholar
Kingman JFC (1961) A mathematical problem in population genetics. Proc Camb Phil Soc 57:574–582
MathSciNet MATH Google Scholar
Klebaner FC, Nerman O (1994) Autoregressive approximation in branching processes with a threshold. Stochastic Process Appl 51:1–7
MathSciNet MATH Google Scholar
Koopmann B, Müller J, Tellier A, Živković D (2017) Fisher–Wright model with deterministic seed bank and selection. Theor Popul Biol 114:29–39
MATH Google Scholar
Kroumi D, Lessard S (2015) Evolution of cooperation in a multidimensional phenotype space. Theor Popul Biol 102:60–75
MATH Google Scholar
Kurokawa S, Ihara Y (2009) Emergence of cooperation in public goods games. Proc R Soc B 276:1379–1384
Google Scholar
Landim C (2019) Metastable Markov chains. Probab Surv 16:143–227
MathSciNet MATH Google Scholar
Lawler GF (2006) Introduction to stochastic processes, 2nd edn. Chapman & Hall/CRC, Cambridge
MATH Google Scholar
Li D, Cheng F (2019) The extinction and persistence of tumor evolution influenced by external fluctuations and periodic treatment. Qual Theory Dyn Syst (to appear)
Li L, Chesson P (2016) The effects of dynamical rates on species coexistence in a variable environment: the paradox of the plankton revisited. Am Nat 188:46–58
Google Scholar
Losert V, Akin E (1983) Dynamics of games and genes: discrete versus continuous time. J Math Biol 17:241–251
MathSciNet MATH Google Scholar
Mahdipour-Shirayeh A, Darooneh AH, Long AD, Komarova NL, Kohandel M (2017) Genotype by random environmental interactions gives an advantage to non-favored minor alleles. Sci Rep 7:5193
Google Scholar
Mandel SPH (1959) The stability of a multiple allelic system. Heredity 13:289–302
Google Scholar
McAvoy A, Fraiman N, Hauert C, Wakeley J, Nowak MA (2018) Public goods games in populations with fluctuating size. Theor Popul Biol 121:72–84
MATH Google Scholar
Méléard S, Villemonais D (2012) Quasi-stationary distributions and population processes. Probab Surv 9:340–410
MathSciNet MATH Google Scholar
Nagylaki T (1986) The Gaussian approximation for random genetic drift. In: Karlin S, Nevo E (eds) Evolutionary processes and theory. Academic Press, New York, pp 629–642
Google Scholar
Nagylaki T (1990) Models and approximations for random genetic drift. Theor Popul Biol 37:192–212
MathSciNet MATH Google Scholar
Norton DE (1995) The fundamental theorem of dynamical systems. Comment Math Univ Carolin 36:585–597
MathSciNet MATH Google Scholar
Nourmohammad A, Otwinowski J, Plotkin JB (2016) Host-pathogen coevolution and the emergence of broadly neutralizing antibodies in chronic infections. PLoS Genet 12:e1006171
Google Scholar
Nowak MA, Sigmund K (2004) Evolutionary dynamics of biological games. Science 303:793–799
Google Scholar
Obolski U, Ram Y, Hadany L (2018) Key issues review: evolution on rugged adaptive landscapes. Rep Prog Phys 81:012602
Google Scholar
Orr HA (2009) Fitness and its role in evolutionary genetics. Nat Rev Genet 10:531–539
Google Scholar
Pageault P (2009) Conley barriers and their applications: chain-recurrence and Lyapunov functions. Topol Appl 156:2426–2442
MathSciNet MATH Google Scholar
Panageas I, Srivastava P, Vishnoi NK (2016) Evolutionary dynamics in finite populations mix rapidly, In: Krauthgamer R (ed) Proceedings of the twenty-seventh annual ACM-SIAM symposium on discrete algorithms. SIAM, pp 480–497
Papadimitriou C, Piliouras G (2016) From Nash equilibria to chain recurrent sets: solution concepts and topology. In: Sudan M (ed) ITCS’16-Proceedings of the 2016 ACM conference on innovations in theoretical computer science. ACM, New York, pp 227–235
Park S-C, Krug J (2007) Clonal interference in large populations. Proc Natl Acad Sci USA 104:18135
Google Scholar
Park HJ, Traulsen A (2017) Extinction dynamics from metastable coexistences in an evolutionary game. Phys Rev E 96:042412
Google Scholar
Parra-Rojas C, Challenger JD, Fanelli D, McKane AJ (2014) Intrinsic noise and two-dimensional maps: Quasicycles, quasiperiodicity, and chaos. Phys Rev E 90:032135
Google Scholar
Perko L (1991) Differential equations and dynamical systems, 3rd edn. Springer, Berlin
MATH Google Scholar
Proulx SR (2011) The rate of multi-step evolution in Moran and Wright–Fisher populations. Theor Popul Biol 80:197–207
MATH Google Scholar
Queller DC (2017) Fundamental theorems of evolution. Am Nat 189:345–353
Google Scholar
Rao CR (2001) Linear statistical inference and its applications, 2nd edn. Wiley, New York
Google Scholar
Rohlfs M, Hoffmeist TS (2003) An evolutionary explanation of the aggregation model of species coexistence. Proc Biol Sci 270:33–35
Google Scholar
Ruppin E, Papin JA, de Figueiredo LF, Schuster S (2010) Metabolic reconstruction, constraint-based analysis and game theory to probe genomescale metabolic networks. Curr Opin Biotechnol 21:502–510
Google Scholar
Samoilov MS, Arkin AP (2006) Deviant effects in molecular reaction pathways. Nat Biotechnol 24:1235–40
Google Scholar
Sandholm WH (2010) Population games and evolutionary dynamics, series on economic learning and social evolution, vol 8. MIT Press, Cambridge
Google Scholar
Schneider KA (2010) Maximization principles for frequency-dependent selection II: the one-locus multiallele case. J Math Biol 61:95–132
MathSciNet MATH Google Scholar
Schreiber SJ, Levine JM, Godoy O, Kraft NJB, Hart SP (2018) Does deterministic coexistence theory matter in a finite world? Insights from serpentine annual plants, preprint is available at https://www.biorxiv.org/content/10.1101/290882v1
Sigmund K (1986) A survey of replicator equations. In: Casti JL, Karlqvist A (eds) Complexity, language, and life: mathematical approaches, Biomathematics, vol 16. Springer, Berlin, pp 88–104
Silverman S (1992) On maps with dense orbits and the definition of chaos. Rocky Mountain J Math 22:353–375
MathSciNet MATH Google Scholar
Simonsen Speed N, Balding DJ, Hobolth A (2019) A general framework for moment-based analysis of genetic data. J Math Biol 78:1727–1769
MathSciNet MATH Google Scholar
Spichtig M, Kawecki TJ (2004) The maintenance (or not) of polygenic variation by soft selection in heterogeneous environments. Am Nat 164:70–84
Google Scholar
Steinsaltz D, Evans SN (2004) Markov mortality models: implications of quasistationarity and varying initial distributions. Theor Popul Biol 65:319–337
MATH Google Scholar
Stephens C (2014) Selection, drift, and the “forces’’ of evolution. Philos Sci 71:550–570
Google Scholar
Svensson EI, Connallon T (2019) How frequency-dependent selection affects population fitness, maladaptation and evolutionary rescue. Evol Appl 12:1243–1258
Google Scholar
Tarnita CE (2017) The ecology and evolution of social behavior in microbes. J Exp Biol 220:18–24
Google Scholar
Taylor C, Nowak M (2006) Evolutionary game dynamics with non-uniform interaction rates. Theor Popul Biol 69:243–252
MATH Google Scholar
Traulsen A, Claussen JC, Hauert C (2005) Coevolutionary dynamics: from finite to infinite populations. Phys Rev Lett 95:238701
Google Scholar
Traulsen A, Reed FA (2012) From genes to games: cooperation and cyclic dominance in meiotic drive. J Theor Biol 299:120–125
MathSciNet MATH Google Scholar
Vallier M, Abou Chakra M, Hindersin L, Linnenbrink M, Traulsen A, Baines JF (2017) Evaluating the maintenance of disease-associated variation at the blood group-related gene B4galnt2 in house mice. BMC Evol Biol 17:187
Google Scholar
Wasserman L (2004) All of statistics: a concise course in statistical inference. Springer texts in statistics. Springer, Berlin
MATH Google Scholar
Waxman D, Loewe L (2010) A stochastic model for a single click of Muller’s ratchet. J Theor Biol 264:1120–1132
MathSciNet MATH Google Scholar
West SA, Diggle SP, Buckling A, Gardner A, Griffin AS (2007) The social lives of microbes. Annu Rev Ecol Evol Syst 38:53–77
Google Scholar
Xu S, Jiao S, Jiang P, Ao P (2014) Two-time-scale population evolution on a singular landscape. Phys Rev E 89:012724
Google Scholar
Zeng Q, Wu S, Sukumaran J, Rodrigo A (2017) Models of microbiome evolution incorporating host and microbial selection. Microbiome 5:127
Google Scholar
Zhang Y, Fu F, Wu T, Xie G, Wang L (2013) A tale of two contribution mechanisms for nonlinear public goods. Sci Rep 3:2021
Google Scholar
Zhou D, Qian H (2011) Fixation, transient landscape, and diffusion dilemma in stochastic evolutionary game dynamics. Phys Rev E 84:31907
Google Scholar
Zinger T, Gelbart M, Miller D, Pennings PS, Stern A (2019) Inferring population genetics parameters of evolving viruses using time-series data. Virus Evol 5:vez011
Google Scholar
Zomorrodi AR, Segrè D (2017) Genome-driven evolutionary game theory helps understand the rise of metabolic interdependencies in microbial communities. Nat Commun 8:1563
Google Scholar

Download references

Acknowledgements

We would like to thank Ozgur Aydogmus for a number of illuminating discussions. This work was supported by the Allen Endowed Chair in Nutrition and Chronic Disease Prevention, and the National Institute of Health (R35-CA197707 and P30E5029067). A. R. is supported by the Nutrition, Statistics & Bioinformatics Training Grant T32CA090301.

Author information

Authors and Affiliations

Department of Statistics, Texas A &M University, College Station, TX, 77843, USA
Alexander Roitershtein
Occidental Petroleum Corporation, Houston, TX, 77046, USA
Reza Rastegar
Department of Nutrition - Program in Integrative Nutrition & Complex Diseases, Texas A &M University, College Station, TX, 77843, USA
Robert S. Chapkin
Department of Veterinary Physiology and Pharmacology, Texas A &M University, College Station, TX, 77843, USA
Ivan Ivanov

Authors

Alexander Roitershtein
View author publications
You can also search for this author in PubMed Google Scholar
Reza Rastegar
View author publications
You can also search for this author in PubMed Google Scholar
Robert S. Chapkin
View author publications
You can also search for this author in PubMed Google Scholar
Ivan Ivanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ivan Ivanov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

1.1 Appendix A. Absorption states and stochastic equilibria

In this appendix we explore the connection between basic geometric properties of the vector field ${\varvec{\Gamma }}$ and the structure of closed communication classes of the Markov chain $\textbf{X}^{(N)}.$ In other words, we study the configuration of absorbing states and identify possible supports of stationary distributions according to the properties of ${\varvec{\Gamma }}$. We remark that the results of this section are included for the sake of completeness and are not used anywhere else in the remainder of the paper. We have made the section self-contained and included the necessary background on the Markov chain theory.

The main result of the appendix is stated in Theorem 7.3 where a geometric characterization of recurrent classes is described. The general structure of recurrent classes of $\textbf{X}^{(N)}$ in this theorem holds universally for the Wright–Fisher model (7) with any update function ${\varvec{\Gamma }}.$ Thus, in Appendix A we deviate from assumption (10) that we have used throughout the paper and to which we will return in the subsequent Appendix B, and adopt instead the following general one:

Assumption 7.1

$\textbf{X}^{(N)}$ is a Markov chain on $\Delta _M$ with transition kernel defined by (7) and an arbitrary update rule ${\varvec{\Gamma }}:\Delta _M\rightarrow \Delta _M.$

Recall (5) and (13). The discussion in this section is largely based on the fact that

$$\begin{aligned} P_N(\textbf{x},\textbf{y})>0 \quad \Longleftrightarrow {{\mathcal {C}}}(\textbf{y})\subset {{\mathcal {C}}}\big ({\varvec{\Gamma }}(\textbf{x})\big )\quad \Longleftrightarrow \quad P_N(\textbf{x},\textbf{z})>0~\text{ for } \text{ all }~\textbf{z}\in \Delta _{[{{\mathcal {C}}}(\textbf{y})],M}. \nonumber \\ \end{aligned}$$

(64)

This simple observation allows one to relate the geometry of zero patterns in the vector field to communication properties, and hence asymptotic behavior, of $\textbf{X}^{(N)}.$

From the point of view of the model’s adequacy in potential applications, (64) is a direct consequence of the fundamental assumption of the Wright–Fisher model that particles in the population update their phenotypes independently of each other and follow the same stochastic protocol. Mathematically, this assumption is expressed in identity (45), whose right-hand side is a sum of independent and identically distributed random indicators. To appreciate this feature of the Wright–Fisher model, we remark that while (64) holds true for an arbitrary model (7), that feature does not hold true, in general, for two most natural generalization of the Wright–Fisher model, namely the Cannings exchangeable model (Ewens 2012) and the pure-drift Generalized Wright–Fisher process of Der et al. (2011).

Recall that two states $\textbf{x},\textbf{y}\in \Delta _{M,N}$ of the finite-state Markov chain $\textbf{X}^{(N)}$ are said to communicate if there exist $i,j\in {{\mathbb {N}}}$ such that $P_N^i(\textbf{x},\textbf{y})>0$ and $P_N^j(\textbf{y},\textbf{x})>0.$ If $\textbf{x}$ and $\textbf{y}$ communicate, we write $\textbf{x}\leftrightarrow \textbf{y}.$ The binary relation $\leftrightarrow $ partitions the state space $\Delta _{M,N}$ into a finite number of disjoint equivalence classes, namely $\textbf{x},\textbf{y}$ belong to the same class if and only if $\textbf{x}\leftrightarrow \textbf{y}.$ A state $\textbf{x}\in \Delta _{M,N}$ is called absorbing if $P_N(\textbf{x},\textbf{x})=1,$ recurrent if $\textbf{X}^{(N)}$ starting at $\textbf{x}$ returns to $\textbf{x}$ infinitely often with probability one, and transient if, with probability one, $\textbf{X}^{(N)}$ starting at $\textbf{x}$ revisits $\textbf{x}$ only a finite number of times. It turns out that each communication class consists of either (1) exactly one absorbing state; or (2) recurrent non-absorbing states only; or (3) transient states only. No other behavior is possible due to zero–one laws enforced by the Markov property (Lawler 2006). A communication class is called closed or recurrent if it belongs to the first or second category, and transient if it belongs to the third one. The chain is called irreducible if there is only one (hence, recurrent) communication class, and aperiodic if $P_N^j(\textbf{x},\textbf{y})>0$ for some $j\in {{\mathbb {N}}}$ and all $\textbf{x},\textbf{y}\in \Delta _{M,N}.$ If $C_1,\ldots ,C_r$ are disjoint recurrent classes of $\textbf{X}^{(N)},$ then the general form of its stationary distribution is $\pi (\textbf{x})=\sum _{i=1}^r \alpha _i \pi _i(\textbf{x})$ where $\pi _i$ is the unique stationary distribution supported on $C_i$ (that is $\pi _i(\textbf{x})>0$ if and only if $\textbf{x}\in C_i$) and $\alpha _i$ are arbitrary non-negative numbers adding up to one. If the chain is irreducible, then the stationary distribution is unique and is strictly positive on the whole state space $\Delta _{M,N}.$ An irreducible finite-state Markov chain is aperiodic if and only if $\pi (\textbf{x})=\lim _{k\rightarrow \infty } P_N^k(\textbf{y},\textbf{x})$ for any $\textbf{x},\textbf{y}\in \Delta _{M,N},$ where $\pi $ is the unique stationary distribution of the chain (Lawler 2006).

Let ${{\mathcal {A}}}_N \subset \Delta _M$ denote the set of absorbing states of the Markov chain $\textbf{X}^{(N)}.$ It readily follows from (7) that ${{\mathcal {A}}}_N\subset V_M,$ and a vertex $\textbf{e}_j\in V_M$ is an absorbing (fixation) state if and only if ${\varvec{\Gamma }}(\textbf{e}_j)=\textbf{e}_j.$ In particular, ${{\mathcal {A}}}_N$ depends only on the update rule ${\varvec{\Gamma }}$ and is independent of the population size N. We summarize this observation as follows:

$$\begin{aligned} {{\mathcal {A}}}_N=\big \{\textbf{e}_j\in V_M:{\varvec{\Gamma }}(\textbf{e}_j)=\textbf{e}_j\big \}\qquad \qquad \forall \, N\in {{\mathbb {N}}}. \end{aligned}$$

(65)

The following lemma asserts that if $\textbf{e}_j$ is an absorbing state of $\textbf{X}^{(N)},$ then the type j cannot be mixed into any stochastic equilibrium of the model, namely any other than $\textbf{e}_j$ state $\textbf{x}\in \Delta _{M,N}$ with $x(j)>0$ is transient.

Lemma 7.2

Let Assumption 7.1 hold. If $\textbf{x}\in \Delta _{M,N}$ is a recurrent state of the Markov chain $\textbf{X}^{(N)}$ and $x(j)>0,$ then $\textbf{e}_j$ is a recurrent state belonging to the same closed communication class as $\textbf{x}.$

Proof

By (65), a state $\textbf{x}\in \Delta _{M,N}$ cannot be absorbing if $\textbf{x}\ne \textbf{e}_j$ and $x(j)>0.$ If $\textbf{x}$ is recurrent and non-absorbing, then there is a state $\textbf{y}\in \Delta _{M,N}$ that belongs to the same closed communication class as $\textbf{x}$ and such that $P_N(\textbf{y},\textbf{x})>0.$ In view of (7) this implies that $\Gamma _j(\textbf{y})>0$ and hence $P_N(\textbf{y},\textbf{e}_j)>0.$ Since the state $\textbf{y}$ is recurrent, $\textbf{e}_j$ must belong to the same communication class as $\textbf{y}$ and $\textbf{x},$ and in particular is recurrent. The proof is complete. $\square $

Recall (5). The same argument as in the above lemma shows that if an interior point $\textbf{x}$ of a simplex $\Delta _{[J],N}$ is recurrent for $\textbf{X}^{(N)},$ then the whole simplex $\Delta _{[J],N}$ belongs to the (recurrent) closed communication class of $\textbf{x}.$ One can rephrase this observation as follows:

Theorem 7.3

Let Assumption 7.1 hold. Then, any recurrent class of the Markov chain $\textbf{X}^{(N)}$ has the form of a simplicial complex $\bigcup _\ell \Delta _{[J_\ell ],N},$ where $J_\ell $ are (possibly overlapping) subsets (possibly singletons) of $S_M.$

With applications in mind, we collect some straightforward implications of this proposition in the next corollary.

Corollary 7.4

Let Assumption 7.1 hold. Then:

(i)
If there is a stationary distribution $\pi $ of $\textbf{X}^{(N)}$ and $J\subset S_M$ such that $\pi (\textbf{x})>0$ for some interior point $\textbf{x}\in \Delta _{[J],N}^\circ ,$ then $\pi (\textbf{y})>0$ for all $\textbf{y}\in \Delta _{[J],N}.$ Furthermore, the following holds in this case unless $\Gamma (\textbf{y})\cdot \textbf{y}=0$ (that is, $\textbf{y}$ and ${\varvec{\Gamma }}(\textbf{y})$ are orthogonal) for all $\textbf{y}\in \Delta _{[J],N}:$
1. (a)
  Let C be the (closed) communication class to which the above $\textbf{x}$ (and hence the entire $\Delta _{[J],N}$) belongs. Then the Markov chain $\textbf{X}^{(N)}$ restricted to C is aperiodic.
2. (b)
  Let T be the first hitting time of C, namely $T=\inf \{k\in {{\mathbb {Z}}}_+:\textbf{X}^{(N)}_k\in C\}.$ As usual, we use here the convention that $\inf \emptyset =+\infty .$ Then
  $$\begin{aligned} \pi (\textbf{y})=\lim _{k\rightarrow \infty } P(\textbf{X}^{(N)}_k=\textbf{y}\mid \textbf{X}^{(N)}_0=\textbf{z},T<\infty ) \end{aligned}$$
  for all $\textbf{y}\in C$ and $\textbf{z}\in \Delta _{M,N}.$ In particular, $\pi (\textbf{y})=\lim _{k\rightarrow \infty }P_N^k(\textbf{z},\textbf{y})$ whenever $\textbf{y},\textbf{z}\in C.$
(ii)
A stationary distribution $\pi $ of $\textbf{X}^{(N)}$ such that $\pi (\textbf{x})>0$ for some interior point $\textbf{x}\in \Delta _{M,N}^\circ $ exists if and only if $\textbf{X}^{(N)}$ is irreducible and aperiodic, in which case the stationary distribution is unique and $\pi (\textbf{y})=\lim _{k\rightarrow \infty } P^k_N(\textbf{x},\textbf{y})>0$ for any $\textbf{x},\textbf{y}\in \Delta _{M,N}.$
(iii)
Let $B=\big \{\exists ~k\in {{\mathbb {N}}}: \textbf{X}^{(N)}_k\in \partial \big (\Delta _{M,N}\big )~\text{ for } \text{ all }~m\ge k\big \}$ be the event that the trajectory of $\textbf{X}^{(N)}$ reach eventually the boundary of the simplex and stays there forever. Then either the Markov chain $\textbf{X}^{(N)}$ is irreducible and aperiodic or $P(B)=1$ for any state $\textbf{X}^{(N)}_0.$
(iv)
If ${\varvec{\Gamma }}(\textbf{e}_j)=\textbf{e}_j$ for some $j\in S_M,$ then $\lim _{k\rightarrow \infty } X^{(N)}_k(j)\in \{0,1\},$ a. s., for any initial distribution of $\textbf{X}^{(N)}.$

We will only prove the claim in (a) of part (i), since the rest of the corollary is an immediate result of a direct combination of Theorem 7.3 and general properties of finite-state Markov chains reviewed in the beginning of the section.

Proof of Corollary 7.4-(i)-a

Suppose that ${\varvec{\Gamma }}(\textbf{y})\cdot \textbf{y}>0$ for some $\textbf{y}\in \Delta _{[J],N}.$ Let $\textbf{z}$ be the projection of $\textbf{y}$ into $\Delta _{[I],M},$ where $I:=\{j\in J:y(j)\Gamma _j(\textbf{y})>0\}.$ That is,

$$\begin{aligned} z(i)= \left\{ \begin{array}{ll} y(i)&{}\text{ if }~y(i)\Gamma _i(\textbf{y})>0, \\ 0&{}\text{ otherwise }. \end{array} \right. \end{aligned}$$

Then $\textbf{z}\in \Delta _{[I],M}^\circ $ and $P_N(\textbf{z},\textbf{z})>0.$ Since $\textbf{z}\in \Delta _{[J],N},$ by Theorem 7.3$\textbf{z}$ belongs to the same closed communication class as $\textbf{x}.$ The asserted aperiodicity of $\textbf{X}^{(N)}$ restricted to C is a direct consequence of the existence of the 1-step communication loop at $\textbf{z}$ (Lawler 2006). $\square $

Example 6

(Harper and Fryer 2016) If $\Gamma _j(\textbf{x})=x(j)^\alpha f(\textbf{x})$ for some $j\in S_M,$ function $f:\Delta _M\rightarrow {{\mathbb {R}}}_+$ such that $f(\textbf{e}_j)=1,$ and a constant $\alpha >0,$ then ${\varvec{\Gamma }}(\textbf{e}_j)=\textbf{e}_j.$ By part (iv) of Corollary 7.4, this fact alone suffices to conclude that $\lim _{k\rightarrow \infty } X^{(N)}_k(j)\in \{0,1\},$ a. s., for this specific type j.

A simple sufficient condition for the Markov chain $\textbf{X}^{(N)}$ to be irreducible and aperiodic is that ${\varvec{\Gamma }}(\Delta _M)\subset \Delta _M^\circ ,$ in which case $P_N(\textbf{x},\textbf{y})>0$ for any $\textbf{x},\textbf{y}\in \Delta _{M,N}.$ If ${\varvec{\Gamma }}\big (\Delta _{M,N}^\circ \big )\subset \Delta _{M,N}^\circ $ but ${\varvec{\Gamma }}(\textbf{e}_j)=\textbf{e}_j$ for all $j\in S_M,$ the model can be viewed as a multivariate analog of what is termed in Chalub and Souza (2017) as a Kimura class of Markov chains. The latter describes evolution dynamics of a population that initially consists of two phenotypes, each one eventually either vanishes or takes over the entire population.

1.2 Appendix B. Permanence of the mean-field dynamics

In this appendix, we focus on Wright–Fisher models with a permanent mean-field dynamics ${\varvec{\Gamma }}.$ The permanence means that the boundary of the state space $\partial (\Delta _M)$ is a repeller for the mean-field dynamics, implying that all types under the mean-field dynamics will ultimately survive (Garay and Hofbauer 2003; Hofbauer and Sigmund 1998; Hutson and Schmitt 1992; Kang and Chesson 2010). This condition is particularly relevant to the stochastic Wright–Fisher model because, as stated in Appendix A, all absorbing states (possibly an empty set) of the Markov chain $\textbf{X}^{(N)}$ lie within the boundary set $\partial (\Delta _M).$ Of course, in the absence of mutations, any vertex of $\Delta _M$ is an absorbing state for the Markov chain $\textbf{X}^{(N)},$ and the latter is going to eventually become fixed on one of the absorbing states with probability one (see Appendix A for formal details). However, we showed in Sect. 4 that a permanent mean-field dynamics induces metastability and a prolonged coexistence of types.

Definition 4

(Hofbauer and Sigmund 1998) ${\varvec{\Gamma }}$ is permanent if there exists $\delta >0$ such that

$$\begin{aligned} \liminf _{k\rightarrow \infty } \psi _k(i)>\delta \qquad \forall \,i\in S_M,{{\varvec{\psi }}}_0\in \Delta _M^\circ . \end{aligned}$$

See Garay and Hofbauer (2003), Schreiber et al. (2018) and references therein for sufficient conditions for the permanence. It turns out that under reasonable additional assumptions, ${\varvec{\Gamma }}$ is permanent for the fitness defined in either (11) or (12). More precisely, we have (for a more general version of the theorem, see Garay and Hofbauer (2003, Theorem 11.4) and Hofbauer and Sigmund (2003, Theorem 5):

Theorem 3

(Garay and Hofbauer 2003) Let $\textbf{A}$ be an $M\times M$ matrix which satisfies the following condition: There exists $\textbf{y}\in \Delta _M^\circ $ such that

$$\begin{aligned} \textbf{y}^T\textbf{A} \textbf{z}> \textbf{z}^T\textbf{A}{} \textbf{z} \end{aligned}$$

for all $\textbf{z}\in \partial (\Delta _M)$ solving the fixed-point equation $\textbf{z}=\frac{1}{\textbf{z}^T\textbf{A}{} \textbf{z}}\big (\textbf{z}\circ \textbf{A}{} \textbf{z}\big ).$

Then:

(i)
There exists $\omega _0>0$ such that ${\varvec{\Gamma }}$ given by (10) and (11) is permanent for all $\omega \in (0,\omega _0).$
(ii)
${\varvec{\Gamma }}$ given by (10) and (12) is permanent for all $\eta >0.$

Recall Definitions 1 and 2. The setting of the theorem provides a natural within our context example of a non-constant complete Lyapunov function.

Example 7

Suppose that ${\varvec{\Gamma }}$ is a replicator function defined in (10). If $\textbf{A}$ is an $M\times M$ matrix that satisfies the conditions of Theorem 3 and ${{\varvec{\varphi }}}$ in (10) is defined by either (11) with small enough $\omega >0$ or by (12), then ${{\mathcal {R}}}({\varvec{\Gamma }}) \subset K\bigcup \partial (\Delta _M),$ where K is a compact subset of $\Delta _M^\circ .$ In particular, the complete Lyapunov function h constructed in Theorem 2 (p. 14) is strictly increasing on the non-empty open set $\Delta _M^\circ \backslash K.$ Notice that while the average $\textbf{x}\cdot {{\varvec{\varphi }}}(\textbf{x})$ may not be a complete Lyapunov function for ${\varvec{\Gamma }},$ the average $h(\textbf{x})$ of the reproductive fitness $\widetilde{{\varvec{\varphi }}}(\textbf{x}):=h(\textbf{x}){{\varvec{\varphi }}}(\textbf{x})$ is.

In Sect. 4.3 we considered a stochastic Wright–Fisher model with a permanent mean-field map ${\varvec{\Gamma }}.$ This model has a global attractor, consisting of a unique equilibrium point. A sufficient condition for such a dynamics is given by the following theorem (Losert and Akin 1983). Recall (20).

Theorem 4

(Losert and Akin 1983) Let Assumption 4.4 hold. Then the following holds true:

(a)
Recall ${{\varvec{\psi }}}_k$ from (3). If ${{\varvec{\psi }}}_0\in \Delta _M^\circ ,$ then $\lim _{k\rightarrow \infty } {{\varvec{\psi }}}_k={{\varvec{\chi _{eq}}}}.$
(b)
${\varvec{\Gamma }}$ is a diffeomorphism.

Note that the weak selection condition (cf. Antal et al. 2009; Boenkost et al. 2019; Imhof and Nowak 2006) imposed in part (i) of Theorem 3 is not required in the conditions of Theorem 4.

The evolutionary games with $\textbf{A}=\textbf{A}^T$ are called partnership games (Hofbauer 2011). We refer to Hummert et al. (2018), Huttegger et al. (2014), Traulsen and Reed (2012) for applications of partnership games in evolutionary biology. Condition (32) implies the existence of a unique evolutionary stable equilibrium for the evolutionary game defined by the payoff matrix $\textbf{A}.$ For a symmetric reversible matrix $\textbf{A}$ this condition holds if and only if $\textbf{A}$ has exactly one positive eigenvalue (Kingman 1961; Mandel 1959). Condition (33) ensures that the equilibrium is an interior point. Note that since the equilibrium is unique, the conditions of Theorem 3 are automatically satisfied, that is ${{\varvec{\psi }}}_k$ is permanent. Finally, under the conditions of Theorem 4, the average payoff $\textbf{x}^T\textbf{A}{} \textbf{x}$ is a Lyapunov function (Kingman 1961; Mandel 1959):

$$\begin{aligned} {{\varvec{\psi }}}_k^T\textbf{A}{{\varvec{\psi }}}_k<{{\varvec{\psi }}}_{k+1}^T\textbf{A}{{\varvec{\psi }}}_{k+1}<{{\varvec{\chi _{eq}}}}^T \textbf{A}{{\varvec{\chi _{eq}}}}\qquad \qquad \forall \,k\in {{\mathbb {Z}}}_+, \end{aligned}$$

as long as ${{\varvec{\psi }}}_0\in \Delta _M^\circ $ and ${{\varvec{\psi }}}_0\ne {{\varvec{\chi _{eq}}}}.$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Roitershtein, A., Rastegar, R., Chapkin, R.S. et al. Extinction scenarios in evolutionary processes: a multinomial Wright–Fisher approach. J. Math. Biol. 87, 63 (2023). https://doi.org/10.1007/s00285-023-01993-7

Download citation

Received: 06 December 2019
Revised: 16 August 2023
Accepted: 31 August 2023
Published: 26 September 2023
DOI: https://doi.org/10.1007/s00285-023-01993-7

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Extinction scenarios in evolutionary processes: a multinomial Wright–Fisher approach

Abstract

Similar content being viewed by others

Towards a replicator dynamics model of age structured populations

Moran-type bounds for the fixation probability in a frequency-dependent Wright–Fisher model

Equilibrium in Wright–Fisher Models of Population Genetics

1 Introduction

Theorem 1

Theorem 2

Theorem 3

2 The model

2.1 Geometric setup, notation

2.2 Formulation of the model

Example 1

3 Mean-field model: fitness landscape

3.1 Reproductive fitness and replicator dynamics

Proposition 3.1

Theorem 1

3.2 Maximization principle for the reproductive fitness

Definition 1

Theorem 3.2

Definition 2

Theorem 2

Example 2

Example 3

Proposition 3.3

3.3 Incorporating mutations

Assumption 3.4

Definition 3

Proposition 3.5

Proof

4 Stochastic dynamics: main results

4.1 Gaussian approximation of the Wright–Fisher model

Theorem 4.1

4.2 Difference equation approximation

Theorem 4.2

Theorem 4.3

4.3 Metastability and elimination of types

Assumption 4.4

Proposition 4.5

Theorem 4.6

Remark 1

Example 4

Example 5

Proposition 4.7

Corollary 4.8

Conjecture 4.9

Conjecture 4.10

5 Proofs

5.1 Proof of Theorem 4.1

5.2 Proof of Theorem 4.2

5.3 Proof of Theorem 4.3

5.4 Proof of Proposition 4.5

Lemma 5.1

Proof of Lemma 5.1

Corollary 5.2

Proof of Corollary 5.2

Remark 2

5.5 Proof of Theorem 4.6

Proof of (i)

Lemma 5.3

Proof of Lemma 5.3

Proof of (ii)

5.6 Proof of Proposition 4.7

Remark 3

5.7 Proof of Corollary 4.8

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendices

1.1 Appendix A. Absorption states and stochastic equilibria

Assumption 7.1

Lemma 7.2

Proof