1 Introduction

The science of biodiversity currently faces the challenge of understanding how ecological processes shape evolutionary change, and reciprocally how evolution affects the structure and function of ecological systems (Schoener 2011). Such eco-evolutionary feedbacks determine the dynamics of so-called adaptive traits -quantitative characters that are heritable yet mutable from parent to offspring (Dieckmann and Law 1996; Metz et al. 1996). Under the combined assumptions of large population and rare mutation scalings, the time evolution of an adaptive trait can be described as a sequence of mutant invasions, each being driven by positive selection in the ecological context set by the ‘resident’ value of the adaptive trait (Metz et al. 1992). The resulting evolutionary model is a jump process called the trait substitution sequence (TSS): every new mutant trait either goes extinct, or replaces the resident, causing the TSS to jump from the former resident population equilibrium to a new equilibrium (Metz et al. 1996; Champagnat 2006; Champagnat et al. 2008). In population genetics, these jumps are known as selective sweeps (Barton 1998; Stephan et al. 1992). Previous works support the view that the TSS as a model of long-term phenotypic evolution is relatively insensitive to the details of the genetic determination of the trait (Lloyd 1977; Christiansen and Loeschcke 1980; Hammerstein 1996; Weissing 1996; Matessi and Schneider 2009; Eshel et al. 1998).

Whereas eco-evolutionary feedbacks can result in variation of adaptive traits among populations (and even within populations when evolutionary branching occurs, Geritz et al. 1998), much of the molecular diversity measured by population geneticists involve DNA sequences of no known adaptive value, i.e. selectively neutral. A neutral sequence that is physically linked in the genome to the sequence that codes for the adaptive trait is called a marker of that trait. A longstanding question in evolutionary theory is understanding how variation in such molecular markers evolves, and how patterns of neutral molecular evolution can be used to infer the history of trait mutation that have driven past adaptation.

When adaptive mutations are rare, adaptation proceeds as a series of selective sweeps: a trait mutation occurs while the population is monomorphic for the trait, and increases rapidly in frequency toward fixation. Following on from Kojima and Schaffer (1967), Smith and Haigh (1974) pointed out that selective sweeps purge genetic variation at linked sites: a particular marker allele goes to fixation as a consequence of linkage with the selected allele, a phenomenon they dubbed the ‘hitchhiking effect’. Maynard Smith and Haigh’s deterministic model was revisited in a stochastic approach by Ohta and Kimura (1975). These seminal studies of hitchhiking focused on the short-term dynamics of an interaction between two alleles at the locus under selection and two alleles at the neutral locus. Long-term dynamics were considered first by Kaplan et al. (1989) who developed a stochastic model for finite populations to describe the effect of recurrent hitchhiking. In order to describe stationary levels of nucleotide diversity at the marker locus, they used the infinite site model and a coalescent approach under the assumption of constant population size and constant selection coefficients. This has generated an abundant theoretical literature on modeling the impact of selection on neutral polymorphism (Barton 2000; Etheridge et al. 2006; Durrett and Schweinsberg 2004 and references therein). Recent deterministic models have relaxed the assumption of constant selection either because of the presence of genetic backgrounds (e.g. assuming a quantitative trait, Chevin and Hospital 2008) or in the case of a parasite, because of the complexity of the demographic events involved in the life cycle (Schneider and Kim 2010). All previous models assume constant population size and constant selection, or that the population size is independent of the selective value of the individuals.

In this article, our goal is to relax these key assumptions. Under general ecological scenarios, eco-evolutionary feedbacks operate: as the adaptive trait evolves, population size and selection co-vary. The eco-evolutionary process of adaptive trait and neutral marker dynamics requires a rigorous mathematical framework, the foundation of which we establish here. We start with a ‘microscopic’, individual-based model where individuals have two heritable characteristics: (1) an adaptive trait that influences their intrinsic demographic rates and ecological interactions, and (2) a genetic marker that has no demographic or ecological effects, hence, is selectively neutral. This work focuses entirely on asexual populations and short genomic regions that remain perfectly linked to the loci under selection, neglecting recombination. The population is described by a measure according to which each individual is represented by a Dirac mass that weights its characters. This leads to study the population eco-evolutionary dynamics as a measure-valued stochastic process.

The dynamics are driven by competition between individuals, asexual reproduction without or with mutation, and death. Variation in population size and selection as the trait evolves are mediated by the demographic effects of change in the trait. These effects are expected to influence the generation and maintenance of neutral variation.

The effect of mutation on the marker can be continuous or discrete. Our framework thus encompasses a variety of conventional mutation models such as the two-alleles model, the stepwise mutation model, and the continuous state mutation model. Our distinctive assumption here is that the marker mutation process is much faster than the trait mutation process but much slower than the ecological time-scale of birth and death events. This is supported by the fact that most mutations are neutral or nearly neutral (such as mutations involved in microsatellite variation). Therefore, there are three time scales in the model: the fast ecological time scale of birth and death events, the slow time scale of trait mutation, and an intermediate time scale of marker mutation. We study the joint process of trait and marker dynamics on the trait mutation time scale.

We are interested in limit theorems when the population carrying capacity goes to infinity. Then, the population size stabilizes in a neighborhood of the ecological equilibrium and jumps to another equilibrium when a successful trait mutant goes to fixation in the population. This is the TSS dynamics of the adaptive trait. It does not depend on the marker and has been mathematically proved by Champagnat (2006). The novelty in the model and in the proofs comes from the time-scales difference for the marker and trait mutations. The study of the marker distribution during the invasion period requires careful consideration of the individual process and of the different scales involved. In a first period, starting with the single invading mutant, we prove that the marker distribution remains close to a Dirac mass at the value of the initial mutant. Until the next jump of the TSS, the marker evolves as a stochastic distribution-valued process. In the case where the marker mutation effects are continuous and small, this is a Fleming–Viot process whose drift and covariance depend on the resident adaptive trait. In every cases, for any marker mutation model, the collated dynamics define a measure-valued diffusive process with jumps that we call Substitution Fleming–Viot Process (SFVP). The convergence of the microscopic process to the SFVP is shown both in the sense of finite dimensional distributions and in the sense of convergence of trait-marker-time measures, thus improving previous results of Champagnat (2006).

From a biological standpoint, we recover the conventional hitchhiking phenomenon: when a new mutant trait appears and sweeps through the population to fixation, the marker carried by the mutant individual is hitchhiked, and the marker distribution undergoes a genetical bottleneck. The mathematical construction of the SFVP process has new implications of biological relevance. Neutral diversity is restored after each adaptive jump, but as the adaptive trait evolves, population size, the mutation rate, genetic drift and demographic fluctuations change, which causes the rate of neutral polymorphism build-up and the moments of the marker distribution to change too. This suggests that the nature and structure of the whole eco-evolutionary feedback loop (i.e. how adaptive traits influence demographic rates and ecological interactions, and how ecological processes shape selection pressures on adaptive traits) may be important to explain the extreme disparities in genetic neutral diversity observed among species, even closely related ones and in the absence of differences in recombination profiles (Cutter and Payseur 2013). In fact, it is well-known that demographic differences due to external causes (demographic bottleneck or population expansion due to environmental changes) can affect neutral diversity of a population and that closely related species can show very different neutral diversity patterns. Here, we show that internal causes of demographic variation involved in adaptation can also affect species differently.

The article is organized as follows. In Sect. 2, we start with the model description. The stochastic individual-based process and its key assumptions are carefully described and examples are provided. A key parameter is \(K\), an integer that gives the order of the population size and is used to rescale the mutation rates and kernels. By letting \(K\) go to infinity we study the large population limit of the stochastic process. The main theorem is enounced and discussed in Sect. 3, where biological implications are also highlighted. Time scale separations implied by the dependence in \(K\) of the trait and marker mutations lead to homogenization phenomena and then to the SFVP. Our mathematical analysis provides a precise description of the genetical bottleneck that occurs at each trait substitution. We show that the marker of the initial mutant individual dominates in the marker distribution of the mutant population until this population reaches a neighborhood of the new ecological equilibrium. Then, we present two numerical examples based on an ecological model adapted from Dieckmann and Doebeli (1999). In the first example, marker mutation is described by a continuous state model that leads to a piecewise Fleming–Viot process (Sect. 3.3) for the marker. In the second example, marker mutation follows a discrete two-allele model and the classical Wright–Fisher diffusion (3.5) is recovered. Further generalizations are discussed. The proof of the main theorem in the adaptive dynamics scaling is in Sect. 4. After having introduced a semi-martingale decomposition of our stochastic measure-valued process, we start with recalling and refining the result of Champagnat (2006) for the convergence of trait-marginals. For this purpose, we introduce the M1-topology on the Skorokhod space where the TSS lives, using some ideas of Collet et al. (2013). The second part of the proof focuses on the marker distribution in an invading mutant population. This gives the result on the genetical bottleneck. Then, between two trait substitutions, the dynamics of the marker converges to a diffusive measure-valued process. As a conclusion of the proof, we show the convergence to the SFVP in the space of trait-marker-time measures.

2 The stochastic model

We consider an asexual population driven by births and deaths where each individual is characterized by hereditary types: a phenotypic trait under selection and a neutral marker. The trait and marker spaces \(\mathcal{X}\) and \(\mathcal{U}\) are assumed to be compact subsets of \(\mathbb {R}\). The type of individual \(i\) is thus a pair \((x_i,u_i), x_{i}\in \mathcal{X}\) being the trait value and \(u_i\in \mathcal{U}\) its neutral marker. The individual-based microscopic model from which we start is a stochastic birth and death process with density-dependence whose demographic parameters are functions of the trait under selection and are independent of the marker. We assume that the population size scales with an integer parameter \(K\) tending to infinity while individuals are weighted with \(\frac{1}{K}\). At any time \(t\ge 0\), we have a finite number \(N^K_t\) of individuals, each of them holding trait and marker values in \(\mathcal{X}\times \mathcal{U}\). Let us denote by \(((x_{1}, u_{1}), \ldots , (x_{N^K_t},u_{N^K_t}))\) the trait and marker values of these individuals. The state of the population at time \(t\ge 0\), rescaled by \(K\), is described by the point measure

$$\begin{aligned} \nu ^{K}_t={1\over K}\sum _{{i=1}}^{N^K_{t}} \delta _{(x_{i},u_{i})}, \end{aligned}$$
(2.1)

where \(\delta _{(x,u)}\) is the Dirac measure at \((x,u)\). This measure belongs to the set of finite point measures on \(\mathcal {X}\times \mathcal{U}\) with mass \(1/K\). This set is a subset of the set \(\mathcal {M}_{F}(\mathcal{X}\times \mathcal{U})\) of finite measures on \(\mathcal{X}\times \mathcal{U}\), which is embedded with the weak convergence topology. We denote by \(\, \langle \nu ,f\rangle \) the integral of the measurable function \(f\) with respect to the measure \(\nu \) and by \(\mathrm {Supp}(\nu )\) the support of \(\nu \). Then \(\, \langle \nu ^{K}_t,\mathbf{1}\rangle =\frac{N^K_t}{K}\).

For any \(t\ge 0\), we also introduce the trait marginal of the measure \(\nu ^K_t\) on \(\mathcal {X}\), denoted by \( X^{K}_t\) and defined by

$$\begin{aligned} X^{K}_t = {1\over K}\sum _{{i=1}}^{N^K_{t}} \delta _{x_{i}}. \end{aligned}$$

Therefore, the population measure \(\nu ^K_{t}\) writes

$$\begin{aligned} \nu _t^{K}(dx,du)= X^K_{t}(dx)\,\pi ^K_{t}(x,du) \end{aligned}$$
(2.2)

where \( \pi ^K_{t}(x,du)\) is the marker distribution for a given trait value \(x\) defined by

$$\begin{aligned} \pi ^K_{t}(x,du) = \frac{\sum _{i=1}^{N^K_t} 1\!\!1_{x_{i}=x}\delta _{u_i} }{\sum _{i=1}^{N^K_t} 1\!\!1_{x_{i}=x}}. \end{aligned}$$
(2.3)

Our purpose is to study the asymptotic behavior of the measure-valued process \(\nu ^K\) at large times, when the trait and marker are inherited but mutations occur. The main interest of our model is that these mutations happen at different time scales for trait and marker, but both longer than the individuals lifetime scale. The trait mutates much slower than the marker and drives the evolution time scale. Thus, the limiting behavior results from the interplay of three time scales: births and deaths, trait mutations and marker mutations.

We describe the individuals’ life history. The trait has an influence on the ability of individuals to survive (including competition with other ones) and to reproduce but the marker is neutral. The demographic parameters are thus functions of the trait only and are defined on \(\mathcal{X}\).

Assumption 2.1

  • An individual with trait \(x\) and marker \(u\) reproduces with birth rate given by \( 0\le b(x)\le \bar{b}\), the function \( b\) being continuous.

  • Reproduction produces a single offspring which usually inherits the trait and marker of its ancestor except when a mutation occurs. Mutations on trait and marker occur independently with probabilities \(p_{K}\) and \(q_{K}\) respectively. Mutations are rare and the marker mutates much more often than the trait. We assume that

    $$\begin{aligned} q_{K} = p_{K} \, r_K, \quad \hbox { with } p_{K}=\frac{1}{K^2}, \quad q_{K}(\log K)^2 \rightarrow _{K\rightarrow \infty } 0, \quad r_K \rightarrow _{K\rightarrow \infty } + \infty . \end{aligned}$$
    (2.4)
  • When a trait mutation occurs, the new trait of the descendant is \(x+k\in \mathcal{X}\) with \(k\) chosen according to the probability measure \(m(x,k)dk\).

  • When a marker mutation occurs, the new marker of the descendant is \(u+h\in \mathcal{U}\) with \(h\) chosen according to the probability measure \(G_{K}(u,dh)\). For any \(u\in \mathcal {U}, G_{K}(u,.)\) is approximated as follows when \(K\) tends to infinity:

    $$\begin{aligned} \lim _{K\rightarrow +\infty }\sup _{u\in \mathcal {U}} \bigg | {r_K\over K} \int _{ \mathcal {U}} (\phi (u+h)-\phi (u)) G_{K}(u,dh) - A\phi (u)\bigg | =0, \end{aligned}$$
    (2.5)

    where \((A, \mathcal{D}(A))\) is the generator of a Feller semigroup and \(\phi \in \mathcal{D}(A)\subseteq \mathcal {C}_b(\mathcal {U},\mathbb {R})\), the set of continuous bounded real functions on \(\mathcal {U}\).

  • An individual with trait \(x\) and marker \(u\) dies with intrinsic death rate \(0\le d(x)\le \bar{d}\), the function \( d\) being continuous. Moreover the individual experiences competition the effect of which is an additional death rate \(\,\eta (x)\ C*\nu ^K_{t}(x)= \frac{\eta (x)}{K}\sum _{i=1}^{N^K_t}C(x-x_{i})\). The quantity \(C(x-x_{i})\) describes the competition pressure exerted by an individual with trait \(\,x_{i}\) on an individual with trait \(x\). We assume that the functions \(C\) and \(\eta \) are continuous and that there exists \(\underline{\eta }>0\) such that

    $$\begin{aligned} \forall x, y \in \mathcal{X},\quad \eta (x)\ C(x-y)\ge \underline{\eta }>0. \end{aligned}$$
    (2.6)

A classical choice of competition function is \(C\equiv 1\) which is called “mean field case” or “logistic case”. In that case the competition death rate is \(\eta (x) N^K_t/K\).

Remark 2.2

Let us insist on the generality of Assumption (2.5) which allows a larger set of possible dynamics.

  • Equation (2.5) is for example true for \(\mathcal{U}=[u_{1},u_{2} ], G_{K}\) a centered Gaussian law (conditioned to \(\mathcal{U}\)) with variance \(\sigma _{K}\rightarrow 0\) such that \(\lim _{K} \sigma _{K}^2 {r_K\over K} = \sigma ^2\) and \(A\phi ={\sigma ^2\over 2}\phi ''\) for \(\phi \in \mathcal {C}^2\) with \(\phi '(u_{1})=\phi '(u_{2}) =0\). Choosing for example \(r_K=K^{3/2}, q_K=1/\sqrt{K}\) and \(\sigma ^2_K=1/\sqrt{K}\) works. This choice can be seen as a continuous state generalization of the stepwise mutation model (Ohta and Kimura 1973).

  • If in addition the distribution \(G_{K}\) has a non zero mean \(\, \mu _{K}\) such that \(\,{r_K\mu _{K}\over K} \rightarrow \mu >0\) corresponding to a mutational directional drift, then the operator \(A\) will be defined by \(A\phi = {\sigma ^2\over 2}\phi '' + \mu \phi '\).

  • If we relax the compactness of \(\mathcal{U}\) and assume that \(\mathcal{U} = \mathbb {R}\), a third example consists in taking for \(G_{K}\) the law of a Pareto variable with index \(\alpha \in (1,2)\) divided by \(K^{\eta /\alpha }\), for \(\eta \in (0,1]\). Then it has been proved by Jourdain et al. (2012) that

    $$\begin{aligned} \lim _{K} \sup _{u} \left| K^\eta \int _{ \mathbb {R}} (\phi (u+h)-\phi (u)) G_{K}(u,dh) - {\alpha \over 2} D^\alpha \phi (u)\right| =0, \end{aligned}$$

    where

    $$\begin{aligned} D^\alpha \phi (u) = \int _{ \mathbb {R}}(\phi (u+h)-\phi (u)- h\phi '(u)1\!\!1_{|h|\le 1}) {dh\over |h|^{1+\alpha }} \end{aligned}$$

    is the fractional Laplacian with index \(\alpha \). Thus if we take \(r_K\) such that \({r_K\over K^{1+\eta }}\) converges as \(K\) tends to infinity, and choose \(A=D^\alpha \) in (2.5), Assumptions (2.4)–(2.5) will be satisfied as soon as \(\eta <1\).

  • Another very interesting case is the discrete case when \(\mathcal {U}=\{a,A\}\) is a set of two alleles. The mutation kernel is given by

    $$\begin{aligned} G_K(u,dv)=1\!\!1_{u=a}\,q_a\, \delta _A(dv)+1\!\!1_{u=A}\,q_A \,\delta _a(dv). \end{aligned}$$
    (2.7)

    In this case, (2.5) implies that \(r_K/K\) has a limit when \(K\rightarrow +\infty \). Let \(\bar{r}\) be this limit, then

    $$\begin{aligned} A\phi (u)=\bar{r}\Big (1\!\!1_{u=a}\, q_a \big (\phi (A)-\phi (a)\big )+1\!\!1_{u=A}\, q_A \big (\phi (a)-\phi (A)\big )\Big ). \end{aligned}$$
    (2.8)

We see that the ratio between the two mutation probabilities \(r_K=q_K/p_K\) that allows convergence is highly dependent on the mutation distribution.

Note that since the demographic rates do not depend on the marker, the dynamics of the population distribution of the trait is independent of the marker distribution. But the dynamics of the marker distribution cannot be separated from the trait distribution as we shall see.

The process \((\nu ^{K}_t,t\ge 0)\) is a càdlàg \(\mathcal{M}_F(\mathcal {X}\times \mathcal {U})\)-valued Markov process. Existence and uniqueness in law of the process can be adapted from Fournier and Méléard (2004) and Champagnat et al. (2008) under the assumption that \(\mathbb {E}(\langle \nu ^{K}_0,\mathbf {1}\rangle )< + \infty \). Moreover, Assumption (2.6) allows to prove as by Champagnat (Champagnat 2006, Lemma 1) that if for \(p\ge 1, \sup _{K\in \mathbb {N}^*}\mathbb {E}(\langle \nu ^K_0,\mathbf {1}\rangle ^p)<+\infty \), then

$$\begin{aligned} \sup _{t\in \mathbb {R}_+, K\in \mathbb {N}^*} \mathbb {E}\big (\langle \nu ^K_t,\mathbf {1}\rangle ^p\big )<+\infty \end{aligned}$$
(2.9)

which will be useful to study the tightness and convergence of the sequence.

3 Convergence to the Substitution Fleming–Viot Process

The adaptive trait mutation time scale is the slowest, equal to \({1\over K p_{K}}=K\) by Assumptions (2.4). It scales the evolutionary time. So we shall consider the limiting behavior of \((\nu ^K_{Kt}, t\ge 0)\). We will see in Sect. 4.2.1 that \(p_K\) of order \(1/K^2\) is the only choice which leads to a non-trivial or non-degenerate marker dynamics.

Before stating our main result, we introduce several important ingredients which are used to describe the limit of \((\nu ^K_{Kt}, t\ge 0)\) when \(K\rightarrow +\infty \). We conclude the section with extensions and simulations.

3.1 Invasion fitness function

The large population behavior of the process \((\nu ^K_t, t\ge 0)\) as \(K\) tends to infinity, can be studied by classical arguments and is given in the appendix. At the ecological time scale (of order \(1\)), no mutation occurs in the asymptotic \(K\rightarrow +\infty \). If the initial population has a single adaptive trait \(x\), then, in the limit \(K\rightarrow +\infty \), the trait distribution remains \(\delta _x\) since \(p_K\) and \(q_K\) vanish in the limit. The rescaled population size process \(( N^K_{t}/K, t\ge 0)\) converges to the solution \((n_{t}, t\ge 0)\) of the ordinary differential equation

$$\begin{aligned} \frac{dn_t}{dt} =\big (b(x)-d(x)-\eta (x) C(0)n_t\big ) n_t \end{aligned}$$
(3.1)

which converges when \(t\) tends to infinity to the equilibrium

$$\begin{aligned} \widehat{n}_x =\frac{ b(x)-d(x)}{\eta (x)C(0)}. \end{aligned}$$
(3.2)

Conversely, at the adaptive trait-mutation time scale \(Kt\), new mutant traits can invade. If they replace the previous traits, then the corresponding event is called “fixation”.

The probability of fixation of a mutant trait \(y\) in a trait resident population \(x\) at equilibrium depends on the invasion fitness function \(f(y;x)\):

$$\begin{aligned} f(y;x)= b(y)-d(y)-\eta (y)\,C(y-x)\,\widehat{n}_x. \end{aligned}$$
(3.3)

This fitness function describes the initial growth of the mutant population. It does not depend on the neutral marker.

By simplicity we work under the assumption of ‘invasion implies fixation’, but this assumption will be relaxed in Sect. 3.5. When a mutant trait appears, either its line of descent replaces the resident population or it disappears. As a consequence, two traits cannot coexist in the long term.

Assumption 3.1

(“Invasion implies fixation”) For all \(x\in \mathcal {X}\) and for almost every \(y\in \mathcal {X}\),

$$\begin{aligned}&\text{ either }\, \frac{b(y) - d(y)}{\eta (y) C(y-x)} <\frac{b(x) - d(x)}{\eta (x)C(0)},\\&\text{ or }\, \frac{b(y) - d(y)}{\eta (y)C(y-x)}> \frac{b(x) - d(x)}{\eta (x)C(0)}\ \text{ and } \ \frac{b(x) - d(x)}{\eta (x)C(x-y)}<\frac{b(y) - d(y)}{\eta (y)C(0)}. \end{aligned}$$

Remark 3.2

In the case of logistic populations with \(C\equiv 1\), this assumption is satisfied as soon as \(x \mapsto \widehat{n}_x\) is strictly monotonous.

3.2 Main theorem

Let us first give the definition of the Fleming–Viot process which will appear in our setting (see e.g. Dawson and Hochberg 1982; Dawson 1993; Donnelly and Kurtz 1996; Etheridge 2000). We recall that the operator \(A\) has been introduced in (2.5).

In the sequel, we denote by \(\mathcal {P}(\mathcal {U})\) and \(\mathcal {P}(\mathcal {X}\times \mathcal {U})\) the probability measure spaces respectively on \(\mathcal {U}\) and on \(\mathcal {X}\times \mathcal {U}\).

Definition 3.3

Let us fix \(x\in \mathcal{X}\) and \(u \in \mathcal{U}\). The Fleming–Viot process \((F_{t}^{u}(x,.), t\ge 0)\) indexed by \(x\), started at time \(0\) with initial condition \(\delta _{u}\) and associated with the mutation operator \(A\) is the \(\mathcal{P}(\mathcal{U})\)-valued process whose law is characterized as the unique solution of the following martingale problem. For any \(\phi \in \mathcal{D}(A)\),

$$\begin{aligned} M^x_{t}(\phi ) = \langle F^{u}_{t}(x,.),\phi \rangle - \phi (u) - b(x) \int _{0}^t \langle F^{u}_{s}(x,.), A\phi \rangle ds \end{aligned}$$
(3.4)

is a continuous square integrable martingale with quadratic variation process

$$\begin{aligned} \langle M^x(\phi )\rangle _{t}=&\,\frac{b(x)+d(x)+\eta (x) C(0) \widehat{n}_x}{\widehat{n}_{x}} \int _{0}^t \left( \langle F^{u}_{s}(x,.), \phi ^2\rangle - \langle F^{u}_{s}(x,.), \phi \rangle ^2 \right) ds\nonumber \\ =&\,\frac{2b(x)}{\widehat{n}_x} \int _{0}^t \left( \langle F^{u}_{s}(x,.), \phi ^2\rangle - \langle F^{u}_{s}(x,.), \phi \rangle ^2 \right) ds. \end{aligned}$$
(3.5)

Let us now state our main theorem that describes the slow–fast dynamics of adaptive traits and neutral markers at the (trait) evolutionary time scale.

Theorem 3.4

We work under Assumptions 2.1 and 3.1. The initial conditions are \(\nu ^K_{0}(dy,dv)= n^K_{0}\,\delta _{(x_{0},u_{0})}(dy,dv)\) with \(\ \lim _{K\rightarrow \infty } n^K_{0} = \widehat{n}_{x_{0}}\) and for any \(\epsilon >0, \sup _{K\in \mathbb {N}^*} \mathbb {E}((n^K_0)^{2+\epsilon })<+\infty \).

Then, the population process \(\ (\nu ^K_{Kt}, t\ge 0)\) converges in law to the \(\mathcal {M}_{F}(\mathcal{X}\times \mathcal{U}))\)-valued process \(\,(V_t(dy,dv), t\ge 0)\) defined by

$$\begin{aligned} V_{t}(dy, dv) = \widehat{n}_{Y_{t}}\,\delta _{Y_{t}}(dy)\, F_{t}^{U_{t}}(Y_{t}, dv), \end{aligned}$$
(3.6)

where the process \(((Y_{t}, U_{t}), t\ge 0)\) on \(\mathcal{X}\times \mathcal{U}\), started at \((x_0,u_0)\), jumps at time t from \((x,u)\) to \((x+k,v)\) with the jump measure

$$\begin{aligned} b(x) \widehat{n}_{x}\frac{[f(x+k;x)]_+}{b(x+k)}\, F^{u}_t(x,dv)\,m(x,k)dk. \end{aligned}$$
(3.7)

The convergence holds in the sense of finite dimensional distributions on \(\mathcal {M}_{F}(\mathcal{X}\times \mathcal{U})\).

In addition, the convergence also holds in the space of trait-marker-time measures, i.e. the measure \(\nu ^K_{Kt}(dy,dv) dt\) on \(\mathcal {X}\times \mathcal {U}\times [0,T]\) converges weakly to the measure \(\widehat{n}_{Y_{t}}\,\delta _{Y_{t}}(dy)\, F_{t}^{U_{t}}(Y_{t}, dv) dt\) for any \(T>0\).\(\Box \)

Definition 3.5

The limiting measure-valued process \(( V_t(dy,dv), t\ge 0)\) is called Substitution Fleming–Viot Process. It generalizes the Trait Substitution Sequence (TSS) introduced by Metz et al. (1996).

We observe that the Substitution Fleming–Viot Process includes the three qualitative behaviors due to the three different time scales: deterministic equilibrium for the size of the population (driven by the ecological birth and death events), transitory diffusive behavior for the marker distribution (driven by marker mutation), jump process for the trait distribution (driven by adaptive trait mutation).

Remark 3.6

Equations (3.4)–(3.5) have important biological implications regarding neutral genetic diversity. Once the fixation of a favorable mutation has occurred and the population is monomorphic for the selected trait, the evolution of the neutral marker distribution is described by a Fleming–Viot process whose law is given by the martingale in (3.4). The bracket of the martingale in (3.5) shows that the stochastic fluctuations with time of the marker distribution are due to randomness in births and deaths and mutations. The multiplicative factor \(2b(x)/\widehat{n}_x\) in (3.5) depends on the trait value \(x\), and on the assumed ecological model which determines the relationships between \(x\), the death and birth rates and the competition kernel. Notice that \(2b(x)/\widehat{n}_x\) corresponds to the quotient of variance (here \(2b(x)\)) divided by effective size \(N_e\) (here \(\widehat{n}_x\)) that appears in the usual Wright Fisher equation. The quantity \(\widehat{n}_x\) corresponds to the mass of the population when there is an infinite number of small individuals; if the size of the population is of order \(K\), it means that there is approximately \(\widehat{n}_x K\) individuals of weights \(1/K\). The right term in (3.4), (i.e. the drift term in a mathematical sense) involves the generator \(A\) and is associated with the mutation model as seen in Assumption (2.5). The generator \(A\) describes the speed at which the neutral diversity is restored. For instance in a continuous state model, if \(A\phi =\frac{\sigma ^2}{2}\phi ''\), we recover the heat equation whose solutions have a variance in \(t\). In a discrete state model similar to (2.8), this equation gives the growth of the support.

In short, (3.4)–(3.5) shows that the distribution of the neutral marker depends on ecological processes and their parameters: every changes in \(x\) will result in changes in the distribution of the neutral marker, through changes in birth, death and mutation rates, in competition and equilibrium population size. This result is biologically relevant and important since it differs from the assumptions of classical genetic hitchhiking models, in which selection and population size remain constant, leading to the fact that the neutral diversity restoration will not depend on the trait substitution and its history. In examples below, we will give more detailed results regarding the distribution of the neutral marker changes.

The proof of Theorem 3.4 is the subject of Sect. 4.

The trait dynamics in the limit of Theorem 3.4 is the Trait Substitution Sequence obtained by Theorem 1 in Champagnat (2006) whose assumptions are satisfied. Our main contribution in Theorem 3.4 is to prove that at the adaptive trait mutation time scale, a homogeneization phenomenon takes place. There is a deterministic limit for the fastest process (the births and deaths leading to \(\widehat{n}_x\)), and stochastic limits for the two slower processes. The limiting process \((V_t, t\ge 0)\) is a measure-valued process with jumps (corresponding to trait mutations) and diffusion (corresponding to marker dynamics). If the population is trait-monomorphic with trait \(x\), the jump measure is

$$\begin{aligned} b(x) \widehat{n}_{x}\int _{\mathcal {X}-\{x\}} \frac{[f(x+k;x)]_+}{b(x+k)}\,m(x,k)dk, \end{aligned}$$

where \(\mathcal {X}-\{x\}=\{y-x,\ y\in \mathcal {X}\}\). When a jump occurs at \(t\), the process jumps from \((x,u)\) to \((x+k,v)\) where \(k\) is chosen in \(m(x,k)dk\) and \(v\) is chosen at time \(t\) in the marker distribution \(F^u_t(x,dv)\).

The marker distribution is the second fastest-evolving component, but marker mutations are assumed small (2.5), allowing to recover a non-degenerate Fleming–Viot superprocess parameterized by the trait of the population but with jumps. Between the jumps, this superprocess is the pathwise limit of the marker dynamics where traits are fixed. The jumps are hitchhiking events due to the trait mutations (see in another context Etheridge et al. 2006). There is a bottleneck at each successful invasion-fixation of mutant traits. Indeed, the individuals present at the fixation time are all descendants of the successful initial mutant. The trait and marker of the latter alone determine the state of the new mutant population, hence creating the bottleneck for the whole population genealogy. This result is biologically intuitive since we assume that the neutral marker and the trait are completely linked, but the mathematical proof of these phenomena is the hardest part of the proof of Theorem 3.4, and we will show that our results still have biological interest. Extending this model to the case of recombination is a challenging problem for future work (see Smadi 2014 in this direction).

It is also worth to notice that contrarily to other extensions of the TSS (e.g. the TSS with age-structure of Méléard and Tran 2009 or the Polymorphic Evolution Sequence for a multi-resource chemostat in Champagnat et al. 2014) that usually jump from an equilibrium to another, the marker distribution is here described by a stochastic process and not an equilibrium measure. This is due to the fact that the time scales of the trait and marker mutations are assumed different: in the time scale of marker mutations, the trait mutations are too rare and not seen.

An illustration of the invasion and fixation phenomena is summed up in Fig. 1.

Fig. 1
figure 1

Invasion and fixation of a successful trait mutant. In the population of resident trait \(x\) and marker distribution \(F_t^u(x,dv)\), a mutant trait \(x+k\) appears at time \(\tau _1\). Let \(v\) be the marker of the mutant individual. As in Champagnat et al. (2008), the fluctuations of the resident population can be neglected in first approximation and the mutant population evolves as a birth and death process with rates \(b(x+k)\) and \(d(x+k)+\eta (x+k)C(k)\widehat{n}_x\), independent of the marker distribution. When the mutant population reaches a sufficient size \(\varepsilon \) at time \(t_2\), with probability \([f(x+k,x)]_+/b(x+k)\), the ‘invasion implies fixation’ assumption leads to the replacement of the former population in a time \(t_K\) such that \(t_K/\log (K)\rightarrow \infty \). This time interval is too short to allow other marker mutant to appear in non-negligible proportion, with large probability. Thus, when the mutant population has fixed, at time \(\sigma _1\), it is close to \(\widehat{n}_{x+k} \delta _{(x+k,v)}\). Before the next adaptive trait mutation occurs, the marker mutates a lot, since marker mutations happen on a faster scale. The dynamics of the marker distribution is then the one of a Fleming–Viot superprocess started at \(\delta _v\) and with statistics depending on \(x+k\)

3.3 An example from Dieckmann and Doebeli (1999)

Let us first illustrate our model by simulations based on an example inspired from Roughgarden (1979) and Dieckmann and Doebeli (1999). Here \(\mathcal {X}=[-1,1], \mathcal {U}=[-2,2]\) and \(K=1{,}000\) in all the simulations. The individual dynamics is characterized by

  • the birth rate \(b(x)=\exp (-x^2/2\sigma _b^2)\) with \(\sigma _b = 0.9\). The probability of mutation of the trait and marker are respectively \(p_{K}=1/K^2\) and \(q_{K}=1/\sqrt{K}\). The adaptive trait mutation kernel \(m(x,k) dk\) is a Gaussian law with mean 0 and variance 0.1, conditioned to \([-1,1]\). The marker mutation kernel \(G_K(u,dh)\) is a Gaussian law with mean 0 and variance \(\sigma ^2_K=1/\sqrt{K}\), conditioned to \([-2,2]\).

  • symmetric competition for resources, with \( \eta (x)=1\) and \(C(x-y)=\exp (-(x-y)^2/2\sigma ^2_C), \sigma _C = 0.8\).

Here, the ‘optimal trait’ is \(x=0\) where the birth rate has its maximum and the population is governed by local competition. We start with the initial condition: \(\ x_{0}= -1, u_{0}= 0\).

The simulations (see Fig. 2) illustrate Theorem 3.4. They show the replacement of a resident population by a mutant population. In Fig. 2a, the dynamics of the support of the marker distribution is represented. The mutant and resident populations are pictured together and separately to better observe the extinction of the resident population (black) and the expansion of the mutant population from one individual (light). The invasion started around time 3,175 and after time 3,250, the mutant population has totally replaced the resident one.

Fig. 2
figure 2

We consider a resident trait-monomorphic population (black) in which a mutant trait (light) appears and goes to fixation. Here the intrinsic death rate \(d(x)=0\). a Evolution of the support of the marker distribution with respect to time; the support of the resident trait-monomorphic population is in black while the support of the mutant population is lighter. The mutant and resident populations are shown separately and together. b Distributions of the traits (left) and markers (right) in the population at three times during the invasion. The marker and trait values are in abscissa and frequencies are in ordinate. The marker value of the initial mutant is indicated by the red line. When the mutant trait appears, the resident population is quickly invaded by the mutant population during a transition period. In b, we can see that if the support of the marker distribution for the resident population remains wide (see also a), the size of the resident population decreases quickly. In the second column of (b), we see that the marker distribution in the mutant population remains spiked at the marker value of the first mutant individual during the whole transition period. After invasion (see a), the spread of the marker distribution follows the Fleming–Viot process (3.4)–(3.5). On a, we see that for the Fleming–Viot process, the support of the marker distribution spreads slowly

In Fig. 2b, the histograms of traits and markers at three times during the invasion are represented simultaneously, to underline the hitchhiking effect of the marker during the ‘invasion implies fixation’ phase. We can see that the distribution of the marker during the fixation remains close to a Dirac mass at the marker value of the initial mutant (red line). This illustrates the bottleneck phenomenon, the existence of which we prove rigorously [Eq. (4.8) of Proposition 4.5].

Let now focus on the Dieckmann–Doebeli’s example and highlight the biological implications regarding the eco-evolutionary feedback on the distribution of the neutral marker. Here, \(\widehat{n}_x=b(x)-d(x)\) and therefore the Fleming–Viot process \(F^u_t(x,.)\) is the solution of the martingale problem (3.4) with \(A\phi =\frac{\sigma ^2}{2}\phi ''\) and with bracket (3.5) given for all \(\phi \in \mathcal {C}(\mathcal {U},\mathbb {R})\) by \(\ 2\frac{b(x)}{b(x)-d(x)}\int _0^t \big (\langle F^u_s(x,.),\phi ^2\rangle -\langle F^u_s(x,.),\phi \rangle ^2\big )ds.\)

If the death rate is a constant \(d(x)=d\), then the multiplicative factor in the bracket (3.5), \(b(x)/(b(x)-d)\), decreases when \(b(x)\) increases. Heuristically we expect that the stochastic fluctuations in time of the distribution of the neutral marker decrease when the trait \(x\) approaches the evolutionary stable strategy (ESS, see Maynard Smith 1982) and \(b(x)\) increases, since the equilibrium size is greater and the diffusion coefficient is lower. The drift term is \(b(x)\frac{\sigma ^2}{2}\int _0^t \langle F^u_s(x,.),\phi ''\rangle ds\) and thus the multiplicative factor \(b(x)\) increases when approaching the ESS, contrarily to the multiplicative factor of the bracket (3.5). In the case \(d\equiv 0\), the Fleming–Viot process has a constant diffusion coefficient and the bracket (3.5) does not depend on \(x\). The Fleming–Viot process depends only on the trait \(x\) through the drift term. Notice that this is true for any mutation model satisfying (2.5). This simple result illustrates how the ecological processes can shape the neutral diversity.

3.4 Corollary: the Wright–Fisher evolutionary process

There exists a version of the SFVP in the case when the marker space \(\mathcal {U}\) is discrete. Assume for instance that there exist only two alleles of the marker trait, denoted by \(a\) and \(A\), so that \(\mathcal {U}=\{a,A\}\). In this case, we apply Theorem 3.4 with the mutation kernel \(G_K\) defined in (2.7) and \(r_K/K\rightarrow \bar{r}>0\) when \(K\rightarrow +\infty \).

Proposition 3.7

We work under Assumptions 2.1 and 3.1 with probabilities \(q_A\) and \(q_a\) to mutate from marker \(A\) to marker \(a\) and from marker \(a\) to marker \(A\). Moreover, we consider similar initial conditions \(\nu ^K_{0}\) as in Theorem 3.4. Then, the population process \(\ (\nu ^K_{Kt}, t\ge 0)\) converges in law to the \(\mathcal {M}_{F}(\mathcal{X}\times \{a,A\})\)-valued process

$$\begin{aligned} ( \widehat{n}_{Y_{t}}\,\big (W_t^a\ \delta _{(Y_{t},a)}(dy,du)+(1-W_t^a)\ \delta _{(Y_t,A)}(dy,du)\big ), t\ge 0), \end{aligned}$$

where \((Y_{t}, t\ge 0)\) is the TSS process that jumps from \(x\) to \(x+k\) in \(\mathcal {X}\) with the jump measure \(\,b(x)\, \widehat{n}_{x}\,\frac{[f(x+k;x)]_+}{b(x+k)}\,m(x,k)dk\ \) and where \((W_t^a, t\ge 0)\) is the following Wright–Fisher jump process that represents the proportion of alleles \(a\) in the population of trait \(Y_t\) at time \(t\). Between jumps, it satisfies the usual Wright–Fisher equation with mutations

$$\begin{aligned} dW^a_t= \bar{r}\, b(Y_t) \big ( q_A(1-W^a_t) - q_a W^a_t \big ) dt + \sqrt{\frac{2 b(Y_t)}{\widehat{n}_{Y_t}}\ W^a_t\ \big (1-W^a_t\big )} dB_t \end{aligned}$$
(3.8)

\((B_t, t\ge 0)\) being a standard Brownian motion. It jumps with the TSS and at jump time \(t\), the process \((W^a_t,1-W^a_t)\) goes to \((1,0)\) with probability \(W^a_t\) and to \((0,1)\) with probability \(1-W^a_t\). \(\Box \)

An illustration of this proposition is given in Fig. 3.

Fig. 3
figure 3

Evolution of sizes of the subpopulations with markers \(a\) and \(A\). The simulation uses individual-based algorithms. The proportions of marker alleles \(a\) and \(A\) follow Wright–Fisher diffusions while the size of the population stabilizes around the equilibrium given by the trait value. A trait mutant appears around time 18,290, invades and fixes into the population. Before the appearance of this mutant trait, fluctuations in the marker distribution are due to (fast) marker mutation, birth and death stochastic events. At the time when the mutant trait appears, the \(A\)-allele frequency is 85 %, giving a high probability for an A-allele hitchhike. This is the case in the simulation. After fixation time (around time 18,490), the \(a\)-allele population is extinct. It is regenerated by mutations of the marker but get extinct three times before taking up around time 19,600

This result can be generalized to discrete marker spaces \(\mathcal {U}=\{a_1,\dots a_m\}\), by introducing the transition probabilities \(q_{ij}\) to mutate from \(a_i\) to \(a_j, i,j\in \{1,\dots ,m\}\). An application is when the marker corresponds to the genetical sequence of \(n\) nucleotides (\(A, T, G\) or \(C\) for each position). In this case, \(m=\text{ Card }\ \mathcal {U}=4^n\).

Traditionally in a population genetics framework, the evolution in finite populations of the diversity at a neutral marker is described as a diffusion process with two fixed parameters: the population size and the mutation “rate” (e.g. Crow and Kimura 1970). The population size is related to what is called the “genetic drift” and generally refers to the random sampling of gametes performed for reproduction at the beginning of each generation, and the higher the population size, the lower the genetic drift. Under this framework, genetic drift induces stochastic fluctuations in the frequencies of the alleles \(A\) and \(a\) and can cause the decrease of neutral genetic diversity when an allele is randomly lost. On the other hand, mutation introduces continuously alleles \(A\) and \(a\) in the population and thus allows the restoration and the maintenance of the neutral genetic diversity. It is important to note that under the population genetics framework, mutation rates and population size are fixed and do not depend on the ecological processes and their parameters, neither on the trait value when the population is monomorphic for the trait under selection. As a consequence, those parameters do not change as successive selective sweeps occur especially during the adaptation process. Here we can use Eq. (3.8) and try to compare the classical population genetics results about the distribution of the neutral genetic diversity and the one in our model.

In an eco-evolutionary framework, (3.8) first shows that mutation rates and population size, i.e. the genetic drift, are not fixed and depend on the ecological processes and on the trait value \(x\). The mutation rates are \(\bar{r} \,b(Y_t) q_A \) and \(\bar{r}\, b(Y_t) q_a\) in our framework while it is only \(q_A\) and \(q_a\) under a population genetics framework (e.g. Crow and Kimura 1970). The genetic drift, i.e. the equilibrium population size, is given by \(1/\widehat{n}_{Y_t}\) while it is a constant \(1/n\,\) in population genetics framework. Second, (3.8) shows that extra ecological processes affect the distribution of the neutral marker since in the left-hand side there is the term \(2 b(Y_t)\). This term can be interpreted as the effect of demographic stochasticity, which is not taken into account in population genetics.

3.5 Extensions to co-existing traits

The work of Champagnat and Méléard (2011) generalizes the TSS to the case of coexisting trait values, when Assumption 3.1 is relaxed. They define a polymorphic TSS called polymorphic evolutionary sequence (PES) and denoted by \((X_t)_{t\ge 0}\in \mathbb {D}(\mathbb {R}_+,\mathcal {M}_F(\mathcal {X}))\). When a mutant trait \(y\) appears in a resident population of trait \(x_0\) at time \(t_1\), either its descendent line is killed with probability \(1-[f(y;x_0)/b(y)]_+\), or it survives. In that case, we can have coexistence of \(y\) and \(x_0\) when there is a positive globally stable non-trivial equilibrium \((n^*_{x_0,y}, n^*_{y,x_0})\) to the Lotka–Volterra system defined in (5.4). Therefore the population jumps from \(X_{t_1-}=\widehat{n}_{x_0}\delta _{x_0}\) to

$$\begin{aligned} X_{t_1}=n^*_{x_0,y} \delta _{x_0}(dx)+ n^*_{y,x_0} \delta _{y}(dx). \end{aligned}$$

For a probability \(\pi \), a trait measure \(X\) and \(x\in \mathcal {X}\), let us denote by \(F_t(\pi ,x,X, du)\) the Fleming–Viot process started at \(\pi \), evolving in the trait distribution \(X\) and parameterized by \(x\).

Let \(\pi _0\) be the initial marker distribution of the monomorphic population of trait \(x_0\). Before the time \(t_1\) of appearance of the first mutant, the marker distribution evolves as \((F_t(\pi _0,x_0,\widehat{n}_{x_0}\delta _{x_0},du))_{t\ge 0}\). Let \(\pi _{t_1}=F_{t_1}(\pi _0,x_0,\widehat{n}_{x_0}\delta _{x_0},du)\) be the marker distribution at \(t_1\) and let \(V_1\) be a random variable drawn in the distribution \(\pi _{t_1}\). After \(t_1\) and before the occurence of the second trait-mutation at \(t_2\), the population evolves as

$$\begin{aligned} n^*_{x_0,y} \delta _{x_0}(dx) F_{t-t_1}(\pi _{t_1},x_0,X_{t_1},du)+n^*_{y,x_0} \delta _y(dx)F_{t-t_1}(\delta _{V_1},y,X_{t_1},du). \end{aligned}$$

The processes \(F_{t}(\pi _{t_1},x_0,X_{t_1},du)\) and \(F_{t}(\delta _{V_1},y,X_{t_1},du)\) are independent generalizations of the Fleming–Viot process defined in Definition 3.3 conditionally on \(\pi _{t_1},X_{t_1} \mathrm{and} V_{1}\). Indeed their semimartingale decompositions are respectively:

$$\begin{aligned} \langle F_{t}(\pi _{t_1},x_0,X_{t_1},.),\phi \rangle&= \langle \pi _{t_1},\phi \rangle + b(x_{0}) \int _{0}^t \langle F_{s}(\pi _{t_1},x_0,X_{t_1},.), A \phi \rangle \ ds + M^1_{t}(\phi ); \nonumber \\ \langle F_{t}(\delta _{V_1},y,X_{t_1},.),\phi \rangle&= \phi (V_1) + b(y) \int _{0}^t \langle F_{s}(\delta _{V_1},y,X_{t_1},.), A \phi \rangle \ ds + M^2_{t}(\phi ),\nonumber \\ \end{aligned}$$
(3.9)

where \(M^1(\phi )\) and \(M^2(\phi )\) are independent square integrable martingales such that

$$\begin{aligned} \langle M^1(\phi )\rangle _{t}&= \frac{b(x_{0})+d(x_{0})+\eta (x_{0}) C(0) n^*_{x_0,y}+ \eta (x_0) C(x_{0}-y)n^*_{y,x_0}}{n^*_{x_0,y}+n^*_{y,x_0}}\nonumber \\&\times \int _{0}^t \left( \langle F_{s}(\pi _{t_1},x_0,X_{t_1},.), \phi ^2\rangle - \langle F_{s}(\pi _{t_1},x_0,X_{t_1},.), \phi \rangle ^2 \right) ds\nonumber \\&= \,\frac{2b(x_0)}{n^*_{x_0,y}+n^*_{y,x_0}}\!\!\int _{0}^t \!\!\left( \langle F_{s}(\pi _{t_1},x_0,X_{t_1},.), \phi ^2\rangle \!-\!\langle F_{s}(\pi _{t_1},x_0,X_{t_1},.), \phi \rangle ^2 \right) ds,\nonumber \\ \langle M^2(\phi )\rangle _{t}&= \frac{b(y)+d(y)+\eta (y) C(y-x_{0}) n^*_{x_0,y}+ \eta (y)C(0)n^*_{y,x_0}}{n^*_{x_0,y}+n^*_{y,x_0}}\nonumber \\&\times \int _{0}^t \left( \langle F_{s}(\delta _{V_1},y,X_{t_1},.), \phi ^2\rangle - \langle F_{s}(\delta _{V_1},y,X_{t_1},.), \phi \rangle ^2 \right) ds \nonumber \\&= \frac{2b(y)}{n^*_{x_0,y}\!+\!n^*_{y,x_0}}\int _{0}^t \!\left( \langle F_{s}(\delta _{V_1},y,X_{t_1},.), \phi ^2\rangle \!-\! \langle F_{s}(\delta _{V_1},y,X_{t_1},.), \phi \rangle ^2 \right) ds.\nonumber \\ \end{aligned}$$
(3.10)

At time \(t_2\), when a third trait appears in the population, the system can evolve to three two or just one coexisting traits, depending on the new trait equilibrium of the Lotka equations that is reached. For each of the traits, the marker distribution evolves as a generalization of the Fleming–Viot processes above.

Remark 3.8

The above equations show that, when there is coexistence of two traits in the population, the markers in the subpopulations defined by the two traits evolve independently but with parameters depending on the two co-existing traits. Thus, when there is a diversification event in the population, the distribution of the neutral diversity in one of the two subpopulations does not evolve as completely forgetting the other one, as it is usually assumed. The parameters of the underlying Fleming–Viot process depend on the complete trait distribution.

We present in Fig. 4 simulations in the case of coexistence, with the same model and parameters as in Sect. 3.3, except \(\sigma _C = 0.7\) and the initial condition: \(\ x_{0}= -0.1.\) The simulations (see Fig. 4) show the appearance of a new mutant trait (yellow) in a population of two coexisting traits (black and blue).

Fig. 4
figure 4

Neutral marker dynamics in a trait-dimorphic population. Evolution of markers with respect to time. a The mutant population (yellow) and resident populations (black and blue) are shown separately and together. b Distributions of the traits and markers in the population at three times during the invasion. The marker value of the initial mutant is indicated by the red line

4 Proof of Theorem 3.4

Let us sketch the proof. In this section, we will suppose that Assumptions 2.1, 3.1 are satisfied and the initial conditions are \(\nu ^K_{0}(dy,dv)= n^K_{0}\,\delta _{(x_{0},u_{0})}(dy,dv)\) with \(\ \lim _{K\rightarrow \infty } n^K_{0} = \widehat{n}_{x_{0}}\) and \(\sup _{K\in \mathbb {N}^*} \mathbb {E}((n^K_0)^3)<+\infty \).

First, we recall results due to Champagnat et al. (2008) that provide the finite marginal convergence of the trait process \(({X}^K_{Kt} ; t\ge 0)\). We extend these results to obtain the weak convergence of the measures \((X^K_{Kt}(dx)dt ; K\ge 0)\) in \(\mathcal {M}_F(\mathcal {X}\times [0,T])\) embedded with the weak convergence topology. This corresponds to the convergence of \(({X}^K_{Kt} ; t\ge 0)\) as a trait-marker-time measure, as developed by Kurtz (1992). Secondly, we include the fast component (the marker) and prove the tightness of the sequence \((\nu ^K_{Kt}(dx,du)dt ; K\ge 0)\) in \(\mathcal {M}_F(\mathcal {X}\times \mathcal {U}\times [0,T])\). We then consider a subsequence, again denoted by \((\nu ^K_{Kt}(dx,du)dt, K\ge 0)\) with an abuse of notation, that converges to a limit \(\Gamma (dt,dx,du)\in \mathcal {M}_F([0,T]\times \mathcal {X}\times \mathcal {U})\) that we have to identify. This derivation is done in several steps. When a successful mutant appears in the monomorphic population with trait \(x\), the transition period to fixation is to be considered carefully. It has been proved by Champagnat (2006) that these transitions are of order \(\log (K)\). We prove that during this time interval, the marker distribution in the mutant subpopulation remains a Dirac mass at the value of the initial mutant. This results from the combined effects of small or rare marker mutations, large population and slow take-off of the new mutant population. Then, we show that in a trait monomorphic population with value \(x\), the marker distribution converges to a Fleming–Viot superprocess parameterized by \(x\).

4.1 Semimartingale decomposition of \(\nu ^K\)

Let us introduce some notation to keep forthcoming formula simple. For \(\nu \in \mathcal {M}_F(\mathcal {X}\times \mathcal {U})\) and \(\phi (x,u)\in \mathcal {C}(\mathcal {X}\times \mathcal {U},\mathbb {R})\), we define the (nonlinear) generators \(B^K\) and \(D^K(\nu )\) such that

$$\begin{aligned} B^K\phi (x,u)&= (1-p_K)(1-q_K) b(x)\phi (x,u)\nonumber \\&+ p_K(1-q_K)b(x) \int _{\mathcal {X}} \phi (x+k,u) m(x,k)dk \nonumber \\&+ q_K(1-p_K)b(x) \int _{\mathcal {U}} \phi (x,u+h) G_K(u,dh) \nonumber \\&+ p_K\ q_K \ b(x) \int _{\mathcal {X}\times \mathcal {U}} \phi (x+k,u+h) m(x,k) dk\ G_K(u,dh)\qquad \qquad \end{aligned}$$
(4.1)
$$\begin{aligned} D^K(\nu )\phi (x,u)&= \big (d(x)+\eta (x) C* \nu (x)\big )\phi (x,u). \end{aligned}$$
(4.2)

The process \(\langle \nu ^K_.,\phi \rangle \) is a square integrable semi-martingale and we give its characteristics.

Proposition 4.1

For a continuous bounded function \(\phi (x,u)\) on \(\mathcal {X}\times \mathcal {U}\), the process

$$\begin{aligned} M^{K,\phi }_t =\,&\langle \nu ^K_t,\phi \rangle - \langle \nu ^K_0,\phi \rangle - \int _0^t ds \int _{\mathcal {X}\times \mathcal {U}} \nu ^K_s(dx,du) \big (B^K-D^K(X^K_s)\big )\phi (x,u) \end{aligned}$$
(4.3)

is a square integrable martingale with previsible quadratic variation

$$\begin{aligned} \langle M^{K,\phi }\rangle _t =\,&\frac{1}{K}\int _0^t ds \int _{\mathcal {X}\times \mathcal {U}} \nu ^K_s(dx,du) \big (B^K+D^K(X^K_s)\big )\phi ^2(x,u). \end{aligned}$$
(4.4)

Proof

The dynamics being given in Sect. 2, the proof can be adapted from Fournier and Méléard (2004, Lemma 5.2). One main step consists in showing that there exists a Poisson point measure driving the measure-valued processes \(\nu ^K\) for all \(K\in \mathbb {N}^*\). \(\square \)

4.2 Convergence of the trait-marginal in the trait mutation time scale

As previously emphasized, the trait dynamics is described by the measure-valued process \( X^K\) which does not depend on the markers. This process has been fully studied in Champagnat (2006) and Champagnat et al. (2008). In this section, we recall the finite marginal convergence result obtained in these papers. We give some additional properties concerning the topology involved. This result shows a time scale separation with successive fixations of successful mutants, under Assumptions 2.1 and 3.1. Notice that the time scale assumption is

$$\begin{aligned} \forall V>0,\quad \log K\ll \frac{1}{K p_K}\ll \exp (VK), \quad \hbox { as } K\rightarrow \infty , \end{aligned}$$
(4.5)

which is realized in our case for \(p_K= 1/K^2\).

Theorem 4.2

Under Assumptions 2.1 and 3.1, let us also assume that the initial population is trait-monomorphic: \( X^K_0=n^K_0\delta _{x}\) for \(x\in \mathcal{X}\) and \(n^K_0\rightarrow \widehat{n}_{x}\) in probability and \(\sup _{K\in \mathbb {N}^*} \mathbb {E}((n^K_0)^3)<+\infty \).

Then, the sequence \(( X^K_{Kt}; t\ge 0)\) converges to the pure jumps singleton measure-valued Markov process \((\widehat{n}_{Y_{t}}\,\delta _{Y_{t}}; t\ge 0)\) defined as follows: \(Y_0=x\), and the process \(Y\) jumps from \(\,{x}\) to \(\ {x+k}\) with jump measure \( \ b(x)\, \widehat{n}_{x}\frac{[f(x+k;x)]_+}{b(x+k)}\,m(x,k)dk. \)

The convergence holds in the sense of finite dimensional distributions on \(\mathcal{M}_F(\mathcal {X})\) equipped with the topology of total variation.

This theorem has been proved by Champagnat (2006) for the logistic case and generalized by Champagnat et al. (2008).

The trait-marginal process \((X^K_{Kt} ; t\in [0,T])\) does not converge in \(\mathbb {D}([0,T],\mathcal {M}_F(\mathcal {X}))\) embedded with the Skorokhod topology. Indeed, the size of jumps is upperbounded by \({1\over K}\) and nevertheless the limiting total mass process has jumps, preventing trajectorial tightness (at least in the \(J1\)-topology). Following the idea of Kurtz (1992) and as developed in Méléard and Tran (2012) and Gupta et al. (2014), a weaker topology consists in forgetting the process point of view and considering the measure \(X^K_{Kt}(dx)dt\) in \(\mathcal {M}_F([0,T]\times \mathcal {X})\) embedded with the topology of weak convergence. This convergence strengthens the result of Theorem 4.2 but in a topology weaker than the Skorohod topology.

To achieve this, as in Collet et al. (2013), we first introduce the \(M_1\)-topology on \(\mathbb {D}([0,T],\mathbb {R}_+)\). It is weaker than the usual \(J_{1}\)-topology and allows monotonous processes with jumps tending to \(0\) to converge to processes with jumps (see Skorohod 1956). For a càdlàg function \(h\) on \([0,T]\), the continuity modulus for the \(M_{1}\)-topology is given by

$$\begin{aligned} w_{\delta }(h) = \sup _{\mathop {0\le t_{2}-t_{1}\le \delta }\limits ^{\scriptstyle 0\le t_{1}\le t\le t_{2}\le T;}} d(h(t), [h(t_{1}), h(t_{2})]). \end{aligned}$$
(4.6)

Note that if the function \(h\) is monotone, then \(w_{\delta }(h) = 0\).

Proposition 4.3

Let us consider a continuous, monotonous and non-negative function \(g\). Then, under Assumptions 2.1 and 3.1, the process \((R^K_{t}, t\in [0,T])\) defined by

$$\begin{aligned} R^K_{t} = \int g(x) X^K_{Kt}(dx) \end{aligned}$$

converges in law in the sense of the Skorohod \(M_{1}\)-topology to the process \((R_{t}, t \in [0,T])\) where \(R_{t} = \widehat{n}_{Y_{t}}\,g(Y_{t})\).

Proof

Assume that \(g\) is non-decreasing. From Theorem 4.2, finite dimensional distributions of \((R^{K}_{t}, t\in [0,T])\) converge to those of \((\widehat{n}_{Y_{t}}\,g(Y_{t}), t \in [0,T])\). By Skorohod (1956, Theorem 3.2.1), it remains to prove that for all \(\eta >0\),

$$\begin{aligned} \lim _{\delta \rightarrow 0} \limsup _{K\rightarrow \infty } \mathbb {P}(w_{\delta }(R^{K}_{.}) >\eta ) = 0, \end{aligned}$$

where \(w_{\delta }\) has been defined in (4.6).

The mutation rate in \((R^{K}_{t}, t\in [0,T])\) being bounded, the probability that two mutations occur within a time less than \(\delta \) is \(o(\delta )\). It is therefore enough to study the case where there is at most one mutation in the time interval \([0,\delta ]\). Following Champagnat (2006), the path of \(R^K\) can be decomposed into several subpaths, each of them being closed to a large population deterministic measure-valued function \(\xi \) (see Proposition 5.1 in the Appendix) with a probability tending to 1. Away from invading mutations and for a trait-monomorphic population with trait \(x, \langle \xi _{Kt},g\rangle = g(x) n_{Kt}(x)\) where \(n_{.}(x)\) is the solution of the logistic Eq. (3.1). We can easily check that \(t\rightarrow n_{t}(x)\) converges monotonously to its stable equilibrium \(\widehat{n}_{x}\) and then \(\langle \xi _{Kt},g\rangle \) is monotonous and the modulus of continuity tends to \(0\). Around an invading mutant \(y , \langle \xi _{Kt},g\rangle \) is close to \(n_{Kt}(x) g(x) + n_{Kt}(y) g(y)\) where \((n_{t}(x), n_{t}(y)) \) is solution of the Lotka–Volterra system (5.4) with an initial condition close to \((\widehat{n}_{x},0)\). The mutant \(y\) invades if the fitness function \(f(y;x)\) is positive (and \(f(x;y)\) is negative). From Assumption 3.1, an easy study of the Lotka–Volterra system (see for example the Appendix in Champagnat 2004, Fig. b p.187), shows that either \(n_{t}(x)\) and \(n_{t}(y)\) are increasing or \(\dot{n}_{t}(x) <0 ;\ \dot{n}_{t}(y) >0\). In that case, \(n_{t}(x) g(x) + n_{t}(y) g(y)\) is the sum of two monotonous functions and the modulus of continuity tends also to 0 \(\square \)

Corollary 4.4

The sequence of random measures \( X^K_{Kt} (dx) dt\) converges in law to the random measure \( \widehat{n}_{Y_{t}}\, \delta _{Y_{t}}(dx) dt\) in \(\mathcal {M}_F([0,T]\times \mathcal {X})\) embedded with the weak convergence topology.

Proof

It is enough to prove the convergence in law of \(\int h(t) e^{-q x} X^K_{Kt} (dx) dt\,\) to \(\,\int h(t) e^{-q x} \hat{n}_{Y_{t}}\, \delta _{Y_{t}}(dx) dt\) for a measurable bounded function \(h\) and \(q\in \mathbb {Q}\). In Skorohod (1956), it is proved that if \(x_{K}\) converges to \(x\) in \(\mathbb {D}([0,T],\mathbb {R})\) embedded with the \(M_{1}\)-topology, then for \(t\) outside a denumerable set, \(x_{K}(t)\) converges to \(x(t)\). Then it follows by Lebesgue’s Theorem that \(\int _{0}^T H(t,x_{K}(t))dt\) converges to \(\int _{0}^T H(t,x(t))dt\), as soon as \(H\) is bounded and continuous. We apply this result to the process \((\int _{\mathcal {X}} e^{-q x} X^K_{Kt} (dx), t\ge 0)\) and the function

$$\begin{aligned} H_{M}(t,y) = h(t) (y \wedge M), \end{aligned}$$

for any \(M>0\). Estimate (2.9) (with \(p=1\)) allows to conclude. \(\square \)

4.2.1 Marker distribution in a new adaptive trait mutant population

In this section, we study the transition of the marker distribution when a new mutant adaptive trait appears in a monomorphic population with trait \(x_0\). We consider this phenomenon at the ecological time scale and we prove that the fixation of the mutant trait creates a genetical bottleneck.

Let \(K\) be fixed. Initially we have a trait monomorphic population with trait \(x_0\) and a marker distribution \(\pi ^K(x_0,du)\). Then an individual \((x_{0},v)\) from this population gives birth to an individual with mutant trait \(y\) and marker \(v (v\) has been chosen according to \(\pi ^K(x_0,du))\). We consider the process \((\nu ^K_t ; t\ge 0)\) started at

$$\begin{aligned} \nu ^K_0(dx,du)=\,&X^K_0(dx)\pi ^K_0(x,du)\\ =\,&\frac{1}{K}\delta _{(y,v)}(dx,du)+\frac{N^K_0-1}{K}\delta _{x_0}(dx)\pi ^K_0(x_0,du). \end{aligned}$$

Proposition 4.5

Under Assumptions 2.1 and 3.1, let us consider a mutant \((y,v)\) appearing in a monomorphic population with trait \(x_0\) and marker distribution \(\pi ^K_0(x_0,du)\). Let us assume that \(f(y ; x_0)>0\), where the fitness function has been defined in (3.3). There exists \(\varepsilon >0\) such that for any sequence \((t_K ; K\in \mathbb {N}^*)\) with \(\lim _{K\rightarrow +\infty } t_K/\log K=+\infty \) and \(\lim _{K\rightarrow +\infty } t_K/K=0\) (for example \(t_K=(\log K)^2\)), we have

$$\begin{aligned} \lim _{K\rightarrow +\infty }\mathbb {P}\big (\langle \nu ^K_{t_K},1\!\!1_{y}\rangle >\varepsilon \big )=\frac{f(y ; x_0)}{b(y)} \text{ and } \lim _{K\rightarrow +\infty }\mathbb {P}\big (\langle \nu ^K_{t_K},1\!\!1_{y}\rangle =0\big )=1-\frac{f(y ; x_0)}{b(y)}. \end{aligned}$$
(4.7)

Further, for the marker distribution, we can prove that

$$\begin{aligned} \lim _{K\rightarrow +\infty }\mathbb {P}\big (\pi ^K_{t_K}(y,du)=\delta _v(du)\langle \nu ^K_{t_K},1\!\!1_{y}\rangle >\varepsilon \big )=\frac{f(y ; x_0)}{b(y)}. \end{aligned}$$
(4.8)

The Eq. (4.8) tells us that when the mutant trait survives in the resident population of trait \(x_0\), then by the time \(t_K\) it needs to reach a non-negligible size, its marker distribution is still a Dirac mass at \(y\). The assumption \(q_K(\log K)^2\) in Assumption 2.1 ensures this. This assumption is not very restrictive as \((\log K)^2\) is a very slow growth rate. Additional comments are given after the proof.

Proof

Properties (4.7) have been proved in Champagnat (2006) and Champagnat et al. (2008) and depend only on the trait distribution. We consider test functions \(\phi (x,u)\) of the form \(1\!\!1_{y}(x)g(u)\) with \(g\in \mathcal {C}^2(\mathcal {U},\mathbb {R})\) such that \(\Vert g\Vert _\infty +\Vert g''\Vert _\infty \le 1\). Starting from Proposition 4.1 and using Itô’s formula with jumps, we obtain as soon as the population with trait \(y\) survives,

$$\begin{aligned} \int _{\mathcal {U}} g(u) \pi ^K_{t_K}(y,du) =&\frac{\langle \nu ^K_{t_K},1\!\!1_{y}g\rangle }{\langle \nu ^K_{t_K},1\!\!1_y\rangle }\nonumber \\ =\,&g(v) + M^{K,g}_{t_K} \!+\! q_K (1-p_K)\, b(y) \int _0^{t_K} \left( 1\!-\!\frac{1}{K\langle \nu ^K_{s},1\!\!1_{y}\rangle +1}\right) \nonumber \\&\times \int _\mathcal {U}\pi ^K_s(y,du) \, \int _\mathcal {U}\big (g(u+h)-g(u)\big )G_K(u,dh) \ ds \end{aligned}$$
(4.9)

where \(M^{K,g}\) is a square integrable martingale with previsible quadratic variation:

$$\begin{aligned}&\langle M^{K,g}\rangle _{t_{K}} = \frac{1}{K}\int _0^{t_{K}} ds \bigg \{b(y)(1-q_K)(1-p_K)\nonumber \\&\times \,\frac{\langle \nu ^K_s,1\!\!1_y\rangle }{\big (\langle \nu ^K_{s},1\!\!1_{y}\rangle +\frac{1}{K}\big )^2} \int _\mathcal {U}\big (g(u)-\langle \pi ^K_s,g\rangle \big )^2\pi ^K_s(y,du) \nonumber \\&+\, \big (d(y)+\eta (y) C*\nu ^K_s(y) \big ) \frac{\langle \nu ^K_s,1\!\!1_y\rangle }{\big (\langle \nu ^K_{s},1\!\!1_{y}\rangle -\frac{1}{K}\big )^2}\int _\mathcal {U}\big (g(u)-\langle \pi ^K_s,g\rangle \big )^2\pi ^K_s(y,du)\nonumber \\&+\, b(y) q_{K}(1-p_K) \frac{\langle \nu ^K_s,1\!\!1_y\rangle }{\big (\langle \nu ^K_{s},1\!\!1_{y}\rangle +\frac{1}{K}\big )^2} \int _\mathcal {U}\pi ^K_s(y,du)\nonumber \\&\times \int _\mathcal {U}G_K(u,dh) \big ( g(u+h)-\langle \pi ^K_s,g\rangle \big )^2 \bigg \}. \end{aligned}$$
(4.10)

The third term in the right hand side of (4.9) is of order \(t_K/K\). Indeed thanks to (2.4) and (2.5), it is upper bounded by

$$\begin{aligned} {t_{K}\over K}\,\bar{b}\, \Vert Ag\Vert _{\infty } . \end{aligned}$$

Equation (4.10) needs more attention. As soon as the mass \(\langle \nu _s^K,1\!\!1_{y}\rangle \) of the mutant population is of order 1, the variance of \(M^{K,g}_{t_K}\) is in \(t_K/K\) which tends to zero when \(K\rightarrow +\infty \). However, since we start from 1 individual, we have to separate the time interval \([0,t_K ]\) into 2 parts. Let us introduce a sequence \((s_{K})\) such that \(s_{K} \le t_{K}\) for any \(K\) and

$$\begin{aligned} \log K \ll s_{K} \ll \frac{1}{\sqrt{qK}}. \end{aligned}$$

This is possible thanks to the assumption that \(q_K (\log K)^2\rightarrow 0\). Notice that \(s_{K}\) can be equal to \(t_{K}\). Using Assumption 3.1, we can prove as in Champagnat (2006, Lemma 3) that there exists \(\varepsilon _0>0\) such that

$$\begin{aligned} \lim _{K\rightarrow \infty } \mathbb {P}\Big (\forall s \in [s_{K}, t_{K}],\ \langle \nu ^K_s,1\!\!1_y \rangle \ge \varepsilon _0\Big ) = \frac{f(y;x_0)}{b(y)}. \end{aligned}$$

It turns immediately out that

$$\begin{aligned}&\mathbb {E}\left( \frac{1\!\!1_{\{\forall s \in [s_K,t_K],\ \langle \nu ^K_{s},1\!\!1_y\rangle >0\}}}{K}\int _{s_{K}}^{t_{K}} \frac{b(y)+d(y)+\eta (y)C *\nu ^K_s(y)}{\langle \nu ^K_{s},1\!\!1_{y}\rangle }\right. \nonumber \\&\quad \quad \left. \int _\mathcal {U}\big (g(u)-\langle \pi ^K_s,g\rangle \big )^2\pi ^K_s(y,du) ds\right) \le C\, {t_{K}\over K}. \end{aligned}$$
(4.11)

Before time \(s_{K}\), the population size with trait \(y\) is not large enough and \( \frac{1}{K\,\langle \nu ^K_s,1\!\!1_y\rangle } \) can only be upper bounded by \(1\). Therefore we have to control the expectation of the variance of \(g\) under \(\pi ^K_{s}\). The expected number of marker mutations at time \(s\) along a lineage is \(s q_{K}\) and the variance of such mutation is bounded by \(\Vert g\Vert _\infty ^2=\sup \{g(h)^2, h\in \mathcal{U}\}\). Then

$$\begin{aligned} \mathbb {E}\left( \int _\mathcal {U}\big (g(u)-\langle \pi ^K_s,g\rangle \big )^2\pi ^K_s(y,du)\right) \le s\, q_{K} \,\Vert g\Vert _\infty ^2, \end{aligned}$$
(4.12)

and

$$\begin{aligned}&\mathbb {E}\left( \frac{1}{K}\int _{0}^{s_{K}} \frac{b(y)+d(y)+\eta (y)C *\nu ^K_s(y)}{\langle \nu ^K_s,1\!\!1_y\rangle } \int _\mathcal {U}\big (g(u)-\langle \pi ^K_s,g\rangle \big )^2\pi ^K_s(y,du) ds\right) \nonumber \\&\quad \quad \le C\, {(s_{K})^2 r_K \over K^2}. \end{aligned}$$

The upper bound converges to 0 by the assumption on \(q_K\). The third term of (4.10) is treated similarly. This concludes the proof. \(\square \)

Remark 4.6

For \(q_K=1/\sqrt{K}\), let us notice that the rate of appearance of mutant markers in a population of size \(K\) is of order \(Kq_K=\sqrt{K}\) which does not tend to zero. This means that many mutant markers appear in the population of trait \(y\) during the \(t_K\) time interval following the first mutant \((y,v)\). However, heuristically, since in a tree the mass is concentrated around the leaves, the mutants do not appear with the same probability along the time interval and mutations are mostly observed after the time \(s_K\) when the mutant population \((y,v)\) is already large. Moreover, using that the marker mutation step and/or marker mutation frequency is small we obtain that the mutant markers remain in negligible proportion between \(s_K\) and \(t_K\).

4.3 Convergence of the marker distribution process in a trait-monomorphic population

For \(K\in \mathbb {N}^*\), we introduce, as in Champagnat (2006), the following sequence of stopping times \(\tau ^K_k\) and \(\theta ^K_k\):

$$\begin{aligned}&\tau ^K_0= 0, \qquad \theta ^K_0=0\\&\tau ^K_{k+1}=\inf \{t>\tau ^K_k,\ \text{ Card }\big (\text{ supp }(\bar{X}^K_t)\big )=\text{ Card }\big (\text{ supp }(\bar{X}^K_{t_-})\big )+1\}\\&\theta ^K_{k}=\inf \{t>\tau ^K_k,\ \text{ Card }\big (\text{ supp }(\bar{X}^K_t)\big )=1\}. \end{aligned}$$

The times \(\tau ^K_k\)’s are the times of appearance of the successive mutant traits in the population and the \(\theta ^K_k\)’s are the times at which the population returns to monomorphic state. These times are possibly infinite, if the corresponding sets are empty. It has been proved in Champagnat (2006) that for \(t_K\) be such that \(\lim _{K\rightarrow +\infty } t_K/\log (K)=+\infty \) and \(\lim _{K\rightarrow +\infty } t_K/K =0\),

$$\begin{aligned} \lim _{K\rightarrow +\infty }\mathbb {P}\Big (\forall k\ge 0, \tau ^K_k\wedge KT \le \theta ^K_{k} \wedge KT \le \big (\tau ^K_k+t_K\big )\wedge KT\le \tau ^K_{k+1} \wedge KT\Big )=1. \end{aligned}$$
(4.13)

Proposition 4.7

Take the process \((\nu ^K_{Kt} ; t\in [0,T])\) started with the monomorphic initial condition \(\nu _0^K(dx,du)=n_0^K \delta _{(x_0,u_0)}(dx,du)\), where \(\lim _{K\rightarrow +\infty }n_0^K= \widehat{n}_{x_0}>0\) and \(\sup _{K\in \mathbb {N}^*} \mathbb {E}((n^K_0)^3)<+\infty \).

(i) In the trait-mutation time scale, the time of first mutation converges in distribution as follows:

$$\begin{aligned} \lim _{K\rightarrow +\infty } \tau _1^K/K =\tau _1, \end{aligned}$$
(4.14)

where \(\tau _1\) is an exponential time with parameter \(b(x_0)\widehat{n}_{x_0}\).

(ii) Let us consider the processes \((\pi ^K_{K(t\wedge \tau ^K_1)} ; t\in [0,T])\) stopped at the time of first mutation. When \(K\rightarrow +\infty \), this sequence converges in distribution in \(\mathbb {D}([0,T],\mathcal {P}(\mathcal {U}))\) to the Fleming–Viot process \(F^{u_0}(x_0,du)\) (see Definition 3.3) and stopped at the independent exponential time \(\tau _1\).

Proof

First of all, the trait and marker mutations are independent. Thus, the stopping time \(\tau _1^K\) is independent of the marker distribution \(\pi ^K(x_0,du)\). The results of Champagnat and coauthors Champagnat (2006) and Champagnat et al. (2008) are unchanged and give (4.14). Moreover, by Champagnat et al. (2008, Lemma 5.4)

$$\begin{aligned} \lim _{K\rightarrow +\infty }\mathbb {P}\Big (\sup _{s\in [\log K, \tau ^K_1]}\langle X^K_s, \mathbf{1}\rangle \ge \frac{\widehat{n}_{x_0}}{2}\Big )=1. \end{aligned}$$
(4.15)

Let \(\phi \in \mathcal {C}(\mathcal {U},\mathbb {R})\). Since the population is trait-monomorphic with trait \(x_0\), then

\(\langle \pi ^K_{Kt}(x_0,du),\phi (u)\rangle = \frac{\langle \nu ^K_{Kt},\phi \rangle }{\langle \nu ^K_{Kt}, \mathbf{1}\rangle }\). Thus, from Proposition 4.1 and Itô’s formula, we get that in the time scale \(K t\)

$$\begin{aligned}&\langle \pi ^K_{K(t\wedge \tau ^K_1)}(x_0,.),\phi \rangle = \langle \pi ^K_0(x_0,.),\phi \rangle + H^{K,\phi }_{K(t\wedge \tau ^K_1)} + b(x_0) q_K(1-p_K)\nonumber \\&\quad \times \int _0^{t\wedge \tau ^K_1} \Big (1-\frac{1}{K\langle \nu ^K_{Ks},1\rangle +1}\Big ) \frac{r_K}{K}\nonumber \\&\quad \times \int _{\mathcal {U}\times \mathcal {U}} \big (\phi (u+h)-\phi (u)\big )G_K(u,dh) K\pi ^K_{Ks}(x_0,du) ds \end{aligned}$$
(4.16)

where \( H^{K,\phi }\) is a square integrable martingale with quadratic variation is

$$\begin{aligned}&\langle H^{K,\phi }\rangle _{K(t\wedge \tau ^K_1)}= \int _0^{t\wedge \tau ^K_1} b(x_0)(1-q_K)(1-p_K)\frac{\langle \nu ^K_{Ks},1\!\!1_{x_0}\rangle }{\big (\langle \nu ^K_{Ks},\mathbf{1}\rangle +\frac{1}{K}\big )^2} \nonumber \\&\times \, \int _{\mathcal {U}} \pi ^K_{Ks}(x_0,du) \big (\phi (u)-\langle \pi ^K_{Ks}(x_0,.),\phi \rangle \big )^2\nonumber \\&+\,\big (d(x_0)+\eta (x_0) C*X^K_{Ks}(x_0)\big )\frac{\langle \nu ^K_{Ks},1\!\!1_{x_0}\rangle }{\big (\langle \nu ^K_{Ks},\mathbf{1}\rangle -\frac{1}{K}\big )^2}\nonumber \\&\times \, \int _{\mathcal {U}} \big (\phi (u)-\langle \pi ^K_{Ks}(x_0,.),\phi \rangle \big )^2\nonumber \\&+\, b(x_0)q_K(1-p_K) \frac{\langle \nu ^K_{Ks},1\!\!1_{x_0}\rangle }{\big (\langle \nu ^K_{Ks},\mathbf{1}\rangle +\frac{1}{K}\big )^2}\nonumber \\&\times \int _\mathcal {U}\pi ^K_{Ks}(x_0,du) \big (\phi (u+h)-\langle \pi ^K_{Ks}(x_0,.),\phi \rangle \big )^2 \end{aligned}$$
(4.17)

The computation shows that the order of the quadratic variation of \(\pi ^K_{t}\) is \({1\over K}\). Thus at time scale \(K t\), this order will be \(1\). That justifies Assumption (2.4) for \(p_{K}\) which is the only choice to get a non degenerate diffusive limit.

Let us introduce a process \((\widetilde{\pi }^{K,x_0}_t(du), t\ge 0)\) coupled with \((\pi ^{K}_t(x_0,du),t\ge 0)\), on the same probability space and driven by the same Poisson point measures, that satisfies the following properties. The dynamics of \(\widetilde{\pi }^{K,x_0}_.(du)\) is given by (4.16)–(4.17) but without the stopping times \(\tau _1^K\) and we have that \(\forall t\ge 0,\ \pi ^{K}_{K(t\wedge \tau _1^K)}(x_0,du)=\widetilde{\pi }^{K,x_0}_{K(t\wedge \tau _1^K)}(du)\). In a nutshell, \((\widetilde{\pi }^{K,x_0}_t,t\ge 0)\) corresponds to the process \((\pi ^{K}_t(x_0,.),t\ge 0)\) that is obtained by setting the trait mutation kernel to the Dirac mass at 0.

Thanks to (2.5), (4.15) and using that \(\widetilde{\pi }^{K,x_0}\) is a probability-valued process, (4.16) and (4.17) imply that for any \(\phi \in \mathcal {C}(\mathcal {U},\mathbb {R})\), the distribution sequence of \((\langle \widetilde{\pi }^{K,x_0}_{K.},\phi \rangle ; K\in \mathbb {N}^*)\) is uniformly tight in \(\mathbb {D}([0,T],\mathbb {R})\). By Roelly’s criterion (1986, Theorem 2.1), this implies the tightness of the sequence of the laws of \((\widetilde{\pi }^{K,x_0}_{K.} ; K\in \mathbb {N}^*)\) in \(\mathbb {D}([0,T],\mathcal {P}(\mathcal {U}))\).

Let us consider a limiting value \((\bar{\pi }_t(du) ; t\in [0,T])\) of the tight sequence and a subsequence, again denoted by \(\widetilde{\pi }^{K,x_0}_{K.}(du)\), that converges to \(\bar{\pi }_.(du)\). By Assumption (4.15) and since individuals have weight \(1/K\), the limiting laws only charge \(\mathcal {C}([0,T],\mathcal {P}(\mathcal {U}))\).

It remains to identify \(\bar{\pi }_.(du)\). Let \(0<s<t<T\), let \(k\in \mathbb {N}\) and \(0<s_1\le \cdots s_k<s<t\), let \(\phi _1,\cdots \phi _k\) be bounded continuous function on \(\mathcal {P}(\mathcal {X}\times \mathcal {U})\) and \(\phi \in \mathcal {C}(\mathcal {U},\mathbb {R})\). We define the following bounded functional on \(\mathbb {D}([0,T],\mathcal {P}(\mathcal {U}))\)

$$\begin{aligned} \Psi _{s,t}(Y)=&\phi _1(Y_{s_1})\cdots \phi _k(Y_{s_k})\left\{ \langle Y_t,\phi \rangle - \langle Y_s,\phi \rangle -\int _s^t du \int _{\mathcal {U}} Y_u(du)b(x_0)A\phi (u)\right\} \end{aligned}$$

On the one hand, using (4.16), we obtain that

$$\begin{aligned} \Psi _{s,t}(\widetilde{\pi }^{K,x_0}_{K.})=\phi _1(\widetilde{\pi }^{K,x_0}_{Ks_1})\cdots \phi _k^K(\widetilde{\pi }^{K,x_0}_{Ks_k})\left\{ H^{K,\phi }_{Kt}-H^{K,\phi }_{Ks} + \varepsilon _{Kt}^K \right\} \end{aligned}$$

where

$$\begin{aligned} \varepsilon ^K_{Kt}=&\int _0^{t} ds \int _{\mathcal {U}} \widetilde{\pi }^{K,x_0}_{Ks}(du) \left[ b(x_0) \frac{r_K}{K}\int _{\mathcal {U}} \big (\phi (u+h)-\phi (u)\big )G_K(u,dh)\right] \nonumber \\&-\int _0^{t} ds \int _{\mathcal {U}} \widetilde{\pi }^{K,x_0}_{Ks}(du) b(x_0) A \phi (u) \end{aligned}$$
(4.18)

tends to 0 in \(\mathbb {L}^1\) when \(K\rightarrow +\infty \). Thus,

$$\begin{aligned} \lim _{K\rightarrow +\infty }\mathbb {E}\Big (\Psi _{s,t}(\widetilde{\pi }^{K,x_0}_{K.})\Big )=0. \end{aligned}$$
(4.19)

On the other hand, using (2.9) and the convergence of \((\widetilde{\pi }^{K,x_0}_{K(.\wedge \tau ^K_1)}(du) ; K\in \mathbb {N}^*)\) to \(\bar{\pi }\in \mathcal {C}([0,T],\mathcal {M}_F(\mathcal {U}))\), we get

$$\begin{aligned} \mathbb {E}\big (\Psi _{s,t}(\bar{\pi })\big )=\lim _{K\rightarrow +\infty }\mathbb {E}\big (\Psi _{s,t}(\widetilde{\pi }^{K,x_0}_{K.}(du))\big ). \end{aligned}$$
(4.20)

This shows that \(\mathbb {E}\big (\Psi _{s,t}(\bar{\pi })\big )=0\) and hence the process \(M^{x_0}(\phi )\) defined in (3.4) is a martingale obtained as the uniform limit in time of \(H^{K,\phi }_{Kt}\), when \(K\rightarrow +\infty \). Moreover, the bracket (4.17) converges to

$$\begin{aligned}&\int _0^t \frac{b(x_0)+d(x_0)+\eta (x_0) \widehat{n}_{x_0}}{\widehat{n}_{x_0}}\int _{\mathcal {U}} \bar{\pi }(du)\Big [ \big (\phi (u)-\langle \bar{\pi }_{s},\phi \rangle \big )^2 \Big ] ds \nonumber \\&\quad = \int _0^t \frac{2b(x_0)}{\widehat{n}_{x_0}} \Big [ \langle \bar{\pi }_s,\phi ^2\rangle -\langle \bar{\pi },\phi \rangle ^2 \Big ] ds. \end{aligned}$$
(4.21)

Indeed, the integral in (4.17) can be separated into two integrals, one between \(0\) and \(\frac{\log K}{K} \wedge t \wedge \tau _1^K\) and the other between \(\frac{\log K}{K} \wedge t \wedge \tau _1^K\) and \(t \wedge \tau _1^K\). The second integral converges to (4.21), but some caution is needed for the first integral since the ratios \(\langle \nu ^K_{Ks},1\!\!1_{x_0}\rangle /\big (\langle \nu ^K_{Ks},\mathbf{1}\rangle \pm \frac{1}{K}\big )^2\) are of order \(K\). Using the same arguments as for (4.12), we can upper bound the integral between \(0\) and \(\frac{\log K}{K} \wedge t \wedge \tau _1^K\) by

$$\begin{aligned} C K \int _0^{\log K/K} s q_K ds=C\frac{(\log K)^2}{K} q_K\rightarrow _{K\rightarrow +\infty } 0. \end{aligned}$$

Using Theorem 3.12 p. 432 of Jacod and Shiryaev (1987), that provides the convergence of \(H^{K,\phi }_{K.}\) to the solution of the martingale problem (3.4)–(3.5) with \(x=x_{0}\).

By the independence of \(\widetilde{\pi }^{K,x_0}_{K.}(du)\) and \(\tau _1^K, \tau _1\) is independent of  \(\bar{\pi }_.(du)\) and \(\bar{\pi }_{.\wedge \tau _1}=\pi ^{u_0}_{.\wedge \tau _1}(x_0,du)\). This concludes the proof. \(\square \)

4.4 Conclusion

Using Theorem 4.2, Proposition 4.5 and 4.7, we prove the first part of Theorem 3.4, for the convergence in finite distribution. Let us now consider the convergence in the space of trait-marker-time measures.

Corollary 4.8

The family \((\nu ^K_{Kt}(dx,du)\ dt, K\in \mathbb {N}^*)\) is uniformly tight in \(\mathcal {M}_F(\mathcal {X}\times \mathcal {U}\times [0,T])\) embedded with the weak convergence topology and converges in distribution to the measure \(V_t(dx,du)dt\), where \(V\) is defined in Theorem 3.4.

Proof

Since the space \(\mathcal {X}\times \mathcal {U}\times [0,T]\) is compact, it is sufficient to prove that

$$\begin{aligned} \sup _{K\in \mathbb {N}^*} \mathbb {E}\left( \int _0^T \langle \nu _{Kt}^K,1\rangle dt\right) <+\infty \end{aligned}$$

which is a consequence of Fubini’s theorem since

$$\begin{aligned} \mathbb {E}\left( \int _0^T \langle \nu _{Kt}^K,1\rangle dt\right) \le T \sup _{K\in \mathbb {N}^*, t\in \mathbb {R}_+}\mathbb {E}\big (\langle \nu ^K_t,1\rangle \big ). \end{aligned}$$

Estimate (2.9) concludes the proof of tightness.

Let us now consider continuous functions \(\phi \in \mathcal {C}(\mathcal {X}\times [0,T],\mathbb {R})\) and \(g\in \mathcal {C}(\mathcal {U},\mathbb {R})\), where the stopping times \(\tau ^K_k\) and \(\theta ^K_k\) have been introduced in Sect. 4.3. Then

$$\begin{aligned}&\int _0^T \int _{\mathcal {X}\times \mathcal {U}} \phi (x,t)g(u)\nu ^K_{Kt}(dx,du)\ dt = \int _0^T \langle \pi ^K_{Kt}(x,.),g\rangle \phi (x,t)X^K_{Kt}(dx)dt\nonumber \\&= \sum _{k\ge 0} \int _{(\theta ^K_k\wedge T)/K}^{(\tau ^K_{k+1}\wedge T)/K} \langle \pi ^K_{Kt}(x,.),g\rangle \phi (x,t)X^K_{Kt}(dx)dt\nonumber \\&\quad +\int _{(\tau ^K_{k+1}\wedge T)/K}^{(\theta ^K_{k+1}\wedge T)/K} \langle \pi ^K_{Kt}(x,.),g\rangle \phi (x,t)X^K_{Kt}(dx)dt. \end{aligned}$$
(4.22)

The limit (4.13) implies that the second term of the right hand side of (4.22) converges to 0. Given \(X^K\), the processes \((\pi ^K_{Kt}(x,.) ; t\in [\theta ^K_k/K, \tau ^K_{k+1}/K))\), for \(k\ge 0\), in the first term of the r.h.s. of (4.22) are independent and, by Proposition 4.7, they converge in distribution in \(\mathbb {D}([0,T],\mathbb {R})\) to the Fleming–Viot processes (3.4)–(3.5) with the initial conditions described by the jumps of the extended TSS \((Y,U)\). Corollary 4.4 and dominated convergence theorem allows us to conclude the proof. \(\square \)