10.1 Introduction

The amount of genetic diversity and polymorphism at the molecular level (in DNA) between individuals is a cornerstone of modern evolutionary theory because it determines the phenotypic diversity on which selection acts, and it is thus key for predicting the adaptive potential of any species [7, 13, 40]. Our ability to predict evolutionary changes has implications for conservation biology, disease management in medicine and agriculture, plant breeding and future ecosystems management in response to global change [1, 35, 40]. Theoretical and empirical evolutionary genetics describe three main factors controlling the amount of neutral and selective genetic diversity: the species effective population size, mutation and linked selection (Fig. 10.1).

Fig. 10.1
figure 1

Overview of the determinants of genetic diversity in plant species (adapted from [13]). In black the key evolutionary factors, in green the key ecological factors, and in blue the effects of seed banking. The sign on the arrows indicates the direction of correlation to genetic diversity

The effective population size is determined directly by species characteristics such as spatial structure, past demographic events (population increase or decrease, [35, 40]) and life span (longevity), as well as the census size, i.e. the number of observed individuals (Fig. 10.1). Several life history traits affect directly the census size, such as the size of propagules in animals [45], or in plants crucial ecological factors such as the geographic range, ecological habitat, density and abundance [47] and seed production [17].

The mutation rate is a characteristic of the species which may vary with latitude and longitude, e.g., due to exposure to UV light. It is also important to quantify if mutations occurs in the resting or dormant stages of some species: seeds for plants, eggs for insects and crustaceans (e.g. Daphnia), spores in bacteria and fungi [12, 26, 37, 38, 42].

Linked selection is a function of the strength of selection and the recombination rate (Fig. 10.1, [59, 60]). The strength of selection depends on the trait affected by selection and the advantage (or disadvantage) in a given environment compared to other genotypes. By linkage disequilibrium (LD, e.g. [8]) between genomic sites, natural selection affects (1) neutral genomic diversity around the targets of selection, and (2) the efficiency of selection at other neighbouring sites (or loci), the latter is termed the Hill-Robertson effect [25]. The extent of LD depends on the recombination rate with higher rates decreasing the effect of linked selection on diversity (Fig. 10.1, [11, 13]). In flowering plants recombination rate has been primarily studied as determined by the mode of reproduction (selfing, outcrossing) and the reproductive characteristics (Hermaphrodite, Dioecy or Monoecy) [9, 19, 20, 46].

Higher mutation rates and larger effective population size increase genetic diversity, while increased linked selection has the opposite effect (Fig. 10.1). The joint action of these three factors explains the occurrence of the so-called Lewontin paradox [39], stating that the census size of a species, N, defined as the number of individuals observed, does counterintuitively not correlate with genetic diversity. Census sizes can vary over several orders of magnitude between species, e.g. between elephants, humans and crustaceans, while genetic diversity varies only by two to three orders of magnitude in animals [11, 13, 36, 39, 45]. Recent studies show that such a discrepancy between expected and observed genetic diversity arises because selection counteracts the effect of a large population size on diversity: in large populations, positive and purifying selection are more efficient thus increasing the effect of linked selection and the Hill-Robertson effect [11, 13]. Resolving this paradox is instrumental to understand the adaptability of species in the current age of global climatic and anthropogenic changes [1, 36].

In plants, in particular in angiosperms (flowering plants), despite the recent progress in sequencing and the large amount of available genome data, we still do not know the relative importance of the several ecological and life history traits highlighted in Fig. 10.1 in shaping genetic diversity [11, 13, 36]. A key characteristic of many plant species is generally ignored: the ability of forming persistent seed banks with seeds remaining in the soil for many years. It is classically observed as plant adaptation to unpredictable semi-arid to desertic habitats, but is found also in many species in temperate climate in Eurasia and North-America [17, 52]. As an ecological adaptation, seed persistence has major consequences on the genomic evolution by increasing genetic diversity and also by buffering against fast ecological changes in population sizes and preventing population extinction. Seed persistence is therefore a key life-history trait linking the ecology of a species with its abundance, the production and survival of seeds over years, and consequently defining the census size above ground and the available genetic diversity (effective population size). To give a first general idea of its potential importance (see details below), let us define the germination rate, b, as the probability (0 < b ≤ 1) that a seed will germinate on average in the next year. Theory predicts that persistent seed banks increase genetic diversity by a factor 1∕b 2 and the recombination rate by a factor 1∕b. If the germination rate is realistically small (b < 0.5), the effect of seed banking becomes prominent on genetic diversity and recombination rates [10, 14, 34, 62].

In this chapter we describe the effect of seed banking on shaping genetic diversity in plant species. We first build here backward and forward population genetics models which allow us to derive statistics as the expected time of coalescence, the time of fixation and the site frequency spectrum (SFS) of a sample (Sect. 10.2). We then derive a set of predictions that should be observable in polymorphism data from diverse plant species for neutral diversity (Sect. 10.3) and for natural selection (including footprints of selection in the genome, Sect. 10.4).

10.2 Model Description

Seed banking is an evolutionary strategy termed as bet hedging because it consists in reducing short-term reproductive success in favour of longer-term risk reduction to maximize individual fitness over time in variable and unpredictable environments [10, 14, 15, 54, 55]. It is also commonly found in bacteria, fungi, protozoans (including human parasites) [27, 37] and invertebrates (e.g. Daphnia, [42]), thus our predictions can be extended to these species. This ecological trait is classically observed as plant adaptation to unpredictable semi-arid to desertic habitats [55], but is found also in many species in temperate climate in Eurasia and North-America [2, 17]. A non-persistent seed bank is defined as seeds remaining in the soil for up to 5 years (hereafter defining non-seed banking, non-SB species), while persistent seed bank is longer (defining seed banking, SB species, [2, 17]). Note that seed bank persistence is independent of dormancy [17]. While dormancy is defined as the best timing for germination within a year, seed banking occurs over many years and seeds may alternate between periods of dormancy and non-dormant stages when in the soil.

In population genetics, it is common practice to either model an entire population forwards in time within a diffusion framework [16] or to trace a sample of a population backwards in time by applying coalescent theory [30]. In both cases a population of large and finite size N is initially assumed that evolves on a discrete generation-wise time scale and whose reproduction mechanism will basically follow a Wright–Fisher model throughout this chapter. In the backwards formulation of the usual non-SB version of this model, the entire population is replaced after reproduction and descendants pick their ancestors by random sampling from the foregoing generation, while in its SB version, ancestors can be chosen from the previous and up to m generations according to the probabilities b 1, …, b m. In the dual prospective perception of time, these probabilities determine when ancestors give rise to their descendants within the subsequent m generations. We focus throughout on the average germination rate b, which is the inverse of the mean time a seed will spend in the bank, i.e. \(b=1/\sum _{i=1}^{m}i\,b_i\). In practice, the seed bank age distribution is often assumed to follow a geometric distribution, so that seeds are more likely to germinate at earlier than later ages.

Since it is mathematically favourable to treat the ancestral (or the equivalent reproductive) process on a continuous time-scale, time is scaled in units of N generations (in a haploid model as assumed throughout) while N →. In the retrospective setting, for instance, the waiting times until two sampled individuals coalesce into a common ancestor can be geometrically distributed in the discrete, and exponentially distributed in the continuous model. Mutations are the main source of variation among the individuals of a population. As rare events these are assumed throughout to follow a Poisson distribution with a scaled mutation rate, θ =limN2b 2, and that each of them arises on a previously unmutated (or monomorphic) site according to the infinitely-many sites model of Kimura [33]. In other words, the effective population size, N e, consists of all plants and seeds, while the census size is only consisting of living plants, so that we obtain the relation N e = Nb 2 [28, 43]. In other words, mutations are assumed to arise in seeds and living plants [12, 58]. Alternatively, mutations may not arise in seeds, so that the scaled mutation rate reduces by a factor b [52, 62]. Except in the case of linked selection, all polymorphic sites will be assumed to be unlinked (due to recombination in between them) and therefore independent of each other.

10.2.1 The Retrospective Coalescent View for Neutrally Evolving Sites

Kaj et al. [28] introduced an urn model to describe neutral seed bank dynamics for a population of a constant size, where the population of a new generation is formed via multinomial sampling from the m previous generations in this initially discrete setting. A sample represented as balls and consisting of n balls at present is repeatedly relocated across the previous generations by sliding a window comprising the m consecutive generations as cells in a stepwise manner. Sliding the window one generation backwards, all balls from the first cell of the previous window are relocated into one of the m cells of the actual window according to the probabilities b 1, …, b m. More precisely, each ball is relocated into one of the N slots of a given cell, each representing an individual of the population in the respective generation. Thereby, two types of coalescent events may occur in the sample’s history: either two balls are placed into the same slot of the same cell or a ball is placed into a previously occupied slot. The probability of one coalescent event at a time is O(1∕N), while more than one coalescent event at a time occur with the negligible probability of O(1∕N 2). Therefore, coalescences occur on a ‘slow’ timescale of O(N) steps, while the relocation process runs on a ‘fast’ time scale making the separation of time scales possible.

The ancestral process of the discrete seed bank model is denoted by \((A_n^N(k))_{k\geq {}0}\), where \(A_n^N(k)\) is the number of ancestors at step k with population size N and initial sample size n. It has been shown [28] that the continuous seed bank model is the n-coalescent [30] run on a slower time-scale by proofing that the time-rescaled ancestral process \((A_n^N([Nt]))_{t\geq {}0}\) converges as N → to the continuous-time Markov chain (A n(t))t≥0 with infinitesimal generator matrix Q = (q ij)i,j ∈{1,…,n} defined by

$$\displaystyle \begin{aligned} \begin{array}{rcl} \begin{array}{llrl} q_{ii}&=&-b^2{i\choose{}2},&\quad {}2\leq{}i\leq{}n,\\ q_{ii-1}&=&b^2{i\choose{}2},&\quad {}2\leq{}i\leq{}n,\\ q_{ij}&=&0,&\quad {}\mathrm{otherwise}. \end{array} {} \end{array} \end{aligned} $$
(10.1)

From Eq. (10.1) one can derive the probability that the process A n(t) is in the state of j = n, …, 2 ancestors at time t via a matrix decomposition (e.g. [61]) to obtain

$$\displaystyle \begin{aligned} P(A_n(t)=j)&=\sum_{k=j}^{n}c_{nk}r_{kj}\exp\left(-b^2{k\choose{}2}t\right), {} \end{aligned} $$
(10.2)

where \(c_{nk}={n\choose {}k}k_{(k)}/n_{(k)}\) and \(r_{kj}=(-1)^{k-j}{k\choose {}j}j_{(k-1)}/k_{(k-1)}\) are the elements of the matrices of column and row eigenvectors of Q, respectively, and a (b) = a(a + 1)⋯(a + b − 1), a (0) = 1. The mean waiting times between coalescent events are given by

$$\displaystyle \begin{aligned} E(T_j)&=\int_{0}^{\infty}P(A_n(t)=j)dt. {} \end{aligned} $$
(10.3)

as the inverse of the coalescent rate.

10.2.2 The Prospective Diffusion Framework for Neutral and Selected Sites

Following allele frequencies in the limit N → forward in time leads to diffusion approximations with time and allele frequencies being again measured on a continuous scale (i.e. in units of N generations). We assume two allelic types A and a with frequencies x and 1 − x. The advantage of the diffusion framework over the coalescent setting is that selection can be more straightforwardly taken into account. The effect of weak selection on the fertility of plants and on the fraction of surviving seeds can be summarized into a coefficient s, so that the scaled coefficient in the haploid model is given by σ = N s. By making use of a perturbation approach, it has been shown [34] that in the diffusion limit, as N →, the probability f(y, t)dy that the type-A genotype has a frequency in (y, y + dy) is determined by the following forward equation (see [31] for the non-SB model):

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \frac {\partial}{\partial t} f(y,t) &\displaystyle =&\displaystyle - \frac {\partial}{\partial y}\left\{ \mu(y)\,f(y,t)\right\} +\frac 1 2 \frac {\partial^2}{\partial y^2} \left\{ \sigma^2(y)\, f(y,t)\right\}, \end{array} \end{aligned} $$
(10.4)

where the drift and the diffusion terms are, respectively, given by μ(y) = σ b y(1 − y) and σ 2(y) = b 2 y(1 − y). For neutrality, μ(y) = 0 so that the exclusive consequence of genetic drift is characterized by the diffusion term. For the derivation of the time to fixation and the SFS we require the following measures. The scale density of the diffusion process is given by

$$\displaystyle \begin{aligned} \xi(y)=\exp\left(-\int_{0}^{y}\frac{2\mu(z)}{\sigma^2(z)}dz\right). \end{aligned}$$

The speed density is obtained (up to a constant) as π(y) = [σ 2(y)ξ(y)]−1 and the probability of absorption at y = 0 is given by

$$\displaystyle \begin{aligned} u_0(x)=\frac{\int_x^1\xi(z)dz}{\int_0^1\xi(z)dz} \end{aligned}$$

and u 1(x) = 1 − u 0(x) gives the probability of absorption at y = 1.

Assuming that both y = 0 and y = 1 are absorbing states the mean time \(\bar {t}\) until one of these states is reached is given by [16]

$$\displaystyle \begin{aligned} \bar{t}(x)=\int\limits_0^1{}t(x,y)dy, \end{aligned} $$
(10.5)

where

$$\displaystyle \begin{aligned} \begin{array}{rcl} t(x,y)&\displaystyle =&\displaystyle 2\,u_0(x)[\sigma^2(y)\xi(y)]^{-1}\int\limits_0^y{\xi(z)dz}, \quad {}0\leq{}y\leq{}x,\\ t(x,y)&\displaystyle =&\displaystyle 2\,u_1(x)[\sigma^2(y)\xi(y)]^{-1}\int\limits_y^1{\xi(z)dz}, \quad {}x\leq{}y\leq{}1.\\ \end{array} \end{aligned} $$

The time until a mutant allele is fixed conditional on fixation can be evaluated as

$$\displaystyle \begin{aligned} \bar{t^*}(x)=\int_0^1{}t^*(x,y)dy, \end{aligned} $$
(10.6)

where t (x, y) = t(x, y)u 1(y)∕u 1(x).

10.2.3 Statistical Measures for the Analysis of Genomic Data

The SFS is one of the most commonly used statistics for the analysis of genomewide distributed SNPs. It is defined as the distribution of the number of times i a mutation is observed in a population or a sample of n sequences conditional on segregation. In the coalescent setting the SFS is (either theoretically or empirically) evaluated for a sample at the present time. In the diffusion framework, the SFS of a population or a sample can often even be derived in dependence of a time variable t and an equilibrium solution can either be implied by letting t → or by making use of the measures from the foregoing section. Although the forward version of the SFS has interesting applications on time-series data (i.e. DNA sequences sampled over time), we will, for convenience, focus throughout on coalescent formulations and results and only mention equilibrium solutions with reference to diffusions.

Let the site frequencies be denoted as f n,i (1 ≤ i ≤ n − 1), where the index n shall indicate that the SFS depends on the sample size (except in the case of neutrality and a constant population size). It has been shown [22, 63] that the following equation holds for general binary coalescent trees, i.e. ancestral trees with pairwise coalescences at a time with arbitrary continuous waiting times and following the mutation model given above:

$$\displaystyle \begin{aligned} f_{n,i}=\frac{\theta}{2}\sum_{k=2}^{n-i+1} k \frac{ {n-i-1 \choose k-2}}{{n-1 \choose k-1}} E(T_{k}). {} \end{aligned} $$
(10.7)

Two related measures of the SFS are the expected number of segregating sites S n equalling the total number of mutations in the infinitely-many sites model and the average number of pairwise differences Π n. The relationships are

$$\displaystyle \begin{aligned} \begin{array}{rcl} S_n&\displaystyle =&\displaystyle \sum_{j=1}^{n-1}f_{n,j},\\ \varPi_n{}&\displaystyle =&\displaystyle \frac{1}{{n\choose{}2}}\sum_{j=1}^{n-1}j(n-j)f_{n,j}, \end{array} \end{aligned} $$

and the expectations of both quantities can be easily obtained via Eq. (10.7). One can also derive the normalized version of the SFS as r n,i = f n,iS n.

In the diffusion framework the equilibrium version of the SFS as the proportion of sites where the mutant frequency is in (y, y + dy) is given by [21]

$$\displaystyle \begin{aligned} \hat{f}(y)=\theta\,\pi(y)\,u_0(y), \end{aligned} $$
(10.8)

and the finite version can be immediately obtained via binomial sampling as

$$\displaystyle \begin{aligned} \hat{f}_{n,i}=\int\limits_0^1{n\choose{}i}y^i(1-y)^{n-1}\hat{f}(y)dy. \end{aligned} $$
(10.9)

10.3 Effect of Seed Banking on Neutral Evolutionary Processes

The major effect of seed persistence is to increase genetic diversity by a storage effect of seeds in the soil. In coalescent theory terms, seed banks increase the time for two lineages to coalesce by a factor 1∕b 2 under any neutral model.

10.3.1 Constant Population Size

The expected time to coalescence is simply given by

$$\displaystyle \begin{aligned} E(T_j)=\left(b^2{j\choose{}2}\right)^{-1}, \end{aligned}$$

which can, e.g., be derived by applying Eq. (10.2) to (10.3). The consideration of the average germination rate b diminishes the rate of genetic drift (Fig. 10.3, [28]) and is bounded as 1∕m ≤ b ≤ 1. The lower and upper bounds result from the scenarios, where all seeds, respectively, rest m and one generation in the bank. So the expected coalescent tree can be up to m 2 generations longer in the SB model compared to the usual non-SB Wright–Fisher model. Persistent seed banks, acting as a genetic storage, increase diversity by a factor 1∕b 2. For a neutral model of a constant population size, Eq. (10.7) simply yields f n,i = θ∕(b 2 i).

Applying diffusion theory, the population and the sample SFS can be, respectively, obtained as \(\hat {f}(y)=\theta /(b^2\;y)\) and \(\hat {f}_{n,i}=\theta /(b^2\;i)\) by applying Eqs. (10.8) and (10.9), which are equivalent to the coalescent results. The mean time to absorption and the time to fixation can be, respectively, derived via Eqs. (10.5) and (10.6) as \(\bar {t}(x)=-2/b^2\,(x\,\log (x)+(1-x)\,\log (1-x))\) and \(\bar {t^*}(x)=-2/b^2(1 - x)/x \log (1 - x)\). Therefore, the mean time to absorption/loss and the time to fixation are both slowed down in the SB model by a factor of 1∕b 2 compared to the non-SB model.

10.3.2 Estimation of Past Demographic Events

The change in coalescent rate due to SB decreases the strength of genetic drift, lengthens the time of fixation of neutral alleles and thus diminishes the genetic differentiation between populations [56]. This consequently affects the inference of past demography of a population or a species [62]. In the following, we present a simplified demographic setting that allows us to make the results of Sects. 10.2.1 and 10.2.3 applicable via a simple time rescaling argument. We assume that plants and seeds of all age classes are equivalently affected by changes in the population size such that the relative proportions of all type of seeds, and therefore the seed bank age distribution b 1, …, b m and its average germination rate b remain constant over time. Furthermore, the population size changes are assumed to occur on a coalescent time scale so that the relocation process can reach an equilibrium between coalescent events as in the case of a constant population size. This particularly implies that the population size is approximately constant over any given time window, i.e. for a given m-window k 0 the relative population size function (as scaled by the population size N at present time) at the i-th cell ρ N(i + k 0 − 1) = N(i + k 0 − 1)∕N ≈ ρ N(k 0). This simplification holds for a geometrically growing population, if the growth rate and m are chosen realistically small. In the case of an instantaneous population decline, this relationship is violated, but only for m − 1 generations, so that for small m instantaneous changes within a window can be neglected due to the small corresponding coalescence probability for large population sizes.

In continuous time, let ρ(t), which arises from ρ N([Nt]) as N → and time being measured in units of N generations, be piecewise continuous, bounded and follow the conventions in discrete time. The time-rescaling argument (in terms of the harmonic mean of the relative population sizes) for the coalescent approximation of the non-SB Wright–Fisher model [32], \(t\rightarrow \int _{0}^t\rho (s)^{-1}ds\), can then be applied to the ancestral process (A n(t))t≥0 to obtain the process with time-varying population size \((A_n^{\rho }(t))_{t\geq {}0}\). Therefore, the corresponding results to Eqs. (10.2) and (10.3) are, respectively, given by Živković and Tellier [62]

$$\displaystyle \begin{aligned} P(A_n^{\rho}(t)=j)&=\sum_{k=j}^{n}c_{nk}r_{kj}\exp\left(-b^2{k\choose{}2}\int_{0}^t\rho(s)^{-1}ds\right) {} \end{aligned} $$
(10.10)

and

$$\displaystyle \begin{aligned} E(T^{\rho}_j)&=\int_{0}^{\infty}P(A_n^{\rho}(t)=j)dt. {} \end{aligned} $$
(10.11)

The SFS can be simply obtained by applying Eq. (10.11) to (10.7) and applied for the estimation of demographic parameters, which is only feasible to some extent with prior knowledge on the average germination rate b [62]. The SFS can also be obtained within the diffusion framework (e.g. via a moment based approach) to solve a time inhomogeneous-version of Eq. (10.4) with the diffusion term given by σ 2(y, t) = b 2 y(1 − y)∕ρ(t) and the drift term μ(y) = 0 (see [61] for non-SB models). As an application of these results, it has been shown in a seminal study [52] that b can be estimated by applying an Approximate Bayesian Computation (ABC) method to polymorphism data with prior knowledge on metapopulation structure and N using ecological data.

10.3.3 Seed Banks Decrease the Rate of Divergence Between Populations/Species

As the coalescent tree is longer by a factor 1∕b 2, we also predict that the inference of recent splits between populations (or species) is affected (Fig. 10.2). Seed banking decreases the genetic differentiation, measured as F st, in spatially structured populations [49]. Note that the population migration rate should be multiplied by b if pollen only disperse, and not if pollen and seeds disperse. Ignoring seed banking thus influences the estimation of migration rates if the appropriate dispersal scaling is not taken into account. On a longer time scale, species with persistent seed banks may exhibit higher rates of shared alleles between divergent species, the so-called Incomplete Lineage Sorting (ILS), which is obscuring the phylogenetic signal of speciation (red mutations in Fig. 10.2). For example, several species in the tomato clade show evidence for seed banking [43], and high rates of ILS are found [44].

Fig. 10.2
figure 2

In a model of speciation (without gene flow) from an ancestor into two incipient species, persistent seed banks increase the amount of incomplete lineage sorting due to the lower rate of genetic drift in species 2 (SB) compared to species 1 (non-SB). The number of shared alleles (red mutations) between species with and without seed bank is increased by seed banking compared to private alleles (black and grey crosses)

10.3.4 Seed Banks Increase the Rate of Recombination

Seed banking decreases the recombination rate per year because only seeds germinating with probability b can undergo recombination upon gamete production. However, as the overall coalescent tree is longer by a factor 1∕b 2, the net effect of longer seed persistence is to increase the amount of recombination events per locus by a factor 1∕b (Fig. 10.3). We can thus predict that at equal census size SB species compared to non-SB species should exhibit (1) a higher genetic diversity (number of SNPs), and (2) a higher recombination rate per locus (Fig. 10.3). Using prior information on N, nucleotide diversity and recombination rate per gene, it should thus be possible to estimate b for any given species with persistent seed bank.

Fig. 10.3
figure 3

Effect of a persistent seed bank on the length of the coalescent tree, diversity and recombination rate per gene. The comparison is made for two species of equal census size: one without (b = 1, left panel) and one with seed banking (b = 0.5, right panel). The number of segregating sites S 3 is increased by the seed bank (grey crosses). The per gene scaled recombination (ρ) and mutation (θ) rates are exemplarily given. The branches in grey are generated by recombination events

10.4 Effect of Seed Banking on Selective Processes

Seed banks slow down natural selection, and decrease the Hill-Robertson effect. Persistent seed banking slows down the action of positive (and purifying) selection, so that it takes longer for a selected allele to get fixed in a population [23, 34, 49]. However, we have recently demonstrated that even though selection is slower under more persistent seed banks, the strength of selection is enhanced when observed at equilibrium (Fig. 10.4 for positive selection, [34]). This is explained by selection acting on every lineage germinating from the seed bank at rate b, whereas genetic drift acts on the time scale of b 2.

Fig. 10.4
figure 4

Effect of a persistent seed bank on signatures of selection. (left panel) An increase of the strength of positive selection (σ = 2) among independent SNPs is illustrated in terms of the normalized SFS, \(\hat {r}_{n,i}=f_{n,i}/S_n\); for species without (black) and with (white, b = 0.5) seed bank. (right panel) Persistent seed banks increase the strength of selection and recombination rates, so that narrower but deeper selective sweep signatures are expected around the target of selection in SB versus non-SB species

10.4.1 Selection on Unlinked Sites

As already mentioned above, it is more straightforward to model selection within the diffusion framework than via coalescent theory. For non-SB models, a partly numerical and partly explicit solution of Eq. (10.4) can be obtained by finding a spectral representation of f(y, t) for a constant population size [48] and even for piecewise population size changes [64]. Since the average germination rate b is included as a linear factor in Eq. (10.4), the underlying method can even be extended to SB models. For convenience, we will only present the equilibrium solution of the SFS for a constant population size, which anyway demonstrates the main affect of selection on seed banks.

The population and the sample SFS are, respectively, given via Eqs. (10.8) and (10.9) as

$$\displaystyle \begin{aligned} \hat{f}(y)=\frac{\theta}{b^2 y(1-y)}\,\frac{1-\exp(-2\sigma (1-y)/b)}{1-\exp(-2\sigma/b)} \end{aligned}$$

and

$$\displaystyle \begin{aligned} \hat{f}_{n,i}= \frac{\theta n}{b^2 i(n-i)}\frac{1-{}_1F_1(i;n;2\sigma/b)e^{-2\sigma/b}}{1-e^{-2\sigma/b}}. \end{aligned}$$

For the mean time to absorption and the time to fixation one can only obtain onerous equations [34], which as well as the results for the SFS show that seed banks enhance the effect of selection (Fig. 10.4, left panel) while the considered allele takes longer to reach that equilibrium state.

10.4.2 Selection on Linked Sites

Here we propose four novel theoretical predictions which have not yet been tested with experimental data.

  1. 1.

    As selection is strong and recombination rates are higher in SB species, classic selective sweeps[8, 29] will exhibit a strong but reduced extent hitchhiking signature [41] around the target site (Fig. 10.4, right panel).

  2. 2.

    As seed banking generates higher nucleotide diversity and slows down positive selection, it is also expected that positive selection acts on standing genetic variation and incomplete sweeps (in which the selected allele has not reached fixation) should be observed (the so-called soft sweeps, [24, 57]).

  3. 3.

    Seed banks also favour the maintenance of polymorphism and enhance the strength of balancing selection [49, 51, 53], which should thus be more observable in SB species.

  4. 4.

    The Hill-Robertson effect [25] should be reduced in outcrossing SB compared to outcrossing non-SB species. Purifying selection is also predicted to be efficient and narrow around the target sites (decreasing the extent of background selection, [5, 6]) in SB species (Fig. 10.5). This yields from the enhanced recombination rate compared to the coalescent rate and the increased strength of selection under seed banking.

    Fig. 10.5
    figure 5

    Effect of a persistent seed bank on increasing pervasive purifying selection and decreasing the effect of background selection in the genome when comparing SB (black line) and non-SB (grey line) species. Genes with similar recombination rates are binned (see [4]). Nucleotide diversity at synonymous sites increases faster for increasing recombination rates in SB than non-SB (left panel), diversity at non-synonymous sites is driven by more efficient selection in SB versus non-SB (middle panel) predicting the ratio Π AΠ S to differ across recombination rates between SB and non-SB species (right panel)

Positive and purifying selection are thus predicted to be more efficient and show a reduced LD signature in SB than in non-SB species. In SB species, positive selection should be pervasive across the genome, signatures of classic selective sweeps would be narrow around the target site (Fig. 10.4), and the extent of purifying linked (background) selection should be limited (Fig. 10.5). More selective sweeps are also expected to arise from standing genetic variation and/or be incomplete in SB compared to non-SB species.

10.5 Conclusions

We have built here a model of a so-called weak seed bank for plants species, but also applicable to fungi or invertebrate species producing dormant resting stages. An assumption that we have used in both of our models is namely that the maximum time seeds can spend in the bank, m, is bounded and small compared to the time of coalescence of lineages. This condition ensures that our coalescent model converges to the n-Kingman coalescent [28], and that the separation of time scale and the Markovian property can be used to analyse the diffusion model. Seed banking is thus a unique life-history trait which links ecology and genetic diversity defining the Lewontin paradox in angiosperms. Following the hypotheses and predictions above (summarized in Table 10.1), we predict that SB species exhibit smaller census sizes, i.e. species abundance, but much higher genetic diversity than non-SB species. This could be termed as an inverse Lewontin paradox for plant species when comparing SB and non-SB species, which generates the opposite pattern of diversity versus census size than found in animals. As a corollary, we predict that selfing non-SB species exhibit large census sizes but show the lowest genetic diversity compared to SB species.

Table 10.1 Summary of hypotheses and predictions

An alternative model has been proposed, the so-called strong seed bank, where the time m can be infinite or at least very large compared to coalescent times [3]. These models produce very different predictions regarding the SFS and the effect of recombination, because they do not consist in a rescaled version of the Kingman n-coalescent. Strong seed banks can generate so-called multiple merger coalescents [3, 50] with signatures differing from our classic results for neutral sites. It is suggested that these models may be better adapted to bacteria species with resting stages that can survive for many years (especially for soil bacteria [37]), which is a much larger amount of time compared to their generation time [18]. As more and more full genome intra-species polymorphism data are becoming available in bacteria, fungi, invertebrates (Daphnia, Mosquito) and plants, it is a fascinating prospect to test the derived predictions and to assess which of the weak or strong seed bank models applies to a given species.