Introduction

A fundamental property affecting the fate of any population is the effective population size (N e ). N e describes the operation of genetic drift, rates of inbreeding, and determines a population’s sensitivity to natural selection (Lande 1988; Wang and Caballero 1999). Due to its key role in population processes, estimating N e is of importance not only to theoreticians and evolutionary biologists but also to conservation biologists and natural resource managers seeking to monitor and forecast population viability. There is a wide a variety of N e estimators available, which until recently required populations to be sampled repeatedly over time (Fisher 1930; Wright 1931; Ewens 1979; Nei and Tajima 1981; Waples 1989; Beerli and Felsenstein 2001; Wang 2001; Kuhner 2006). New methods, referred to as ‘single-sample’ N e estimators, relax the requirement for temporal sampling and are therefore of particular interest in the field of conservation biology for monitoring populations and assessing population viability in real-time, rather than having to wait for several generations to produce a single estimate of N e (Waples and Do 2010). Collecting temporally spaced samples from endangered species is particularly challenging, due to the inherent scarcity of individuals and the typically long generation times of vulnerable species. Additionally, threats to species survival can often require immediate action. Thus the application of single-sample N e estimates can assist population managers in making informed and timely recommendations.

Whilst single-sample N e estimation holds great promise for field applications, there is a great need to rigorously evaluate the robustness of these methods to application in non-ideal field scenarios. Many field studies have compared the performance of single-sample N e estimators, and demonstrated that their relative performance is highly situational. Several case studies have shown that onesamp and L d N e produce congruent N e estimates (Hoehn et al. 2012; Jansson et al. 2012; Skrbinsek et al. 2012). Other studies provide contrasting evidence that onesamp estimates are more precise than L d N e (Beebee 2009; Barker 2011; Phillipsen et al. 2011; Gomez-Uchida et al. 2013). Another case study has criticised the accuracy of onesamp because N e estimates were highly correlated with sample size (Johnstone et al. 2013). This variable performance in wild populations is not unexpected because simulations have shown that at least one single-sample N e estimator, L d N e (Waples and Do 2008), is very sensitive to declines in population size (Antao et al. 2011), persistent population fragmentation (England et al. 2010) and dispersal (Waples and England 2011). Without prior knowledge of the true N e and information about potentially confounding population process, such as the pattern and rate of dispersal, it is not possible to gain an accurate understanding of the performance single-sample N e estimators (Chikhi et al. 2010).

In this study we investigated the relative performance of single-sample N e estimators in genetically structured populations, using simulations and also by creating replicated populations of Drosophila melanogaster with controlled dispersal that fulfil the assumptions of the Wright–Fisher model. Our controlled Drosophila experiment allows us to make predictions about the expected Total N e and acts as an intermediate scenario between wild populations with unknown population parameters and simulated populations that may not have full biological realism but conform to most of the assumptions of analytical models used to develop N e estimators. Using real organisms in controlled replicated experiments is an important next step after analytical methods have been evaluated by computer simulation (England et al. 2010; Antao et al. 2011; Waples and England 2011). To our knowledge no studies of N e estimation methods have been conducted using replicated controlled populations of live organisms.

Here we evaluate whether single-sample N e estimates in real and simulated populations are consistent with the values predicted by the Wright–Fisher model when the experimental populations have been maintained to closely reflect ‘ideal’ Wright–Fisher conditions. We also evaluate whether populations with different rates of dispersal (and thus different levels of population structure) experience altered effects of genetic drift and result in different estimates of effective size. We restrict our evaluation to two single-sample estimators: onesamp (Tallmon et al. 2008) and L d N e (Waples and Do 2008) and apply two statistical approaches to estimate the single-sample effective population size. Our work depicts what may be expected in a study of wild populations when the sampling design is limited and analyses are conducted with incomplete knowledge of the underlying population structure.

Materials and methods

Construction of replicated, genetically structured populations

The source population of D. melanogaster was a large wild population, collected from Tyrell’s Winery, Hunter Valley, New South Wales (Australia) in April 2000 (Gunn 2003). Wild caught individuals were used to establish four laboratory populations, each founded by 100 males and 100 non-virgin females. The four laboratory populations are referred to as lines: 3, 4, 17 and 21. All lines were maintained on an instant potato-sugar artificial insect food medium (Holleley et al. 2008).

Each population (line pair) consisted of two subpopulations (s = 2) that were connected by low levels of symmetrical dispersal (Fig. S1 of supplementary material). All subpopulations had a census size (N) of 50 individuals and an equal sex ratio. Each line pair was initialized at generation zero from one of two contrasting scenarios: from lineages that had previously been isolated for approximately 60 generations and thus showed a variable degree of initial differentiation (‘Isolation’ scenario, pairs 17_21 and 3_4) or from lineages that were split to make the pair immediately prior to generation zero and thus showed very low initial differentiation (‘Split’ scenario, pairs 3_3 and 17_17). Each pair was replicated three times. The size and structure of the populations were constant throughout the experiment and generations were discrete and non-overlapping. Reciprocal exchange of individuals between the two subpopulations was conducted at three fixed dispersal rates; low m = 0.0025 (1 fly exchanged per 8 generations), moderate m = 0.01 (1 fly exchanged per 2 generations), high m = 0.04 (2 flies exchanged per generation). See Fig. S1 for full details of Drosophila dispersal regimes. Dispersal was continued for 34, 26 and 12 generations, respectively (called T2 in Holleley et al. 2011), defined as twice the number of generations expected to reach 50 % of the drift–dispersal equilibrium prediction of fixation index (F ST ) (Whitlock 1992). Microsatellite and SNP analyses showed that this was adequate time for convergence of the two starting scenarios to a common mean trajectory (Dewar et al. 2011; Holleley et al. 2011) and simulations confirmed that this design provides sufficient time for populations to attain drift–dispersal equilibrium (data not shown) (Maio 2008). As described fully in Holleley et al. (2011), there was no evidence to suggest that dispersing individuals had differential reproductive fitness compared to resident individuals in the populations comprising this study. At the conclusion of the experiment we sampled 24 individuals from each subpopulation (thus 48 individuals for the total population). Sample sizes of this order of magnitude are routinely used in studies of wild populations. DNA was extracted from each D. melanogaster using a Gentra Puregene DNA extraction kit (Progenz Ltd, Australia) modified for high-throughput processing (Holleley 2007). We then genotyped the sampled individuals at seven autosomal microsatellite loci (Msat 2, Msat 3, Msat 6, Msat 7, Msat 8, Msat 9, Msat 11) using multiplex PCR and a step-down thermal cycling protocol (Holleley and Sherwin 2007; Holleley and Geerts 2009). DNA fragment size analysis was conducted on an Applied Biosystems 48-Capillary 3730 DNA Analyser and analysed using the software GENEMAPPER® Version 3.7 (Applied Biosystems 2004).

Expectations under the Wright–Fisher model

Throughout this manuscript, N e refers to the effective size of an idealised and closed Wright–Fisher population (Wright 1931). The notation Total N e refers to the effective size of a genetically structured population consisting of (s) subpopulations that are open to dispersal, whereas Local N e refers to the local effective size of the subpopulations that make up the total structured population.

In our experiment all structured populations adhered to the Wright–Fisher model, which allowed us to calculate the expected effective size of the structured population from our experimental population parameters. Specifically, the eigenvalue effective population size (N e ) (Ewens 1979) of isolated populations can be calculated from the change in expected heterozygosity over time (ΔH e ), following Eq. 1.

$$\Updelta H_{e} = \frac{{H_{t} }}{{H_{0} }} = \left( {1 - \frac{1}{{2{N_{e} }}}} \right)^{t}$$
(1)

where H t /H 0 is the proportion of the original expected heterozygosity (H 0) remaining in a population after t generations (Falconer and Mackay 1996). For closed D. melanogaster populations from the same source, under the same physical conditions and transfer protocols, Gilligan (2001, 2005) used the decay in heterozygosity over time (ΔH e ), to estimate the ratio of census population size to eigenvalue effective population size (Ewens 1979) to be N e : N = 0.286. This estimate was verified independently using closed D. melanogaster populations collected from the same wild Tyrrell’s source population by Gunn (2003) and we have used the same stock lines for this investigation.

The estimation of N e via the decay of heterozygosity assumes populations to be closed to dispersal, thus for the Eq. 1 to hold in our structured populations, we must adjust the relationship to account for dispersal.

$$\Updelta H_{e} = \frac{{H_{t} }}{{H_{0} }} = \left( {1 - \frac{1}{{2\left( {^{Total} N_{e} } \right)}}} \right)^{t}$$
(2)

where Total N e is the effective size of the structured population and is defined in Eq. 3 by the number of subpopulations (s), each of an idealised effective population size N e except for receiving a proportion of m dispersing individuals (Wright 1943; Wang and Caballero 1999).

$$^{Total} N_{e} = sN_{e} \left( {1 + \frac{{\left( {s - 1} \right)^{2} }}{{4N_{e} ms^{2} }}} \right)$$
(3)

Eqs. 2 and 3 assume that the size and structure of the population is constant, there is no local extinction of subpopulations and drift–dispersal equilibrium has been attained. Our experiment meets these assumptions and we can calculate the expected Total N e (Table 1). Thus under the null hypothesis (H null ) the initial heterozygosity (H 0) is expected to decline at the rate of ΔH e , over the defined number of generations (t), such that;

$$H_{null} :H_{0} \times \Updelta H_{e} = H_{t} .$$
(4)
Table 1 Experimentally controlled population parameters, the Wright–Fisher expected effective population size for genetically structured populations and the expected change in expected heterozygosity for genetically structured populations

We tested this null hypothesis using a two-sample t test with an expectation of a systematic difference (D). In this paper D = ΔH e . The test statistic (t stat ) was calculated as:

$$t_{stat} = \frac{{\left| {DH_{0} - H_{t} } \right|}}{Z}$$
(5)

where Z is:

$$Z = \frac{{\sqrt {\left( {\frac{1}{{n_{0} }} + \frac{1}{{n_{t} }}} \right)\left( {\left[ {\left( {n_{0} - 1} \right)D^{2} S_{0}^{2} } \right] + \left[ {\left( {n_{t} - 1} \right)S_{t}^{2} } \right]} \right)} }}{{\left( {n_{0} + n_{t} - 2} \right)}}$$
(6)

n 0 and n t are the sample sizes used to calculate the mean values of H 0 and H t respectively. In this paper both n 0 and n t are equal to the number of microsatellite loci. \(S_{0}^{2}\) and \(S_{t}^{2}\) are the variances of the means H 0 and H t . The degrees of freedom of the FDR-corrected two-tailed t test are calculated as

$$d.f. = \left( {n_{0} + n_{t} - 2} \right) .$$
(7)

Expectations from simulated populations

The Wright–Fisher model predicts an effective size for structured populations (E(Total N e )) (Table 1), however other factors such as non-neutral molecular evolution of markers, incomplete sampling of populations or initial levels of allelic diversity in the real populations may cause a deviation from this mathematical expectation. To investigate this possibility, we developed an individual-based model using R (www.r-project.org) that simulated the sampling conditions of our Drosophila experiment (Source code available upon request).

At the start of each simulation, the individuals forming the initial generation were created by assigning them a sex and then generating genotypes at each locus by randomly drawing a pair of alleles for each locus from the initial allele frequencies of the founding population (Drosophila lines 3, 4, 17, 21) (Holleley and Sherwin 2007). For subsequent generations, 50-offspring from each subpopulation (100-offspring total) were created by randomly assigning pairs of male and female individuals from the preceding generation of the same subpopulation to be the parents of an offspring individual. The offspring’s genotype was determined by randomly selecting, with equal probability, one allele from each parent for each locus. The sex of offspring was assigned randomly, with each sub-population having an equal sex ratio. Dispersal events were conducted at the same rate and after the same numbers of generations as the experimental Drosophila populations (Fig. S1). In the simulations, we increased the number of replicates from n = 3 in the Drosophila populations to n = 100 replicates of each scenario in order to more fully account for the variability of Total N e estimates. This model closely approximates the Drosophila experiment, in that N is known and controlled, but N e is not controlled and varies stochastically because not all individuals necessarily contribute to the next generation. Similarly m is known and controlled, but effective dispersal is not controlled and migrants do not necessarily contribute offspring to the next generation.

At the conclusion of the computer simulations 24 individuals were sampled from each subpopulation (48 for the total population) and allele frequencies for all simulated populations were used to estimate Total N e in the programs L d N e and onesamp, using two statistical approaches described below. These simulations are designed to return the expectations of N e estimators under neutrality for our experimental conditions, which was calculated as the median Total N e of 100 replicates.

Single-sample methods to estimate N e

In this paper we evaluate two single-sample methods to estimate N e (L d N e and onesamp) that both use estimates of linkage disequilibrium among unlinked loci as a means to assess the strength of genetic drift in populations. The basic premise of both methods is that small population size can lead to non-random allele associations among unlinked genetic loci, thus the higher the level of linkage disequilibrium, the smaller the effective population size (and vice versa). Whilst L d N e and onesamp are based upon the same genetic signal they implement different statistical approaches. L d N e (Waples and Do 2008) estimates effective population size using Burrows’ Δ, a linkage disequilibrium method with bias correction for sample size (Weir 1979; Waples 2006). For all L d N e estimates in this manuscript, we assumed a random mating model. onesamp is an approximate Bayesian method that utilises eight summary statistics and user-defined priors to calculate N e (Tallmon et al. 2008). Both programs assume that genotypic data is obtained from closed populations with discrete generations using genetic markers that are unlinked and selectively neutral. We follow convention for this field by assuming that microsatellite markers are selectively neutral for statistical purposes, although this assumption is discussed later. The experiment intentionally violates the assumption of unstructured populations.

Statistical approach 1: Non-hierarchical estimation of Total N e disregarding genetic population structure

Approach 1 represents a scenario that may occur in many field studies where there is no prior information about genetic population structure or the pattern of dispersal. In this case, N e estimation is applied to a genetically structured population (incorrectly), as if it were a single panmictic population with no genetic structure. Here we estimated Total N e in the programs L d N e and onesamp applying the non-hierarchical statistical approach. Subpopulations were equally sampled (24 individuals from each subpopulation of 50) but the data were analysed as if the sample came from a single panmictic population (48 individuals from a population of 100). Our onesamp priors specified that all the microsatellites had dinucleotide repeat motifs and that the upper and lower bounds for N e were 2 and 200 respectively. These priors were appropriate, as 2 is the lowest bound that onesamp accepts and Tallmon et al. (2008) states that for this method, a conservative estimate of the upper bound of N e is twice the census size. onesamp does not allow the input of monomorphic loci, thus any loci that became fixed due to the loss of alleles through genetic drift were excluded on a case-by-case basis.

Statistical approach 2: hierarchical estimation of Total N e accounting for genetic population structure

The hierarchical approach for estimating the effective size differs from approach 1 by employing knowledge about the structure of the population and patterns of dispersal in our experiment. The hierarchical approach first uses L d N e or onesamp to estimate the Local N e of each subpopulation. onesamp priors were the same as specified for approach 1 except that the upper bound for N e was lowered to 100, twice the census size of the subpopulations. To estimate Total N e , we summed the two subpopulation Local N e estimates and adjusted for the level of population structure (F ST ), following Eq. 8 (Wright 1943).

$$^{Total} N_{e} = \frac{{\sum_{i} {^{Local} N_{ei} } }}{{1 - F_{ST} }}$$
(8)

We estimated F ST in the program Genepop (Raymond and Rousset 1995) which implements Weir and Cockerham’s (1984) estimator. The usual assumptions in calculating F ST apply to the hierarchical approach for estimating Total N e including: equal population sizes, equal and symmetrical dispersal and that the population has reached equilibrium between drift and dispersal.

Estimating systematic and stochastic deviations

We compared the Total N e estimates calculated from the empirical Drosophila dataset and the simulated dataset to two expected values: E(Total N e ) under the Wright–Fisher model (Table 1), and the simulation E(Total N e ) calculated as the median of 100 population replicates (Fig. 1). This comparison was made for both the non-hierarchical and hierarchical approach. We calculated the systematic deviation of mean Total N e estimates (Drosophila dataset and simulated dataset) for each dispersal rate from their expected values (Wright–Fisher or simulation medians) using an equation for bias (Eq. 9). This four-way comparison is presented in Table 2.

$$Bias = \frac{{\left( {^{Total} \overline{{N_{e} }} } \right) - \left( {E\left( {^{Total} N_{e} } \right)} \right)}}{{\left( {E\left( {^{Total} N_{e} } \right)} \right)}}$$
(9)
Fig. 1
figure 1

Total effective population size of simulated genetically structured populations (grey box plots) (n = 100 replicates for each dispersal scenario) and empirical Drosophila populations (black circles) (n = 3 replicates for each dispersal scenario) using two single-sample N e estimators: L d N e (a, b) and onesamp (c, d). For each of the two programs, effective size was calculated using the non-hierarchical (a, c) and hierarchical statistical approaches (b, d). Dispersal and sampling conditions are as described for the empirical Drosophila experiment (Fig. S1). Simulations were conducted using an individual-based model developed in R. The box plots show the distribution of Total N e estimates for each simulated scenario with the box showing the 25th and 75th percentiles, the solid line in the middle of each box showing the median, and error bars showing the 10th and 90th percentiles. Solid black circles show the empirical estimates of Total N e from the Drosophila experiments. Dashed horizontal lines show the effective population size expected under the Wright–Fisher model (E(Total N e )) for each of the three dispersal rates. Dispersal rates are expressed as the proportion of the subpopulation exchanged per generation: High m = 0.04; Mod m = 0.01; Low m = 0.0025

Table 2 The systematic and stochastic deviations of two single-sample N e estimators (L d N e and onesamp) of two data sets (empirical Drosophila populations and simulated populations) compared to the Wright–Fisher E(Total N e ) and the simulation predicted E(Total N e )

Stochastic departures were expressed as the coefficient of variation (CV) of estimates over replicates (Table 2). To summarise the combined effects of systematic and stochastic deviations, we used the equation for root mean square error (RMSE) (Table 2), although we acknowledge that our naming of it as RMSE may depart from the engineering practice of its origins. Lastly, as predicted by the Wright–Fisher model, we used Pearson’s correlation coefficient (r) to determine whether empirical and simulated estimates of Total N e were correlated with the dispersal rate. We also tested whether the level of genetic population structure (F ST ) was correlated with dispersal rate.

Results

Validation of Wright’s expected Total N e

To make predictions about the expected effective population size in structured populations with ongoing dispersal, we required an estimate of N e in closed single populations under the same environmental conditions and population density as our experiments. This work was previously conducted by Gunn (2003) in a 35-generation experiment, comprising ten closed populations with a controlled census population size of 50 non-virgin individuals. Gunn (2003) genotyped the following autosomal microsatellites: DmAC1, DmAC3, DmAC8 and DmAC9 (England et al. 1996).

Using the rate of decay in heterozygosity over time (ΔH e ) Gunn (2003) demonstrated that closed D. melanogaster populations (N = 50) are consistent with an eigenvalue N e of 14.3 (95 % confidence interval = 8.74–19.8) in our stock lines. We did not repeat the closed population experiments, however we did confirm that the observed change in expected heterozygosity in all 36 of our independent experimental populations was not significantly different from the value expected under the Wright–Fisher model for structured populations with the same subpopulation N e of 14.3 (See supplementary material: Table S2).

Results of population simulations

We used simulated populations to predict the behaviour of N e estimators in conditions closely approximating our Drosophila experiment. Figure 1 shows the distribution of onesamp and L d N e estimates for 100 replicates using both the non-hierarchical and hierarchical statistical approaches. The median Total N e of these 100 replicates was used as the simulation E(Total N e ) in further analysis (Table 2).

The population simulations did not predict the same values of Total N e as the Wright–Fisher model (Fig. 1). For the non-hierarchical approach, the simulated distribution of L d N e estimates overlapped with Wright’s E(Total N e ) for high and moderate dispersal rates but not for the low dispersal rate (Fig. 1a). The simulated distribution of non-hierarchical onesamp estimates was lower than and did not overlap with Wright’s E(Total N e ) for all dispersal rates (Fig. 1c). In comparison, the hierarchical approach showed different trends. Specifically, the simulated distribution of hierarchical L d N e estimates overlapped with Wright’s E(Total N e ) for all dispersal rates (Fig. 1b). In contrast, the simulated distribution of hierarchical onesamp estimates overlapped with Wright’s E(Total N e ) for high and moderate dispersal rates but not for the low dispersal rate, where simulated Total N e estimates were lower than Wright’s E(Total N e ) (Fig. 1d).

For the simulated populations there was a significant correlation between dispersal and genetic structure (F ST ) (Pearson’s correlation co-efficient r = −0.775; P < 0.0001) (Table 3). Using the non-hierarchical approach, Total N e was significantly correlated with dispersal, but there was no correlation with dispersal for the hierarchical approach (Table 2).

Table 3 The correlation of dispersal rate with single-sample methods to estimate total effective population size and the correlation of dispersal rate with estimates of population structure in empirical Drosophila populations and simulated populations

In the simulated populations, L d N e displayed much wider distribution of Total N e estimates than onesamp for all scenarios (Fig. 1). Related to this, we observed that a large proportion of the L d N e estimates returned a negative result (non-hierarchical = 2.8 %; hierarchical = 36.4 %). This resulted in very high CV and RMSE estimates when calculating stochastic deviation for L d N e simulation estimates (Table 2). The occurrence of negative N e estimates is a known phenomenon with L d N e that occurs when there is no detectable disequilibrium in the sampled individuals (Waples and Do 2007). This outcome is strongly influenced by sample size, which determines the power to detect disequilibrium. Negative L d N e estimates were less common in the non-hierarchical approach, where n = 48 individuals, compared to the hierarchical approach where n = 24 individuals. As recommended by Waples and Do (2007), we did not bias the distribution of N e estimates by excluding negative values.

Estimation of Total N e via the non-hierarchical approach in empirical Drosophila populations

We estimated Total N e and the 95 % confidence interval for each of the three replicates of each structured Drosophila population using onesamp and L d N e (Fig. 2a–c). onesamp produced estimates of Total N e that were larger and closer to the Wright–Fisher expected value than those obtained from L d N e. The mean values estimated by L d N e ranged from 1.7 to 43.4. The mean values estimated by onesamp ranged from 10.9 to 34.2. The 95 % confidence interval of L d N e tended to be larger, and in one instance the upper bound exceeded the census population size (Fig. 2b). We calculated the systematic deviation (bias) using the E(Total N e ) (Wright–Fisher and simulation median), stochastic deviation (CV) and RMSE (Table 2). When comparing the Drosophila data to the Wright–Fisher expectation, Total N e estimates obtained from onesamp had a lower bias, lower CV and a lower RMSE than L d N e regardless of statistical approach (Table 2). When comparing the Drosophila data to the simulation median, non-hierarchical empirical estimates of Total N e were largely concordant with the distribution generated by the population simulations, with the exception of the high dispersal scenario, where empirical estimates were lower than the simulation (Fig. 1a, c; Table 2).

Fig. 2
figure 2

Empirical estimates of total effective population size in controlled replicated genetically structured Drosophila melanogaster populations using two single-sample N e estimators: L d N e (circle) and onesamp (square). For each of the two programs, effective size was calculated using the non-hierarchical (ac) and hierarchical statistical approaches (df). Bars indicate the 95 % confidence interval of the mean. Upper confidence intervals that exceeded the known census size of the total population were truncated, and the value labelled at the top of the graph. The dashed horizontal line is the effective population size expected under the Wright–Fisher model (E(Total N e )) for each of the three dispersal rates. Grey-shading indicates the 95 % confidence interval of the Wright–Fisher E(Total N e )

The empirical data showed a trend contrary to the predictions of the Wright–Fisher model; neither onesamp nor L d N e displayed a significant correlation of Total N e with dispersal rate between subpopulations (Table 3). However, there was a significant correlation between dispersal and genetic structure (F ST ) (Pearson’s correlation co-efficient r = −0.601; P = 0.0001) (Table 3).

Estimation of Total N e via the hierarchical approach in empirical Drosophila populations

In the hierarchical approach, we estimated the Local N e of each subpopulation (Table S4) and used these estimates in combination with the observed population genetic structure (Table S4) to estimate the Total N e using onesamp and L d N e following Eq.  8 (Fig. 2d–f). The trends observed in the hierarchical approach were congruent with the non-hierarchical approach. Specifically, when comparing the Drosophila data to the Wright–Fisher expectation, onesamp produced mean estimates of Total N e that were larger and closer to the Wright–Fisher expected value, with smaller confidence intervals than L d N e. Again neither method displayed a correlation of Total N e with dispersal rate (Table 3). The mean Total N e values estimated by L d N e ranged from 5.1 to 273.2. The mean Total N e values estimated by onesamp ranged from 25.6 to 39.3. The 95 % confidence intervals of hierarchical L d N e estimates tended to be large, with the upper bound often including infinity. In 17 of 36 instances the upper bound exceeded the census population size (Fig. 2d–f). The hierarchical Total N e estimates obtained from onesamp had a lower bias, lower CV and a lower RMSE than L d N e (Table 2). When comparing the Drosophila data to the simulation median, we observed that the hierarchical empirical estimates of Total N e were concordant with the distribution generated by the population simulations, however all empirical estimates were systematically lower than the simulation (Fig. 1; Table 2).

Discussion

Single-sample Total N e estimates from simulated and empirical Drosophila populations do not adhere to Wright’s model

This study provides an example of how populations may deviate systematically from the expectations of a population genetic model even when most of the assumptions are met, especially if there is undetected genetic structure. Single-sample estimates of Total N e in genetically structured populations (simulated and real) were not consistent with the values predicted by the Wright–Fisher model, even though populations had been controlled to closely reflect ‘ideal’ Wright–Fisher conditions. We observed a general trend for the empirical onesamp and L d N e estimates of Total N e to be lower than the value expected under the Wright–Fisher model for both statistical approaches. This discrepancy is most likely because Wright’s expected Total N e (Table 1) is based on the rate of loss of heterozygosity, whereas L d N e and onesamp are based on linkage disequilibrium. Comparing N e estimators is difficult because different N e concepts may refer to different time frames and spatial scales, see Luikart et al. (2010) for a review. However other factors may also be at play, such as dependency on a previous estimate of the eigenvalue closed population N e , initial levels of allelic diversity and/or incomplete sampling.

The Wright–Fisher estimate of E(Total N e ) used in this study is highly dependent on the eigenvalue closed population N e estimate from previous studies (Gilligan 2001; Gunn 2003). We empirically confirmed that Eq.  3 predicts the expected effective size of our structured populations when using an eigenvalue closed population N e of 14.3 (Gilligan 2001; Gunn 2003). This was achieved by demonstrating that the rate of change in expected heterozygosity over the 35-generation closed population experiment did not significantly differ from the rate of change in expected heterozygosity in our experiments using structured populations which were initialized from the same stocks (Gunn 2003) (See supplementary material: Table S2). This provides evidence that the parameters used to estimate the expected value of Wright–Fisher E(Total N e ) were appropriate. However, we must consider how variation in the estimation of the eigenvalue closed population N e affects E(Total N e ) and thus our conclusions. After taking into account the 95 % confidence interval (see grey shading in Fig. 2) we note the conclusions for the non-hierarchical approach remain unchanged (empirical estimates systematically below the Wright–Fisher expectations in all cases) but for the hierarchical approach onesamp estimates, we observe that 12 of 12 high dispersal and 6 of 12 moderate dispersal populations fall within the 95 % confidence interval. A similar trend is seen for hierarchical L d N e estimates, where 5 of 12 high dispersal and 2 of 12 moderate dispersal populations fall within the 95 % confidence interval of the Wright–Fisher expectations but at low dispersal rates the empirical estimates are still systematically below the Wright–Fisher E(Total N e ). Consistent with the predictions of the Wright–Fisher model, all methods performed best when dispersal rates were high, genetic population structure was low and the population was therefore approaching panmixia.

It is also possible that sampling effects may have contributed to the apparent downward bias of single-sample Total N e estimates in genetically structured populations (simulated and real) compared to Wright’s expectation. Low allelic variation could potentially inflate linkage disequilibrium estimates and thus downwardly bias estimates of Total N e (Gulcher 2012). Both simulated and real populations had the same levels of initial allelic diversity, which could explain why both were downwardly biased. Incomplete sampling of the populations could also underestimate allelic variation and result in overestimation of linkage disequilibrium and thus downwardly bias estimates of Total N e . The simulations and real populations had the same sampling strategy where 48 % of the population was sampled. It is not possible with the current experimental design to disentangle which of the factors discussed here was responsible for the deviation from Wright’s E(Total N e ).

Single-sample Total N e estimates from simulated populations approximate Total N e estimates in empirical Drosophila populations

Irrespective of the single-sample N e estimation program used (onesamp or L d N e) or the statistical approach implemented (non-hierarchical or hierarchical) we observed that the simulated population data and the empirical Drosophila population data were largely concordant and in most cases (except those noted in the results section) the empirical data occurred within the distribution of the population simulations (Fig. 1; Table 2). Despite having substantially less control over lab populations, the rough concordance with computer-simulated populations encourages us that there are few influences not accounted for in this study. However there were some key differences between the empirical Drosophila experiment and the simulations. The empirical Drosophila Total N e estimates were much closer to simulated E(Total N e ) values than they were to Wright’s E(Total N e ), but were still systematically lower relative to both expectations (Fig. 1; Table 2). This suggests there may be forces affecting the empirical populations that are not included in the population simulation parameters.

One possible reason for the discrepancy between the simulations and the empirical results could be selection. We did not include selection in the simulations, but it may be acting in the live populations. Of course, the inevitable simplification of simulations means that there are many other differences, however if we assume that that the simulations portray a selectively neutral version of the Drosophila experiment, the discrepancies between the simulated and real populations could be explained by the presence of multilocus balancing selection favouring haplotypes in the Drosophila genome. If a given selective pressure favours particular combinations of alleles at more than one locus in the genome, this will increase the occurrence of these combinations above random expectations (i.e. linkage equilibrium), so that linkage disequilibrium between these loci is expected to increase (Navarro and Barton 2002). Under this mode of selection there is no requirement for loci to be physically linked for apparent linkage disequilibrium to increase. We must stress that this phenomenon is different from situations where balancing selection operates independently on each of several loci, which would not be expected to increase linkage disequilibrium. Conditions where multilocus balancing selection could increase linkage disequilibrium have been previously described in Drosophila. For example, disassortative mating on the basis of pheromonal cues can impose balancing selection on multiple pheromone loci scattered throughout the D. melanogaster genome and buffer the genome against the effects of drift (Averhoff and Richardson 1974, 1975; Templeton 2006). Whilst we were not expecting large effects of balancing selection in our Drosophila populations, this phenomenon may help to explain why the empirical data produced lower Total N e estimates than the simulated populations.

There are two other observations that indirectly support our hypothesis that multilocus balancing selection favouring haplotypes in the Drosophila genome has increased linkage disquilibrium between physically unlinked loci. Firstly, we can consider differences in the frequency of negative L d N e estimates. In the simulations we observed a high proportion of negative Total N e estimates generated by L d N e. Negative L d N e estimates are typically interpreted as no evidence for any disequilibrium caused by genetic drift (Waples and Do 2007). In contrast to the simulated populations, we did not observe a single negative L d N e estimate in our empirical Drosophila populations. This suggests that disequilibrium is more common in the Drosophila populations than in the selectively neutral simulated populations. Secondly, supporting evidence for low levels of balancing selection operating in our populations can be observed in the rate of decay in expected heterozygosity. Whilst the decay in expected heterozygosity did not significantly differ from the rate expected under the Wright–Fisher model (See supplementary material: Table S2), we note that despite the differences being individually non-significant, heterozygosity did decay more slowly than the predicted rate in 29 of the 36 structured populations and in fact four line pairs showed no decline in expected heterozygosity at all (Table S2). This may be indicative of low levels of balancing selection on loci linked to our markers, maintaining polymorphism in our populations. Whilst the action of balancing selection favouring multilocus haplotypes is consistent with several aspects of our empirical dataset we cannot necessarily assume that the presence and absence of selection is the only variable that differs between the simulated and real populations.

An alternative hypothesis to explain empirical Total N e estimates systematically lower than the simulation, could be that dispersal between isolated populations has induced temporary linkage disequilibrium among genetic loci (Haliburton 2004). This means that the level of linkage disequilibrium in our populations may have reflected recent dispersal events rather than the amount of genetic drift. Consequently an overestimation of linkage disequilibrium would lead to an underestimation of Total N e . For example, Waples and England (2011) have shown that pulse dispersal of genetically divergent individuals can depress N e estimates. Our Drosophila experiment could have failed to detect increasing Total N e with increased dispersal because the estimators are producing increasingly downward biased estimates of Total N e due to linkage disequilibrium from admixture. For an admixture-related elevation of linkage disequilibrium hypothesis to explain our results, the effective dispersal rate would need to be higher in the Drosophila populations than in the simulations, because the simulations did show a significant correlation of m with Total N e in the hierarchical approach (Fig. 1a, c; Table 2).

Dispersal rate determines genetic population structure, but is not correlated with estimated Total N e in real populations

As expected, different rates of dispersal among subpopulations (simulated and real) resulted in different levels of genetic population structure (Table S3 and S4) and dispersal was negatively correlated with population structure (Drosophila populations, r = −0.601; P = 0.0001, simulated populations r = −0.775; P < 0.0001) (Table 3). We demonstrated in simulated populations that single-sample methods to estimate Total N e are sensitive to the presence of population structure. Both L d N e and onesamp show a lower estimated non-hierarchical Total N e for simulations with lower effective dispersal rate (Fig. 1a, c; Table 3). This pattern is not reproduced with experimental populations (Fig. 2; Table 3). This suggests that there could be some other factor that affects effective dispersal rate in experimental populations, even though dispersal is clearly affecting the allele proportions in the live populations, as demonstrated by the significant correlation of F ST with m. The lack of correlation of empirical Total N e with dispersal among subpopulations (Table 3) suggests that other forces (possibly multilocus balancing selection) are affecting the empirical populations that are not included in the population simulation parameters.

Recommendations for using single sample estimators in structured populations

Our study has established that the single-sample N e estimator onesamp generally gives better estimates of Total N e than L d N e (Figs. 1, 2). This is most likely because onesamp’s approximate-Bayesian approach utilises prior information about the populations when estimating N e . However, we note that the performance of onesamp in other studies will strongly depend upon appropriate choice of priors. onesamp may also be more accurate because it utilises eight summary statistics whereas L d N e only uses Burrows’ Δ. Lastly, it is important to acknowledge that the information presented in this study refers to a very specific set of experimental conditions, and it is possible that altering the sampling design could overcome some of the biases observed here. For example, we would expect that sampling a larger proportion of the population and increasing the number of loci genotyped would improve the performance of both methods.

Both the non-hierarchical and hierarchical approaches have limitations and should be interpreted with caveats. The non-hierarchical approach resulted in Total N e estimates with a low CV but these estimates were systematically lower than the expected values, whereas hierarchical estimates were closer to the expected values (lower bias) but also had a considerably higher CV (Table 2). Additionally the hierarchical method is extremely susceptible to downward biases caused by incomplete sampling of subpopulations because the method assumes that all subpopulations are represented and summed (Eq.  8 ). This source of bias was not discussed in our study because 100 % of subpopulations were sampled, however this sampling issue is very likely to affect studies of wild populations where the total number of subpopulations is not known. Despite the possibility of biased estimates, it would be informative for field studies to compare and contrast single-sample N e estimates using both the non-hierarchical and the hierarchical approach. It is also worth stressing that the methods evaluated in this manuscript assume that the effective population size is stable over time. Here we have evaluated scenarios where this is true, however it is unrealistic to assume that fragmented wild populations will have a stable effective population size. Fluctuations in population size and connectivity are likely increase the variance of N e estimates and may create unexpected biases (Waples 2010).

The sensitivity of single-sample N e estimates when population subdivision is disregarded, and the unpredicted possible effects of selection in real organisms (as opposed to neutral models) have practical implications since one of the primary appealing features of single-sample N e estimators is their application to wild populations, where temporal sampling is often not feasible. Wild populations frequently have genetic structure for a variety of biological, ecological and geographical reasons and the selective landscape is generally unknown. Habitat fragmentation is simultaneously a leading cause of population extinction, as well as a major mechanism driving the development of genetic population structure (Tilman et al. 1994; Henle et al. 2004; Banks et al. 2005). This means that single-sample N e estimators are likely to be least accurate in the situations where they are most needed. However this pitfall can be strategically managed by defining genetic population structure and identifying migrant or recently admixed individuals. These analyses will identify which statistical approach is appropriate (hierarchical vs non-hierarchical) and provides the option of removing recent migrants from the analysis. Practitioners should also consider factors such as age structure, demographic fluctuations, and trade-offs between sample size and the number of genetic loci, which were not addressed by this study.