Introduction

Patterns of DNA diversity in contemporary populations offer insight into the populations' past. Processes such as migration, geographic dispersal, and admixture leave recognizable marks at the DNA level (von Haeseler et al. 1996; Cavalli-Sforza 1998; Bertranpetit 2000; De Knijff 2000), and rapid changes in population size can be inferred from the distributions of pairwise sequence differences in nonrecombining DNA segments. In particular, theory predicts that demographic changes affect the shape of the gene genealogies. Expansions result in star-like genealogies, and most mutations occurring on those genealogies do not tend to be shared among lineages (Donnelly 1996). The resulting plots of differences between pairs of individuals, or mismatch distributions, are smooth and unimodal; their mode depends on the time passed since the expansion (Rogers and Harpending 1992). Most human mitochondrial mismatch distributions agree with expansion expectations, and the few exceptions have been explained as a result of demographic crises in hunting–gathering communities (Excoffier and Schneider 1999).

For the Y chromosome, the results are not equally straightforward. Two studies (Pritchard et al. 1999; Shen et al. 2000) concluded that Y-chromosome diversity is more compatible with exponentially increasing than with constant population sizes, suggesting a recent population growth. On the contrary, in a previous study we found no evidence of growth in the mismatch distributions inferred from 11 Y-chromosome SNPs in Europe (Pereira et al. 2001). In agreement with what would be expected for populations of constant size, all mismatch distributions had multiple peaks, and statistics sensitive to demographic changes (Tajima's D and Fu's F S) were insignificant. The small number of polymorphic sites available might have reduced the sensitivity of the tests, although we observed that mitochondrial mismatch distributions maintain their unimodal shape even when small subsets of sites are considered (Pereira et al. 2001). We interpreted our results as reflecting either a combination of selective and demographic processes or the fact that the effective population sizes of European males (N m) have really remained low until recently, while female population sizes (N f) increased sharply in prehistoric times. Three problems remained open, namely, (1) whether a greater number of SNPs could have led to differently shaped distributions for Europe; (2) whether the lack of Y SNPs evidence for expansion extends beyond Europe; and (3) whether ascertainment bias and/or low mutation rates typical of Y SNPs but not of STRs (Pritchard et al. 1999) could have concealed an existing signal of expansion.

To address the first two questions, we analyzed two sets of data, comprising, respectively, 1007 Y chromosomes from Europe (Semino et al. 2000) and 1062 Y chromosomes from all continents (Underhill et al. 2000). Mismatch distributions were calculated, and tests of mutation-drift equilibrium were carried out. As for the third question, we simulated the effects of factors such as different mutation rates and different probabilities of ascertainment of polymorphic sites, in both stationary and expanding populations, and we compared the resulting mismatch distributions with those observed in the empirical analyses.

The results we obtained suggest that male population sizes increased substantially later than female population sizes. This observation raises the possibility that a polygamous mating system might have been widespread in prehistoric human populations.

Materials and Methods

The Data

We analyzed the data sets of Y-chromosome single-nucleotide polymorphisms published by Semino et al. (2000) (Europe [EU] data set) and Underhill et al. (2000) (world [WO] data set). They comprised, respectively, 1007 individuals from 25 European populations, typed at 22 polymorphic sites, and 1062 individuals from 21 populations of all continents, who had been typed at 166 polymorphic sites or whose genotype could be inferred with a high degree of confidence, assuming that all SNPs result from mutations that occurred only once in human evolution (Underhill et al. 2000). The only known violation of the assumption that no site mutated more than once is the M116 polymorphism, where three different alleles have been recorded. We chose to disregard that site, and therefore we considered as identical two haplotypes that differ only by a substitution at M116, namely, haplotype 19 (which had been observed only once, in an African individual) and haplotype 22. To evaluate the consequences of the lumping of different populations, in both cases we also jointly analyzed all chromosomes of either data set, regardless of their origin. The European and Near Eastern samples of the WO data set include part of the chromosomes of the EU data set. Therefore, the two data sets are not fully independent.

Mismatch Distributions

Allele genealogies tend to have long internal branches in stationary populations, so that many mutations are shared by several individuals. Rapidly expanding populations, conversely, will show long terminal branches in their gene trees (or star-like genealogies); the mutations occurring along those branches will often be unique to single individuals (Donnelly 1996). These different patterns of substitutions are reflected in the shape of the distribution of pairwise differences between sequences, or mismatch distribution. Population subdivision (Marjoram and Donnelly 1994) and admixture (Bertorelle and Slatkin 1995) may act as confounding factors. As a rule, however, irregular and multimodal mismatch distributions are expected in stationary or shrinking populations, whereas a smooth, unimodal shape is typical of expanding populations (Rogers and Harpending 1992; Rogers et al. 1996; Excoffier and Schneider 1999).

Mismatch distributions were estimated 48 times, namely, for each of the 25 populations of the EU data set, for each of the 21 populations of the WO data set, and for the two entire data sets. They were also estimated for a number of data sets generated by computer simulation, to represent a broad spectrum of evolutionary and demographic scenarios.

In all cases, mismatch distributions and gene diversity, i.e., the probability that two randomly sampled chromosomes differ from each other (Nei 1987), were estimated by ARLEQUIN 2.0 (Schneider et al. 2000), and the observed distribution of mismatches was fitted to expectations relative to an expanding population, by Monte Carlo randomization (Schneider and Excoffier 1999). The null hypothesis was one of expansion, because there is no established expectation for the mismatch distribution in a stationary population (Harpending 1994). The age of the expansion, τ, was also estimated from the data (Rogers and Jorde 1995) when the expansion hypothesis was not rejected and the distribution was unimodal.

Tests of Mutation-Drift Equilibrium

Departures from mutation-drift or mutation-selection equilibrium were tested, in each population and in the pooled samples, by Tajima's D and Fu's F S. In Tajima's (1989a, b) test, the parameter θ = 2Nµ (where N is the population size, and µ is the mutation rate) is independently estimated twice, from the number of polymorphic sites and from the average number of pairwise differences (or average mismatch) in the sample. Under equilibrium, the two θ estimates should overlap. Differences between them, measured by the statistic D, may be caused by changes in the population size, or selection, or both. Fu's (1997) F S statistic compares the observed number of alleles in a sample with the number of alleles expected in a population of constant size on the basis of the observed average mismatch. Both D and F S take negative values when the population expands and positive values when it shrinks. Different selective regimes may affect the shape of the underlying gene tree and, hence, mimic the effects of demographic changes.

The significance of D and F S was tested by randomization, in agreement with Simonsen and co-workers' (1995) observation that critical values of the former test based on the beta distribution are too conservative. By the coalescent simulation program implemented in the ARLEQUIN package (Schneider et al. 2000), random samples were repeatedly generated from hypothetical stationary populations whose parameter θ was equal to the average number of pairwise differences observed in the population of interest (Tajima 1989a). In this way, empirical null distributions of the relevant statistics were generated by repeating the simulations 1000 times, each time recording the values of D and F S. It was straightforward to obtain empirical estimates of the probability of the observed D and F S values from these distributions, under the hypothesis of neutrality and constant population size.

Monte Carlo Simulations

Computer simulations were run to evaluate the effects of mutation rates and ascertainment bias on the probability of detecting an expansion, once it has occurred. Biallelic Y-chromosome markers mutate slowly; estimated mutation rates per site and per year, µ, range between 1.2 × 10−9 (Thomson et al. 2000) and 2.5 × 10−8 (Hammer 1995; Jobling et al. 1997). On the contrary, for the hypervariable region of the mitochondrial genome µ estimates can be as high as 3.2 × 10−7 (Soodyall et al. 1997; Jazin et al. 1998; Sigurgardottir et al. 2000). It is conceivable that low rates of mutation may reduce the possibility to identify population growth in studies based on Y-chromosome SNPs.

To evaluate the effects of different mutation rates on the possibility to detect an expansion, we generated samples from stationary and expanding populations by Monte Carlo simulation using the SIMCOAL program (Excoffier et al. 2000). The simulation algorithm was based on the coalescent process with superimposed mutations, as described by Hudson (1990). Each sample was obtained by first generating its genealogy. Mutations were then randomly placed on the genealogy, assuming that they occur according to a Poisson process. More details are given by Pereira et al. (2001).

For four mutation rates (from 5 × 10−9 to 1 × 10−7 per site and per year), we simulated 1000 samples of 80 chromosomes under the assumption of a large and constant effective population size (N m = 5000 haploid individuals) and random mating. Each chromosome had 1000 potentially variable SNP sites, and each site could mutate only once. For the same mutation rates we also simulated exponential population expansions using the same coalescent approach. The simulated expansion started 50,000 years (or 2500 generations) ago, a figure commonly estimated in mitochondrial studies (Rogers and Harpending 1992; Excoffier and Schneider 1999). Population size increased by a factor of 100 to a final effective size N 0 = 100,000, corresponding to a rate of exponential growth r = 0.0018. Depending on the mutation rates and on the shapes of population genealogies, variable numbers of sites (in practice, never exceeding 199) mutated and became polymorphic.

Tajima's D and Fu's F S statistics were estimated in each of the simulated samples. The fit of the observed distribution of mismatches to a model of population expansion was tested by the bootstrap approach implemented in ARLEQUIN (Schneider et al. 2000). The statistic SSD, a sum of squared deviations from expansion expectations, was estimated and its empirical probability was computed by performing sets of 100 coalescent simulations of expansions. Finally, we defined three basic shapes of the mismatch distribution, namely, unimodal with a peak at zero differences (Type 0), unimodal with a maximum >0 (Type 1), and bi- or multimodal (Type 2), and counted the number of occurrences of each type in each set of 1000 simulations.

While complete sequences of hundreds or thousands of base pairs are analyzed in mitochondrial studies, and in Shen and co-workers' (2000) Y-chromosome study, in most SNP analyses only sites known in advance to be variable are typed. In this way, some rare or private polymorphisms are likely to be missed, possibly affecting the inferred mismatch distributions. That phenomenon is called selection of sites (a term we prefer to avoid because of the possible confusion with natural selection), or ascertainment bias. A second set of Monte Carlo simulations was run to evaluate the effects of ascertainment bias on the power to detect population growth. We simulated 1000 samples from populations that underwent an ancient (50,000 years ago) or a recent (10,000 years ago) demographic expansion. Each sample was composed of 100 individuals, i.e., 100 sets of 30,000 potentially variable SNP sites. Because of the low mutation rate (µ = 1 × 10−8, a value close to the estimates for Y-chromosome biallelic markers [Hammer 1995; Jobling et al. 1997]), we brought to 30,000 the number of sites considered, so that a sufficiently large number of them would become polymorphic in the course of each simulation.

The effect of ascertainment bias was reproduced by excluding from the analysis the least polymorphic sites, i.e., those that have a higher chance to escape detection in the phase of polymorphism discovery (Underbill et al. 1997). After each run of the simulation, we computed Tajima's D statistic, by considering both all sites that mutated and so became polymorphic or (to represent the loss of sites that escape ascertainment) only the sites whose rarer allele had a frequency p > 1% (mutations shared by at least two chromosomes), or p > 2% (mutations shared by at least three chromosomes), or p > 3% (mutations shared by at least four chromosomes).

Results

Mismatch Distributions

A ragged pattern, either bi- or trimodal, is observed in the analysis of all populations in the EU data set (Fig. 1). Despite considering twice as many polymorphic sites as in the previous study of the same continent (Pereira et al. 2001), mismatch distributions with a peak at zero differences are still the most common. Predictably, by doubling the number of sites considered, the average distance between peaks increased. For instance, in the Iberian populations, mismatch distributions based on 11 sites (Pereira et al. 2001) displayed peaks at zero, three, and five differences, whereas, in this study, the peaks are located at zero, four, and seven differences.

Figure 1
figure 1

Mismatch distributions in the EU data set (data from Semino et al. 2000).

Doubling the number of polymorphic sites analyzed (Table 1; compare with Table 1 of Pereira et al. 2001), the number of different haplotypes increased (from 2–8 per population using 11 SNPs to 3–13 using 22 SNPs). However, gene diversities did not increase as much, in agreement with Semino et al.'s (2000) observation that more than 95% of the chromosomes they typed could be assigned to clades of haplotypes defined by just 10 key mutations. Tajima's and Fu's statistics were insignificant, except for a negative F S for Turks, who also showed a rather smooth distribution. However, this result was no longer significant after Bonferroni's correction for multiple tests (Sokal and Rohlf 1995).

Table 1 Measures of genetic diversity estimated from the EU data set (Semino et al. 2000)

Even when 166 biallelic markers were studied (WO data set) most distributions were multimodal (Fig. 2). The sub-Saharan samples were the ones that displayed the most irregular distributions, with peaks at 16 (Sudan), 17 (Ethiopia), or 18 (Khoisan) differences, confirming extensive divergence of African Y chromosomes. In the European samples there were minor differences between the results of the analysis of 22 (EU data set) and 166 (WO data set) sites, in terms of both the shape of the mismatch distributions and the related statistics (Table 2). Sardinia shows a more irregular shape in the WO data set, but that might reflect the small sample size, 22, in the study by Underhill et al. (2000), presumably a subset of the 77 individuals described by Semino et al. (2000).

Figure 2
figure 2

Mismatch distributions in the WO data set (data from Underhill et al. 2000).

Table 2 Measures of genetic diversity estimated from the WO data set (Underhill et al. 2000)

Unimodal mismatch distributions were observed in three Central and Eastern Asian populations. Fu's F S was negative and significant (as is the case for mitochondrial data) in these samples, and Tajima's D in one of them, but these significances did not withstand Bonferroni's correction. A unimodal distribution was also observed in the American sample, but the peak is at zero differences (which is what we defined as the Type 0 distribution), reflecting the fact that 78% of the Y chromosomes belong to haplotype 115, a haplotype not found in other continents (Underhill et al. 2000). Tajima's D and Fu's F S are negative but, once again, both insignificant after Bonferroni's correction. We do not know how well the sample considered represents the whole continent. However, based on the evidence available, it seems that Y-chromosome diversity in this American sample reflects a severe bottleneck (Bonatto and Salzano 1997), with most Y-chromosome variation presumably restricted to the more rapidly evolving STR sites (Ruiz-Linares et al. 1999).

To understand the effects of aggregation of individuals from distant populations, we ran two global analyses of the EU and WO data sets (Fig. 3). For the populations of the EU data set, we observed a trimodal distribution, and insignificant, positive Tajima's D and Fu's F S (Table 1). For the more heterogeneous set of populations in the WO data set, the mismatch distribution was still bimodal, but Tajima's D and Fu's F S statistics were negative, and the former remained significant (p = 0.048) even after Bonferroni's correction (Table 2). The simplest explanation of this apparently puzzling result is that, if one picks up chromosomes from different populations, many substitutions become relatively rare. As a consequence, a greater fraction of mutations is likely to appear almost lineage-specific. In turn, that may lead to a mismatch distribution that looks similar to those resulting from expansions, and to values of D and F S compatible with an expansion, even when there is no evidence of expansion in any single population. In agreement with Tajima (1989a), who specified that his test is valid only if applied to a set of chromosomes that evolved together, we interpret as a statistical artifact the significant D value observed in the global analyses.

Figure 3
figure 3

Mismatch distributions for the pooling of all populations in the EU and WO data sets.

Simulations: Effects of the Mutation Rates

In simulated stationary populations, when considering the whole set of sites, the average mismatch observed is close to the expected value, i.e., the parameter θ used to generate the simulated samples, and thus increased with the mutation rate in the different simulations (Table 3). As expected, in expanding populations both the average mismatch and its standard deviation are reduced.

Table 3 Simulation results

For both stationary and expanding populations, multimodality is more frequent as the mutation rate increases, which does not support the view that the low Y-chromosome mutation rate increases the probability of observing multimodal mismatch distributions. Under stationarity, for example, for µ = 5 × 10−9 per site and per year, about 60% of the mismatch distributions have a single peak (Types 0 and 1) and 40% of them show a maximum at zero difference. But when µ is 10−7 all but one simulations yield distributions with multiple peaks (Type 2). In expanding populations, multimodality is rare, especially at low mutation rates. When µ < 10−8, more than 90% of the mismatch distributions are either Type 0 or Type 1. Therefore, the low Y-chromosome mutation rate does not seem to affect the shape of the mismatch distribution much; our results show that, if anything, it may enhance an existing signal of expansion, reducing the frequency of Type 2 distributions.

In the simulated stationary populations, both D and F S show wide distributions centered on zero, regardless of mutation rates. The fraction of significant D values is very close to the nominal level α = 0.05 (in fact, lower than that). On the contrary, the F test seems permissive, at least for low mutation rates. After expansions, D and F S are always negative and, in more than 78% of the cases, significant. As expected, the power of these statistics increases as the mutation rate, and therefore the number of polymorphic sites, increases.

Simulations: Effects of the Ascertainment Bias

We report in Fig. 4 the Tajima D values and the number of times they achieved significance over 1000 simulations of expanding populations. A strong effect of the ascertainment bias on the values (Fig. 4a) and the significance (Fig. 4b) of Tajima's D is evident, especially when the expansion event is recent. In the populations that underwent an ancient expansion, when passing from the total number of polymorphic sites to the analysis of mutations that are shared by at least two chromosomes, the number of significant Tajima D values drops from 1000 to 511, while in the populations that expanded recently it drops drastically, from 940 to 61.

Figure 4
figure 4

Tajima's D values (A) and number of significant cases (B) for 1000 simulations of expanding populations. Open and filled symbols correspond, respectively, to recent (10,000 years) and ancient (50,000 years) expansion events. Tajima's D statistics were estimated from different subsets of SNP sites, selected on the basis of their polymorphism: all sites, p > 1% (mutations shared by at least two chromosomes), p > 2% (mutations shared by at least three chromosomes), and p > 3% (mutations shared by at least four chromosomes). The average number of sites analyzed within each frequency class is indicated in the inset.

When only mutations that are shared by at least three chromosomes are analyzed, Tajima's D fails to show any evidence of population growth. For old expansions, the D values are insignificant but mostly negative on average, and for recent expansions, Tajima's D often takes positive values.

Therefore, recent expansions are more likely to go undetected than ancient expansions, if there is an ascertainment bias. How large is the ascertainment bias in the SNPs of this study? The biallelic polymorphisms considered here were discovered by comparing the Y chromosomes of different individuals by DHPLC and looking for heteroduplexes (Underhill et al. 1997). If p is the frequency of the most common allele at a site, the probability that all n screened chromosomes share that allele is p n, and so the possibility to detect the polymorphism is 1−p n. The individuals screened varied from 53 (Underhill et al. 1997, 2000) to 72 (Shen et al. 2000). If we use these values to define a range of n, the fraction of sites failing to be ascertained is between 2.5 and 6.6% for p = 0.95 and between 48.0 and 58.7% for p = 0.99. In other words, most moderately polymorphic sites have probably been identified as such, whereas one-half or more of the sites whose rarer allele has a frequency <0.01 are likely to appear monomorphic.

Discussion

Analyses of Y-chromosome SNPs (Pereira et al. 2001; this paper) do not suggest a rapid growth of the males' effective population. Despite the fact that we considered many polymorphic sites, nearly all mismatch distributions are still multimodal, and there is no statistical support for departures from mutation-drift equilibrium. Y-chromosome SNP diversity provides no evidence for male population expansion in populations around the world, particularly in Africa. Thus, as for the questions listed in the Introduction, it is clear that (1) a greater number of SNPs has not changed the shape of the mismatch distributions, and (2) the apparently nonexpansion pattern is not a European, but a worldwide, feature.

Although there is only one human Y-chromosome tree, and so, in principle, all Y-chromosome markers should lead to the same demographic inferences, two previous studies reached different conclusions. Shen et al. (2000) inferred a rapid growth of the male population from a negative Tajima's test and from the distribution of mutants at independent sites. For these calculations, however, 72 individuals from 46 populations were considered, and that violates the assumptions of Tajima's (1989a, p. 593) test. As confirmed by our mismatch distributions and related statistics, joint analysis of individuals of different origins tends to render D and F S compatible with an expansion (Table 2), even though no population shows evidence for growth when separately analyzed.

A better fit of a model of rapid growth than of constant N m was also found by Pritchard et al. (1999), who studied eight microsatellite loci in 445 individuals. The excess of rare haplotypes they observed suggests an expansion 18,000 years ago for the whole human population (95% confidence interval between 7000 and 41,000; see also Table 4).

Table 4 Estimated times of expansion for mitochondrial and Y-chromosome data (Excoffier and Schneider 1999; Pritchard et al. 1999)

There are doubtless more humans now than in the Pleistocene (Biraben 1979; Weiss 1984; see also Zietkiewicz et al. 1998; Harpending et al. 1998), and so the idea that N m stayed constant is counterintuitive. However, expansions may be difficult to recognize if poorly polymorphic sites are not efficiently ascertained (Nielsen 2000; Wakeley et al. 2001), so that most mutations considered will be ancient and shared by several chromosomes. Our simulations show that an ascertainment bias, in our case leading to lumping rare (p < 0.01) haplotypes with their nearest evolutionary neighbors, may actually conceal the effects of an expansion (question 3 in the Introduction). However, that effect (Fig. 4) was strong for simulations of recent expansions (10,000 years ago), and much less so for older phenomena (50,000 years ago). Therefore, if Pritchard et al.'s (1999) time estimates are approximately correct, the male effective population size might have increased too recently for the expansion to be detected at the SNP level. Because the likely dates of female and male population growth estimated in previous studies do not overlap (see Table 4) (Excoffier and Schneider 1999; Pritchard et al. 1999), the obvious implication is that the two genders had different demographic histories.

In principle, apparent differences between the demographic history of males and that of females may reflect different selective regimes. A mitochondrial selective sweep (Excoffier 1990; Harris and Hey 1999b; Wall and Przeworski 2000) may have led investigators to reject constant N f erroneously. Alternatively, stabilizing selection upon the Y chromosome (Jobling and Tyler-Smith 2000) may have determined patterns compatible with constant N m, when populations, in fact, expanded (Tajima 1989a; Wall and Przeworski 2000; see also Pereira et al. 2001). The effects of selective pressures and demographic changes cannot be discriminated a posteriori from population data (Takahata 1996), and hence the present study does not provide evidence relevant to this question. However, unless selection has really been strong (which would force us to reconsider crucial aspects of human evolution inferred from DNA evidence under the assumption of quasineutrality, such as the age of the most recent human ancestors), the available data suggest at least that the human demographic past cannot be envisaged as a process of linear growth, to which females and males contributed equally and in parallel. Variation in other genome regions does not help clarify the picture. Some nuclear loci seem to support rapid population growth (Reich and Goldstein 1998; Zhao et al. 2000; Alonso and Armour 2001), but others (Takahata et al. 1995; Harding et al. 1997; Harris and Hey 1999a; Beaumont, 1999) do not.

A Recent Shift from Polygyny to Monogamy?

In this section we explore the consequences of a model in which the effective populations of females expanded earlier than those of males. The main genetic consequence would be that the terminal branches in the Y-chromosome tree would be shorter than those in the mitochondrial tree, because of the shorter time elapsed from the expansion. In addition, as we showed, an ascertainment bias causes rare (p < 0.01) mutations in the terminal branches of the tree to be largely missed. In this way, very few haplotypes will differ for mutations in the terminal branches, both because there are not many such mutations and because a fraction of them would not be discovered. Ultimately, that would result in ragged and multimodal Y-chromosome mismatch distributions and in insignificant values of the D and F S statistics (Rogers and Jorde 1995). That is not the case for the STR sites of Pritchard et al. (1999), both because they mutate more rapidly and because all their alleles are identified without any bias.

As a consequence, we propose the following.

  1. 1

    Female effective population sizes, in agreement with mitochondrial studies, increased relatively early, at the moments at which archaeological evidence places the human expansions from Africa into the various continents (see Table 4) (Excoffier and Schneider 1999).

  2. 2

    Male effective population sizes increased later (see Table 4), and therefore, over much of human prehistory, polygyny was the rule rather than the exception; a high variance in the males' offspring numbers is a necessary consequence of that mating system.

  3. 3

    Such recent expansions of male population sizes can be identified at the STR (Pritchard et al. 1999) but not at the SNP (Pereira et al. 2001; this study) level, because the sample sizes used to discover SNP polymorphisms likely prevent the identification of variable sites whose rarer allele has a frequency p ≤ 0.01, i.e., the ones where the expansion has had the greatest chance of leaving a mark.

Under the hypothesis of a delayed increasing in N m, most apparent inconsistencies among the mtDNA and Y-chromosome evidence would disappear. It would no longer be necessary to imagine different selection regimes on female- and male-transmitted traits (although these regimes may well have differed) or a different tendency to migrate (although male and female migration rates may well have differed). The greater genetic variances observed for the Y chromosome (Seielstad et al. 1998) could be explained by the greater impact of genetic drift over the smaller populations of males.

In turn, the resulting increased differentiation of the male populations prior to the shift from polygyny to monogamy is also expected to result in a stronger impact of admixture on the males' genetic diversity, which would also contribute to determining multimodal mismatch distributions. Indeed, hybrid populations may be more or less internally heterogeneous, depending on the degree of differentiation between the parental groups from which they derive. If the parental groups differed substantially and the contact is recent, hybrid populations fail to show signatures of past expansions (Marjoram and Donnelly 1994). There is empirical evidence of this phenomenon in some bimodal mitochondrial mismatch distributions observed in Africa, probably reflecting the presence of genes from two different gene pools (Bandelt and Forster 1997; Brakez et al. 2001). If N m increased later than N f, admixture has had a greater chance to induce multimodality in the Y chromosome than in the mitochondrial mismatch distributions, making male population growth difficult to identify.

Is there any evidence of a relatively recent shift from polygyny to monogamy in humans? In the ethnological literature, there is ample consensus that humans evolved in multimale polygynous bands (Lee and DeVore 1968; Flannery 1972b; Keen 1982; Badcock 1991), like gorillas (Pussey 2001). Under those conditions, although the sex ratio at birth is close to one in humans (James 1987), N m was smaller than N f, because some men made a large contribution to the next generation's gene pool, and some did not contribute at all. Note that if the number of offspring per male is highly variable, the N m estimated from genetic data is even lower than the actual number of reproducing males (Crow 1958; Donnelly et al. 1996).

A likely moment for the shift to monogamy is difficult to define exactly, and there is no reason to imagine that it happened at the same moment everywhere. The dates Pritchard et al. (1999) estimated from their different continental and intercontinental groups of samples (Table 4) are clearly averages of processes that probably occurred at different times but affected most human populations. Slightly later than these average dates, but within their confidence interval, namely, between 10,000 and 5000 years ago in Europe and Asia and more recently in Africa and in the Americas, a major change is documented in the archeological record, i.e., the development of technologies for farming and animal breeding, or Neolithic transition (Cavalli-Sforza et al. 1994; Bellwood 2001). Although the shift to food production may have caused an initial decline in nutrition and health (Cohen 1989), there is no doubt that its long-term effect was an increase in population density and growth rates (Weiss 1984; Cavalli-Sforza et al. 1994). Sociobiological studies do suggest that the development of extensive farming resulted in a decrease in the levels of polygyny, although those levels did not appear to be very high among hunters and gatherers either (van den Berghe 1979).

Especially with the shift to farming in the Neolithic period, although not necessarily then, sedentary, more structured communities developed. Nuclear families replaced the polygamous, extended-family compounds typical of hunting–gathering populations, and the household, rather than the band, became the main socioeconomic unit (Flannery 1972a). If so, monogamy may have become widespread, and N m may have started increasing. Nowadays there are examples of polygyny both in farming and in hunting–gathering societies. For the above model to be generally correct, however, it seems necessary only that polygyny (and the related small N m) became increasingly uncommon as time passed.

Then, unless so far unspecified selective pressures have not been the main factor shaping human gene genealogies, Pritchard and co-workers' (1999) time estimates reflect a change from a typically polygynic social structure to one in which more males had access to reproduction. Our empirical results and our simulations suggest that that change happened too recently to leave a trace in the variation observed at most Y-chromosome SNPs. Starting at approximately the same time, populations that did not turn to food production began to suffer from competition with farming communities; the resulting bottlenecks are reflected in their current mitochondrial diversity (Excoffier and Schneider 1999). This model seems able to reconcile the results of several, apparently contradictory, analyses of genes inherited through the female and male lines. One could test the model using comparisons of Y-chromosome diversity in contemporary polygynous and monogamous populations. This model may provide a framework for addressing specific questions concerning human demography of the past.