Ancestral Population Sizes and Species Divergence Times in the Primate Lineage on the Basis of Intron and BAC End Sequences

Satta, Yoko; Hickerson, Michael; Watanabe, Hidemi; O’hUigin, Colm; Klein, Jan

doi:10.1007/s00239-004-2639-2

Ancestral Population Sizes and Species Divergence Times in the Primate Lineage on the Basis of Intron and BAC End Sequences

Published: October 2004

Volume 59, pages 478–487, (2004)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

Journal of Molecular Evolution Aims and scope Submit manuscript

Ancestral Population Sizes and Species Divergence Times in the Primate Lineage on the Basis of Intron and BAC End Sequences

Download PDF

Yoko Satta¹,
Michael Hickerson^1,2,
Hidemi Watanabe³,
Colm O’hUigin⁴ &
…
Jan Klein⁵

277 Accesses
29 Citations
9 Altmetric
1 Mention
Explore all metrics

Abstract

The effective sizes of ancestral populations and species divergence times of six primate species (humans, chimpanzees, gorillas, orangutans, and representatives of Old World monkeys and New World monkeys) are estimated by applying the two-species maximum likelihood (ML) method to intron sequences of 20 different loci. Examination of rate heterogeneity of nucleotide substitutions and intragenic recombination identifies five outrageous loci (ODC1, GHR, HBE, INS, and HBG). The estimated ancestral polymorphism ranges from 0.21 to 0.96% at major divergences in primate evolution. One exceptionally low polymorphism occurs when African and Asian apes diverged. However, taking into consideration the possible short generation times in primate ancestors, it is concluded that the ancestral population size in the primate lineage was no smaller than that of extant humans. Furthermore, under the assumption of 6 million years (myr) divergence between humans and chimpanzees, the divergence time of humans from gorillas, orangutans, Old World monkeys, and New World monkeys is estimated as 7.2, 18, 34, and 65 myr ago, respectively, which are generally older than traditional estimates. Beside the intron sequences, three other data sets of orthologous sequences are used between the human and the chimpanzee comparison. The ML application to these data sets including 58,156 random BAC end sequences (BES) shows that the nucleotide substitution rate is as low as 0.6–0.8 × 10⁻⁹ per site per year and the extent of ancestral polymorphism is 0.33–0.51%. With such a low substitution rate and short generation time, the relatively high extent of polymorphism suggests a fairly large effective population size in the ancestral lineage common to humans and chimpanzees.

Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split

Article Open access 02 January 2024

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

Article Open access 06 November 2019

The spread of the first introns in proto-eukaryotic paralogs

Article Open access 19 May 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Molecular genetic techniques, combined with rigorous statistical methods based on population genetic models that incorporate inherent stochasticity of nucleotide substitution processes or coalescence processes of genes in a population (Kingman 1982; Tajima 1983), allow us to answer questions regarding species divergence times and extents of ancestral polymorphism. Since orthologous genes from two species must have diverged before the divergence of the species, the divergence time of genes always exceeds that of the species. Hence, if we use the gene divergence as a representative of the species divergence and calculate the nucleotide substitution rate, we inevitably overestimate the rate. It is therefore important to determine to what extent the gene divergence time exceeds the species divergence time. The key factor is the effective size of the ancestral species, because the excess is determined by the coalescence process of ancestral lineages of orthologous genes. Several methods have been developed to estimate the ancestral population size together with the species divergence time (for review, see Edwards and Beerli 2000; Takahata and Satta 2002). These include the trichotomy method (Nei 1987; Wu 1991), the two-species maximum likelihood (ML) method (Takahata et al. 1995), and more generalized ML methods (Yang 1997, 2002; Rannala and Yang 2003; Wall 2003).

The trichotomy method uses genealogies of orthologous genes in three closely related species, such as humans, chimpanzees, and gorillas. Depending on the coalescence process of genes in the ancestral population of the two most closely related species, the genealogies may not be identical to the species phylogeny. The mean coalescence time [E(t)] is 2N generations for a pair of genes in the ancestral population with effective size N. If N is large compared to the time interval (T) between the two successive species divergences, the coalescence is likely to have taken place before the three species descended from the common ancestor. If this happens, the gene genealogy can differ from the species phylogeny in two-thirds of the cases so that the incompatibility probability is given by (2/3)e^−T/(2N) (Nei 1987). When the incompatibility probability and T are at hand, we can estimate N. In practice, the incompatibility probability must be estimated by comparisons of gene genealogies at a large number of different loci. However, sampling errors as well as recombination and/or multiple hits of nucleotide substitutions may obscure a true gene genealogy. In addition, it is difficult to determine the time interval T accurately. These uncertainties make the N estimate by the trichotomy method suggestive at best.

The two-species ML method uses pairs of orthologous loci sampled from two species. The method divides the nucleotide divergences into two categories: the nucleotide divergences that have occurred before and after the speciation. If the time elapsed since speciation is t_s years and the coalescence time in the ancestral population is t generations, the divergence time of a pair of orthologous loci is t_s + tg years, where the generation time in the ancestral species is g years. If the nucleotide substitution rate per site per year is μ, the number (k_i) of substitutions at the ith locus accumulated in the t_s + tg interval is (t_s + tg) μL_i, where L_i stands for the number of nucleotides compared. For a large number of orthologous locus pairs, t_sμ is constant over all pairs but tgμ differs from locus to locus according to the exponential distribution with both mean and standard deviation 2Ngμ (Takahata et al. 1995; Takahata and Satta 1997). Based on this principle, the most likely estimates of t_s and t can be obtained to fit the variation of k_i among different loci. In order for the method to yield accurate estimates, however, two conditions must be fulfilled: the μ remains constant among different loci and the sites in each pair of orthologous loci are not shuffled by intragenic recombination. In practice, unfortunately, these conditions are often not fulfilled. The heterogeneity of μ across loci enlarges the variance of k_i, leading to an overestimation of the ancestral polymorphism [x = 2E(t) gμ = 4Ngμ] and an underestimation of the species divergence time (y = 2t_sμ). On the other hand, recombination reduces the variance of k_i and underestimates the ancestral polymorphism (Yang 1997, 2002; Takahata and Satta 2002; Wall 2003).

To satisfy the assumed constant rate of nucleotide substitutions among different loci, synonymous substitutions may be suitable (Kimura 1983) and be used in the two-species ML method (Takahata and Satta 1997; Takahata 2001). However, it turns out that the use of synonymous substitutions has raised several problems. The nucleotide diversity may vary among loci because of linkage to selected sites (Hartl and Clark 1997, pp. 184–185) or biased mutation pressure (Bielawski et al. 2000 and references therein). In addition, the number of synonymous sites at a locus is generally small and tends to be underestimated by frequently used methods if nucleotide substitutions are biased toward transitions (Nei and Kumar 2000, p. 57). To avoid these problems, we may use intron or intergenic sequences (Chen and Li 2001).

In the present study, we use intron sequences of 20 loci and make two-species ML estimates of the ancestral population size and species divergence time for pairs of six primate species, i.e., humans, chimpanzees, gorillas, orangutans, and representatives of Old World monkeys (OWMs) and New World monkeys (NWMs). For the human–chimpanzee pair, we also apply the two species ML method to exon sequences of 37 loci (Takahata 2001), 53 intergenic sequences (Chen and Li 2001), and a set of 58,156 human–chimpanzee pairs of BAC End Sequences (BES; Fujiyama et al. 2002).

Materials and Methods

Intron Sequences

The intron data set was taken from O’hUigin et al. (2002; Table 1), which contains noncoding sequences such as the 5′ or 3′ untranslated regions, promoter regions, and introns. Since functional constraints against transcription or translation regulation may operate on parts of the nonintron regions, we used only introns. In the case of genes containing more than one intron, we concatenated these intron sequences. There are 20 loci for which intron sequences are available in six primate species (or taxa): humans, chimpanzees, gorillas, orangutans, macaques or baboons, and tamarins or marmosets. Since OWMs and NWMs are monophyletic to each other and to other primates, sequences from macaques/baboons and tamarins/marmosets were used as a representative of OWMs and NWMs, respectively. For simplicity, humans, chimpanzees, gorillas, orangutans, OWMs, and NWMs are abbreviated H, C, G, O, M, and T, respectively, throughout the text. Although another large set of 53 hominoid intergenic sequences is also available (Chen and Li 2001), we did not use it for two reasons. First, the set lacks OWM and NWM sequences. The divergence time of OWMs and NWMs is still controversial (Pilbeam 1984; Martin 1993; Kumar and Hedges 1998; Goodman et al. 1998; Takahata 2001; Glazko and Nei 2003; Hasegawa et al. 2003) and its estimation is one of the aims of the present study. Second, the sequences we used are about 800 bp long on average. They are longer than in Chen and Li (2001) (∼500 bp on average) and therefore less prone to stochastic errors.

Table 1 The number of phylogenetically informative sites and the number of nucleotides in introns of 20 loci (sequences from O’hUigin et al. 2002)

Full size table

Sequence Alignment and Phylogenetic Analysis

Sequences were aligned by the Clustal W program (Thompson et al. 1994) and the resulting alignments were modified manually. In the analysis, sites that include gaps were removed. For phylogenetic analysis, we used the neighbor-joining (NJ) method based on the number of nucleotide differences (the p-distances) as well as the maximum parsimony (MP) method implemented in PHYLIP 3.572 (Felsenstein 1993).

ML Method

The two-species ML method used here is essentially the same as that in Takahata and Satta (1997). One difference is that the present method implements multiple-hit corrections. The largest observed nucleotide difference among the six primates is about 10% at most, so that multiple-hit corrections were made by the Jukes and Cantor (1969) method (see also Nei and Kumar 2000, p. 23). The computer program is written in Mathematica (version 3.0; Wolfram Research Champaign IL) and is available on request.

Result

Substitution Rates of Intron Sequences

To examine whether the data set of 20 intron sequences (O’hUigin et al. 2002) is representative of the entire genome in terms of nucleotide divergences, we compared those of humans and chimpanzees with a collection of pairs of BES (Fujiyama et al. 2002). The collection consisted of 58,156 BES pairs, from which we chose 20 pairs at random. Repeating this subsampling 1000 times, we obtained the distributions of their mean and variance of nucleotide divergences (Fig. 1). The mean and variance in the 20 pairs of human and chimpanzee intron sequences are 0.0147 and 4.39 × 10⁻⁵, respectively, and both are within the 90% confidence regions of the mean and variance distributions for the BES random subsamples. We therefore concluded that the 20 intron sequences could be regarded as representatives of the human and chimpanzee genome.

In comparisons between M or T and hominoids (H, C, G, or O), rate heterogeneity of nucleotide substitutions is apparent. O’hUigin et al (2002) showed that 10%–20% of substituted sites have experienced multiple hits, even when the average nucleotide divergence is as low as 10%. Multiple substitutions often result in phylogenetically incompatible sites within a single gene or region, and in the intron sequence data set, several phylogenetically incompatible sites are observed (Table 1). If the extent of this incompatibility differs greatly from locus to locus, the cause might be attributed to rate heterogeneity of nucleotide substitutions among different loci. Therefore we counted the number of sites that experienced multiple substitutions by the maximum parsimony method, assuming the standard phylogenetic relationship among the six primate species, namely, ((((H,C,G)O)M)T). Incompatible sites among H, C, and G were ignored, because they have likely been generated by intragenic recombination (Satta et al. 2000; O’hUigin et al. 2002).

Interlocus rate heterogeneity was examined by the binomial distribution. Based on the average proportion of multiple hits over the 20 loci, the expected number (m_i) of multiple hits at the ith locus was calculated. We then obtained the probability (P_i) of having an equal or larger (smaller) number of multiple hits compared to the observation at the ith locus. Since the number of sites compared is large and m_i is small, we used the Poisson approximation to calculate P_i (Table 1). The result reveals that the insulin (INS) and γ-globin (HBG) introns show more frequent multiple substitutions than the expectation (Table 1; p < 0.05 and p < 0.001), suggesting that the nucleotide substitution rate at these loci is significantly higher than that at other loci.

Phylogenetic Relationships

NJ trees at 18 of 20 loci are topologically identical to the standard phylogenetic relationships of the six primates. The two exceptions are the β2 microglobulin (B2M) and complement 4B (C4B) loci. The B2M tree has no substitutions on a branch leading to a cluster of (H,C,G) and in the C4B tree the same thing happens on a branch leading to a cluster of (H,C,G,O). Examination of phylogeneticaly informative sites (Table 1) and MP analyses (data not shown) can confirm the absence of substitutions on these branches. However, since B2M and C4B do not show any significant shortages or excesses of the number of multiple hits (Table 1), it is unlikely that the unusual substitution patterns result from a slowdown or acceleration of the nucleotide substitution rate. Therefore we did not exclude these loci from the following ML analysis.

Intragenic Recombination Within Intron Sequences

To examine linkage between sites within a locus, we analyzed individual informative sites for their support of phylogenetic relationships among H, C, and G. The analysis reveals that four loci (ANP, TNF, DAF, and HBBP1) contain no informative sites with regard to the (H,C,G) relationships. Eleven loci contain informative sites that support one of the three possible relationships: F9, B2M, PAH, LCAT, BOP, and IL3 support the (H,C)G; APOA1, UOX, and C4B, the (C,G)H; and AFP and EPO, the (H,G)C. Finally, the remaining five loci (ODC1, GHR, INS, HBE, and HBG) show that some sites support one relationship, while others favor a different one even at a single locus (Table 1). For example, the HBG locus contains four phylogenetically informative sites, one of which supports the (H,C)G, another the (C,G)H, and two the (H,G)C relationship. In these five loci, intergenic recombination is, therefore, likely to have occurred in the ancestral population of the three species. Since these five loci include INS and HBG at which loci rate heterogeneity is apparent, we may exclude them from the two-species ML analysis.

The Two-Species ML Method

The ancestral population size (N) and species divergence time (t_s) are obtained in terms of x = 4Ngμ and y = 2t_sμ in the two-species ML method. To check the reliability of these estimates for the 15 different pairs of the six primates, we divide these pairs into five classes with respect to shared ancestral populations. The classes are [(H,C)], [(H,G), (C,G)], [(H,O), (C,O), (G,O)], [(H,M), (C,M), (G,M), (O,M)], and [(H,T), (C,T), (G,T), (O,T), (M,T)]. We designate the ancestral populations of these classes HC, HCG, HCGO, HCGOM, and HCGOMT, respectively. By definition, members in each class share a common ancestral population immediately before their divergences. Thus, for example, the ancestral population of the (H,O) pair is also the ancestral population of the (G,O) and (C,O) pairs, and these three pairs are in turn all members of the same HCGO class. Because of the sharing of ancestral populations, the estimates of x and y must be the same for all the pairs in a given class, even though approximately.

Before excluding the five loci mentioned above, we applied the ML method to the entire data set of 20 intron sequences and estimated x and y for each of 15 pairs of species. The result reveals satisfactory consistency in estimates of y within each class and fairly large estimates of x, ranging from 0.42 to 1.5% (Table 2). With these as a reference, we applied the ML method to the trimmed data set, which excludes ODC1, GHR, INS, HBE, and HBG because of high nucleotide substitution rates or intragenic recombination (Table 1). The ML estimates of y are in good agreement with those for the entire data set. Because t_s remains constant among different loci, the y estimates are not much affected by trimming the data set. However, the x estimates become substantially small (Table 2).

Table 2 ML estimates (%) of x = 4Ngμ^c and y = 2t_sμ^c based on the entire or the truncated data set^b of intron sequences and exon sequences

Full size table

Discussion

Comparison Between Intron-Based and Exon-Based x and y Estimates

We compared the present estimates with the previous ML estimates based on exon sequences (Table 2). It is interesting that the x estimates based on exon sequences are close to those on the entire data set of intron sequences, suggesting that exon data still contain heterogeneous sequences regarding the nucleotide substitution rate or intragenic recombination. On the other hand, the y estimates based on exons are much larger than those on introns. Considering that the y estimates are not much affected by the exclusion of outrageous sequences (Table 2), the relatively large y estimates based on the exon sequences are caused by an overestimation of synonymous divergences, but not by rate heterogeneity of nucleotide substitutions.

The x estimates are generally smaller in the trimmed data set than in the entire data set of 20 intron sequences as well as in the exon data set. In particular, the x estimate for the HCGO class is consistently much smaller than that of any other (Table 2) and is as small as that of extant humans (ca. 0.1%). Although further accumulation of intron sequences is necessary, this may suggest that the primate lineage has experienced a reduction of the population size when Asian apes diverged from African apes (see later).

Patterns of Nucleotide Substitutions inHuman–Chimpanzee Comparisons

We examined whether or not the variation of the nucleotide divergences observed in human–chimpanzee comparisons can be explained by factors other than a relatively large ancestral population size. Specifically we focused on the effect of a limited number of sites compared and different substitution rates among different loci. To evaluate the effect, we performed a computer simulation that imitates the human and chimpanzee BES data.

We consider three nucleotide substitution models. The first focuses on the variation of the nucleotide divergence caused by a limited number of sites compared at individual BES loci. We use a constant nucleotide substitution rate and assume that the coalescence time in the ancestral population is negligibly small compared to the species divergence time. Using the observed mean nucleotide divergence per site (d_HC) over 58,158 BES loci, the expected number of nucleotide substitutions at the ith BES locus is estimated as d_HCL_i, where L_i is the number of nucleotides compared. Setting d_HCL_i as a Poisson parameter, we generate a Poisson random variable (k_i) for the ith locus and calculate the number of nucleotide substitutions per site as k_i/L_i. Repeating this process 58,158 times, we obtain the distribution of k_i/L_i (blue line in Fig. 2).

The second model is based on the negative binomial distribution, and, as in the first, we ignore the presence of ancestral polymorphism. The variation of nucleotide divergences is then attributed mainly to the variation in the substitution rate among different loci (Yang 2002). Following equation (5.14) in Takahata and Satta (2002), we estimate the shape parameter α (α = 5.82) of the gamma distribution of the nucleotide substitution rate and calculate the mean substitution rate (r) from d_HC = 2rt_s, assuming that t_s = 6 × 10⁶ years ( Brunei et al. 2002). We then generate a gamma variable γ_i for the substitution rate at the ith locus and determine the number of nucleotide substitutions (k_i) by following the Poisson distribution with mean 2t_sL_iγ_i. This procedure is equivalent to generating a random variable that follows the negative binomial distribution. Again repeating this process 58,156 times, we obtain the distribution of k_i/L_i (green line in Fig. 2).

The third model is based on the convolution of the geometric and Poisson distributions, as derived in Takahata et al. (1995), and takes explicit account of ancestral polymorphism. Some extent of the variation in the number of nucleotide substitutions can be attributed to the variation in coalescence times in the ancestral population. To simulate this model, we first estimate x_HC and y_HC from the 58,156 BES data. Since y_HC = 2t_sμ, we generate a Poisson variable with mean y_HCL_i for the number of nucleotide substitutions at the ith locus that can accumulate after the species divergence. We also generate a random variable that is geometrically distributed with mean x_HCL_i for the number of nucleotide substitutions during the phase of ancestral polymorphism. Dividing the sum of these Poisson and geometric random numbers by L_i, we obtain a per-site random variable (x_i + y_i) for each of 58,156 loci and plot the distribution (red line in Fig. 2).

As expected, the mean of sequence divergences in each of the above three models is the same as the observation (0.0124). However, the variance varies depending on models. Whereas the variance in the third convolution model (7.54 × 10⁻⁵) is in good agreement with the observation (7.55 × 10⁻⁵), the variance in both the Poisson and the negative binomial models (4.22 × 10⁻⁵ and 6.85 × 10⁻⁵) is somewhat small. In fact, the Kolmogorov–Smilnov test (Sokal and Rohlf 1969, pp. 704–721) reveals that the first and second models do not fit the observation (p < 0.01 for each case). We therefore conclude that the distribution of sequence divergences best fits the observation of the BES data set under the convolution model (Fig. 2; p > 0.05). We also find that at least for humans and chimpanzees, the variation in nucleotide divergences among loci does not appear to be much affected by heterogeneity in nucleotide substitution rates.

Human–Chimpanzee Ancestral Population Size

There are four data sets of nucleotide sequences, which can be used for the ML estimation of the ancestral human and chimpanzee population size (Chen and Li 2001; Takahata 2001; O’hUigin et al. 2002; Fujiyama et al. 2002). They are 53 intergenic regions, 37 exonic regions, 15 introns, and 58,156 BES, respectively. Of these, the ML estimate from the BES data seems the most reliable because of an exceptionally large number of loci examined (Fig. 3). To evaluate the effect of the number of loci on our ML estimates, we resample 20, 50, or 100 loci from the BES data and examine the estimates of x and y based on 1000 such replications. The estimates obtained for the entire BES data are x = 0.51% and y = 0.73%. However, as the number of loci becomes small, the range of both x and y estimates becomes broad. Even for 100 loci and under the condition of 95% confidence limits, the x estimate ranges from 0.25 to 0.76% and the y estimate ranges from 0.59 to 0.99% (data not shown). The 90% confidence region of x and y for BES is extremely small compared with that for other data sets (Fig. 3).

Table 3 Estimates of the extent of polymorphism (x = 4Ngμ) or effective size (N) in the ancestral population and of species divergence (y = 2t_sμ) for humans and chimpanzees by the maximum likelihood (ML) method

Full size table

It may be noted that the x estimate for the intergenic sequences in Chen and Li (2001) is quite small (Table 3). To make a quantitative assessment, we calculate the mean and variance of nucleotide substitutions over the 53 loci and compare them with those in 1000 replications of 53 resampled BES data sets. The mean of Chen and Li’s data set is 1.23%, which is in good agreement with the 1.24% for the resampled BES data. However, the variance of Chen and Li’s data is only 3.01 × 10⁻⁵, which is significantly smaller (p < 0.01) than that of the BES data (7.55 × 10⁻⁵). Thus, although cause is unknown, Chen and Li’s data show an unexpected uniformity in the extent of nucleotide substitutions between humans and chimpanzees.

Yang (2002) developed a method for estimating an ancestral population size using ML and Bayesian approaches. Taking into consideration different substitution rates among different loci, he applied these approaches to Chen and Li’s data and obtained x = 0.1%, which is almost the same as that for extant humans (Li and Sadler 1991). However, if the ancestral population size were the same as the extant human population size (10⁴), most pairs of H and C orthologous genes should have coalesced within the ancestral population. Under the assumption of the ancestral population size of 10⁴ individuals and the interval of T = 1 myr between the human–chimpanzee divergence and the (human–chimpanzee)–gorilla divergence, the proportion of discordance between the species and the gene tree becomes 0.1% from the trichotomy method (Nei 1987). In other words, 99.9% of the data should have supported the (H,C)G relationship, but in fact only 42% do (Chen and Li 2001). The small estimated value of x is not owing to the methodology since the simple two-species ML method also gives x = 0.099% (Table 3). Since the small estimate of x cannot be achieved by taking heterogeneity of nucleotide substitution rates, it must result from an unusual small variance in the number of substitutions.

There still remain differences among the ML estimates of x and y in other data sets (Table 3). Nonetheless, we can draw two conclusions. First, except for the estimate by Takahata (2001), which appears to be affected by overestimation of synonymous divergences, the y estimate for chimpanzees and humans ranges only from 0.73 to 1.04% (Table 3). Assuming the divergence time of 6 myr between the two species ( Brunei et al. 2002), we estimate the nucleotide substitution rate as 0.6–0.8 × 10⁻⁹ per site per year. This rate is lower than generally accepted (cf. Li 1997). Second, the x estimate ranges from 0.33 to 0.51%. These values are four to five times larger than the estimate of the extant human population (Li and Sadler 1991). If we further take account of a prolonged generation time of extant humans, the effective size of the ancestral human–chimpanzee population must have been approximately 10 times larger than 10⁴ for extant humans (Takahata and Satta 1997; Takahata 2001).

Demographic History of Primate Populations During the Last 50 myr

Discrepancies between molecular and paleontological estimates of primate divergence time have been pointed out recently (Martin 1993; Tavaré et al. 2002), and a new statistical approach pushes the last common ancestor of primates back as old as 81.5 myr ago. Martin (1993) suggested that the divergence time of major nodes in the primate phylogeny was pushed back at least 10 myr, and our results support this view. If we assume that the divergence time between humans and chimpanzees is 6 myr (Burnet et al. 2002), our ML estimates of y (Table 2) suggest that the divergence times of the major nodes in the primate phylogeny become 7.2 myr for (H,C)G, 18 myr for (H,C,G)O, 34 myr for (H,C,G,O)M, and 65 myr for (H,C,G,O,M)T. These divergence times are older than that indicated by fossil records.

Recently, there are several molecular approaches to estimate the divergence time of primate species (Kumar and Hedges 1998; Glazko and Nei 2003; Hasegawa et al. 2003). When we compare our results with these estimates, our estimate of the divergence time of gorillas from humans (7.2 myr) shows good agreement with others (ranging 7 to 12 myr). Similarly, our estimate of the divergence time of humans from orangutan (18 myr) appears to be in the range of others (ranging 8 to 18 myr). In addition, this relatively old divergence time of orangutans is consistent with the time when the African continent became combined with Eurasia some 18 myr ago (Waddell and Penny 1996). However, regarding more ancient divergences, there are large discrepancies among various estimates. For instance, Kumar and Hedges (1998) estimated the divergence time of OWMs from humans as 21–24 myr and Glazko and Nei (2003) obtained a similar estimate (21–25 myr). On the other hand, Hasegawa et al. (2003) estimated the divergence to be as old as 31–38 myr. Our estimate is consistent with the latter. Furthermore, Kumar and Hedges (1998) and Glazko and Nei (2003) estimated the date of NWM divergence as 39–56 and 32–36 myr, respectively. On the other hand, our estimate was much older (65 myr). Although this discrepancy may come from different data and methods used, it is evident that more studies for the primate phylogeny are necessary, especially to reach a consensus about the divergence time.

To convert the amount of ancestral polymorphism (measured by x = 4Ngμ) into the effective size (N) of the ancestral population, information on the generation time in that population is required. Although there are uncertainties about the generation time of nonhuman primates, it is shorter than the generation time of extant humans (Gavan 1953). Under this assumption, the estimated values of x suggest that the ancestral population size has been of the order of 10⁵ throughout most of primate evolution, although there might be an occasional reduction as discussed earlier. It also appears that such a large size of the ancestral population of humans, chimpanzees, and gorillas is consistent with the high extent of DNA polymorphism in extant nonhuman primates (Kaessmann et al. 1999, 2001; Satta 2001).

References

JP Bielawski KA Dunn Z Yang (2000) ArticleTitleRates of nucleotide substitution and mammalian nuclear gene evolution: Approximate and maximum-likelihood methods lead to different conclusions Genetics 156 1299–1308 Occurrence Handle1:CAS:528:DC%2BD3cXosFSntro%3D Occurrence Handle11063703
CAS PubMed Google Scholar
M Brunet F Guy D Pilbeam et al. (2002) ArticleTitleA new hominid from the Upper Miocene of Chad, Central Africa Nature 418 145–151 Occurrence Handle10.1038/nature00879 Occurrence Handle1:CAS:528:DC%2BD38XltFGls74%3D Occurrence Handle12110880
Article CAS PubMed Google Scholar
F-C Chen W-H Li (2001) ArticleTitleGenomic divergence between human and other hominoids and the effective population size of the common ancestor of human and chimpanzee Am J Hum Genet 68 444–456 Occurrence Handle10.1086/318206 Occurrence Handle1:CAS:528:DC%2BD3MXhtl2lsLg%3D Occurrence Handle11170892
Article CAS PubMed Google Scholar
SV Edwards P Beerli (2000) ArticleTitlePerspective: Gene divergence, population divergence, and the variance in coalescence time in phylogeographic studies Evolution 54 1839–1854 Occurrence Handle1:STN:280:DC%2BD3M3jt1Kjsw%3D%3D Occurrence Handle11209764
CAS PubMed Google Scholar
J Felsenstein (1993) PHYLIP (Phylogeny Inference Package) version 3.572. Distributed by the author Department of Genetics, University of Washington Seattle
Google Scholar
A Fujiyama H Watanabe A Toyoda et al. (2002) ArticleTitleConstruction and analysis of a human-chimpanzee comparative clone map Science 295 131–134 Occurrence Handle10.1126/science.1065199 Occurrence Handle11778049
Article PubMed Google Scholar
JA Gavan (1953) ArticleTitleGrowth and development of the chimpanzee; A longitudinal and comparative study Hum Biol 25 93–143 Occurrence Handle1:STN:280:CyuD3MnovVw%3D Occurrence Handle13096075
CAS PubMed Google Scholar
GV Glazko M Nei (2003) ArticleTitleEstimation of divergence times for major lineages of primate species Mol Biol Evol 20 424–434 Occurrence Handle10.1093/molbev/msg050 Occurrence Handle1:CAS:528:DC%2BD3sXisFaqsbs%3D Occurrence Handle12644563
Article CAS PubMed Google Scholar
M Goodman CA Porter J Czelusniak SL Page H Schneider J Shoshani G Gunnell CP Groves (1998) ArticleTitleToward a phylogenetic classification of primates based on DNA evidence complemented by fossil evidence Mol Phylogenet Evol 9 585–598 Occurrence Handle10.1006/mpev.1998.0495 Occurrence Handle1:STN:280:DyaK1czjsVGrsQ%3D%3D Occurrence Handle9668008
Article CAS PubMed Google Scholar
DL Hartl AG Clark (1997) Principles of population genetics, 3rd ed Sinauer Associates Sunderland, MA
Google Scholar
M Hasegawa JL Thorne H Kishino (2003) ArticleTitleTime scale of eutherian evolution estimated without assuming a constant rate of molecular evolution Genes Genet Syst 78 267–283 Occurrence Handle10.1266/ggs.78.267 Occurrence Handle1:CAS:528:DC%2BD3sXhtVSqsLfL Occurrence Handle14532706
Article CAS PubMed Google Scholar
TH Jukes CR Cantor (1969) Evolution of protein molecules HN Munro (Eds) Mammalian protein metabolism III Academic Press New York 21–132
Google Scholar
H Kaessmann V Wiebe S Pääbo (1999) ArticleTitleExtensive nuclear DNA sequence diversity among chimpanzees Science 286 1159–1162
Google Scholar
H Kaessmann V Wiebe G Weiss S Pääbo (2001) ArticleTitleGreat ape DNA sequences reveal a reduced diversity and an expansion in humans Nature Genet 27 155–156 Occurrence Handle10.1038/84773 Occurrence Handle1:CAS:528:DC%2BD3MXhtFGktL0%3D Occurrence Handle11175781
Article CAS PubMed Google Scholar
M Kimura (1983) The neutral theory of molecular evolution Cambridge University Press Cambridge
Google Scholar
JFC Kingman (1982) ArticleTitleOn the genealogy of large populations J Appl Prob A19 27–43
Google Scholar
S Kumar B Hedges (1998) ArticleTitleA molecular timescale for vertebrate evolution Nature 392 917–920 Occurrence Handle10.1038/31927 Occurrence Handle1:CAS:528:DyaK1cXjtV2jur8%3D Occurrence Handle9582070
Article CAS PubMed Google Scholar
W-H Li (1997) Molecular evolution Sinauer Associates Sunderland, MA
Google Scholar
W-H Li LA Sadler (1991) ArticleTitleLow nucleotide diversity in man Genetics 129 513–523 Occurrence Handle1:CAS:528:DyaK3MXmslKmsb4%3D Occurrence Handle1743489
CAS PubMed Google Scholar
RD Martin (1993) ArticleTitlePrimate origins: Plugging the gaps Nature 363 223–234 Occurrence Handle10.1038/363223a0 Occurrence Handle1:STN:280:ByyB2MvlslA%3D Occurrence Handle8487862
Article CAS PubMed Google Scholar
M Nei (1987) Molecular evolutionary genetics Columbia University Press New York
Google Scholar
M Nei S Kumar (2000) Molecular evolution and hylogenetics Oxford University Press New York
Google Scholar
C O’hUigin Y Satta N Takahata J Klein (2002) ArticleTitleContribution of homoplasy and of ancestral polymorphism to the evolution of genes in anthropoid primates Mol Biol Evol 19 1501–1513 Occurrence Handle1:CAS:528:DC%2BD38XntVyhtL4%3D Occurrence Handle12200478
CAS PubMed Google Scholar
D Pilbeam (1984) ArticleTitleThe descent of hominoids and hominids Sci Am 250 84–96 Occurrence Handle1:STN:280:BiuC2cnntVA%3D Occurrence Handle6422549
CAS PubMed Google Scholar
B Rannala Z Yang (2003) ArticleTitleBayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci Genetics 164 1645–1656 Occurrence Handle1:CAS:528:DC%2BD3sXnvF2nt74%3D Occurrence Handle12930768
CAS PubMed Google Scholar
Y Satta (2001) ArticleTitleComparison of DNA and protein polymorphisms between humans and chimpanzees Genes Genet Syst 76 159–168 Occurrence Handle10.1266/ggs.76.159 Occurrence Handle1:CAS:528:DC%2BD3MXntFagtLY%3D Occurrence Handle11569499
Article CAS PubMed Google Scholar
Y Satta J Klein N Takahata (2000) ArticleTitleDNA archives and our nearest relative: the trichotomy problem revisited Mol Phylogenet Evol 14 259–275 Occurrence Handle10.1006/mpev.2000.0704 Occurrence Handle1:CAS:528:DC%2BD3cXhtFGjsr0%3D Occurrence Handle10679159
Article CAS PubMed Google Scholar
RR Sokal FJ Rohlf (1969) Biometry W.H. Freeman New York
Google Scholar
F Tajima (1983) ArticleTitleEvolutionary relationship of DNA sequences in finite populations Genetics 105 437–460 Occurrence Handle1:CAS:528:DyaL3sXlsFCjs74%3D Occurrence Handle6628982
CAS PubMed Google Scholar
N Takahata (2001) Molecular phylogeny and demographic history of humans PV Tobias MA Taath J Moggi-Cecchi GA Doyle (Eds) Humanity from African naissance to coming millennia Firenze University Press Johannesburg 299–305
Google Scholar
N Takahata Y Satta (1997) ArticleTitleEvolution of the primate lineage leading to modern humans: Phylogenetic and demographic inferences from DNA sequences Proc Natl Acad Sci USA 94 4811–4815 Occurrence Handle10.1073/pnas.94.9.4811 Occurrence Handle1:CAS:528:DyaK2sXjtVyrs7c%3D Occurrence Handle9114074
Article CAS PubMed Google Scholar
N Takahata Y Satta (2002) Pre-speciation coalescence and the effective size of ancestral populations M Slatkin M Veuille (Eds) Modern developments in theoretical population genetics Oxford University Press New York 52–71
Google Scholar
N Takahata Y Satta J Klein (1995) ArticleTitleDivergence time and population size in the lineage leading to modern humans Theor Popul Biol 48 198–221 Occurrence Handle10.1006/tpbi.1995.1026 Occurrence Handle1:STN:280:BymD3MvktVc%3D Occurrence Handle7482371
Article CAS PubMed Google Scholar
S Tavare’ CR Marshall O Will C Soligo RD Martin (2002) ArticleTitleUsing the fossil record to estimate the age of the last common ancestor of extant primates Nature 416 726–729 Occurrence Handle10.1038/416726a Occurrence Handle1:CAS:528:DC%2BD38XjtVKks7c%3D Occurrence Handle11961552
Article CAS PubMed Google Scholar
JD Thompson DG Higgins TJ Gibson (1994) ArticleTitleCLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res 22 4673–4680 Occurrence Handle1:CAS:528:DyaK2MXitlSgu74%3D Occurrence Handle7984417
CAS PubMed Google Scholar
P Waddell D Penny (1996) Evolutionary trees of apes and humans from DNA sequences AJ Lock CR Peters (Eds) Handbook of human symbolic evolution Oxford University Press Oxford 53–73
Google Scholar
JD Wall (2003) ArticleTitleEstimating ancestral population sizes and divergence times Genetics 163 395–404 Occurrence Handle1:CAS:528:DC%2BD3sXhvFChsrc%3D Occurrence Handle12586724
CAS PubMed Google Scholar
C-I Wu (1991) ArticleTitleInferences of species phylogeny in relation to segregation of ancient polymorphisms Genetics 127 429–435 Occurrence Handle1:STN:280:By6C28bivFc%3D Occurrence Handle2004713
CAS PubMed Google Scholar
Z Yang (1997) ArticleTitleOn the estimation of ancestral population sizes of modern humans Genet Res Cambr 69 111–116 Occurrence Handle10.1017/S001667239700270X Occurrence Handle1:STN:280:DyaK1M3gslCntg%3D%3D
Article CAS Google Scholar
Z Yang (2002) ArticleTitleLikelihood and Bayesian estimation of ancestral population sizes in hominoids using data from multiple loci Genetics 162 1811–1823 Occurrence Handle12524351
PubMed Google Scholar

Download references

Acknowledgments

We thank N. Takahata for his critical reading of an early version of this paper. We also thank two anonymous reviewers for their helpful comments. This research was supported in part by Japan Society for Promotion of Science Grant 12304046 and in part by a MEXT summer program grant to M.H.

Author information

Authors and Affiliations

Department of Biosystems Science, Graduate University for Advanced Studies, Hayama, Kanagawa, 240-0193, Japan
Yoko Satta & Michael Hickerson
Department of Biology, Duke University, Durham, NC, 90338, USA
Michael Hickerson
Department of Bioinformatics and Genomics, Nara Institute of Science and Technology, Ikoma, Nara, 630-0101, Japan
Hidemi Watanabe
National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Colm O’hUigin
Abteilung Immungenetik, Max-Planck-Institut fuer Biologie, Corrensstrasse 42, D-72076, Tuebingen, Germany
Jan Klein

Authors

Yoko Satta
View author publications
You can also search for this author in PubMed Google Scholar
Michael Hickerson
View author publications
You can also search for this author in PubMed Google Scholar
Hidemi Watanabe
View author publications
You can also search for this author in PubMed Google Scholar
Colm O’hUigin
View author publications
You can also search for this author in PubMed Google Scholar
Jan Klein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yoko Satta.

Additional information

[Reviewing Editor: Dr. Magnus Nordborg]

Rights and permissions

Reprints and permissions

About this article

Cite this article

Satta, Y., Hickerson, M., Watanabe, H. et al. Ancestral Population Sizes and Species Divergence Times in the Primate Lineage on the Basis of Intron and BAC End Sequences. J Mol Evol 59, 478–487 (2004). https://doi.org/10.1007/s00239-004-2639-2

Download citation

Received: 05 October 2003
Accepted: 14 April 2004
Issue Date: October 2004
DOI: https://doi.org/10.1007/s00239-004-2639-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Ancestral Population Sizes and Species Divergence Times in the Primate Lineage on the Basis of Intron and BAC End Sequences

Abstract

Similar content being viewed by others

Dating ancient splits in phylogenetic trees, with application to the human-Neanderthal split

The effect of alignment uncertainty, substitution models and priors in building and dating the mammal tree of life

The spread of the first introns in proto-eukaryotic paralogs

Introduction