Introduction

Molecular genetic techniques, combined with rigorous statistical methods based on population genetic models that incorporate inherent stochasticity of nucleotide substitution processes or coalescence processes of genes in a population (Kingman 1982; Tajima 1983), allow us to answer questions regarding species divergence times and extents of ancestral polymorphism. Since orthologous genes from two species must have diverged before the divergence of the species, the divergence time of genes always exceeds that of the species. Hence, if we use the gene divergence as a representative of the species divergence and calculate the nucleotide substitution rate, we inevitably overestimate the rate. It is therefore important to determine to what extent the gene divergence time exceeds the species divergence time. The key factor is the effective size of the ancestral species, because the excess is determined by the coalescence process of ancestral lineages of orthologous genes. Several methods have been developed to estimate the ancestral population size together with the species divergence time (for review, see Edwards and Beerli 2000; Takahata and Satta 2002). These include the trichotomy method (Nei 1987; Wu 1991), the two-species maximum likelihood (ML) method (Takahata et al. 1995), and more generalized ML methods (Yang 1997, 2002; Rannala and Yang 2003; Wall 2003).

The trichotomy method uses genealogies of orthologous genes in three closely related species, such as humans, chimpanzees, and gorillas. Depending on the coalescence process of genes in the ancestral population of the two most closely related species, the genealogies may not be identical to the species phylogeny. The mean coalescence time [E(t)] is 2N generations for a pair of genes in the ancestral population with effective size N. If N is large compared to the time interval (T) between the two successive species divergences, the coalescence is likely to have taken place before the three species descended from the common ancestor. If this happens, the gene genealogy can differ from the species phylogeny in two-thirds of the cases so that the incompatibility probability is given by (2/3)e−T/(2N) (Nei 1987). When the incompatibility probability and T are at hand, we can estimate N. In practice, the incompatibility probability must be estimated by comparisons of gene genealogies at a large number of different loci. However, sampling errors as well as recombination and/or multiple hits of nucleotide substitutions may obscure a true gene genealogy. In addition, it is difficult to determine the time interval T accurately. These uncertainties make the N estimate by the trichotomy method suggestive at best.

The two-species ML method uses pairs of orthologous loci sampled from two species. The method divides the nucleotide divergences into two categories: the nucleotide divergences that have occurred before and after the speciation. If the time elapsed since speciation is ts years and the coalescence time in the ancestral population is t generations, the divergence time of a pair of orthologous loci is t s + tg years, where the generation time in the ancestral species is g years. If the nucleotide substitution rate per site per year is μ, the number (k i ) of substitutions at the ith locus accumulated in the t s + tg interval is (t s + tg) μL i , where L i stands for the number of nucleotides compared. For a large number of orthologous locus pairs, t s μ is constant over all pairs but tgμ differs from locus to locus according to the exponential distribution with both mean and standard deviation 2Ngμ (Takahata et al. 1995; Takahata and Satta 1997). Based on this principle, the most likely estimates of ts and t can be obtained to fit the variation of k i among different loci. In order for the method to yield accurate estimates, however, two conditions must be fulfilled: the μ remains constant among different loci and the sites in each pair of orthologous loci are not shuffled by intragenic recombination. In practice, unfortunately, these conditions are often not fulfilled. The heterogeneity of μ across loci enlarges the variance of k i , leading to an overestimation of the ancestral polymorphism [x = 2E(t) gμ = 4Ngμ] and an underestimation of the species divergence time (y = 2tsμ). On the other hand, recombination reduces the variance of k i and underestimates the ancestral polymorphism (Yang 1997, 2002; Takahata and Satta 2002; Wall 2003).

To satisfy the assumed constant rate of nucleotide substitutions among different loci, synonymous substitutions may be suitable (Kimura 1983) and be used in the two-species ML method (Takahata and Satta 1997; Takahata 2001). However, it turns out that the use of synonymous substitutions has raised several problems. The nucleotide diversity may vary among loci because of linkage to selected sites (Hartl and Clark 1997, pp. 184–185) or biased mutation pressure (Bielawski et al. 2000 and references therein). In addition, the number of synonymous sites at a locus is generally small and tends to be underestimated by frequently used methods if nucleotide substitutions are biased toward transitions (Nei and Kumar 2000, p. 57). To avoid these problems, we may use intron or intergenic sequences (Chen and Li 2001).

In the present study, we use intron sequences of 20 loci and make two-species ML estimates of the ancestral population size and species divergence time for pairs of six primate species, i.e., humans, chimpanzees, gorillas, orangutans, and representatives of Old World monkeys (OWMs) and New World monkeys (NWMs). For the human–chimpanzee pair, we also apply the two species ML method to exon sequences of 37 loci (Takahata 2001), 53 intergenic sequences (Chen and Li 2001), and a set of 58,156 human–chimpanzee pairs of BAC End Sequences (BES; Fujiyama et al. 2002).

Materials and Methods

Intron Sequences

The intron data set was taken from O’hUigin et al. (2002; Table 1), which contains noncoding sequences such as the 5′ or 3′ untranslated regions, promoter regions, and introns. Since functional constraints against transcription or translation regulation may operate on parts of the nonintron regions, we used only introns. In the case of genes containing more than one intron, we concatenated these intron sequences. There are 20 loci for which intron sequences are available in six primate species (or taxa): humans, chimpanzees, gorillas, orangutans, macaques or baboons, and tamarins or marmosets. Since OWMs and NWMs are monophyletic to each other and to other primates, sequences from macaques/baboons and tamarins/marmosets were used as a representative of OWMs and NWMs, respectively. For simplicity, humans, chimpanzees, gorillas, orangutans, OWMs, and NWMs are abbreviated H, C, G, O, M, and T, respectively, throughout the text. Although another large set of 53 hominoid intergenic sequences is also available (Chen and Li 2001), we did not use it for two reasons. First, the set lacks OWM and NWM sequences. The divergence time of OWMs and NWMs is still controversial (Pilbeam 1984; Martin 1993; Kumar and Hedges 1998; Goodman et al. 1998; Takahata 2001; Glazko and Nei 2003; Hasegawa et al. 2003) and its estimation is one of the aims of the present study. Second, the sequences we used are about 800 bp long on average. They are longer than in Chen and Li (2001) (∼500 bp on average) and therefore less prone to stochastic errors.

Table 1 The number of phylogenetically informative sites and the number of nucleotides in introns of 20 loci (sequences from O’hUigin et al. 2002)

Sequence Alignment and Phylogenetic Analysis

Sequences were aligned by the Clustal W program (Thompson et al. 1994) and the resulting alignments were modified manually. In the analysis, sites that include gaps were removed. For phylogenetic analysis, we used the neighbor-joining (NJ) method based on the number of nucleotide differences (the p-distances) as well as the maximum parsimony (MP) method implemented in PHYLIP 3.572 (Felsenstein 1993).

ML Method

The two-species ML method used here is essentially the same as that in Takahata and Satta (1997). One difference is that the present method implements multiple-hit corrections. The largest observed nucleotide difference among the six primates is about 10% at most, so that multiple-hit corrections were made by the Jukes and Cantor (1969) method (see also Nei and Kumar 2000, p. 23). The computer program is written in Mathematica (version 3.0; Wolfram Research Champaign IL) and is available on request.

Result

Substitution Rates of Intron Sequences

To examine whether the data set of 20 intron sequences (O’hUigin et al. 2002) is representative of the entire genome in terms of nucleotide divergences, we compared those of humans and chimpanzees with a collection of pairs of BES (Fujiyama et al. 2002). The collection consisted of 58,156 BES pairs, from which we chose 20 pairs at random. Repeating this subsampling 1000 times, we obtained the distributions of their mean and variance of nucleotide divergences (Fig. 1). The mean and variance in the 20 pairs of human and chimpanzee intron sequences are 0.0147 and 4.39 × 10−5, respectively, and both are within the 90% confidence regions of the mean and variance distributions for the BES random subsamples. We therefore concluded that the 20 intron sequences could be regarded as representatives of the human and chimpanzee genome.

Figure 1
figure 1

Distribution of means (A) and variances (B) of nucleotide divergences in 20 resampled BES data. The distribution was obtained by 1000 replications. The arrow shows the class which contains the mean and variance in the 20 intron sequences, respectively. On the ordinate and abscissa are plotted the frequency and the range of mean or variance in the resampled data, respectively.

In comparisons between M or T and hominoids (H, C, G, or O), rate heterogeneity of nucleotide substitutions is apparent. O’hUigin et al (2002) showed that 10%–20% of substituted sites have experienced multiple hits, even when the average nucleotide divergence is as low as 10%. Multiple substitutions often result in phylogenetically incompatible sites within a single gene or region, and in the intron sequence data set, several phylogenetically incompatible sites are observed (Table 1). If the extent of this incompatibility differs greatly from locus to locus, the cause might be attributed to rate heterogeneity of nucleotide substitutions among different loci. Therefore we counted the number of sites that experienced multiple substitutions by the maximum parsimony method, assuming the standard phylogenetic relationship among the six primate species, namely, ((((H,C,G)O)M)T). Incompatible sites among H, C, and G were ignored, because they have likely been generated by intragenic recombination (Satta et al. 2000; O’hUigin et al. 2002).

Interlocus rate heterogeneity was examined by the binomial distribution. Based on the average proportion of multiple hits over the 20 loci, the expected number (m i ) of multiple hits at the ith locus was calculated. We then obtained the probability (P i ) of having an equal or larger (smaller) number of multiple hits compared to the observation at the ith locus. Since the number of sites compared is large and m i is small, we used the Poisson approximation to calculate P i (Table 1). The result reveals that the insulin (INS) and γ-globin (HBG) introns show more frequent multiple substitutions than the expectation (Table 1; p < 0.05 and p < 0.001), suggesting that the nucleotide substitution rate at these loci is significantly higher than that at other loci.

Phylogenetic Relationships

NJ trees at 18 of 20 loci are topologically identical to the standard phylogenetic relationships of the six primates. The two exceptions are the β2 microglobulin (B2M) and complement 4B (C4B) loci. The B2M tree has no substitutions on a branch leading to a cluster of (H,C,G) and in the C4B tree the same thing happens on a branch leading to a cluster of (H,C,G,O). Examination of phylogeneticaly informative sites (Table 1) and MP analyses (data not shown) can confirm the absence of substitutions on these branches. However, since B2M and C4B do not show any significant shortages or excesses of the number of multiple hits (Table 1), it is unlikely that the unusual substitution patterns result from a slowdown or acceleration of the nucleotide substitution rate. Therefore we did not exclude these loci from the following ML analysis.

Intragenic Recombination Within Intron Sequences

To examine linkage between sites within a locus, we analyzed individual informative sites for their support of phylogenetic relationships among H, C, and G. The analysis reveals that four loci (ANP, TNF, DAF, and HBBP1) contain no informative sites with regard to the (H,C,G) relationships. Eleven loci contain informative sites that support one of the three possible relationships: F9, B2M, PAH, LCAT, BOP, and IL3 support the (H,C)G; APOA1, UOX, and C4B, the (C,G)H; and AFP and EPO, the (H,G)C. Finally, the remaining five loci (ODC1, GHR, INS, HBE, and HBG) show that some sites support one relationship, while others favor a different one even at a single locus (Table 1). For example, the HBG locus contains four phylogenetically informative sites, one of which supports the (H,C)G, another the (C,G)H, and two the (H,G)C relationship. In these five loci, intergenic recombination is, therefore, likely to have occurred in the ancestral population of the three species. Since these five loci include INS and HBG at which loci rate heterogeneity is apparent, we may exclude them from the two-species ML analysis.

The Two-Species ML Method

The ancestral population size (N) and species divergence time (t s ) are obtained in terms of x = 4Ngμ and y = 2t s μ in the two-species ML method. To check the reliability of these estimates for the 15 different pairs of the six primates, we divide these pairs into five classes with respect to shared ancestral populations. The classes are [(H,C)], [(H,G), (C,G)], [(H,O), (C,O), (G,O)], [(H,M), (C,M), (G,M), (O,M)], and [(H,T), (C,T), (G,T), (O,T), (M,T)]. We designate the ancestral populations of these classes HC, HCG, HCGO, HCGOM, and HCGOMT, respectively. By definition, members in each class share a common ancestral population immediately before their divergences. Thus, for example, the ancestral population of the (H,O) pair is also the ancestral population of the (G,O) and (C,O) pairs, and these three pairs are in turn all members of the same HCGO class. Because of the sharing of ancestral populations, the estimates of x and y must be the same for all the pairs in a given class, even though approximately.

Before excluding the five loci mentioned above, we applied the ML method to the entire data set of 20 intron sequences and estimated x and y for each of 15 pairs of species. The result reveals satisfactory consistency in estimates of y within each class and fairly large estimates of x, ranging from 0.42 to 1.5% (Table 2). With these as a reference, we applied the ML method to the trimmed data set, which excludes ODC1, GHR, INS, HBE, and HBG because of high nucleotide substitution rates or intragenic recombination (Table 1). The ML estimates of y are in good agreement with those for the entire data set. Because t s remains constant among different loci, the y estimates are not much affected by trimming the data set. However, the x estimates become substantially small (Table 2).

Table 2 ML estimates (%) of x = 4Ngμc and y = 2tsμc based on the entire or the truncated data setb of intron sequences and exon sequences

Discussion

Comparison Between Intron-Based and Exon-Based x and y Estimates

We compared the present estimates with the previous ML estimates based on exon sequences (Table 2). It is interesting that the x estimates based on exon sequences are close to those on the entire data set of intron sequences, suggesting that exon data still contain heterogeneous sequences regarding the nucleotide substitution rate or intragenic recombination. On the other hand, the y estimates based on exons are much larger than those on introns. Considering that the y estimates are not much affected by the exclusion of outrageous sequences (Table 2), the relatively large y estimates based on the exon sequences are caused by an overestimation of synonymous divergences, but not by rate heterogeneity of nucleotide substitutions.

The x estimates are generally smaller in the trimmed data set than in the entire data set of 20 intron sequences as well as in the exon data set. In particular, the x estimate for the HCGO class is consistently much smaller than that of any other (Table 2) and is as small as that of extant humans (ca. 0.1%). Although further accumulation of intron sequences is necessary, this may suggest that the primate lineage has experienced a reduction of the population size when Asian apes diverged from African apes (see later).

Patterns of Nucleotide Substitutions inHuman–Chimpanzee Comparisons

We examined whether or not the variation of the nucleotide divergences observed in human–chimpanzee comparisons can be explained by factors other than a relatively large ancestral population size. Specifically we focused on the effect of a limited number of sites compared and different substitution rates among different loci. To evaluate the effect, we performed a computer simulation that imitates the human and chimpanzee BES data.

We consider three nucleotide substitution models. The first focuses on the variation of the nucleotide divergence caused by a limited number of sites compared at individual BES loci. We use a constant nucleotide substitution rate and assume that the coalescence time in the ancestral population is negligibly small compared to the species divergence time. Using the observed mean nucleotide divergence per site (dHC) over 58,158 BES loci, the expected number of nucleotide substitutions at the ith BES locus is estimated as dHCL i , where L i is the number of nucleotides compared. Setting dHCL i as a Poisson parameter, we generate a Poisson random variable (k i ) for the ith locus and calculate the number of nucleotide substitutions per site as k i /L i . Repeating this process 58,158 times, we obtain the distribution of k i /L i (blue line in Fig. 2).

Figure 2
figure 2

Per-site difference distribution of BES data and of data obtained by simulations (generating random variables) under three models. The number of sites for each locus is the same as for BES. Bars indicate the observed results and lines represent the results of simulations, respectively. The blue line represents the Poisson distribution; the red line, the geometric + Poisson distribution; and the green line, the negative binomial distribution.

The second model is based on the negative binomial distribution, and, as in the first, we ignore the presence of ancestral polymorphism. The variation of nucleotide divergences is then attributed mainly to the variation in the substitution rate among different loci (Yang 2002). Following equation (5.14) in Takahata and Satta (2002), we estimate the shape parameter α (α = 5.82) of the gamma distribution of the nucleotide substitution rate and calculate the mean substitution rate (r) from dHC = 2rt s , assuming that t s  = 6 × 106 years ( Brunei et al. 2002). We then generate a gamma variable γ i for the substitution rate at the ith locus and determine the number of nucleotide substitutions (k i ) by following the Poisson distribution with mean 2t s L i γ i . This procedure is equivalent to generating a random variable that follows the negative binomial distribution. Again repeating this process 58,156 times, we obtain the distribution of k i /L i (green line in Fig. 2).

The third model is based on the convolution of the geometric and Poisson distributions, as derived in Takahata et al. (1995), and takes explicit account of ancestral polymorphism. Some extent of the variation in the number of nucleotide substitutions can be attributed to the variation in coalescence times in the ancestral population. To simulate this model, we first estimate xHC and yHC from the 58,156 BES data. Since yHC = 2t s μ, we generate a Poisson variable with mean yHCL i for the number of nucleotide substitutions at the ith locus that can accumulate after the species divergence. We also generate a random variable that is geometrically distributed with mean xHCL i for the number of nucleotide substitutions during the phase of ancestral polymorphism. Dividing the sum of these Poisson and geometric random numbers by L i , we obtain a per-site random variable (x i + y i ) for each of 58,156 loci and plot the distribution (red line in Fig. 2).

As expected, the mean of sequence divergences in each of the above three models is the same as the observation (0.0124). However, the variance varies depending on models. Whereas the variance in the third convolution model (7.54 × 10−5) is in good agreement with the observation (7.55 × 10−5), the variance in both the Poisson and the negative binomial models (4.22 × 10−5 and 6.85 × 10−5) is somewhat small. In fact, the Kolmogorov–Smilnov test (Sokal and Rohlf 1969, pp. 704–721) reveals that the first and second models do not fit the observation (p < 0.01 for each case). We therefore conclude that the distribution of sequence divergences best fits the observation of the BES data set under the convolution model (Fig. 2; p > 0.05). We also find that at least for humans and chimpanzees, the variation in nucleotide divergences among loci does not appear to be much affected by heterogeneity in nucleotide substitution rates.

Human–Chimpanzee Ancestral Population Size

There are four data sets of nucleotide sequences, which can be used for the ML estimation of the ancestral human and chimpanzee population size (Chen and Li 2001; Takahata 2001; O’hUigin et al. 2002; Fujiyama et al. 2002). They are 53 intergenic regions, 37 exonic regions, 15 introns, and 58,156 BES, respectively. Of these, the ML estimate from the BES data seems the most reliable because of an exceptionally large number of loci examined (Fig. 3). To evaluate the effect of the number of loci on our ML estimates, we resample 20, 50, or 100 loci from the BES data and examine the estimates of x and y based on 1000 such replications. The estimates obtained for the entire BES data are x = 0.51% and y = 0.73%. However, as the number of loci becomes small, the range of both x and y estimates becomes broad. Even for 100 loci and under the condition of 95% confidence limits, the x estimate ranges from 0.25 to 0.76% and the y estimate ranges from 0.59 to 0.99% (data not shown). The 90% confidence region of x and y for BES is extremely small compared with that for other data sets (Fig. 3).

Figure 3
figure 3

Contour plots of the log likelihood function of four data sets which compare humans with chimpanzee nucleotide sequences: (A) 15 intron sequences, (B) 35 exon sequences, (C) 53 intergenic sequences, and (D) 58,156 BES. The abscissa and the ordinate give the range of the estimate of x = 4Ngμ and y = 2t s μ, respectively. The ML estimates for A–D are given in Table 3. The innermost area in A, B, and C represents the 90% confidence region of the x and y estimates. In D, the innermost area shows the 99.9% confidence region.

Table 3 Estimates of the extent of polymorphism (x = 4Ngμ) or effective size (N) in the ancestral population and of species divergence (y = 2tsμ) for humans and chimpanzees by the maximum likelihood (ML) method

It may be noted that the x estimate for the intergenic sequences in Chen and Li (2001) is quite small (Table 3). To make a quantitative assessment, we calculate the mean and variance of nucleotide substitutions over the 53 loci and compare them with those in 1000 replications of 53 resampled BES data sets. The mean of Chen and Li’s data set is 1.23%, which is in good agreement with the 1.24% for the resampled BES data. However, the variance of Chen and Li’s data is only 3.01 × 10−5, which is significantly smaller (p < 0.01) than that of the BES data (7.55 × 10−5). Thus, although cause is unknown, Chen and Li’s data show an unexpected uniformity in the extent of nucleotide substitutions between humans and chimpanzees.

Yang (2002) developed a method for estimating an ancestral population size using ML and Bayesian approaches. Taking into consideration different substitution rates among different loci, he applied these approaches to Chen and Li’s data and obtained x = 0.1%, which is almost the same as that for extant humans (Li and Sadler 1991). However, if the ancestral population size were the same as the extant human population size (104), most pairs of H and C orthologous genes should have coalesced within the ancestral population. Under the assumption of the ancestral population size of 104 individuals and the interval of T = 1 myr between the human–chimpanzee divergence and the (human–chimpanzee)–gorilla divergence, the proportion of discordance between the species and the gene tree becomes 0.1% from the trichotomy method (Nei 1987). In other words, 99.9% of the data should have supported the (H,C)G relationship, but in fact only 42% do (Chen and Li 2001). The small estimated value of x is not owing to the methodology since the simple two-species ML method also gives x = 0.099% (Table 3). Since the small estimate of x cannot be achieved by taking heterogeneity of nucleotide substitution rates, it must result from an unusual small variance in the number of substitutions.

There still remain differences among the ML estimates of x and y in other data sets (Table 3). Nonetheless, we can draw two conclusions. First, except for the estimate by Takahata (2001), which appears to be affected by overestimation of synonymous divergences, the y estimate for chimpanzees and humans ranges only from 0.73 to 1.04% (Table 3). Assuming the divergence time of 6 myr between the two species ( Brunei et al. 2002), we estimate the nucleotide substitution rate as 0.6–0.8 × 10−9 per site per year. This rate is lower than generally accepted (cf. Li 1997). Second, the x estimate ranges from 0.33 to 0.51%. These values are four to five times larger than the estimate of the extant human population (Li and Sadler 1991). If we further take account of a prolonged generation time of extant humans, the effective size of the ancestral human–chimpanzee population must have been approximately 10 times larger than 104 for extant humans (Takahata and Satta 1997; Takahata 2001).

Demographic History of Primate Populations During the Last 50 myr

Discrepancies between molecular and paleontological estimates of primate divergence time have been pointed out recently (Martin 1993; Tavaré et al. 2002), and a new statistical approach pushes the last common ancestor of primates back as old as 81.5 myr ago. Martin (1993) suggested that the divergence time of major nodes in the primate phylogeny was pushed back at least 10 myr, and our results support this view. If we assume that the divergence time between humans and chimpanzees is 6 myr (Burnet et al. 2002), our ML estimates of y (Table 2) suggest that the divergence times of the major nodes in the primate phylogeny become 7.2 myr for (H,C)G, 18 myr for (H,C,G)O, 34 myr for (H,C,G,O)M, and 65 myr for (H,C,G,O,M)T. These divergence times are older than that indicated by fossil records.

Recently, there are several molecular approaches to estimate the divergence time of primate species (Kumar and Hedges 1998; Glazko and Nei 2003; Hasegawa et al. 2003). When we compare our results with these estimates, our estimate of the divergence time of gorillas from humans (7.2 myr) shows good agreement with others (ranging 7 to 12 myr). Similarly, our estimate of the divergence time of humans from orangutan (18 myr) appears to be in the range of others (ranging 8 to 18 myr). In addition, this relatively old divergence time of orangutans is consistent with the time when the African continent became combined with Eurasia some 18 myr ago (Waddell and Penny 1996). However, regarding more ancient divergences, there are large discrepancies among various estimates. For instance, Kumar and Hedges (1998) estimated the divergence time of OWMs from humans as 21–24 myr and Glazko and Nei (2003) obtained a similar estimate (21–25 myr). On the other hand, Hasegawa et al. (2003) estimated the divergence to be as old as 31–38 myr. Our estimate is consistent with the latter. Furthermore, Kumar and Hedges (1998) and Glazko and Nei (2003) estimated the date of NWM divergence as 39–56 and 32–36 myr, respectively. On the other hand, our estimate was much older (65 myr). Although this discrepancy may come from different data and methods used, it is evident that more studies for the primate phylogeny are necessary, especially to reach a consensus about the divergence time.

To convert the amount of ancestral polymorphism (measured by x = 4Ngμ) into the effective size (N) of the ancestral population, information on the generation time in that population is required. Although there are uncertainties about the generation time of nonhuman primates, it is shorter than the generation time of extant humans (Gavan 1953). Under this assumption, the estimated values of x suggest that the ancestral population size has been of the order of 105 throughout most of primate evolution, although there might be an occasional reduction as discussed earlier. It also appears that such a large size of the ancestral population of humans, chimpanzees, and gorillas is consistent with the high extent of DNA polymorphism in extant nonhuman primates (Kaessmann et al. 1999, 2001; Satta 2001).