Introduction

The carboxyl/cholinesterases are a particularly variable gene family, with very high rates of molecular evolution and low amino-acid identities among its members (Cygler et al. 1992; Hemila et al. 1994; Oakeshott et al. 1999, 2005). Rapid sequence change has resulted in the diversification of carboxyl/cholinesterases both in sequence identity and biochemical function in correlation with ecological and behavioral changes in their host species (Oakeshott et al. 1999, 2005).

Two clusters of carboxyl/cholinesterases genes have been described in detail in Drosophila spp. These are the α- and β-esterase clusters, denominations that generally reflect the ability of the encoded enzymes to hydrolyze α- and β-naphthyl-acetate, respectively. The α-esterase cluster has been characterized in D. melanogaster (Robin et al. 1996) and in the cactophilic D. buzzatii (Robin et al. 2000). In the latter, the cluster is comprised of 11 genes encoding functional esterases, 8 of which are orthologous with genes of the α-cluster of D. melanogaster. These results indicate that most duplication events involved in the origin of this family occurred previous to the divergence of the subgenera Sophophora and Drosophila (to which D. melanogaster and D. buzzatii, respectively, belong), that is, <40 million years ago (Russo et al. 1995).

The EST-2 (esterase-2) protein in D. buzzatii is the major α-napthyl acetate–staining esterase on starch gels. Like the majority of esterase enzymes, the natural substrate and biological function of EST-2 are unknown. The protein is expressed principally in the alimentary tract, midgut, and malpighian tubules (East 1982). Biochemical and physiological analyses have indicated that it belongs to the carboxyl/cholinesterases multigene family, and it has been suggested that it is orthologous to EST-9 in D. melanogaster (East et al. 1990; Oakeshott et al. 1993). Recently, Campbell et al. (2003) used proteomics and mass spectrophotometry to show that EST-9 is encoded by the αE5 gene in the α-esterase cluster, and Mascord (2004) has used a genetic approach to show that EST-2 is encoded by the αE5 ortholog in D. buzzatii. Mascord (2004) obtained the full sequence of the αE5 gene from several D. buzzatii lines selected to cover EST-2 starch mobility alleles and showed that the surface charges of the predicted αE5 proteins explained the relative mobilities of the EST-2 allozymes and that an EST-2 null line had a frame-shifting deletion in the αE5 gene.

The α-esterase cluster has been mapped to the second chromosome of D. buzzatii between cytological bands F5e and F6a, a region close to, but outside of, the proximal break points of the two more common inversions, 2j and 2z 3 (Ranz et al. 2003). The 2j arrangement is a simple paracentric inversion of the D. buzzatii standard sequence (2st), whereas the 2z 3 inversion occurred on a 2j chromosome and gave rise to the 2jz 3 arrangement, which includes the segment inverted by 2j. The proximity of the α-esterase cluster to the break points of naturally occurring inversions probably explains, at least in part, the linkage disequilibrium of EST-2 variants with second-chromosome arrangements (Knibb et al. 1987; Betrán et al. 1995; Rodríguez et al. 2001).

The population genetics of allozyme variation of EST-2 in D. buzzatii has been examined in both native populations from Argentina and recently colonized areas of Australia and the Mediterranean Basin. Seven starch gel electromorphs (EST-2a, EST-2a−, EST-2b, EST-2c, EST-2c+, EST-2d, and EST-2e) have been identified in Argentinian populations, with all except EST-2a− detected also in Australia (J. S. F. Barker, 1974–1996, unpublished data). Furthermore, using sequential cellulose acetate gel electrophoresis, a more discriminating electrophoretic technique, the number of alleles that could be distinguished in two Australian populations was extended from 6 to 25 (Barker 1994).

Only one study into the population genetics of the variants detected by sequential cellulose acetate electrophoresis has been conducted, and this could not fully discriminate between natural selection and drift as factors accounting for the differences observed between populations (Barker 1994). However, several studies on the population genetics of the major starch gel variants have indicated that this protein may be the target of natural selection. Some of these major EST-2 variants show parallel latitudinal clines in Australia and Argentina, which are largely independent of inversion frequencies (Knibb and Barker 1988; Rodríguez et al. 2000). Populations are genetically structured for these variants (Sokal et al. 1987), and differentiation is correlated with ecologically relevant variables, particularly with the diversity of phytogeography and cactus in Argentina (Rodríguez et al. 2000). Other lines of evidence include responses to genetic perturbation experiments in natural populations (Barker and East 1980; Barker et al. 1989) and to thermal shocks (Watt 1981). Furthermore, the starch gel variation seems to be associated with differential attraction to and viability on diverse cactus hosts (Fernández-Iriarte et al. 2002) and with oviposition site preferences for different yeast species (Barker et al. 1981, 1986; Barker and Starmer 1999).

The recent confirmation that EST-2 in D. buzzatii is orthologous to EST-9 in D. melanogaster and encoded by αE5 presents an opportunity to examine the evolutionary forces that have shaped the EST-2 variation at a molecular level. We report the results of a comparative study of αE5 nucleotide variation in populations of D. buzzatii and its sibling species D. koepferae. Specifically, we compared nucleotide variability in original and colonized populations of D. buzzatii, to search for signs of a founder effect, and between D. buzzatii and D. koepferae. We also analyzed levels of nucleotide variation within and between the main chromosomal arrangements segregating in natural populations of D. buzzatii and within and between EST-2 electrophoretic alleles. Finally, we searched for evidence of population structure at this locus and investigated if αE5 nucleotide variation is compatible with neutral expectations.

Materials and Methods

Drosophila Lines

Eighty-five D. buzzatii lines isogenic for the second chromosome and derived from wild flies collected in Argentina (n = 39) and Australia (n = 46) were used in the present study. Samples were collected in (1) 11 Argentinian localities spanning 5 phytogeographic regions in autumn 1995, spring 1996, and autumn 1997, and in (2) 2 Australian localities in summer 1984 and spring 1985 (Supplementary Table 1). Isogenic flies were obtained by crossing wild male flies individually with virgin female flies of the balancer stock Antp/Δ5 as described in Barker (1994) and Rodríguez et al. (2000). Each line was characterized cytologically and electrophoretically to ascertain the inversion karyotype and the EST-2 genotype, and specimens were stored at −80°C until DNA extraction. The lines from Argentina were a representative sample of the 2st, 2j, and 2jz 3 inversions, and their frequencies were not significantly different (χ2 = 0.09; p = 0.96) from global estimates in previous surveys (Rodríguez et al. 2000). The isogenic lines from Australia were randomly selected from the 147 lines characterized by Barker (1994) for additional allozyme variation at EST-2 using sequential gel electrophoresis. Equal numbers of the 2st and 2j inversions for each site were selected at random, except that only 10 sequences from the 2st lines from the Trinkey population were obtained. A constructed random sample (CRS; Hudson et al. 1994) was also analyzed for comparative purposes.

Twenty D. koepferae lines derived from flies collected in autumn 1997 and 1999 in eight localities of Argentina were also analyzed (Supplementary Table 1). Wild flies were captured by means of net sweeping on fermented banana baits, transported to the laboratory, sexed, and female flies placed in individual vials. After four generations of sib mating, progeny individuals of each isofemale line were frozen and stored at −80°C until DNA extraction. Neither the inversion karyotype nor the EST-2 genotype was known for these flies.

DNA Sequencing

Genomic DNA was obtained from individual flies using a Puregene kit (GENTRA Systems) according to the manufacturer’s protocol. αE5 fragments of 1.9 or 1.6 kb were polymerase chain reaction (PCR) amplified using primers 151+ (sense 5′CCGCAGCCTT TACGATGATG3′) and 2082– (antisense 5′AGACTATCCCAGA CGAGCAG3′) or primers 5.5b (sense 5′CTAACGGGCATCAT AGAAGGCAG3′) and 5.6b (antisense 5′GCAGTTGGGATTGG AAGTAGCTG3′), respectively. If the yield of amplifications was high, two 50-μL reactions were run in a 0.8% agarose gel and the DNA band purified using Qiaquick spin columns (Qiagen). If the yield of the first PCR was not as high, a smaller nested fragment was reamplified with primers 337+ (sense 5′GGACTGCCTTTAT CTGAATG3′) and 1998– (antisense 5′TGCTCCTGCTTCGCTG GCTG3′), and this PCR product was purified in spin columns.

All PCR reactions were carried out in final volumes of 50 μl containing 8 μl dNTPs (1.25 mM each), 5 μl 10x reaction buffer, 1 μl MgCl2 (1mM), 0.2 μl 5U/μl Taq DNA polymerase (Invitrogen), 90 ng sense and antisense primers, and 50 to 100 ng DNA. Amplifications were performed according to the following PCR profile: one cycle of 1 minute at 94°C; 35 cycles of denaturation (1 minute at 94°C), annealing (1 minute at 60°C), and extension (1.5 minutes at 72°C); and a final extension step of 3.5 minutes at 72°C. Purified DNAs were used as templates for direct sequencing of both strands using primers 5.21 (sense 5′GACCCCAGCTGTGA GGTG3′), 5.11 (antisense 5′GTGGGAGCCGTGGCGTATG3′), and 1866– (antisense 5′TTCCACGACTCCGAGTTACG3′) with the ABI PRISM BigDye Terminator Cycle Sequencing Ready Reaction Kit, version 2.0 (Perkin Elmer) in a Perkin Elmer ABI PRISM 3700 DNA Analyzer Automated Sequencer. This sequencing strategy allowed us to obtain the nucleotide sequence between positions 684 and 1486 of the published αE5 sequence (Robin et al. 2000; GenBank accession no. AF216213). This fragment covers 45% of the coding region of the gene and includes part of the third and fourth exons (699 and 39 bp, respectively) and the complete fourth intron (64 bp). This region was selected for sequencing because it showed high sequence diversity among Australian allozyme alleles, and nucleotide differences in αE5 alleles identified in this region distinguished particular EST-2 sequential gel electromorphs (Mascord 2004).

In D. koepferae, unlike D. buzzatii, individuals were not isogenic, and several heterozygous sites were detected per individual. A nucleotide position was classified as heterozygous if two overlapping signals were consistently observed at the same site in the chromatograms of two or three sequencing reactions. When the analytical software allowed it, data were treated as genotypic with unknown gametic phase in further analyses (Arlequin 2.001; Schneider et al. 2000). Otherwise, two sequences per individual were created by distributing the segregating sites at random between each of them (DnaSP 3.97; Rozas and Rozas 1999).

Partial and complete sequences of each line were aligned manually or using Clustal X (Higgins et al. 1994). Newly reported sequences have been deposited at GenBank under accession nos. DQ192063 to DQ192108 and DQ204622 to DQ204680.

Data Analyses

Estimates of nucleotide diversity were obtained using Watterson’s (θ W ) and Tajima’s (π) estimators, based on the number of segregating sites and the average number of pairwise differences per site, respectively (Watterson 1975; Tajima 1989). Interspecific divergence was calculated according to the method of Jukes and Cantor (1969).

The population recombination parameter was estimated using the method of Hudson and Kaplan (1985), based on the minimum number of recombination events in a sample (R m ). Nucleotide differentiation among inversion arrangements in D. buzzatii for the αE5 gene region was analyzed by means of K ST* (Hudson et al. 1992), and S nn (Hudson 2000) statistics. Significant associations between electrophoretic EST-2 variants and αE5 haplotypes were tested using a linkage disequilibrium analysis based on D coefficients (Lewontin and Kojima 1960). Genealogical relationships between haplotypes were estimated as a haplotype network, based on the algorithm proposed by Templeton et al. (1992). Population structure was investigated by means of analysis of molecular variance (AMOVA; Excoffier et al. 1992) for total, nonsynonymous, synonymous, and silent variations. For these analyses, sequences of Argentinian D. buzzatii and D. koepferae were arranged in two hierarchical levels, the first according to the collection site and the second to the phytogeographic region in which each site is located (Supplementary Table 1). For the Australian sample, sequences were assembled according to collection sites only.

The tests of Tajima (1989), Fu and Li (1993), and Ramos-Onsins and Rozas (2002) were applied to determine whether the frequency spectrum of polymorphic variants departs from neutral expectations. We also employed the McDonald and Kreitman (1991) (MK) test to examine the prediction of the neutral theory of a correlation between polymorphism and divergence for the ratio of synonymous to nonsynonymous variation. The neutrality index (NI) proposed by Rand and Kann (1996) was used as a qualitative indicator of the degree and direction of departures from neutrality in the ratio of polymorphism to divergence. The index is defined as NI = (P r /F r )/(P s /F s ), where P r and P s are the number of replacement and synonymous polymorphic sites, respectively, and F r and F s are the number of replacement and synonymous fixed differences between species, respectively. Under strict neutrality, NI is expected to be equal to 1. A value >1 indicates an excess of amino-acid variation within species, whereas a value <1 indicates an excess of fixed amino-acid differences between species (Rand & Kann 1996).

All tests that required an outgroup were performed using the other species (i.e., D. koepferae for D. buzzatii and vice versa) or D. borborema (GenBank accession no. DQ453800), another member of the D. buzzatii complex (Ruiz and Wasserman 1993; Manfrin et al. 2001).

DnaSP 3.97 (Rozas & Rozas 1999) was used for all statistical analysis, except for AMOVAs and linkage disequilibrium analyses, which are subroutines of the program Arlequin 2.001 (Schneider et al. 2000). The haplotype network was built with TCS 1.21 (Clement et al. 2000).

The significance of the AMOVAs, K ST* and S nn tests were obtained by means of 10000 permutations of haplotypes (or genotypes) between chromosomes, alleles, populations or regions. The statistical significance of neutrality tests was calculated by 1000 coalescent simulations based on a Monte Carlo process (Hudson 1990), assuming the most conservative option of no recombination.

Results

Nucleotide Variation Within and Between Species

The fragment of the αE5 gene region sequenced in this study was polymorphic at 29 and 25 sites in the Argentinian and Australian samples of D. buzzatii, respectively, and at 34 sites in the D. koepferae samples (Table 1; Supplementary Figs. 1 and 2). No shared polymorphisms were detected between the two species.

Table 1 Estimates of nucleotide heterozygosity in D. buzzatii and D. koepferae for all, synonymous, nonsynonymous and noncoding sites in the αE5 gene region

A 28-bp deletion between sites 337 and 364 was detected in the D. buzzatii line Tir-17. Although this deletion results in a frame-shift mutation, it did not appear to be lethal because several adult individuals homozygous for this variant were recovered by means of the balancer stock Antp/Δ5 (data not shown). Although this indicates that EST-2 function is not critical for survival in the laboratory, we note that null alleles only occur in natural populations at low frequencies (0.01% in Australia [Barker 1994] and 0.09% in Argentina [Rodriguez et al. 2001]), suggesting some selective value for EST-2 function in the field environment. This line was excluded from all analyses. We also detected a fixed indel difference of 2 bp (GC) between D. koepferae and D. buzzatii at intron position 725.

Argentinian and Australian D. buzzatii had 18 polymorphic sites in common (Supplementary Figure 1), which means that >60% of the polymorphic variants in the original area were also present in the founder population that reached Australia. Remarkably, the nine parsimony-informative sites detected in Argentina are all present in Australia. These results are in agreement with the predictions of population genetic theory that alleles at lower frequencies are more likely to be lost during a founder event (Hedrick 1983). Alternative hypotheses for these findings involve recent gene flow or recurrent mutation.

Levels of nucleotide heterozygosity estimated by means of θ w and π are listed in Table 1. Watterson’s estimate of nucleotide heterozygosity (θ w ) indicates that total polymorphism was only slightly higher in D. koepferae than in D. buzzatii. However, when the average number of pairwise differences per site (π) was considered, this region of αE5 in D. koepferae showed a level of polymorphism >1.5 times higher than in D. buzzatii. This interspecific difference detected in αE5 is different from the patterns observed in both the Xdh and Est-A gene regions, where D. buzzatii was twice as polymorphic as D. koepferae (Gómez and Hasson 2003; Piccinali et al. 2004). The interspecific difference in αE5 was even more striking when nonsynonymous and synonymous variation were considered separately. The percentage of nonsynonymous polymorphic sites was high in both Argentinian (48%) and Australian D. buzzatii (64%), whereas the percentage of nonsynonymous polymorphisms was much lower in D. koepferae (26%) (Suppplementary Figs. 1 and 2). This species, in contrast, was more polymorphic at silent sites (11%, synonymous and noncoding) than D. buzzatii (6%).

Argentinian D. buzzatii exhibited a high proportion of singleton variants (close to 70%) in the αE5 gene (Table 1), similar to the values reported for the Xdh locus in Argentinian D. buzzatii (Piccinali et al. 2004). Conversely, unique variants were noticeably lower in Australian D. buzzatii (24% of total variation; 25% synonymous and 13% nonsynonymous sites) and in D. koepferae (29%, 33%, and 25%, respectively). Results based on the Australian CRS were similar to those obtained for the complete Australian data set.

Estimates of the recombination parameter according to R M were equal to 4/gene and 0.0050/site in the total sample, whereas the values for both continents analyzed separately were exactly the same and equal to 3/gene and 0.0037/site.

Haplotype diversity in the D. buzzatii dataset was quite similar in Argentina (0.936) and Australia (0.926), but haplotype composition was very different in the two continents. Only 7 of the 34 haplotypes detected were present in both samples (Supplementary Figure 1). The Argentinian D. buzzatii sample contained 23 haplotypes, with 1 major and widely distributed haplotype (haplotype 10; frequency 21%), recorded from the Northwest (lines Til-3, Quil-7, and Quil-19) and the Central West (lines Chu-2, Chu-20, Chu-38) to the southernmost limit of the species (lines Ota-3 and Ota-4). It was also detected in the two major chromosomal arrangements and in 2jz 3 (Supplementary Figure 1). Remarkably, it appeared to be absent in Australia. Conversely, the Australian sample, which contained 18 haplotypes, included 2 haplotypes with moderate frequencies (haplotypes 24 and 30, each one with frequencies of 17%) that were not present in Argentina.

Quite apart from the differences in D. buzzatii haplotypes between continents, the high frequencies of the nominated haplotypes in each continent are themselves inconsistent with neutral expectations. To show this, we used coalescent simulations (Hudson et al. 1994) to calculate for each continent the likelihood of having a subset of n sequences with a haplotype diversity equal to zero, given the observed mean heterozygosity in the total sample for that continent. The observed values of haplotypic diversity were much lower than those realized in the coalescent simulations in both Argentina and Australia (p< 0.0001). Thus, the presence of such haplotypes in relatively high frequencies in the respective continents is not compatible with neutral expectations.

Sequence divergence between D. buzzatii and D. koepferae was 4.7% for all sites, 1.5% for nonsynonymous sites, and 15.3% for synonymous sites. These estimates were similar to the divergence between D. koepferae and D. borborema (4.5%, 1%, and 16.6% for total, nonsynonymous, and synonymous sites, respectively). However, divergence between D. buzzatii and D. borborema was slightly higher for all sources of variation (6%, 2%, and 23%). These results concur with previous work in finding that phylogenetic relationships among certain species of the Drosophila buzzatii complex are ambiguous. Some studies suggest that D. buzzatii and D. koepferae are sister groups (Durando et al. 2000; Manfrin et al. 2001) whereas others support a closer relationship between D. koepferae and D. borborema (Rodríguez-Trelles et al. 2000).

Differentiation Among EST-2 Electrophoretic Variants

Our samples contained six EST-2 starch gel variants in the Argentinian sequences (EST-2a, EST-2a−, EST-2b, EST-2c, EST-2c+, and EST-2d) and six in the Australian sequences (EST-2a, EST-2b, EST-2c, EST-2c+, EST-2cf, and EST-2d). This spectrum is broadly similar to previous findings (Knibb et al. 1987; Rodríguez et. al 2000), except that this sampling did not include the relatively uncommon EST-2e variant but did include EST-2cf, another rare variant, that shows faint staining around the position of the EST-2c on starch gels (Knibb et al. 1987).

Six significant associations existed between haplotypes and starch gel electrophoretic variants in Argentina, and only two of these involved >1 individual (EST-2c with haplotype 3 and EST-2a with haplotype 9; Table 2). In Australia, 12 significant associations existed, of which 9 involved at least 2 individuals (Table 2). Most of the haplotypes involved in significant associations were only associated with a single electrophoretic variant in each sample, although haplotype 16 was found in both EST-2c and EST-2cf in Australia, and the only associations that were the same in the two samples were those involving EST-2d. The most common Argentinian haplotype (10) was not significantly associated with any particular electrophoretic variant, but the two high-frequency Australian haplotypes (24 and 30) were associated exclusively with EST-2a.

Table 2 Linkage disequilibrium between EST-2 allozymes and αE5 haplotypesa

The haplotype network (Fig. 1) shows the genealogical relationships within and between EST-2 electrophoretic classes. Argentinian and Australian haplotypes were mixed throughout the network, showing that some Australian haplotypes are closer to Argentinian variants than to other Australian alleles, as expected because of the recent colonization of Australia by D. buzzatii. Sequences associated with EST-2a, EST-2a− or EST-2b were scattered widely through the network, but haplotypes associated with EST-2c, EST-2cf, EST-2c+ and EST-2d were more clustered.

Fig. 1
figure 1

Haplotype network for D. buzzatii αE5 sequences. The network was built with the algorithm of Templeton et al. (1992) as implemented in TCS 1.21 (Clement et al. 2000). Each branch represents a mutational step. Small empty circles represent hypothetical haplotypes. The sizes of circles enclosing observed haplotypes are proportional to their frequencies in the whole sample. Blue color: Argentinian haplotypes. Red color: Australian haplotypes. Haplotype 1 was excluded because the number of mutational steps separating it from the closest neighbor was too high, compromising the confidence limits of 95% for the statistical parsimony analysis (Templeton et al. 1992). The Australian variant EST-2c+ has an unusual triple-banding pattern around the position of the EST-2c variant on starch gels, whereas the rare variant EST-2cf shows faint staining at the EST-2c position.

Differentiation Among Chromosomal Arrangements in D. buzzatii

Levels of nucleotide variation within the two major arrangements 2st and 2j were very similar (2st in Australia: θ w = 0.0062 ± 0.0025, π = 0.0064 ± 0.0033, 2j in Australia: θ w = 0.0080 ± 0.0030, π = 0.0064 ± 0.0032, 2st in Argentina: θ w = 0.0076 ± 0.0033, π = 0.0055 ± 0.0029, and 2j in Argentina: θ w = 0.0067 ± 0.0027, π = 0.0054 ± 0.0028), whereas the sample of 2jz 3 chromosomes was completely monomorphic. Of the total of 38 variable sites in the 2 continents, 9 and 8 were exclusive to 2st and 2j, respectively, and the remaining 21 were found in both (Supplementary Table 1).

We can use the calculations of Innan and Tajima (1997) to show that these results for 2st and 2j are in agreement with theoretical expectations. The authors estimated the expected number of pairwise differences within and between two allelic classes when the ancestral allelic class is known and found that the amount of variability within each allelic class changes with their frequencies. The expected level of nucleotide variation within 2st and 2j in each continent can be obtained from the values reported in Table 2 of Innan and Tajima (1997), taking into account the respective frequencies of both arrangements and excluding 2jz 3 chromosomes from the Argentinian sample. According to the accepted chromosomal phylogeny, arrangement 2st is ancestral to 2j (Ruiz and Wasserman 1993), and the average frequencies of 2st in Australia and Argentina are 0.27 and 0.35, respectively. The expected differences within 2st and 2j in Australia are fairly equal (0.449θ and 0.451θ, respectively), which is exactly coincident with the observed values. In Argentina, the expected values are 0.546θ and 0.403θ, indicating that 2st is predicted to harbor 35% more variation than 2j, but such a difference is still within the 95% confidence intervals of our estimates.

Innan and Tajima (1999) have also shown that the ratio of the sum of the average number of pairwise differences within allelic classes to the average pairwise number of differences between all sequences is expected to be 1 in the absence of recombination. In the αE5 gene region, the values of the ratio were 1.96 and 1.76 in Australian and Argentinian samples, respectively. Such values suggest negligible levels of differentiation and moderate recombination between arrangements. This is concordant with the high proportion of shared polymorphic variants between 2st and 2j chromosomes.

In fact, differentiation among arrangements within Argentina and Australia was nonsignificant after Bonferroni’s correction (data not shown), although there was a certain degree of differentiation across the continents. Argentinian 2jz 3 was significantly differentiated from Australian 2st (K ST* = 0.092, p = 0.001; Snn = 0.985, p < 0.0001) and 2j (K ST* = 0.053, p = 0.004; Snn = 1.000; p < 0.0001) and Argentinian 2j from Australian 2st (K ST* = 0.049, p = 0.004; Snn = 0.832; p < 0.0001). However, comparisons involving 2jz 3 should be interpreted with caution because of the absence of variation within it as well as its small sample size (Charlesworth 1998).

Geographic Differentiation

Because our samples consisted of sequences of different geographic origin, and given that electrophoretic surveys of genetic variation have shown that D. buzzatii populations are significantly differentiated for EST-2 variant frequencies (Knibb and Barker 1988; Rodríguez et al. 2000), we also examined nucleotide variation at αE5 in D. buzzatii and D. koepferae for patterns of population subdivision.

The AMOVAs showed that genetic differentiation among populations within and among regions for total, nonsynonymous, synonymous, and silent variation was not significant either in Argentinian or Australian D. buzzatii (results not shown). Because the small number of sequences sampled per population and the large within-population variance can mask a structured pattern, we also repeated the analyses in Argentina, including only those populations with at least four sequences (Termas de Río Hondo, Salta, Catamarca and Tilcara, which belong to four different phytogeographic regions). However, the results were again not significant.

D. koepferae did not exhibit signs of population structure either when total or nonsynonymous variation was considered, but a pattern of population structure emerged from silent variation. Differentiation between regions was significant (σ a 2 = 0.671, Φ CT = 0.166, df = 3; p = 0.044) and accounted for 17% of total silent variation, whereas differentiation between populations within regions was not significant (σ b 2 = −0.286, Φ SC = −0.085, df = 4; p = 0.822).

Neutrality Tests

Fu and Li’s D and F statistics were negative and highly significant for total, nonsynonymous, and synonymous variation in Argentinian populations of D. buzzatii (Table 3), pointing to an excess of low-frequency variants. These tests were not significant in either Australian D. buzzatii or D. koepferae, and the results were congruent when no outgroup was used (results not shown). The R 2 statistic, which compares the observed number of singletons with those expected under a major recent population growth event, was significant only for the total variation in the Argentinian D. buzzatii sample.

Table 3 Tajima’s (D T ), Fu and Li’s (D FL and F FL ) and Ramos-Onsins and Rozas’ (R 2 ) tests for D. buzzatii and D. koepferae

The MK test was significant when either Argentinian or Australian D. buzzatii were compared with D. koepferae or D. borborema as outgroups (Table 4). In all of these comparisons, the ratio of nonsynonymous polymorphism to nonsynonymous fixed differences was higher than the ratio of synonymous polymorphism to synonymous fixed differences (NI values 3 to 6.2), pointing to an excess of nonsynonymous polymorphism or a deficit of nonsynonymous fixed differences. The Australian CRS gave results qualitatively similar to those for the whole Australian sample (D. koepferae: G-value: 11.97, p: 0.0005, NI: 7.63; D. borborema: G-value: 13.71, p: 0.0002, NI: 7.53).

Table 4 McDonald and Kreitman test for the αE5 gene region in D. buzzatii and D. koepferae

Discussion

Founder Effect in the Colonization of Australia

D. buzzatii was inadvertently introduced into Australia between 1930 and 1935 during the biological control program for prickly pear cacti (Opuntia spp.) (Barker 1982). Results of surveys of inversion, allozyme and mtDNA restriction site polymorphisms indicate that Australian populations have been subject to a moderate founder effect (Barker et al. 1985; Knibb et al. 1987; Halliburton and Barker 1993), whereas original South American populations of D. buzzatii have undergone a recent population expansion (Rossi et al. 1996; Rodríguez et al. 2000; DeBrito et al. 2002; Piccinali et al. 2004).

Theoretical and experimental studies have shown that founder events can cause the loss of low-frequency alleles and drastic shifts in allele frequencies in newly founded populations (Hedrick 1983; Fontdevila 1989; Rozas et al. 1990; Ridley 1996; Austerlitz et al. 2000; Pascual et al. 2001). The present survey of nucleotide variation at the αE5 gene region indicates that these predictions are met in the case of D. buzzatii. Despite the extensive sharing of nucleotide polymorphisms and the similar levels of nucleotide variation in the two continents, a considerable loss of singletons has accompanied the colonization of Australia by D. buzzatii. In fact, the percentage of singleton variants decreased dramatically from 70% in Argentina to 24% in Australia. Moreover, we also detected important shifts in the frequency of certain variants. For example, singleton variants at positions 348, 501, 532, 574, 575, and 589 in Argentina increased to higher frequencies during colonization of Australia.

Intriguingly, the most frequent and widespread haplotype in Argentina is not present in Australia, although two closely related haplotypes (separated by two mutational steps) are present there. It may be that the genetic diversity of the region from which the Australian population originated was not represented in the Argentinian sample analyzed. However, this seems unlikely for two reasons. First, the sample analyzed included localities such as Ticucho and Termas de Río Hondo, which are close to the research stations from which substantial amounts of cactus material were introduced into Australia during the prickly pear control program (Barker 1982). Second, surveys of molecular genetic variation of mitochondrial and nuclear genes have failed to reveal significant geographic subdivision in Argentinian D. buzzatii (Rossi et al. 1996; Piccinali et al. 2004). The most likely explanation for the changes in the haplotypic composition between continents is that it is the result of genetic drift associated with the founder event, but some forms of natural selection cannot be ruled out (see below).

Nucleotide Variation in D. buzzatii and D. koepferae

Our survey of nucleotide variation in αE5 in D. buzzatii and D. koepferae revealed that D. koepferae was more polymorphic at silent sites, whereas D. buzzatii was more polymorphic at nonsynonymous sites. Although stochastic variance can produce great changes in levels of nucleotide variation (see, for example, Machado et al. 2002), this result was surprising because previous studies based on the Xdh and Est-A genes showed that D. buzzatii was twice as polymorphic as D. koepferae for nonsynonymous, synonymous, and silent variation (Piccinali et al. 2004; Hasson unpublished data). These earlier results were attributed to interspecific differences in the historical population size or in the recombinational environment in which a gene is located because of the presence of different polymorphic inversions (Piccinali et al. 2004). However, in view of the departures observed in αE5 from neutral expectations (see the MK test results), the peculiar pattern in this locus may be the outcome of some form of diversifying selection in D. buzzatii populations (see below).

Electrophoretic Alleles

As might be expected, several significant associations between αE5 haplotypes and EST-2 variants were detected by starch gel electrophoresis. The correlations were not perfect because only part of the αE5 gene has been sequenced in our study, and starch electrophoresis does not detect all the protein variations observable by more sensitive sequential cellulose acetate gel electrophoresis (Barker 1994). This latter technique has shown that EST-2a and EST-2b are more heterogeneous on cellulose acetate gels, at least in Australian populations (Barker 1994). In agreement with these observations, the statistical parsimony analysis showed that the haplotypes associated with these two allelic classes were those more widely distributed across the haplotype network. In contrast, most of the EST-2c, EST-2c+, and EST-2d haplotypes from both continents were closely related, strongly suggesting that most of them were derived from one ancestral sequence. These observations may indicate that the EST-2a and EST-2b classes have accumulated a higher number of variants and recombination events with time and that they are ancestral to EST-2c, EST-2c+, and EST-2d classes.

Finally, the observation of stronger associations between haplotypes and electrophoretic alleles in Australia than Argentina is probably another consequence of the population bottleneck that occurred during the colonization of Australia.

Differentiation Between Chromosomal Arrangements

Electrophoretic surveys have shown that certain variants of the EST-2 protein are in linkage disequilibrium with second-chromosome inversions (East et al. 1987; Rodríguez et al. 2001), but differentiation in the αE5 region between chromosomal inversions within continents was not significant after Bonferroni’s correction. In addition, levels of nucleotide variation within arrangements were similar for 2st and 2j in both Argentina and Australia, and in agreement with theoretical expectations, whereas between and within variability suggest a moderate recombination rate between arrangements (Innan and Tajima 1999).

Despite the similar levels of nucleotide variation within 2st and 2j, our sample of four 2jz 3 chromosomes was monomorphic for the sequenced αE5 gene region. However, the same chromosomes have shown sequence variation in other genes (Gomez and Hasson 2003; Laayouni et al., 2003; Piccinali et al. 2004). There are two alternative explanations for the lack of variation in 2jz 3 chromosomes, both based on the proximity of αE5 to the proximal breakpoint of inversion 2z 3. One is that this proximity makes the region more vulnerable to selective sweeps. The second, and more plausible explanation, is that 2z 3 is a relatively recent inversion (Cáceres et al. 2001; Laayouni et al. 2003), which has simply not existed long enough to acquire variation near its breakpoints by recombination. A similar picture has been reported for 2q 7, another young inversion in D. buzzatii derived from a 2j chromosome by a single paracentric inversion. All chromosomes sharing this arrangement were identical for the sequences of the regions encompassing the breakpoints of the inversion (Casals et al. 2003).

Population Genetic Structure

Our survey of nucleotide variation in the αE5 gene revealed that natural populations of D. koepferae are genetically structured for silent sites and that variation is hierarchically structured among phytogeographic regions. A similar population genetic structure was detected in a previous study based on synonymous variation in the Xdh gene (Piccinali et al. 2004), indicating that this feature is not locus specific and that populations of D. koepferae behave as semi-isolated populations with limited gene flow. However, the αE5 gene, as well as Xdh and other nuclear and mitochondrial genes (Rossi et al. 1996; Gómez and Hasson 2003; Piccinali et al. 2004), failed to show signs of population structure in D. buzzatii. Although the existence of population subdivision in Argentinian D. buzzatii cannot be excluded, especially because of the small sample size analyzed per population, our results suggest that the pattern of structuring is weaker than in D. koepferae. This absence of population structure in D. buzzatii could reflect an extensive gene flow between populations and/or recent divergence from an ancestral population (Rodríguez et al. 2000).

Departures from Neutrality in the αE5 Gene in D. buzzatii

Overall, our results suggest that populations of D. koepferae are in drift–mutation equilibrium. However, in Argentinian D. buzzatii, there is evidence that nucleotide variation departs significantly from neutrality toward an excess of singletons at synonymous and nonsynonymous sites (Fu and Li tests; Table 3). As discussed elsewhere (Piccinali et al. 2004), similar departures have been reported in other nuclear loci (Gómez and Hasson 2003; Laayouni et al. 2003) and in the mitochondrial genome (Rossi et al. 1996; DeBrito et al. 2002), and the available evidence suggests that these departures are not merely a sampling artifact. One possible explanation for the genome-wide excess of singletons in Argentina may be a recent population expansion. This interpretation is also supported by the results of the R 2 test, which is one of the more powerful tests for detecting changes in population size (Ramos-Onsins and Rozas 2002).

Another pattern emerging from our data is an excess of replacement polymorphism in both native and colonized populations of D. buzzatii. Three alternative explanations have been offered to explain such a pattern. First, a relaxation of natural selection can lead to an increase in the amount of nonsynonymous polymorphism not affecting interspecific divergence (Kennedy and Nachman 1998). Second, the so-called nearly neutral model contends that nonsynonymous substitutions are slightly deleterious and can persist for longer periods of time because of small selection coefficients and generally do not contribute to divergence because the fixation of these mutations is highly unlikely (Ohta 1992). The third hypothesis is Gillespie’s model of episodic selection (Gillespie 1991), which proposes that particular haplotypes, carrying initially advantageous amino-acid variants that may rapidly increase in frequency but cannot reach fixation because of changes in the intensity and direction of natural selection.

The relaxation of natural selection may be a plausible explanation for αE5 variability in D. buzzatii, but only in Australia, because an excess of nonsynonymous variants is an expected feature in populations that have experienced a recent population bottleneck (Fay and Wu 2002). However, the signature of a population expansion, as seems to have occurred in Argentina, is usually an excess of divergence, which can be confounded with positive selection (Eyre-Walker 2002; Fay and Wu 2002).

Rand and Kann (1996) proposed a method to test the nearly neutral model. One prediction of the nearly neutral theory is that mildly deleterious polymorphisms are not expected to reach appreciable frequencies, whereas the frequencies of neutral variants may range from rare to intermediate (Ohta 1992). Thus, according to this line of reasoning, an appropriate test of the model would be to compare the frequency spectra of silent versus replacement polymorphisms. Because silent polymorphisms may reach high frequencies and exhibit a wide frequency range, Tajima’s D would be expected to be less extreme than for replacement sites. On one hand, in Argentina, nonsynonymous and synonymous sites have fairly similar average frequencies for less common variants (mean = 0.09 for both), and the variance (var) for nonsynonymous sites (var = 0.02) is twice as large as that for synonymous sites (var = 0.01). Similarly, Tajima’s D, although not significant, is quite similar for both types of sites (Table 3). On the other hand, in Australia, the observed pattern is closer to the expectations of the model. In effect, the mean and variance of the frequency of synonymous variants (mean = 0.18; var = 0.03) are greater than for nonsynonymous sites (mean = 0.12, var = 0.01), and Tajima’s Ds are positive and negative for synonymous and nonsynonymous sites, respectively (Table 3). Therefore, the nearly neutral model seems to be another suitable explanation for the excess of nonsynonymous polymorphism at the αE5 locus in Australia but not in Argentina.

Finally, we consider whether the excess of replacement polymorphism in the D. buzzatii of both continents is in agreement with the predictions of some form of heterogeneous selective regime. EST-2 is proposed to be involved in digestion and/or detoxification (East 1982), and habitat selection (Fernández-Iriarte et al. 2002), and D. buzzatii is the only species of the buzzatii complex that lives associated with prickly pear cacti; the other species breed and feed in columnar cacti (Hasson et al. 1992; Manfrin and Sene 2006). Several species of Opuntia serve as hosts for D. buzzatii in both Argentina and Australia, although there is very little concordance between the species between the two continents (Kircher 1982; Hasson et al. 1992). The different Opuntia species also constitute chemically different habitats for D. buzzatii with respect to the kind and quantity of alcohols, ketones, alkaloids, and esters (East 1982; Kircher 1982). Therefore, we suggest that the observed excess of nonsynonymous polymorphism in D. buzzatii on both continents might be related to the shifts to heterogeneous complements of host plants that D. buzzatii has experienced since its divergence from the other D. buzzatii complex species and during the expansion of its range. In particular, given the different complements of Opuntia species on the two continents, such an explanation could also account for the finding of different high-frequency haplotypes in the two data sets. Although the idea clearly requires testing in further more specific ecological genetic analyses, we propose that the high level of nonsynonymous variation we found in the αE5 gene is related, at least in part, to a role it has played in the diversification of the species into a complex cactophilic niche in the arid lands of South America and, more recently, in Australia.