Introduction

Genes encoding reproductive proteins are often more divergent than genes encoding nonreproductive proteins (e.g., Civetta and Singh 1998; Singh and Kulathinal 2000). This divergence commonly stems from selection for nucleotide substitutions that result in amino acid changes (Swanson and Vacquier 2002). Such positive selection can be identified by comparing relative rates of nonsynonymous and synonymous changes at orthologous loci (McDonald and Kreitman 1991; Hughes and Nei 1988). More recent refinements to these methods allow for the identification of specific residues targeted by positive selection (Yang et al. 2000; Yang and Swanson 2002; Palomino et al. 2002). Such site-specific models have been used to detect positive selection in a variety of reproductive genes (Swanson and Vacquier 2002).

Protein divergence, however, is not brought about solely by nucleotide substitutions. Partial gene duplications contribute to the divergence of some reproductive proteins by producing variation in the number of internal repeats (e.g., bindin, a sperm-borne adhesion protein from sea urchins [Metz and Palumbi 1996; McCartney and Lessios 2004; Zigler and Lessios 2004]) or by altering posttranslational modifications to introduce new coding regions into the mature protein (e.g., TMAP, an acrosomal protein from marine snails [Hellberg et al. 2000]). Homogenization of internal repeats by concerted evolution may also contribute to the rapid divergence of reproductive proteins (e.g., VERL, the egg-borne vitelline envelope receptor for lysin from abalone [Swanson and Vacquier 1998]). Insertions and deletions (indels) are another source of variation upon which positive selection may act. The rate of spontaneous indel mutations may be as high as that for nucleotide substitutions (Britten et al. 2003; Denver et al. 2004), and recent studies have shown positive selection acting on indels in sperm-specific proteins in mammals (Podlaha and Zhang 2003; Podlaha et al. 2005).

Here we evaluate positive selection in some well-characterized examples of rapid divergence in reproductive proteins: the accessory gland proteins (Acps) of Drosophila. During mating, D. melanogaster males transfer 70–106 Acps to females in the seminal fluid that accompanies sperm (Mueller et al. 2005). These Acps elicit many behavioral and physiological changes in the mated female (Wolfner 2002): they increase egg-laying rate (Herndon and Wolfner 1995; Chapman et al. 2001; Heifetz et al. 2001), promote sperm storage (Neubaum and Wolfner 1999; Xue and Noll 2000), reduce female willingness to remate (Chen et al. 1988; Aigaki et al. 1991), reduce female life span (Chapman et al. 1995; Lung et al. 2002), and mediate sperm competition (Harshman and Prout 1994; Clark et al. 1995). The genes underlying these reproductive functions often diverge relatively rapidly: Acps in the D. melanogaster subgroup are on average twice as divergent between species as nonreproductive proteins (Civetta and Singh 1995; Singh and Kulathinal 2000; Swanson et al. 2001).

While the functions of Acps in the D. melanogaster subgroup have been widely studied, along with the role of positive selection on nucleotide substitutions in effecting their divergence, little is known about Acps in other drosophilid lineages. The recent publication of the D. pseudoobscura genome (Richards et al. 2005) permits the comparison of Acp evolution between two lineages (the D. melanogaster and D. pseudoobscura subgroups) that have been independent for 21–46 MYA (Beckenbach et al. 1993). Wagstaff and Begun (2005) used a combination of computational and molecular approaches to identify five orthologous Acp loci from the D. melanogaster subgroup in D. pseudoobscura (Table 1). Stevison et al. (2004) compared the divergence of four X-linked putative Acps in two species from the D. melanogaster subgroup and two species from the pseudoobscura subgroup. One of these putative Acp genes (CG16707) exhibted positive selection in both subgroups and contributed to an overall correlation between dN/dS (the ratio of nonsynonymous substitutions per nonsynonymous site to synonymous substitutions per synonymous site [Hughes and Nei 1988]) values for 12 orthologues compared between the two clades. Genomic comparisons by Muller et al. (2005) found that the dN/dS ratios of Acp genes with recognizable orthologues between D. melanogaster and D. pseudoobscura were lower within the D. melanogaster subgroup than were dN/dS ratios for Acp genes with no identifiable orthologue in D. pseudoobscura. Mueller et al. (2005) also defined Acps more narrowly than previously, thereby excluding CG16707. Thus the degree to which selection on orthologous Acps is correlated in distant clades remains unknown.

Table 1 Functions of accessory gland proteins used in this study

Here, we test for positive selection on the five Acps in the D. pseudoobscura subgroup identified by Wagstaff and Begin (2005). In addition to nucleotide substitution rates, we evaluate the role that indels, a source of variation heretofore ignored in studies of Acps in Drosophila, play in the divergence of these proteins. We compare patterns of selected change within the D. pseudoobscura subgroup to those seen for the same Acps in the D. melanogaster subgroup. If the patterns of molecular evolution in these Acps are similar between these two clades, then the conserved functions of these proteins could remain a constant target for similar types of selection over large time scales. Alternatively, different patterns of selection on orthologous reproductive proteins in the two lineages would suggest that selection opportunistically targets different loci among different clades.

Materials and Methods

Fly Stocks

Flies used in this study were obtained from Dr. Mohamed Noor, Dr. Carlos Machado, and the Tucson Stock Center (http://www.stockcenter.arl.arizona.edu) and largely overlap with those used by Machado et al. (2002). We used 20 lines of D. pseudoobscura: 4 lines from Mather, California (Mather17, Mather32, Mather52, and Mather1959); 4 lines from Mt. St. Helena, California (MSH9, MSH21, MSH24, and MSH32); 1 line from James Reserve, California; 4 lines from American Fort Canyon, Utah (AF2, AFC3, AFC7, and AFC12); 4 lines from Flagstaff, Arizona (Flagstaff5, Flagstaff14, Flagstaff16, and Flagstaff18); 1 line from Tucson, Arizona; 1 line from Baja, California (Baja 1); and 1 line from Sonora, Mexico (Sonora 3). We also used 11 lines of D. p. bogotana from near the city of Bogotá in Cundinamarca, Colombia (Bogotá 1960, Bogotá 1976, Potosí2, Potosí3, Susa2, Susa6, Sutatausa3, Sutatausa5, Toro1, Toro6, and Toro7); 7 lines of D. persimilis—3 lines from Mather, California (Mather37, Mather40, MatherG) and 4 lines from Mt. St. Helena, CA (MSH1, MSH3, MSH7, and MSH42); and 3 lines of D. miranda (MSH22, MSH38, and Mather 1993).

DNA Isolation, PCR Amplification, and Sequencing

DNA was extracted from whole male flies using the single fly squish protocol of Gloor and Engels (1992). PCR primers were designed from D. pseudoobscura Acp sequences from Wagstaff and Begun (2005) using Primer3 (http://www.frodo.wi.mit.edu/cgi-bin/primer3/primer3). The PCR was performed on a PTC-200 (MJ Research, Watertown, MA) using the following conditions: 94°C for 2 min 30 sec, 50°C for 2 min, then 72°C for 2 min, followed by 38 cycles of 94°C for 45 sec, 50°C for 1 min, then 72°C for 1 min 15 sec. Resulting amplicons were purified using either a Strataprep PCR Purification Kit (Stratagene, La Jolla, CA) or a QuickStep2 96-Well PCR Purification Kit (Edge BioSystems, Gaithersburg, MD), then sequenced using both amplification primers on an ABI 377 automated sequencer, using Big Dye Terminators (V3.1; Applied Biosystems, Foster City, CA). Sequences are available from GenBank (DQ368868–DQ369012).

Sequence Analyses

Nucleotide sequences for each Acp were initially assembled and edited with Sequencher 3.0 (Gene Codes, Ann Arbor, MI). Inferred amino acid sequences were then aligned with ClustalW (http://www2.ebi.ac.uk/clustalw/) under default settings. Further alignment modifications were made by hand. Resulting amino acid alignments were then used to align nucleotides.

Measures of Acp polymorphism and divergence, as well as McDonald-Kreitman’s (1991) test for nonneutrality, were calculated using DnaSP 4.0 (Rozas et al. 2003). These measures can reveal departures from neutrality that act across all sites of a protein by comparing the number of silent versus replacement polymorphisms. We tested for recombination with DnaSP using the algorithm described by Hudson and Kaplan (1985). No significant recombination was detected. DnaSP 4.0 was also used to calculate Tajima’s D, Fu and Li’s D, and Fay and Wu’s H, with confidence levels for these estimated by the coalescent with 1000 replications. We obtained parsimony and neighbor-joining (Saitou and Nei 1987) trees for alleles using Kimura two-parameter distances in PAUP* v4.0b10 (Swofford 2001). Branch support was estimated by bootstrapping using 1000 replicates.

Acp sequences from the D. melanogaster subgroup were downloaded from GenBank. These were chosen based on sequence length (>75% of the protein’s open reading frame had to be available) and uniqueness (identical sequences were not included). These sequences were given initially by Tsaur et al. (2001; AF302208–AF302229), Begun et al. (2000; AY010527–AY010711), Panhuis et al. (2003; AY344246–AY344364), Holloway and Begun (2004; AY635196–AY635290), and Kern et al. (2004; AY505178–AY505293). For these analyses, D. melanogaster (Zimbabwe) and D. p. bogotana were considered as taxa separate from their nominal conspecifics.

We used the codeml program in PAML 3.14 (Yang 2004) to test for positive selection and to infer amino acid sites under positive selection under the maximum likelihood methods of Nielson and Yang (1998) and Yang et al. (2000). A Bayes Empirical Bayes (BEB; Deely and Lindley 1981) approach was subsequently used to calculate the posterior probabilities that each particular site fell into the different dN/dS (or ω) classes (Yang et al. 2005).

We performed three tests for positive selection on nucleotide substitutions. First, we used a model (M0) that assumed a single ω value for all sites to estimate the level of positive selection averaged over all codons (Nielsen and Yang 1998). Second, a more robust test for adaptive evolution was performed by comparing the nested models M7 and M8 (Yang et al. 2000). The neutral model M7 allowed ω to take on beta-distributed values between 0 and 1 at each codon (i.e., no positive selection). This was compared with selection model M8, which used the same beta-distributed values for neutral codons but added another parameter that allows a proportion of codons to take on ω values greater than one. Finally, we compared selection model M8 to model M8A, which allows a proportion of sites to equal but not exceed a ω value of 1 (Swanson et al. 2003). Positive selection was inferred if ω > 1.0. Significance was determined by comparing twice the difference between the likelihood values of M7 vs. M8 or M8 vs. M8A to a chi-square table of critical values with one degree of freedom. The default starting value of ω in PAML is 0.3 for all models. Because convergence is a concern for MCMC analyses, we varied initial ω values (set to 0, 0.5, and 1.0). Results were consistent regardless of these starting values (data not shown); we report values from the default priors here.

Codon-based maximum likelihood approaches have had success in identifying residues under selection, as evidenced by their ability to identify residues already functionally implicated as being under positive selection (e.g., Yang and Swanson 2002; Palomino et al. 2002). We used the BEB approach to identify positively selected residues instead of alternative parsimony-based approaches (Suzuki and Nei 2004; Zhang 2004) because (1) while the parsimony methods have a low rate of false positives, they also have little power for detecting positive selection or identifying positively selected sites (Wong et al. 2004), and (2) while the older Naive Empirical Bayesian approach (NEB) can have high false-positive rates, the BEB approach corrects for past problems and reduces the false-positive rate considerably (Yang et al. 2005). Through the BEB approach, sites under positive selection can be identified, even if the average dN/dS over all sites is <1. Sites with a high probability of belonging to the class with ω > 1 are likely to be under positive selection.

Determining whether positive selection promotes indels is not as straightforward as for nucleotide substitutions because there are no natural within-gene comparisons analogous to synonymous substitutions. To test for positive selection on indels, Podlaha and Zhang (2003) compared the rates of indel substitutions in the reproductive protein of interest to those in neutral (noncoding) sequences with the simple ratio (number of nucleotide indels)/(total number of base pairs)/(divergence time). Ideally, such comparisons would be made involving the same genomes, but such estimates are not always available. For their comparisons among primates, Podlaha and Zhang (2003) used estimates of neutral indel rates between humans and chimpanzees, even though this pair showed no indels for the Catsper1 gene of interest. We estimated indel substitution rates in Drosophila from differences between intronic, 5′-intergenic, and 3′-intergenic regions in D. simulans and D. sechellia (Halligan et al. 2004). Note that this method of comparison is conservative, because indels occurring in exonic sequences must occur in multiples of 3 bp so as not to disrupt open reading frames, a constraint not present for noncoding regions. Indels were counted without regard to their size.

Results

Intraspecific Variation

Intraspecific sequence variation at the five Acp loci examined (Table 2) was comparable to the range of values reported for neutral sequence regions by Machado et al. (2002) for these same taxa. For both Θw and Nei’s π, D. pseudoobscura had the highest levels of nucleotide variation at Acp26Aa, Acp32CD, and Acp62F, while D. miranda had the highest nucleotide variation for Acp53Ea and Acp70A.

Table 2 Acp polymorphism in the D. pseudoobscura subgroup

For all phylogenetic analyses of Acps, alleles from D. miranda fell basal to the other taxa (not shown). Individuals from the same taxon generally grouped together, and with the same topology as generally accepted for this subgroup, although support was weak. Acp26Aa (Fig. 1) provided the strongest exception to this pattern, with many D. persimilis alleles grouping with D. p. pseudoobscura alleles, to the exclusion of a basal group of D. p. pseudoobscura and D. p. bogotana alleles.

Figure 1
figure 1

Neighbor-joining tree (using Kimura two-parameter distances) for alleles of Acp26Aa of the Drosophila pseudoobscura subgroup. Numbers above branches indicate bootstrap support values. Geographic origin and line numbers are shown in parentheses.

Tests for Neutrality

Tajima’s (1989)D was not significantly different from zero in any taxon within the D. pseudoobscura subgroup for any of the Acp loci (Table 2). Fu and Li’s D and Fay and Wu’s H were also not significantly different from zero. Therefore, we cannot reject the hypothesis that these loci are evolving neutrally using these tests. McDonald-Kreitman tests also failed to reveal any departure from neutral behavior at Acp53Ea, Acp62F, or Acp70A (Supplementary Table 1). The only comparisons that showed deviation from neutrality were between D. p. bogotana and D. miranda for Acp26Aa (p = 0.005) and between D. persimilis and D. miranda for Acp32CD (p = 0.0079). All other interspecific comparisons for Acp26Aa and Acp32CD did not deviate from neutrality under this test (Supplementary Table 1). These results remained significant after applying the Williams correction for independence (Sokal and Rohlf 1995).

Tests for Positive Selection on Nucleotide Substitutions

Raw sequence comparisons suggest the possibility of positive selection, with more replacement than silent polymorphisms in at least some comparisons for Acp26Aa, Acp32CD, and Acp62F (Table 2). dN/dS ratios (ω) averaged across lineages and sites (M0) were <1 for all Acps in both subgroups, with the exception of Acp32CD in the D. melanogaster subgroup (Table 3). Under the positive selection model (M8) positive selection was detected in Acp26Aa, Acp32CD, and Acp62F in both the D. pseudoobscura subgroup and the D. melanogaster subgroup, with Acp53Ea under selection in the D. melanogaster group as well (Table 3).

Table 3 LRTs of positive selection for Acps in the D. pseudoobscura and D. melanogaster subgroups

To identify the particular residues underlying this positive selection, we used the BEB approach of Yang et al. (2005). Many residues were subject to positive selection in three of the Acps examined: Acp26Aa, Acp32CD, and Acp62F (Table 3). For Acp26Aa, a higher proportion of sites underwent positive selection in the D. pseudoobscura subgroup than in the D. melanogaster subgroup. A similar number of sites underwent positive selection between the groups for Acp62F. Acp53Ea also had a ω > 1 in the D. melanogaster subgroup, but this was not significant (Table 3). The extensive divergence between orthologous loci prevented us from determining whether the same residues were under selection in the two clades.

Acp26Aa had the highest dN/dS ratio in the D. pseudoobscura subgroup and under strong selection in the D. melanogaster subgroup. Acp62F was also undergoing significant positive selection in both groups, but at fewer sites and with lower ω values. No significant positive selection was detected in Acp53Ea or Acp70A for either subgroup. Acp32CD was undergoing significant positive selection in the D. melanogaster subgroup, but not in the D. pseudoobscura subgroup, although positive selection was suggested at more sites in this Acp than in either Acp53Ea or Acp70A.

Indel Substitutions

Nucleotide substitutions were not the only source of variation in Acp26Aa. Amino acid alignments of Acp26Aa revealed several indels in both the pseudoobscura and the melanogaster subgroups, including polymorphisms within species (Fig. 2a). In contrast to these exonic indels, there were no indels present in an immediately adjacent 68-bp intron of Acp26Aa in any of the seven individuals from the pseudoobscura subgroup (obtained from GenBank; our sequencing started immediately after the intron). In addition, 22 of the 29 positively selected sites (with posterior probabilities >0.8) fell within the indel regions of Acp26Aa in the D. pseudoobscura subgroup (Fig. 2a), even though these regions constituted only 39% of the total aligned protein-coding region. In contrast, only four of seven positively selected sites (with posterior probabilities >0.8) fell within indel regions in the D. melanogaster subgroup (Fig. 2b). The indels sometimes prevented unambiguous alignment of sequences (especially a 12-residue repeat shared by some D. p. pseudoobscura and D. p. bogatana sequences (Fig. 2a), however, analysis of several alternative alignments produced very similar results in terms of the number of residues under selection and overall values of Dn/Ds (not shown).

Figure 2
figure 2

Amino acid alignment of insertions and deletions in part of Acp26Aa from (a) the Drosophila pseudoobscura subgroup and (b) the D. melanogaster subgroup. Positively selected sites with posterior probabilities >0.8 are highlighted in gray. D. pseudoobscura subgroup sites are numbered starting immediately after the sole intron. D. melanogaster subgroup sites are numbered as the sequences appear in GenBank.

Comparisons of indel substitution rates in Acp26Aa to those in noncoding regions of Drosophila genomes suggest that indels may be under positive selection. The indel substitution rates in Acp26Aa are higher than, or of the same order of magnitude as, those in noncoding regions of Drosophila genomes (Table 4).

Table 4 Estimated indel substitution rates for Acp26Aa, intronic, and gene flanking regions

Acp32CD also contained several indels, including a single indel polymorphism within Acp32CD of D. pseudoobscura. Alignments of Acp32CD revealed one 6-bp insertion/deletion between D. melanogaster (United States and Zimbabwe) and D. simulans. This indel did not fall in a positively selected region of Acp32CD. No indels were present in Acp53Ea, Acp62F, or Acp70A in either of these groups.

Discussion

We have shown that the accessory gland proteins Acp26Aa and Acp62F have sites that are undergoing positive selection in the D. pseudoobscura subgroup. Similar proportions of positively selected sites are found in these same two Acps in the D. melanogaster subgroup, and in Acp32CD as well. Two additional Acps, Acp53Ea and Acp70A, were not subject to positive selection in either of these subgroups. In addition to positive selection acting on nucleotide substitutions, we also found several indel replacements and polymorphisms in Acp26Aa and Acp32CD. The regions where these indels occur are the same places that harbor positively selected nucleotide substitutions for Acp26Aa in the D. pseudoobscura subgroup, but not in the D. melanogaster subgroup. The deep divergence in Acps from the two subgroups prevented us from determining whether the same residues are subject to positive selection in both subgroups, as Acps from the different subgroups could not be aligned. Acp26Aa has already been demonstrated to undergo positive selection in the D. melanogaster subgroup (Tsaur and Wu 1997; Tsaur et al. 1998; Begun et al. 2000) and in the D. pseudoobscura subgroup (Wagstaff and Begun 2005). However, this is the first study to identify positive selection at particular sites for Acp26Aa or any other drosophilid Acp or to note extensive indel variation or high rates of indel substitution within any Acp.

Mueller et al. (2005) suggested that, because most Acps from D. melanogaster could not be detected in D. pseudoobscura, Acps might be undergoing different evolutionary paths in these divergent lineages. Stevison et al. (2004), however, found that dN/dS values were correlated for 12 orthologous genes in the melanogaster and pseudoobscura subgroups, 4 of which were putative Acps (although only 2 of these would qualify as Acps under the definitions of Mueller et al. [2005]). For the subset of five Acps where orthologues in the two subgroups had been recognized by Wagstaff and Begun (2005), we found that the relative strength of positive selection on nucleotide substitutions is similar. This suggests that the presumably conserved functions of these proteins have remained targets for the same type of selection, diversifying or stabilizing, over long periods of time.

The functions of the two Acps shown here to be under positive selection suggest a potential role in some observed reproductive incompatibilities within the two subgroups. Acp62F protects sperm from proteolysis (Lung et al. 2002), which could potentially protect the sperm in the female’s reproductive tract. The protease inhibitor class to which Acp62F belongs was noted by Mueller et al. (2005) as being especially lacking in orthologues between the melanogaster and the pseudoobscura subgroups. However, whether the action of Acp62F is species specific in the D. melanogaster subgroup remains unknown.

Acp26Aa (ovulin) increases egg-laying (Herndon and Wolfner 1995; Heifetz et al. 2001). In addition, Clark et al. (1995) showed that Acp26Aa genotypes correlate with sperm displacement ability within D. melanogaster. If these observed intraspecific effects carried over to interactions between subspecies, the allelic variation at Acp26Aa might play a role in the conspecific sperm precedence observed between subspecies of D. pseudoobscura (Dixon et al. 2003). Here, we found that Acp26Aa alleles from the same D .p. pseudoobscura populations used by Dixon et al. (2003) fell into two different (modestly supported) phylogenetic groups: one basal and including all alleles from D. p. bogotana, the other derived and containing all D. persimilis alleles but none from D. p. bogotana (Fig. 1). Studies that simultaneously genotyped Acp26Aa alleles and evaluated mating success (as Clarke et al. 1995) may reveal whether some conspecific sperm precedence seen between D. p. pseudoobscura and D. p. bogotana (Dixon et al. 2003) owes to divergence at this locus (possibly from introgressed D. persimilis alleles).

Previous studies evaluating positive selection on Acps have only examined nucleotide substitutions. Two recent studies, however, have shown positive selection acting on indels in a sperm-specific protein (Catsper1) in both primates (Podlaha and Zhang 2003) and rodents (Podlaha et al. 2005). Catsper1 encodes a voltage-gated calcium ion channel that is necessary for proper sperm motility (Ren et al. 2001) and may help mediate sperm competition. Positive selection on nucleotides also occurs in indel-rich regions of the gamete recognition protein bindin from sea urchins (Metz and Palumbi 1996; McCartney and Lessios 2004, Zigler and Lessios 2004). Previous studies evaluating the molecular evolution of Acps in Drosophila, however, have either implicitly or explicitly excluded indels from their analyses (e.g., Tsaur and Wu 1997; Begun et al. 2000), although Mueller et al. (2005) noted that two Acp loci (CG14560 and CG9074) contained repetitive regions. Our results suggest that indel substitutions play a significant role in the divergence of some Acps. Indels appear to be concentrated in the same part of Acp26Aa of the D. pseudoobscura subgroup as where most residues under positive selection occur. This correlation we found between positively selected residues and indel sites in the D. pseudoobscura subgroup should not arise as an artifact of the PAML analysis (and indeed is not present in the D. melanogaster subgroup) because gaps are treated as ambiguities and dropped from the analysis in pairwise fashion. Further, the high rates of indel substitution in Acp26Aa (Table 4) suggest that positive selection may act on the indels themselves.

Positive selection often drives the rapid evolution of reproductive proteins (Swanson and Vacquier 2002). We have demonstrated that the strength of positive selection on nucleotide substitutions acting on five orthologous Acps is similar in two drosophilid lineages that split 21–46 MYA (Beckenbach et al. 1993). In addition, indels also contribute to the divergence of some Acps and may even be promoted by positive selection.