Introduction

Very few evolutionary analyses have been conducted on the molecules involved in avian reproduction despite the large amount of data available regarding avian sexual behavior. Indeed, the study of birds has given rise to many of the hypotheses invoked to explain the pattern of adaptive evolution of reproductive proteins, which is observed in taxa as diverse as Arabidopsis, abalone and Drosophila (Darwin 1871; Andersson 1994; Clark et al. 2006). It is this fact, as well as the unusual nature of the avian fertilization system, with its tolerance of physiological polyspermy, that makes the study of avian reproductive proteins especially intriguing. Here we present one of the first studies of the evolutionary pattern of an avian egg surface protein, zona pellucida 3 (ZP3).

Darwinian adaptive evolution of reproductive proteins has been detected in invertebrates (sea urchin, abalone, Drosophila, crickets), vertebrates (mammals), and plants (Arabidopsis) (Clark et al. 2006 and references therein). A variety of hypotheses have been proposed to explain this apparently widespread phenomenon. These hypotheses primarily involve forces of sexual selection such as mate choice, intrasexual competition and sexual conflict. Conflict over polyspermy, in particular, is one of most frequently discussed hypotheses for the rapid evolution of sperm and egg surface proteins (Gavrilets 2000; Swanson and Vacquier 2002; Haygood 2004). In this scenario, competition among sperm variants for more rapid binding with egg proteins leads to a decreased ability for the egg to protect itself from polyspermy. Since, in the vast majority of species, polyspermy results in egg death, females benefit from decreasing sperm efficiency through various blocks to polyspermy (Frank 2000; Tarin and Cano 2000). In contrast, because of anisogamy, the benefit to sperm of outcompeting the sperm cells of other males outweighs any costs of polyspermy to that male. This conflict between male and female fitness trajectories is predicted to result in a cyclical antagonistic coevolutionary process (Frank 2000).

Sexual conflict over polyspermy would not be expected to drive rapid evolution of egg surface proteins in birds, however, since they are one of a few taxa that tolerate physiological polyspermy (Tarin and Cano 2000; other taxa include some species of ctenophores and urodeles). In the avian fertilization system, the penetration of the outer membranes of the egg, including the inner perivitelline layer (IPVL) and plasma membrane, by multiple sperm does not result in a selective disadvantage to the egg, and may even ensure fertilization and proper development (Wishart 1987; Birkhead and Fletcher 1998). Thus, if polyspermy avoidance is the primary force driving rapid evolution in reproductive proteins, avian IPVL proteins should not show a pattern of divergence via adaptive evolution.

Rapid evolution of reproductive proteins may be observed in birds if forces other than polyspermy avoidance are acting. Sperm storage and regular multiple mating in many bird species create a scenario in which egg and sperm proteins might face pressures analogous to sexual selection at the organism level. For example, a female that mates with several males may possess eggs whose surface protein binds more readily with the sperm of those males with more similar proteins, thus resulting in assortative mate choice at the molecular level (Eberhard 1996; Palumbi 1999). Presently, the factors dictating the ultimate selection of a sperm nucleus to fuse with the egg are unknown, but it is possible that processes early on in the fertilization cascade influence this process.

The complex stages of avian fertilization offer up a series of opportunities for sperm and egg proteins to interact, and for selection to dictate success or failure. The initial stage of fertilization involves the penetration of the IPVL, the structure analogous to the mammalian zona pellucida, by multiple sperm, most of which enter through the germinal disk (Tarin and Cano 2000). Approximately fifteen minutes after sperm penetrate the IPVL, the outer perivitelline layer (OPVL) is laid down, apparently blocking further penetration by sperm (Bellairs et al. 1963). A proportion of sperm that have not acrosome-reacted are trapped by this outer layer (Tarin and Cano 2000). Untrapped sperm either fuse with the oolemma and form pronuclei, or are engulfed completely (Tarin and Cano 2000). Ultimately, only a single sperm pronucleus fuses with the egg pronucleus—the others migrate to the periphery and are degraded. Both the manner in which egg and sperm proteins act at these later stages of fertilization, and the actual mechanism by which the successful sperm nucleus is selected is unknown.

Five proteins homologous to zona pellucida proteins in other taxa have been identified in the avian egg. We analyzed the evolutionary patterns of one of these proteins, ZP3, the avian homologue of the mammalian egg coat protein ZP3 (Takeuchi et al. 1999). In birds, it is a primary component of the egg IPVL, along with ZP1 and ZPD; all three of which appear to be involved with sperm activation and entry (Bausek et al. 2004; Okumura et al. 2004). If the patterns of rapid evolution observed in mammalian ZP3 and other reproductive proteins arise out of protection against polyspermy threats, resulting from sperm competition, we expect to see no evidence of rapid evolution in the avian form of this protein. However, if other forces result in adaptive evolution of reproductive proteins, we would expect to see rapid evolution in avian ZP3 similar to the pattern seen in other taxa.

We tested whether ZP3 sequence evolution shows a pattern of adaptive change by comparing the nucleotide sequences among 15 galliform and a single anseriform species (Table 1). We used a maximum likelihood algorithm that uses patterns of synonymous and nonsynonymous ratios of substitution (d N /d S or ω) to test for rapid adaptive divergence in ZP3 sequences across taxa, and to predict the protein residues most likely to be under selection. We also obtained the sequence of two intronic/exonic regions for several individuals within a population of California quail (Callipepla californica) and tested for signs of recent selective pressures on this protein. The California quail is a free-ranging species likely to demonstrate normal levels of polymorphism, and is part of a larger study we are conducting on the reproductive behavior and molecular evolution of the Callipepla species group.

Table 1 Study species, sample source and accession number for the analyzed sequence

Methods

Data Collection

In addition to several avian sequences of the gene ZP3 available on GenBank (Gallus gallus reference sequence NM_204389; see Table 1 for accession numbers), we sequenced ZP3 for three galliform and one anseriform species not included in earlier analyses (California quail; Gambel’s quail, Callipepla gambelii; mountain quail, Oreortyx pictus; and Muscovy duck, Cairina moschata). The Muscovy duck (anseriformes) represents the first nongallinaceous bird sequenced for this gene.

We used blood samples of California quail acquired from a population in Encinitas, CA between 1995 and 1998. For the diversity study we used tissue samples of mountain quail acquired from the University of Washington Burke Museum genetic resources collection (UWBM 66924 and 46830) and blood samples of Gambel’s quail collected by J. Gee in Palm Desert, CA. We also used a tissue sample from a whole, frozen Muscovy duck (source: Grimaud Farms) purchased at Seattle’s Finest Exotic Meats, Seattle WA. Although the sample was whole, we ensured that it was indeed duck by sequencing part of the CO1 region of mitochondrial DNA (mtDNA), using methods detailed for ZP3 sequencing below, and compared the result against other sequences available on GenBank using BLAST. The CO1 region has been shown to be an effective species-specific identifier, or genetic barcode (Hebert et al. 2004). There is no recorded sequence of this gene for Muscovy duck but a BLAST search against all GenBank entries returned significant matches to two waterfowl species, Anas crecca and Aix sponsa, thus supporting the claim that this sample is from anseriformes.

We extracted DNA from the whole blood (California quail and Gambel’s quail) and tissue samples (mountain quail and Muscovy duck) using the Puregene DNA purification kit (Gentra Systems, Minneapolis, MN). We used polymerase chain reaction (PCR) primers designed from the chicken sequence using PRIMER3 version 0.2 to amplify exons, 3′ untranslated region (UTR) and 5′ UTR (Rozen and Skaletsky 2000; for primer sequences, see Table 1 in the supplementary information). PCR products were diluted 5:1 and 5 μl of the dilution was added to a cycle sequence reaction that included BigDye version 3.1. The products of the sequencing reaction were ethanol precipitated then run on an ABI 3100 sequencer.

Data Analysis

We used Sequencher 4.2 (Gene Codes, Ann Arbor, MI) to manually assemble sequences used in the divergence analysis, and visually checked the results using Se-Al version 2 (Rambaut 1996). We used a maximum likelihood species tree generated with DNAML version 3.5c (Felsenstein 1981) for analysis in CODEML (Fig. 1). This tree resembled other species trees generated for this group (Zink and Blackwell 1998; Kimball et al. 1999; Armstrong et al. 2001; Dimcheff et al. 2002; Dyke and Tuinen 2004; Crowe et al. 2006).

Fig. 1
figure 1

Phylogenetic tree of studied species used in the CODEML analysis. The tree was generated using PHYLIP and a maximum-likelihood analysis. Branch lengths indicate evolutionary distance. Bootstrap values > 50 are included along corresponding branches

For the polymorphism study, we imported sequences into Pred, Phrap and polyPhred for base-calling, assembly and single nucleotide polymorphism (SNP) discovery (Nickerson et al. 1997; Ewing and Green 1998; Ewing et al. 1998). Assemblies were visualized using Consed (Gordon et al. 1998). We used PHASE to infer haplotypes from exported sequences (Stephens et al. 2001); and DnaSP version 4.0 to estimate population genetic parameters and to test for neutrality (Rozas and Rozas 1999).

(i) Divergence Among Species

The CODEML program in the PAML 3.15 package allows for maximum likelihood estimation of the d N /d S ratio (the ω parameter) at sites along a protein. As a d N /d S ratio greater than 1 indicates positive Darwinian selection, this analysis allows for the detection of particular sites or lineages subject to divergence via positive selection. Two models, a neutral model that limits ω to a value between 0 and 1 and a second model with selection that allows for an additional class of ω that can take on any positive value, are generated by CODEML, each with an associated likelihood value. These likelihood values are then compared and statistical significance is determined using a likelihood ratio test.

We analyzed the site-specific ω across ZP3 for the 16 species listed in Table 1 by comparing the following models: M1 to M2, M7 to M8 and M8a to M8. M1 is a neutral model that allows for two classes of codons, one with ω = 0 and one with ω = 1; M2 includes these classes along with a third class with a freely estimated ω value. M7 is a neutral model that includes a beta distribution of ω over the interval 0 to 1, while M8 incorporates this distribution with an additional class of codons assigned a freely estimated ω. Finally, M8a restricts the additional class of codons for M8 to one (Swanson et al. 2003). For all selection models, a Bayes empirical Bayes approach was used to calculated the posterior probabilities that sites with ω > 1.0 are under positive selection (Yang et al. 2005). We conducted the tests without removing sequences with missing data and ran the CODEML analysis several times with different starting values of ω to ensure convergence.

(ii) Polymorphism Within the Population

To describe ZP3 polymorphism we used DnaSP version 4.0 to analyze patterns of diversity in exons and introns adjacent to residues identified as being under selection by CODEML (Rozas and Rozas 1999). We examined nucleotide diversity (π) and tested for significant deviations from neutral frequency spectrums using Tajima’s D, and Fu and Li’s D* and F* tests. Tajima’s D allows for the comparison of the number of segregating sites and the allele frequencies at these sites. Excess rare alleles indicate a selective sweep whereas excess intermediate frequency variants indicate balancing selection (Tajima 1989). We used data obtained from Gambel’s quail as an outgroup.

Fu and Li’s D* and F* test, like Tajima’s D, examine the pattern of allele frequency relative to the number of segregating sites. A negative D* and/or F* indicates a high frequency of “singleton” or rare alleles that results from either a selective sweep or demographic effects. A positive D* and/or F* indicates a lack of singletons as a result of balancing selection, demographic effects, or, in some cases, a selective sweep (Fu and Li 1993).

As an exploratory analysis, we also examined values across a sliding window of the Tajima’s D and the Fu and Li’s D* and F* tests.

Results

The chicken ZP3 gene consists of 3634 nucleotides and nine exons coding for a protein 446 amino acids long (Fig. 2). We acquired sequences for 16 avian species covering 91% of the total target sequence. Four of the sequences downloaded from GenBank (red-legged partridge, Alectoris rufa; hazel grouse, Bonasa bonasia; rock ptarmigan, Lagopus mutus; willow ptarmigan, Lagopus lagopus) included a single nucleotide insertion in the ninth exon, at nucleotide position 1191 of the coding region. To ensure a conservative analysis of the data, we excluded the sequences from this base pair to the end of the gene in these species.

Fig. 2
figure 2

Schematic of the ZP3 protein for both birds and mammals with mapped positively selected sites, as predicted by Bayes empirical analysis . Residues predicted to be under positive selection for birds are indicated by upward facing triangles below the protein. Residues predicted to be under positive selection in mammals (from Swanson et al. 2001) are represented by arrows above the protein (after Swanson et al. 2001)

Divergence Among Species

Three models were compared using results from CODEML (Table 2). The neutral model M1 was not significantly different from the selection model M2 (−2ΔlnL = 3.92, df = 2, p > 0.10). However, in the more sensitive tests, the selection model M8 fit the data significantly better than the neutral models M7 or M8a (M7 vs. M8, 2ΔlnL = 15.6, df = 2, p < 0.01; M8a vs. M8, −2ΔlnL = 4.4, df = 1, p < 0.05). Detailed results from all models, including putative selected sites, as determined through the Bayes empirical Bayes analysis for models M2 and M8, are presented in Table 2. Most of the putative selected sites clustered at the N and C termini of the protein in a pattern similar to that found in mammalian ZP3 (Swanson et al. 2001; Fig. 2).

Table 2 Results of CODEML analysis of ZP3 across species

Polymorphism Within Population

Using samples from a population of California quail, we examined the patterns of variation for the region around and including exons 1 and 8 of ZP3 using the Tajima’s D and Fu and Li’s D* and F* tests. For exon 1, no test showed a significant deviation from expected values, although all test statistics were positive (N = 27). For exon 8, all test statistics were negative and, although Tajima’s D was not significant, Fu and Li’s D* and F* values deviated significantly from those expected for a model of neutrality (N = 19; Table 3). A sliding window analysis of these test statistics in the regions under focus showed positive values for these test statistics in coding and noncoding regions around exon 1 (genomic sequence, 1258–1357; midpoint, 1307; Tajima’s D = 2.11; p < 0.05; Fu and Li, NS) within the coding region. The same type of analysis resulted in negative values for Tajima’s D for exon 8 and the surrounding noncoding regions (genomic sequence, 3181–3280; midpoint, 3230; Tajima’s D = −1.88; p < 0.05); Fu and Li’s D* and F* tests yielded negative values for a larger region around this exon (Table 4).

Table 3 Patterns of polymorphism for the ZP3 gene in a population of Callipepla californica. Sites analyzed include exon and surrounding bases
Table 4 Sliding window values for Fu & Li’s D* and F* for exon 8. Window values represent site within genomic sequence: range, midpoint. Exon range includes 3165–3254

Discussion

The results of this study are the first to demonstrate positive adaptive evolution in an avian reproductive protein. Indeed, ZP3 is one of only two avian proteins found to be adaptively evolving (see also Ceplitis and Ellegren 2004). The divergence analysis found a pattern of adaptive change in ZP3, especially at the 5′ and 3′ ends of the genes. Consistent with this pattern across species, the California quail population we analyzed also indicated a selective sweep at the protein ends, as predicted by a Bayes empirical analysis.

A previous analysis of ZP3 sequence evolution in birds found no evidence of positive Darwinian selection along the gene (Berlin and Smith 2005). This earlier study included 12 species and, although the statistical tests approached significance, the authors concluded that selection was not influencing ZP3. Both phylogenetic branch length and total sample size influence the resolution of CODEML’s estimation of positive selection, and developers recommend that a minimum of 16 species are included in analysis with CODEML (Anisamova et al. 2001). It is therefore not surprising that our study, which included a larger number of species and more taxonomic diversity, showed significant positive selection acting on ZP3.

Although Berlin and Smith (2005) suggest that comparisons between M7 and M8 have a high risk of false positive results, they found no evidence of false positive results with M8 and M8a model comparisons. In our analysis, comparisons between M8 and M8a significantly indicated that ZP3 is under positive selection in birds. Further support for positive selection in avian ZP3 is provided by our analysis of variation patterns in a wild California quail population. In contrast, Berlin and Smith (2005) found no evidence of selection in their genetic analysis of captive breeder stock chickens (G. gallus), which, given their status as captive inbred populations, are likely to have lost any imprint of reproduction-based selection.

The observation that ZP3 is rapidly evolving in birds means that aves can be added to the growing list of those taxa with adaptively evolving reproductive proteins (Swanson and Vacquier 2002). This inclusion of birds amongst the species that show positive Darwinian evolution of reproductive proteins also indicates that conflict over polyspermy is not the sole force driving the pattern of evolution found in these proteins. These results do confirm the prediction that for most forces hypothesized to drive adaptive evolution in reproductive proteins, female-expressed reproductive proteins will evolve by positive Darwinian selection in the same manner as male-expressed proteins (Swanson and Vacquier 2002). Though a pattern of positive Darwinian evolution has been found for male-expressed proteins, many fewer studies have tested this pattern in female-expressed proteins (Swanson et al. 2001; Swanson and Vacquier 2002). Our results indicate that detailed studies of avian reproductive proteins will help elucidate the evolutionary forces driving the observed pattern of positive Darwinian selection in reproductive proteins across taxa.

The Bayes empirical analysis predicted that 13 sites of the ZP3 protein are under positive selection. These putative selected sites cluster at the N and C termini and correspond roughly to the sites predicted to be under selection in the orthologous mammalian ZP3 (Swanson et al. 2001; Fig. 2). Given this result, it is especially intriguing that, in mammals, the area of ZP3 predicted to bind sperm is localized at the C terminus, i.e., where several residues are predicted to be under selection in birds. The patterning of putative selected sites in birds indicates that avian ZP3 may function in a manner akin to mammalian ZP3, with the sperm-combining site in a similar location at the C terminus. These sites warrant further examination, especially since the function of individual residues along ZP3 in birds is unknown.

Glycosylation patterns may also indicate sites for further study since, in mammalian species, residues on the ZP3 protein that serve as potential glycosylation sites have been suggested to play a role in species-specific sperm-egg interactions. Three of the residue sites indicated to be under selection in birds—30, 120 and 381—are serines in some species and are therefore possible glycosylation sites. Other species analyzed here possess alternative amino acids at these sites, which potentially changes the glycosylation status and, possibly, the manner in which this protein interacts with its ligands.

In the analysis of variation patterns within a population of California quail, we observed a skewing towards rare derived alleles in the C terminus region, exon 8 and flanking sequence, of ZP3, where many of these putative selected sites reside. Other reproductive proteins such as lysin in abalone and semenogelin 1 in chimpanzees show a pattern of reduced polymorphism within populations, indicating a selective sweep (Lee et al. 1995; Kingan et al. 2004). A selective sweep in the ZP3 C terminus region in California quail, coupled with the observation of divergence among species, may indicate strong selective pressure, such as that for species specificity. The N terminus of the protein, exon 1 and 2, and putative selected sites therein, showed a weak signal of excess intermediate frequency alleles. This pattern has been observed in a variety of species’ reproductive proteins (e.g., sea urchin bindin, Drosophila Acps; Swanson and Vacquier 2002 and references therein) and has been predicted for female proteins by models of sexual conflict (Gavrilets and Waxman 2002).

The tests of selection we used for this study—Fu and Li’s, and Tajima’s—are vulnerable to demographic effects: excessive rare alleles may result from population growth while excessive intermediate alleles may signal population subdivision (Tajima 1989; Fu and Li 1993). Further tests of demographic patterns in the population will aid in sorting selective versus demographic effects. However, the difference in patterns observed on either end of ZP3, with excessive rare alleles on the 3′ end and excessive intermediate alleles on the 5′ end of the same gene, is less easily resolved through a demographic explanation than by a hypothesis that includes selection on these regions.

To explain why avian ZP3 is undergoing adaptive evolution we must invoke hypotheses other than that of sexual conflict via polyspermy avoidance. These alternative hypotheses include cryptic mate choice (including species specificity and reinforcement), sperm competition, and other forms of sexual conflict and pathogen avoidance. We do not explicitly test these hypotheses here, although our results indicate that further study of avian proteins is warranted. Comparative studies that exploit the wealth of avian-related reproductive data (e.g., testis size, mating system, extra pair copulation rates) will likely prove useful in discriminating among the various hypotheses. This approach has already been conducted in mammals, with intriguing results (Dorus et al 2005; Herlyn and Zischler 2007).

Similarly, the discovery and analysis of sperm proteins that interact with ZP3 will help to test the remaining hypotheses, which predict different evolutionary trajectories for the sperm and the egg proteins relative to one another. For example, if cryptic mate choice is driving the evolution of these proteins, ZP3 is expected to evolve more rapidly than its sperm cognate, while in a situation of sperm competition, we might expect the reverse (Civetta 2003). Sexual conflict would result in either the sperm protein or ZP3 evolving more rapidly, depending on the point in the antagonistic cycle at which the proteins are sampled; however, if a surveyed population showed a pattern of one protein driving the diversification of the other, sexual conflict would be supported over the other two sexual selection hypothesis (Hayashi et al. 2007).

We conducted a preliminary test of evolution along lineages of hybridizing species, California and Gambel’s quails, and European (Coturnix coturnix) and Japanese quails (C. japonica). We did not observe the pattern of increased rates of evolution along these lineages that would be expected under reinforcement. However, this test was not particularly sensitive and expanding the investigation of ZP3 to include allopatric and sympatric populations of hybridizing species will offer more conclusive evidence for or against the importance of reinforcement on the evolution of ZP3.

The impact reinforcement could have on the evolution of ZP3 is related to the intensity of species specificity or selectivity in sperm–egg interactions in birds (Vacquier et al. 1995). In a noncompetitive study of sperm-egg interactions, Stewart et al. (2004), found little evidence for differential penetration rates among con- and heterospecific crosses of egg and sperm, thus suggesting that species specificity is unlikely to drive the evolution of egg coat proteins in birds (Stewart et al. 2004; Birkhead and Brillard 2007). However, further studies examining the competition between different species’ sperm to fertilize an egg are necessary to truly determine the importance of species–specific interactions in ZP3 evolution (Vacquier et al. 1995).

The influence of these sexual selection forces—cryptic mate choice, sperm competition and sexual conflict—on the evolution of ZP3 is affected by the amount of sperm that the egg itself encounters. This number is limited by the majority of the sperm in a male’s ejaculate being ejected by the female bird prior to storage in the sperm storage tubules (SSTs). Ultimately, only 1–2% of more than a million sperm typically released during a single ejaculation are stored in the female’s SSTs (Birkhead and Brillard 2007). Despite this limitation and the further reduction of representative sperm exiting the SSTs, more than 60 sperm can still reach and penetrate the egg at the level of the IPVL (Birkhead and Fletcher 1994). The exposure of the egg to more than a single sperm sets up the opportunity for some form of competition, to be resolved through cryptic choice by the egg or competition by the sperm. Therefore, some level of sexual selection acting on ZP3 on the avian egg coat is quite possible.

Ultimately, the discovery that avian reproductive proteins do evolve by positive Darwinian selection opens a large field of potential studies aimed at elucidating the specific causes of this evolutionary pattern. These studies include functional analyses of ZP3, determination of the interacting sperm proteins and their evolutionary trajectories, in-depth population analyses of these proteins in hybrid zones, and determination of the avian mechanism of fertilization and final selection of a single sperm nucleus. Expanding the analysis of ZP3 across a broad taxonomic range of birds in light of the extensive knowledge about avian reproduction at the behavioral level will also help reveal the evolutionary relationship between forces occurring at the level of the organism and patterns occurring at the level of the molecule. Clearly, increasing our knowledge of avian reproduction at the cellular level will be invaluable to the general discussion of the forces that drive positive Darwinian evolution in reproductive proteins.