Introduction

Reproductive proteins, which mediate gamete generation, storage, usage, signal transduction, and fertilization, diverge rapidly, often as a result of adaptive evolution (Civetta and Singh 1995; Swanson and Vacquier 2002). This may contribute to the establishment of reproductive barriers leading to speciation, as well as other important biological function. Accordingly, there has been an intense interest in identifying reproductive-related genes with accelerated amino acid changes (Dorus et al. 2004; Swanson and Vacquier 2002; Wang and Zhang 2004; Wyckoff et al. 2000).

NYD-SP12 is a newly reported testis-specific gene (Xu et al. 2003), involved in the formation of acrosome during spermatogenesis (Lu et al. 2006). It is located on human chromosome 3q26.32, encoding a protein of 569 amino acids, and highly expressed in testis during human sexual maturation (Xu et al. 2003). Furthermore, patients with spermatogenic arrest and Sertoli cell-only syndrome have reduced expression of NYD-SP12 transcripts in the testis, indicating a possible role of NYD-SP12 in spermatogenesis (Xu et al. 2003).

In this study, we conducted an evolutionary survey of sequence substitution patterns of NYD-SP12 in diverse primate species to detect potential positive selection. We also compared the sequence variants of NYD-SP12 in worldwide human populations, as well as three clinical groups with different sperm traits to detect regional selection and functional association. Our data demonstrated that there were adaptive amino acid changes in the primate lineages leading to human and chimpanzee, which were likely caused by Darwinian positive selection. Also, there is significant interpopulation divergence, which might be due to the varied demographic histories. Among the clinical subjects, we observed a significant difference in the frequencies of one-nucleotide polymorphisms.

Materials and Methods

DNA Samples

We sequenced 72 human individuals from the major continental populations, including 19 Africans, 13 Europeans, 30 East Asians (20 Chinese and 10 Cambodians), and 10 Melanesians. We also sequenced 124 Chinese clinical samples, whose sperm number, motility, and morphology were analyzed according to the World Health Organization (1999) criteria. The clinical samples were categorized into three groups: (1) the asthenozoospermia group (AST), whose sperm counts were >20 million/ml but with low sperm motility (A+B < 50%; 43 samples); (2) the normal group (NOR), with normal sperm counts and motility (20 million–200 million/ml; A+B > 50%; 48 samples), and (3) the donor group (DON), with high sperm counts (>200 million/ml; A+B > 50%; 33 samples). In addition, six nonhuman primate species were also sequenced, reflecting a 45 million-year history of primate evolution (Goodman et al. 1998). The nonhuman primate panel included three great ape species (one chimpanzee, Pan troglodytes; one gorilla, Gorilla gorilla; and one orangutan, Pongo pygmaeus), one lesser ape species (white-browed gibbon, Hylobates hoolock), one Old World monkey species (rhesus monkey, Macaca mulatta), and one New World monkey species (black-handed spider monkey, Ateles geoffroyi). All the DNA samples were from collections at the Kunming Cell Bank of CAS, Kunming Blood Center, Shanghai Renji Hospital, and Stanford University. Informed consent was obtained from the clinical subjects.

Sequencing Analysis

All human and nonhuman primate samples were sequenced for the full-length coding region of NYD-SP12 (1707bp). Primers for all the primates were designed by aligning the published sequences of human and mouse (Ensembl genome browser; http://www.ensembl.org). The primer sequences are listed below.

  • Exon2_f: 5′ CCACTGGAATTCAAAATATTGG 3′

  • Exon2_r: 5′ TTCATAGTTTCCTGTCACCAATG 3′

  • Exon3_f: 5′ TAAGCAAGCATAAGCAAAACATT 3′

  • Exon3_r: 5′ AAGCTCAGATAGTACCCTCCACA 3′

  • Exon4_f: 5′ ATTTCATTCCATTGCACTTGTGC 3′

  • Exon4_r: 5′ AATTGCTGGTTTGACGTTGATAGT 3′

  • Exon5_f: 5′ AAAAATCCCTACTTTCAACACTCTT 3′

  • Exon5_r: 5′ CCAGAGCTACCTTTTTCTTCAGTG 3′

  • Exon6_f: 5′ ATATGTTATCTTTGCAGTACAGTAA 3′

  • Exon6_r: 5′ CAGAAACAATAAACTAATAAACCAC 3′

  • Exon7_f: 5′ GATATGTTATCTATTTGACCTTTTC 3′

  • Exon7_r: 5′ CTAACCCACAAAGAAGATTAGAT 3′

  • Exon8_f: 5′ CATCTCCTCTGCCTTTCTCCG 3′

  • Exon8_r: 5′ CGTTTCCAAAATCCCCTAACAG 3′

  • Exon9_f: 5′ ATCCGTCTCAACAGTTCTCCAAT 3′

  • Exon9_r: 5′ TGTTGGCTCTGGGCCTATGT 3′

  • Exon10_f: 5′ CAATAATATTCGGCAATGACAAGG 3′

  • Exon10_r: 5′ TATTTGATTTTGGTCACACTTCCAC 3′

  • Exon11_f: 5′ CCTGCACTTCTGCTTTGTGTTAT 3′

  • Exon11_r: 5′ GTGGATGCCCTTGCCTCTTG 3′

PCRs were performed with rTaq under conditions recommended by the manufacturer (Takara Company).

Sequencing was performed in both directions with forward and reverse primers using the BigDye terminator sequencing kit on an ABI 3130 automated sequencer. There are 11 exons in the NYD-SP12 gene, and the first exon is nontranslational and, therefore, was not sequenced in this study.

Data Analysis

DNA sequences were aligned using DNASTAR (DNASTAR, Inc.) and the MEGA3.0 program (Kumar et al. 2004) and checked manually. The Z-test was performed by MEGA3 (Kumar et al. 2004). The PAML (Yang 1997) package was used to identify positive selection and individual sites under selection. To detect selection, we first inferred the ancestral sequences of the internal nodes from the phylogenetic tree by Baseml and calculated lineage-specific dN/dS (ω) by yn00. We also used the branch models and branch-site models in Codeml to identify branches and sites under selection without inferring the ancestral sequences. The sliding window calculation was performed by K-Estimator (Comeron, 1995, 1999).

McDonald-Kreitman’s (1991) neutrality test was used to test selection and performed by DnaSP (Rozas and Rozas 1999). DnaSP was also used to perform three tests: Tajima’s (1989) D statistic, which considers the difference between θ W and π; Fay and Wu’s (2000) H test, which considers the difference between θ H and π; and Fu and Li’s (1993) D and F tests (Fu and Li 1993), which compares the observed number of singleton polymorphisms with those expected under a neutral model. Statistic significance was determined by 5000 simulations. F ST was estimated using the program Arlequin, which can be found at http://www.lgb.unige.ch/arlequin. Phase I of the HapMap project (Altshuler et al. 2005) provides SNP information from 90 individuals of 30 trios of Yoruba in Ibadan, Nigeria (YRI); 90 individuals of 30 trios of Utah residents with ancestry from northern and western Europe (CEU); 45 unrelated Han Chinese individuals from Beijing, China (CHB); and 44 unrelated Japanese individuals from Tokyo, Japan (JPT). The web tool, Haplotter, at http://www.pritch.bsd.uchicago.edu/data.html, which pooled CHB and JPT samples to form a single sample pool, was used to query the HapMap data for evidence of selection by calculating the empirical distributions of the iHS, Tajima’s D, Fay and Wu’s F, and the F ST statistics (Voight et al. 2006). We used Haploview at http://www.broad.mit.edu/mpg/haploview/index.php to calculate linkage disequilibrium, and the SNP data with location information were downloaded from the HapMap Project.

Results

Positive Selection in Human and Chimpanzee

We sequenced the complete coding region of NYD-SP12 in seven primate species including human. The protein sequence alignment is shown in Fig. 1. To investigate the general pattern of nucleotide substitutions, the well-established phylogenetic tree of the major primate lineages was used (Goodman et al. 1998; Page and Goodman 2001) (Fig. 2). Following the strategy in Messier and Stewart’s (1997) work on primate lysozyme, we reconstructed the ancestral sequences. The nonsynonymous substitution number (N) and synonymous substitution number (S), as well as the d N /d S ratio (ω), for each branch of the tree were calculated (Fig. 2). Our results identified episodes of adaptive evolution in both the human lineage (ω = 4.17, P < 0.05, Z-test) and the chimpanzee lineage (ω = ∞, P = 0.01). Since the reconstructed ancestral sequences involve random errors and systematic biases (Yang 1998), we used the likelihood approach embodied in the PAML package (Yang 1997) to confirm the results, which averages over all possible ancestral sequences at each interior node in the tree and weights appropriately according to their relative likelihoods of occurrence. The log likelihood values and maximum-likelihood estimates of parameters under the branch-specific models, the site-specific models, and the branch-site models (Yang 1998; Yang and Nielsen 2002; Zhang et al. 2005) are reported in Table 1. In the branch-specific models, the two-ratio model assigns one ω ratio for the human, chimpanzee, and gorilla branches (ωHCG), which were shown to have ω values >1, and ω0 for all other branches. This model is significantly better than the one-ratio model (2Δl = 16.82; P = 0.00004, df = 1). When fixing the foreground ωC, ωG, and ωH to 1 in the two-ratio model, we found that this model was significantly worse than the two-ratio model without constraining ωC = ωG = ωH = 1 (2Δl = 7.33; P = 0.007, df = 1), indicating that the great ape clade evolved more rapidly than other primate lineages due to higher level of selection. We also conducted a series of tests (Table 1) to identify which branch or branches are significantly >1, and the data indicate that both the human and the chimpanzee lineages are likely under positive selection (P = 0.03 and P = 0.01, respectively). To rule out the possibility that the high ω ratio might result from a decrease in K s, we compared the average K s in NYD-SP12 to the average value of more than 10,000 genes in the genome (Nielsen et al. 2005) and no significant decrease in K s was observed for NYD-SP12 (data not shown). Therefore, the high ω values of human and chimpanzee are most likely caused by Darwinian positive selection.

Fig. 1.
figure 1

Protein sequence alignment of NYD-SP12 in human and nonhuman primates. The amino acid sites under positive selection by the branch-site model are shaded.

Fig. 2.
figure 2

The ω values and numbers of nonsynonymous (N) and synonymous (S) substitutions of the primate species tested. The ω value is shown above each branch, and the N/S ratio below. A–E refer to internal nodes.

Table 1 Parameter estimates and log-likelihood values under the branch-specific model and the branch-site model

With the use of the site-specific models, we can predict whether there are certain sites under selection. We first compared the M2 model (selection) with the M1 model (neutral), and the former is significantly better than the latter (2Δl = 12.28; P = 0.002, df = 2). The M8 model (β and ω > 1) is also much better than the M7 model (β) (2Δl = 12.46; P = 0.002, df = 2). Both comparisons imply that there is selection in NYD-SP12 among the primate lineages.

The modified branch-site models (Zhang et al. 2005) were applied to test whether there are amino acid sites under positive selection along the branches of interest. Only the result from the Bayes empirical Bayes (BEB) method was considered because it performs better than the naïve empirical Bayes (NEB) method by taking into account the sampling errors (Yang et al. 2005). The human and chimpanzee branches were used as the foreground branches. The modified model A (Zhang et al. 2005) is significantly better than that with ω2 fixed to 1 (2Δl = 7.24; P = 0.007, df = 1) (Zhang et al. 2005). The parameter estimates under model A suggested that 52% of the sites are under strong positive selection along the foreground branches with ω = 10.95. At the P > 95% level, we identified the following sites under positive selection: 35A, 78E, 82F, 97Q, 300E, 335I, 356E, 392D, 459L, 547T, 567Q, and 569R (shaded in Fig. 1).

When conducting McDonald-Kreitman’s test, we detected 12 polymorphic sites (8 nonsynonymous and 4 synonymous) in the coding region of NYD-SP12 in human populations. The interspecies nonsynonymous/synonymous substitution ratio (20/1) is higher than the nonsynonymous/synonymous polymorphism in human populations (8/4) (p = 0.047, Fisher’s two-tailed exact test), again confirming the presence of Darwinian positive selection between the human and the chimpanzee lineages.

We next sought to identify the specific domains in NYD-SP12 that might be subject to positive selection because this might reveal the driving force and facilitate further functional analysis. For instance, studies on major histocompatibility complex proteins suggested that positive selection is confined to the antigen-recognition sites (Hughes and Nei 1988; Yang and Swanson 2002), which participate in protein-protein interaction. The NYD-SP12 contains a known functional domain called TPR (tetratricopeptide repeat; amino acids 169–280), which is involved in chaperone, cell cycle, transcription, and protein transport complexes and may function in protein-protein interaction. Using sliding window analysis, we compared the ω ratios between ancestral node A and human or chimpanzee, respectively. We did not observe signature of positive selection on the TPR domain. Instead, the other segments of the gene, which could not be classified into any known domain, have undergone positive selection (data not shown). This calls for further studies to understand the biological meanings of the observed positive selection on these segments.

NYD-SP12 Sequence Polymorphisms in Human Populations

In many cases, selection increases the degree of differentiation among populations. A selective sweep can have dramatic impacts on the level of population subdivision, especially when the sweep has not spread to all populations (Majewski and Cohan 1999; Nielsen 2005; Slatkin and Wiehe 1998; Yang and Swanson 2002). The allele frequency differences in subpopulations can be evaluated by F ST (Wright 1950). Under evolutionary neutrality, all loci in the genome would have the same expected degree of differentiation, which can be used to detect the action of selection (Cavalli-Sforza 1966; Lewontin and Krakauer 1973). If allele-frequency data are available for a set of putatively neutral loci, then an empirical distribution of F ST values can be constructed and used to identify loci with unusual patterns of differentiation due to positive selection (Cavalli-Sforza et al. 1994; Fullerton et al. 2002). In our population samples, we measured the interpopulation frequency divergences (Table 2), and we conducted the F ST analysis to assess population differentiation. As shown in Table 3, four of the six pairwise comparisons have significant interpopulation differences. Both regional selection and demographic factors, such as population growth and random genetic drift, could account for the observed differentiation.

Table 2 The NYD-SP12 polymorphic sites in human populations
Table 3 Pairwise F ST values in human populations

We conducted several neutrality tests, including Tajima’s D test, Fu and Li’s D and F tests, and Fay and Wu’s H test (data not shown). None of the tests rejected the neutral assumptions. Since our sample size is relatively small (72 individuals), we used the newly available single-nucleotide polymorphism (SNP) data from the International HapMap Project (Altshuler et al. 2005) to confirm our results by empirical studies in a large sample pool (209 individuals) in the NYD-SP12 region. Beside the tests mentioned above, we also conducted the iHS (integrated haplotype score) test developed to test for recent selection using the HapMap data (Voight et al. 2006). The Haplotter program was used to calculate the iHS (Voight et al. 2006), as well as several other test statistics (Tajima’s D, Fay and Wu’s H, F ST ). Our result indicates that in the NYD-SP12 region, none of the statistic values fell into the top 5% cutoff (which is defined as significant in empirical distributions; data not shown), implying no or very weak recent positive selection. However, it could also be due to a low density of SNPs in the Phase I HapMap data or to the selective sweep on amino acid changes occurring well before the split of modern human populations.

Functional Association of the NYD-SP12 Polymorphisms in the Clinical Subjects

As NYD-SP12 is related to spermatogenesis and shows a higher expression level in adults than in fetuses (Xu et al. 2003), we investigated the SNP frequency spectrum in three clinical groups (Table 4). Although the majority of the SNPs showed no significant frequency discrepancy among the groups, the 675 C/T synonymous polymorphisms suggest a different pattern. The DON group, which has the highest sperm count and motility level, has the highest 675T frequency, and this is significantly higher than in the other two groups (P = 0.0006 for DON versus AST, P = 0.016 for DON versus NOR; Fisher’s two-tailed exact test after Bonferroni correction). However, we did not observe the same association between AST and NOR.

Table 4 Allele frequency distribution of the 675T allele in the clinical samples

Discussion

In this study, we have demonstrated that NYD-SP12 evolves rapidly under positive selection in primates, and a synonymous polymorphism in human populations is associated with sperm traits in the clinical subjects. It was suggested that NYD-SP12 is likely involved in the formation of acrosome during spermatogenesis (Lu et al. 2006), hence the observed adaptive evolution in human and chimpanzee could cause functional modification in sperm traits.

Genes expressed exclusively or preferentially in testis are likely involved in male reproduction and have been shown to evolve rapidly under positive selection in previous studies (Podlaha and Zhang 2003; Rooney and Zhang 1999; Wang and Zhang 2004). Our observation of rapid evolution in NYD-SP12 is consistent with the previous findings. There are varied biological driving forces causing adaptive evolution; one possible force is sexual selection because sperm traits are critical for male reproduction. For example, it was shown that the sperm trait-related gene SEMG2 was involved in sperm competition (Dorus et al. 2004). In the analysis of NYD-SP12, chimpanzee, the most promiscuous species, showed the strongest signal of positive selection (ω = ∞), and human, whose mating system is considered less promiscuous, also had a strong signature of positive selection (ω = 4.20) (Harcourt et al. 1995). However, the monandrous gorilla also has a relatively high ω ratio (ω = 1.67), while the rhesus monkey has a low ratio (ω = 0.26) though it is highly promiscuous (Anderson and Dixson 2002). Therefore, sperm competition may not be the major force leading to the rapid evolution of NYD-SP12 in primates. An alternative hypothesis is that the adaptive evolution might lead to establishing reproductive barriers during speciation, which seems to fit the NYD-SP12 data better (Yang and Swanson 2002), but more evidence is needed.

A previous study showed that the NYD-SP12 protein was localized to the Golgi apparatus, and it was involved in the formation of the acrosome during spermatogenesis (Xu et al. 2003). This is consistent with the potential function of the TPR domain of NYD-SP12, and it is not surprising to see strong purifying selection on the TPR region. However, the biological significance of the observed positive selection is not clear. One possibility is that NYD-SP12 gained new or modified function along the human and chimpanzee lineages. Further functional evidence is needed to address this issue.

The neutrality test for human populations does not show significant deviation from the neutral expectation, though the F ST comparisons show significant interpopulation divergence. Considering the small sample size, we also analyzed the newly available HapMap data, in which none of the statistics was exceptional high. Therefore, the significant differences in F ST may be caused by demographic factors though regional selection as well as the ascertainment bias in the HapMap data cannot be totally ruled out.

The significant association of 675T allele frequency in the three clinical groups is intriguing. Since the C-to-T mutation at this site is a synonymous change, it seems unlikely that this polymorphism will encounter selection at the protein level. To test whether this site has any effect at gene expression pattern, we analyzed the expression of NYD-SP12 in eight human testis samples with different genotypes, i.e., 675 C/C and 675 C/T. No significant difference was observed (data not shown). Hence, the 675 C/T polymorphism may be a nonfunctional site which is linked to a nearby functional site due to genetic hitchhiking. To locate the potential functional site, we conducted linkage disequilibrium analysis in this gene by using SNP data downloading from the HapMap project. The 675 C/T polymorphism falls into a region showing high linkage disequilibrium, which spans about 30 kb covering exon 3 and intron 3, implying that there might be other sites with clinical significance. However, due to the low density of SNPs in the Phase 1 HapMap data, further reequencing and analysis of this region are needed.

Electronic Database Information

The URL for data in this article is http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html. The GenBank accession numbers are DQ141860–DQ142639.