Introduction

Several studies in a wide variety of taxa have demonstrated that genes with a role in reproduction are often rapidly evolving and in many cases rapid evolution is driven by adaptive selection (Civetta and Singh 1998; Swanson and Vacquier 2002; Civetta 2003a; Clark et al. 2006; Panhuis et al. 2006). In mammals, fertilization is internal and complex in the sense that it requires the sperm to cross a series of barriers that involve complex interactions with egg proteins (Primakoff and Myles 2007). There are many examples of mammalian sperm surface proteins that show evidence of rapid interspecific sequence divergence driven by positive selection (Torgerson et al. 2002; Swanson et al. 2003; Glassey and Civetta 2004; Dorus et al. 2010). However, few studies have directly tried to link episodes of positive selection with changes that can be used as a measure of sexual selection. One of the best examples comes from studies of genes coding for components of the semen coagulum. Population genetics analysis of Semenogelin 1 (Semg1) has shown evidence of low polymorphism but high rate of evolution leading to species with promiscuous mating systems (Kingan et al. 2003) but studies using a phylogenetic based approach have found no evidence of long-term selection for Semg1 along more distant lineages leading to primates for whom sperm competition is common (Ramm et al. 2008; O’Connor and Mundy 2009). The discrepancy in results simply highlights the fact that the two studies have measured selection at different time points in the gene’s evolutionary history. Associations between bouts of positive selection and differences in mating types have been found consistently for Semg2 in both primates and rodents but the evidence is less clear or even lacking for other genes expressed in reproductive tissue (Dorus et al. 2004; Hurle et al. 2007; Ramm et al. 2008; O’Connor and Mundy 2009).

While adaptive evolution of reproductive genes is widespread it is not always clear whether sexual selection, species-specific adaptations to fertilization or other factors such as species diversification in immune-related function have triggered rapid evolution (Good and Nachman 2005; Dean et al. 2008). The ADAM gene family is an ideal group of genes to address whether positive selection at reproductive genes is likely driven by postmating selective pressures imposed by differences in species mating habits. The family is comprised of 35 characterized genes with a variety of functions, both tissue-specific as well as ubiquitous patterns of expression, and a common evolutionary history. More importantly, the ADAMs are transmembrane proteins involved in cell adhesion, proteolysis, and signaling (Primakoff and Myles 2000; Edwards et al. 2008) and so well characterized proteolytic and adhesion motifs can be found within their amino acid sequences. Therefore, evolutionary studies can be aimed not only at attempting to associate evolutionary rate heterogeneity and positive selection episodes with differences in mating systems but also at mapping the distribution of positively selected sites across domains with different cellular functions. In mammals, ADAM1, ADAM2, and ADAM3 are the best characterized in terms of their role during fertilization. They play a role during sperm zona pellucida binding as well as egg membrane recognition and fusion (Blobel et al. 1992; Wolfsberg et al. 1995; Primakoff and Myles 2000; Evans 2002). ADAMs 4, 5, 6, 18, 24, and 32 are also sperm surface proteins while other ADAMs have been identified only as testes expressed (Frayne et al. 2002; Kim et al. 2006; Zhu et al. 2009). Molecular evolution studies have suggested that the evolution of Adam1, Adam2, and Adam32 have been driven by positive selection and that selection is limited to ADAMs with a role in fertilization (Civetta 2003b; Glassey and Civetta 2004). However, formal tests of positive selection of many ADAM genes were limited by the low number of species for which sequences were available (Glassey and Civetta 2004). A recent sperm proteome study has found 23 sperm surface genes to be under positive selection when comparing mouse sperm genes with orthologs in rat, human, chimpanzee or macaque, and dog or cow. Of the 23, five were ADAM genes including Adam2 (Dorus et al. 2010).

Here we revisited the molecular evolution of mammalian ADAM gene family taking advantage of a larger number of sequences currently available in public databases. We also analyzed the distribution of positively selected sites across protein domains. We found that only sperm surface ADAM proteins show evidence of positive selection localized to their adhesion domain. We further tested lineage-specific rates of evolution within the adhesion domain of ADAM proteins in primates and found evidence of lineage-specific rate heterogeneity and higher proportion of nonsynonymous relative to both synonymous and noncoding sequence substitutions for the adhesion domains of Adams 2 and 18 in chimpanzee and Adams 2 and 23 in macaques.

Methods

Mammalian Sequence Data Collection and Analysis

We retrieved nucleotide and amino acid sequence data from GenBank for 25 Adam genes (for accession numbers see Supplementary Table 1). Amino acid sequences were aligned using the global alignment algorithm ClustalX (Thompson et al. 1997) and the local alignment algorithm DiAlign2 (Morgenstern et al. 1996), and the generated alignments were used as reference for nucleotide sequence alignments using Pal2Nal (Suyama et al. 2006). To confirm that the gene sequences labeled as member of a specific ADAM gene family were members of a monophyletic group, the entire ADAM gene phylogeny was reconstructed using a Neighbor-Joining Distance method (MEGA 4.0) and a Maximum Parsimony (MP) approach (PAUP 4.0) (Swofford 2003; Tamura et al. 2007). For the MP tree reconstruction, gaps were treated as a fifth character state and all characters were weighted equally. A heuristic search was conducted using the Tree Bisection-Reconnection (TBR) branch-swapping algorithm and the search was allowed to swap to completion. The strict consensus tree was chosen. The reliability of the inferred distance and MP trees was assessed by bootstrapping with 1,000 replicates (Felsenstein 1985).

Tests of selection were performed for each ADAM gene using the codeml package within Phylogentic Analysis by Maximum Likelihood (PAML; v4.2) (Yang 1997). We tested for evidence of positive selection by comparing the likelihood of a pair of nested models. Model M7 uses a beta distribution to fit ω values to classes between 0 and 1 while model M8 adds a class of ω ratio higher than 1. Twice the log-likelihood difference between the two models was compared to a chi-square threshold value with the appropriate degrees of freedom (Yang et al. 2000). To validate that any signal of positive selection resulting from an M8 vs. M7 model comparison were not the result of relaxed selection, the likelihood of model M8 was also compared to the likelihood of null model M8a, where ω is fixed at one (Swanson et al. 2003; Wong et al. 2004). We adjusted all P value thresholds to reduce the number of false positives to less than one. In cases where we detected evidence of codons under positive selection, the Bayes Empirical Bayes (BEB) method was used to identify specific codon sites under selection and to assign a posterior probability (Yang et al. 2005). Positively selected sites were mapped within protein domains identified by scanning amino acid sequences against the PROSITE patterns and profiles database (http://www.expasy.ch/prosite/).

Primate Sequence Data Collection and Analysis

We classified species into two groups based on their mating system classification (Dixson 1998). Species with a multimale–multifemale or a disperse mating system and hence more likely experiencing postmating sexual selection competition included chimpanzee (Pan troglodytes), and macaques (Macaca mulatta or Macaca fascicularis). Species with a polygynous or monogamous mating system and less likely to experience postmating sexual selection included orangutan (Pongo abelii) and gorilla (Gorilla gorilla). Humans (Homo sapiens) were classified as a group experiencing less intense postcopulatory sexual selection than chimpanzee and macaques, and grouped with the monogamous/polygynous species (Dixson 1998, p. 219).

DNA sequences from the adhesion domain of gorilla and orangutan ADAM genes were retrieved from GenBank by using the amino acid sequence data available from humans and chimpanzee to tblastn the gorilla and orangutan Whole-Genome Shotgun reads (WGS) database. The hits were selected on the basis of alignment similarities and the sources used for assemblies are provided in appendix section (Supplementary Table 2). We calculated the ω ratio, d N and d S using the free ratio model within PAML (Yang 1997). For ADAM genes showing evidence of ω higher than one in any lineage, we also tested rates of nucleotide divergence within the coding regions of the adhesion domain relative to noncoding introns or flanking sequence (i.e. neutral control). The position of introns within adhesion domains of gorilla and orangutan sequences were delineated by gaps in between tblastn hits listed in supplementary material (Supplementary Table 2). Human and chimpanzee intron sequence data was retrieved by aligning the nucleotide sequence coding for each ADAM adhesion domain to the chromosome genome reference sequence assembly within GenBank (Adams 2, 18, and 32 NC_000008.10, Adam12 NC_000010.10, Adam23 NC_000002.11 for Homo sapiens and Adams 2, 18, and 32, NC_006475.2, Adam12 NC_006477.2, Adam23 NC_006470.2 for Pan troglodytes). For macaques, we retrieved intron sequence using the adhesion domain amino acid sequence to tblastn the Macaca mulatta WGS database (Supplementary Table 2). We found no intron sequence for Adam30 and so noncoding sequence flanking the gene was retrieved as outlined above using tblastn searches against WGS databases or retrieval from the chromosome genome reference sequence assembly within Genbank (NC_000001.10 for Homo sapiens and NC_006468.2 for Pan troglodytes).

We used the baseml program within PAML to reconstruct ancestral sequences under models that assume rate variation over nucleotide sites and different patterns of nucleotide substitutions. We used model HKY85 which assumes different frequencies for the four nucleotides (pi) and different transition/transversion rate ratios. Simpler models assume no differences in transition/transversion rate ratios (F81) equal frequencies for all nucleotides (K80) and both equal nucleotide frequencies and no difference in transition/transversion rate ratios (JC69) (Yang et al. 1994). The log-likelihood of the species tree topology for each ADAM gene was calculated under the different models and twice the log-likelihood difference between models were compared to a chi-square distribution with degrees of freedom given by the difference in number of parameters estimated for each model. The model that better explained the data was used to reconstruct a sequence state at each internal node of the tree. The ancestral sequence was used to estimate noncoding sequence divergence along each branch using MEGA (v4.0).

We tested for evidence of heterogeneity in evolutionary rate by using the branch model within PAML. We compared the likelihood of a two-ratio model (M2) with foreground branches being those likely experiencing postmating selective pressures to the likelihood of the one-ratio model which assumes equal rates of evolution across branches. We also tested for evidence of positive selection along the chimpanzee and macaque foreground branches using the mixed branch-site model (model = 2; NSsites = 2) within codeml (Yang and Nielsen 2002). The log-likelihood of the branch-site model was compared to the same model but fixing the ω value of the foreground branches to 1, so that any significant variation in ω between foreground and background branches could be attributed to positive selection as opposed to differences in selective constraints (Zhang et al. 2005). All tests were also conducted using sequence data from Hominidae and macaques available for SEMG2 (Supplementary Table 2) to serve as a positive control.

Results

ADAM Genes Phylogeny

We used the reconstructed phylogeny of the whole ADAM gene family as a tool to identify true monophyletic groups for analysis of selection. We found consistent grouping of genes regardless of the approach used. The reconstruction of the ADAM genes phylogeny using both distance and MP methods (not shown) confirmed orthology for most ADAM genes with the exception of Adam20 and Adam29. Bos taurus and Canis familiaris Adam20 genes did not group with the other three Adam20 genes and thus Adam20 was not included in phylogenetic tests of rate heterogeneity or positive selection. Bos taurus Adam29 did not cluster with other Adam29 genes and was excluded from the PAML analysis of Adam29 (Fig. 1). The distance method tree identified all testes expressed ADAM genes as having a common monophyletic origin and subdivided into two clades. One such clade (Adams 1, 4, 6, 20, 21, 29, and 30) is in agreement with one of the three major clades resolved in a previous study that have phylogenetically grouped ADAM genes, and so is the grouping of Adam17 and Adam10 (see clades A and C in Huxley-Jones et al. 2007). Both the distance and MP trees placed one of the testes expressed ADAMs group (Adams 2, 3, 5, 18, and 32) as a common monophyletic cluster with high bootstrap support (100 and 94, respectively) (Fig. 1).

Fig. 1
figure 1

Neighbor-Joining Poisson corrected phylogenetic tree. All ADAM genes, with the exception of ADAM20, form monophyletic groups and the gene terminal nodes are collapsed. Black nodes are testes expressed ADAM genes. Bootstrap values are shown

Positive Selection Within the Adhesion Domain of ADAM Sperm Surface Genes in Mammals

We tested for evidence of positive selection among mammals for 24 genes of the ADAM family for which we could find sequence data from four or more species. Fifteen genes showed evidence of positive selection (significant likelihood ratio test: M8 vs. M7 and M8 vs. M8a) (Table 1). Seven (Adams 1, 2, 3, 4, 5, 6, and 32) of the 15 are known to be sperm surface ADAM expressed genes, while Adams 21, 28, 29, and 30 are testes expressed genes and Adam7 is an epididymis expressed gene possibly involved in sperm maturation (Oh et al. 2005). Only three of the fifteen genes detected to be under positive selection are not reproductive-related genes (Adams 8, 11, and 15) and nine other genes showed no evidence of positive selection (Table 1). In summary, all reproductive ADAM genes (both testes expressed and epididymis) showed some evidence of positive selection but positive selection was not restricted to ADAM genes with a role in reproduction (Table 1).

Table 1 Results from codon tests of positive selection using mammalian ADAM genes

ADAM genes are characterized by the presence of identifiable metalloprotease and adhesion (disintegrin and cysteine-rich) domains within their protein sequences. The family is also known as the MDC family, the name highlights the three protein domains found in all members of the gene family (metalloprotease, disintegrin, and cysteine-rich). An analysis of the distribution of positively selected sites across domains revealed that only sperm surface ADAMs 1, 2, 3, 4, and 32 showed evidence of positive selection within the adhesion domain of the protein (Fig. 2).

Fig. 2
figure 2

Distribution of codon sites for 11 out of 15 Adam genes showing evidence of positive selection. Adam genes with a BEB posterior probability lower than 0.95 of being under positive selection are not shown (Adams 6, 8, 15, and 21). The proportion of positively selected sites is shown for the amino end (white), metalloprotease domain (dashed), adhesion domain (black), and carboxy-tail end (gray) in that order for each gene. The name of sperm surface proteins is marked with an asterisk

Changes in Evolutionary Rate and Selection Within the Adhesion Domain of Primate ADAM Genes

The fact that only some mammalian sperm surface ADAMs showed evidence of positively selected sites within the protein adhesion domain led us to hypothesize that such domain might be critical in terms of male × male and or male × female interactions for fertilization. Therefore, it is likely that the evolution of the sperm surface protein adhesion domains is driven by sexual selection at the molecular level. To test this hypothesis we decided to focus our analysis of selection on the adhesion domain of a group of species with different mating types. Moreover, in order to avoid false positives due to ancient bouts of selection unrelated to reproductive differences and to loose true positives due to the inclusion of highly diverged species, we restricted our analysis to relatively closely related species of primates.

We used the branch model within PAML to test for variation in rates of evolution by comparing all species of the Hominidae family as well as macaques and looking for associations between acceleration in rates of evolution and differences in mating system types. We found that a model of evolution assuming a ωA ratio for terminal branches leading to species with multimale–multifemale or dispersed mating systems (macaques and chimpanzee) and a different ωB ratio for other branches as opposed to a single-ratio (ω0) model, fit the data better for Adam2 (2Δ = 6.06; P = 0.014; ωA = ∞ vs. ωB = 0.30) and Adam23 (2Δ = 6.38; P = 0.012; ωA = 1.94 vs. ωB = 0.07) (Table 2). We further used a mixed branch-site model to test for positive selection linked to the two terminal branches more likely experiencing high postmating selective pressures and found significant results for Adam18 (2Δ = 5.60; P = 0.018), Adam23 (2Δ = 50.44; P < 0.001), and the positive control Semg2 (2Δ = 8.66; P = 0.003).

Table 2 Likelihood ratio tests for the adhesion domain of primate ADAM proteins using the branch and branch-site models within PAML

Estimates of ω ratios for each branch in the tree showed Adams 2, 12, 18, 23, 28, 30, and 32 having at least one branch with a ratio higher than one (Table 3). Since this ratio can be inflated due to low number of substitutions with no synonymous changes, we also compared the ratio of nonsynonymous substitutions relative to the proportion of substitutions in noncoding regions for genes showing ω higher than 1 along any given branch. We reconstructed ancestral sequences using models that assume different patterns of nucleotide substitutions. Log-likelihood estimates under different models of nucleotide substitution consistently showed model K80 fitted the data better than Model JC69 and model HKY85 was significantly better than model F81 (Supplementary Table 3). Because all ADAM genes tested had higher frequencies of AT compared to CG, we used model HKY85 for inferring nucleotide ancestral sequences (notice that log-likelihood estimates for HKY85 is lower than F81) (Supplementary Table 3). A consistent pattern of higher proportion of nonsynonymous substitutions relative to both synonymous and noncoding substitutions were found along the chimpanzee branch for Adam2 and Adam18 and along the macaque branch for Adam2 and Adam23 (Table 4).

Table 3 Results for lineage-specific ω ratio as well as proportions of nonsynonymous (d N) to synonymous (d S) substitutions per site
Table 4 Proportion of nonsynonymous to synonymous, as well as nonsynonymous to noncoding substitutions, per site across lineages

Discussion

Our results expand on previous findings on the molecular evolution of ADAM family genes (Glassey and Civetta 2004) to show that all ADAM genes expressed in male reproductive tissue show evidence of positive selection. In fact, it has been recently shown that cell-surface sperm genes have a significantly higher d N than d S in mouse and positive selection was detected for Adams 1, 2, 4, 6, and 24 using phylogenetic comparisons among five different species of mammals (Dorus et al. 2010). These results can be either driven by a strong selective signal across the entire or most of the evolutionary history of the gene, which might be likely for a gene like Adam2, or some lineages experiencing bouts of selection. We also confirmed our prior findings of positive selection of codon sites within the disintegrin/cysteine-rich adhesion domains of Adams 1, 2, and 32 (Civetta 2003b; Glassey and Civetta 2004) and expanded the result to establish that the signal of selection is also found in the adhesion domain of other sperm surface ADAM proteins (ADAMs 3 and 4). We know that gene disruption of Adams 1, 2, and 3 impairs the ability of sperm to move from the uterus to the oviduct and that knockouts for these genes are defective in sperm binding to the egg zona pellucida (Cho et al. 1998; Shamsadin et al. 1999; Kim et al. 2006; Yamaguchi et al. 2009). It is also known that ADAMs 1, 2, and 3 form protein complexes in both testicular cells and sperm (Cho et al. 2000; Nishimura et al. 2004, 2007). ADAMs 4, 5, 6, 30, and 32 are expressed in the sperm head and the use of mice knockouts has revealed that both ADAMs 4 and 6 associate with ADAMs 1 and 2 in the formation of sperm protein complexes (Kim et al. 2006; Han et al. 2009). Our results provide evidence that only some mammalian sperm ADAM genes, many of which are known to associate to form protein complexes with known functions in sperm movement within the female reproductive tract and in sperm–egg binding, have evolved by adaptive diversification of their protein adhesion domain. This result raises the possibility that such signal is driven by male × female postcopulatory interactions prior to fertilization. Alternatively, positive selection at adhesion domains in mammals could be driven by ancient bouts of selection associated to other aspects of an adhesion protein domain.

The results obtained from the analysis of molecular evolution of adhesion ADAM domains in primates using both the branch model and the branch-site model allowed us to identify signals of selection (Adam18 and Adam23) and accelerated d N/d S ratios (Adam2 and Adam23) along the two terminal branches leading to promiscuous species. The detection of positive selection for our positive control Semg2 in the foreground branches is reassuring. However, the use of both the branch and branch-site models required grouping together different tree branches that might have experienced different rates and patterns of evolution. This can lead to a single branch or few branches producing a positive result of selection or rate heterogeneity, particularly if a small number of species are used in the analysis. It is also worth noticing that if only some lineages have experienced positive selection, like in the case of Semg1 (Kingan et al. 2003), then the signal might not be detected using a phylogenetic approach. Clearly, lineage-specific sexual selection bouts occur in sperm genes of primates and caution should be exercised in extrapolating conclusions about the role of sexual selection along specific lineages from broad phylogenetic tests of selection. In fact the estimates of d N/d S ratios along each branch show accelerated evolution for Adam2 and Adam18 along both chimpanzee and macaque, while restricted to macaques in Adam23. It is unclear why Adam23 shows accelerated evolution in macaque and such a strong signal of positive selection when using the branch-site model. ADAM23 has an active adhesive motif and the gene is expressed in the brain (Evans 2001), so given that Adam23 has no known function or expression in sperm, the signal along the macaque branch is more likely unrelated to species mating habits. Particularly interesting is that contrary to Adams 2 and 18, Adam23 shows no nonsynonymous substitutions along the promiscuous chimpanzee branch, further suggesting that the signal picked for Adam23 is not related to a dispersed or promiscuous mating system.

Our analysis rules out the sperm surface ADAM32 protein as being driven by selection linked to species differences in primates mating system. The finding of accelerated evolution and higher proportion of nonsynonymous relative to both synonymous and noncoding substitutions in Adam2 and Adam18 in primates more likely experiencing postcopulatory reproductive challenges is quite interesting in light of what we know about these two proteins. Contrary to other sperm proteins that have become non-functional in some primates (e.g., ADAM1, ADAM3, and ADAM5) (Jury et al. 1997, 1998; Frayne and Hall 1998; Frayne et al. 1999) both ADAMs 2 and 18 are functional and localized in the sperm of primates as diverged as humans and macaques (Frayne et al. 2002). Besides being expressed in the sperm surface, primate ADAM2 and ADAMl8 also show the same developmental expression and contain an identical putative integrin-binding motif (Frayne et al. 2002). Moreover, both genes are located in adjacent genome positions in primates. In humans, Adams 2 and 18 are adjacent to each other (13.7 kb apart) in chromosome 8p11.22. The two genes have also an adjacent location in chromosome 8 in chimpanzee (13.8 kb apart), and the rhesus monkey (20.9 kb apart). It is possible that, in primates, Adam18 might perform complementary or similar postcopulatory roles to Adam2 and that not only their likely functional similarities but also their close chromosome position might explain why both genes show similar patterns of evolution.