Introduction

Understanding how new biological functions arise represents an area of active research in evolutionary biology. From a genetic standpoint, gene duplication is recognized as a key biological event in generating the raw material for evolution to act on (Ohno 1970). After duplication, the functional and regulatory divergence between the resulting gene copies is thought to contribute to the generation of evolutionary novelties (Ohno 1970; Zhang 2003; Hahn 2009), and positive selection has been recognized as a potentially important evolutionary force in generating these new functions. Morris Goodman (1981, 1982) postulated that in the first stages of functional differentiation the amino acid changes that remodel the molecule could be driven to fixation by positive Darwinian selection, whereas at later stages the previously fixed changes would be preserved by purifying selection so as to maintain the newly acquired physiological role. In this regard the relaxin gene family represents an interesting model system to explore the interplay between gene duplication and functional differentiation in originating a group of genes with a variety of physiological functions (Sherwood 2004; Wilkinson et al. 2005; Arroyo et al. 2012; Hoffmann and Opazo 2011).

Members of the relaxin gene family are located on three different locations in the genome, which have been called relaxin family locus (RFL) A, B, and C (Park et al. 2008a). The number and nature of genes in these three loci is well conserved in most mammalian groups. The RFLA locus contains an ortholog of the human INSL5 gene, which has remained as a single-copy gene in all species surveyed, and the RFLC locus contains orthologs of the human INSL3 and RLN3 genes, which have also remained as single-copy genes in all species surveyed (Park et al. 2008a, Arroyo et al. 2012, Hoffmann and Opazo 2011). Copy number variation is also minimal in the RFLB locus in most mammalian species with the exception of primates (Fig. 1; Park et al. 2008a; Arroyo et al. 2012; Hoffmann and Opazo 2011). In most mammals the RFLB locus contains two genes: a single copy of a relaxin-like gene, labeled as either RLN or RLN2, and a single copy of an ortholog of the human INSL6 gene, but primates show additional variation in the number and nature of genes in this locus (Fig. 1; Bièche et al. 2003; Park et al. 2008a, b; Arroyo et al. 2012; Hoffmann and Opazo 2011). For example, haplorhines primates, the group that includes tarsiers, New World monkeys, Old World monkeys and apes, also posses a copy of INSL4 at this locus (Bièche et al. 2003; Park et al. 2008a, b; Arroyo et al. 2012), and apes also posses duplicated RLN1 and RLN2 paralogs. Interestingly, phylogenetic reconstructions indicate that both the INSL4 gene of haplorhines and the duplicate RLN1/RLN2 genes of apes, are actually older than what their phyletic distributions would suggest (Arroyo et al. 2012; Hoffmann and Opazo 2011).

Fig. 1
figure 1

Genomic structure of the RFLB in mammals. Stars indicate gaps in genomic coverage, while diagonal slashes indicate that the genes were identified in different genomic pieces. The orientation of the clusters is from 5′ (on the left) to 3′ (on the right). Phylogenetic relationships are based on recent published literature (Hallstrom and Janke 2008, 2010)

From a functional standpoint, the RLN2 genes of primates and the single-copy relaxin gene in the RFLB locus of non-primate mammals have physiological roles mainly associated to the reproductive system, such as softening the connective tissue of the interpubic ligament, growth of mammary gland and nipples, follicle development, and embryo implantation among others (Burger and Sherwood 1995; Graham and Dracy 1953; Hall 1947; Krantz et al. 1950; O’day et al. 1989; Sherwood and O’Byrne 1974; Zhao et al. 2000; Samuel et al. 2007; Shabanpoor et al. 2009). In addition to its role in reproductive physiology, recent studies have also identified non-reproductive functions of these genes associated to fibrosis (Bathgate et al. 2006; Samuel 2005; Samuel and Hewitson 2006; Sherwood 2004), wound healing (Bathgate et al. 2006; Sherwood 2004), cardiac protection (Bani et al. 2005; Dschietzig et al. 2006; Samuel and Hewitson 2006; Samuel et al. 2006), allergic responses (Bani 1997) and cancer (Kamat et al. 2006; Silvertown et al. 2003). Despite the variety and apparent importance of physiological roles in which the product of this gene is involved, it is not indispensable as species of the family Bovidae (e.g., cow and sheep) have secondarily lost it during its evolutionary history (Park et al. 2008a). Because of the functional similarities between the relaxin gene in the RFLB locus from most mammals and the RLN2 gene from apes, the name RLN2 is generally applied to these two genes, although these two genes are not orthologs (Shabanpoor et al. 2009). We will follow this convention to facilitate comparisons with the extensive body of functional studies already published. Thus, the term RLN2 will used to refer to the RLN2 gene of primates, and to the relaxin gene in the RFLB locus from non-primate mammals.

In the European rabbit (Oryctolagus cuniculus), in addition to the classical physiological functions in the reproductive system (Fields et al. 1995), the RLN2 gene is also expressed in tracheobronchial epithelial cells and its expression has been linked to squamous differentiation (Jetten et al. 1992). In principle by comparing the sequence and genomic structure of the genes in the RFLB locus of the European rabbit we could gain insights into the changes and evolutionary forces that are responsible for the acquisition of this new physiological function, which appears to be an autapomorphy restricted to the European rabbit lineage. Accordingly, the goals of this study are to apply the tools of molecular evolution and comparative genomics to better understand the evolution of the RFLB locus in the European rabbit. In particular, we want to evaluate the putative role of positive Darwinian selection in the fixation of the specific amino acid replacements that distinguish the European rabbit RLN2 from other mammalian RLN1 and RLN2 genes from the RFLB locus, and make inferences about the sites that are likely to be involved with the acquisition of this new function in this species. Results of our study indicate that the European rabbit possess five tandemly arranged RLN2 gene copies in the RFLB locus, which contrast to the single copy of RLN2 gene found in all mammals other than primates, and indicate that positive Darwinian selection played a significant role in their pre-duplicative history.

Materials and Methods

Data

We obtained genomic DNA sequences for structural genes in the Relaxin Family Locus B for 35 mammals, one bird, one amphibian and two fish species from the Ensembl database (Supplementary Table 1). We identified RLN/INSL-like genes by comparing annotated exon sequences to unannotated genomic sequences, using the program Blast2seq (Tatusova and Madden 1999) available from NCBI (www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi). We also included sequences from shorter records based on genomic DNA or cDNA, in order to attain a broad taxonomic coverage. To distinguish among tandemly arrayed genes copies, we index each gene-copy with the symbol T followed by a number that corresponds to the linkage order in the 5′–3′ orientation, thus, the first gene in the cluster is labeled T1, the second T2, and so forth. It is worth reminding the reader that for historical reasons, the name RLN2 is applied to two set of genes that are not 1:1 orthologs, the RLN2 gene of primates and to the single-copy RLN gene in RFLB locus in all other mammals.

Alignment

To explore sensitivity of our phylogenetic analyses to changes in the alignment method sequences were aligned using Dialign-TX (Subramanian et al. 2008), Kalign2 (Lassmann et al. 2009), the E-INS-i, G-INS-i, and L-INS-i strategies from Mafft v.6 (Katoh et al. 2009), MUSCLE v3.5 (Edgar 2004), Probcons (Do et al. 2005), and Tcoffee (Notredame et al. 2000). The biologically accuracy of alignments was assessed using the program MUMSA (Lassmann and Sonnhammer 2005) which ranks the quality of each alignment by calculating a multiple overlap score (MOS) to each of the different alignments, which ranges from 0 to 1, with 1 being the highest quality. We then selected the alignment with the highest MOS score for all ensuing analyses.

Phylogenetic Inference

We estimated phylogenetic relationships among the different RFLB genes in all major groups of mammals (monotremes, marsupials, and placentals), using sequences from chicken (Gallus gallus), western clawed frog (Xenopus tropicalis), fugu (Takifugu rubripes), and medaka (Oryzias latipes) as outgroup. We used a maximum likelihood and a Bayesian approach, as implemented in the program Treefinder version October 2008 (Jobb et al. 2004) and Mr.Bayes v3.1.2 (Ronquist and Huelsenbeck 2003), respectively. The best fitting models for each structural domain was estimated separately using the propose model routine from the program Treefinder version October 2008 (Jobb et al. 2004). In the case of maximum likelihood, we estimated the best tree under the selected models, and assessed support for the nodes with 1,000 bootstrap pseudoreplicates. In Bayesian analysis, two simultaneous independent runs were performed for 30 × 106 iterations of a Markov Chain Monte Carlo algorithm, with six simultaneous chains sampling trees every 1,000 generations. Support for the nodes and parameter estimates were derived from a majority rule consensus of the last 15,000 trees sampled after convergence. The average standard deviation of split frequencies remained 0.01 after the burn-in threshold.

Natural Selection Analysis

To investigate the possible role of positive Darwinian selection on the evolution of the multiple copies of the RLN2 genes in the European rabbit, we explored variation in ω, the ratio of the rate of non-synonymous substitutions (d N) to the rate of synonymous substitutions (d S), in a maximum likelihood framework using the codeml program from PAML v4.4 (Yang 2007). In brief, if amino acid replacements are neutral, then d N and d S would be very similar and ω = d N /d S ≈ 1, under purifying selection most amino acid substitutions would be deleterious, and so ω < 1, whereas under positive selection amino acid replacements would be advantageous, and so ω > 1. All these analyses were based on a phylogeny that includes a representative panel of RLN2 genes from all major mammalian lineages (Supplementary Fig. S1). We compared two sets of models, the first set focused on comparing changes in ω (=d N /d S) along the branches of the tree, and the second set of models focused on comparing changes in ω along the different sites in the alignment between background and foreground sets of branches. These different models were implemented to explore changes in ω relative to the duplication in the European rabbit RLN2 paralogs. In all cases three starting ω values (0.5, 1, and 2) were used to check the existence of multiple local optima. Nested models were compared using the likelihood ratio test (LRT). We first compared the following four branch models (Supplementary Fig. S2): (i) a 1-ω model which fitted a single ω estimate to all branches in the tree; (ii) a 2-ω model, which assigned one ω to all European rabbit branches and a second ω to all non-rabbit branches in the tree; (iii) a 3-ω model, with one ω for the ancestral branch of the European rabbit RLN2 clade, a second ω for the European rabbit RLN2 clade branches, and a third ω for non-rabbit branches; and (iv) a 7-ω model that assigns a separate ω to each European rabbit RLN2 branch, plus a separate ω estimate for non-rabbit branches (Supplementary Fig. S2). We also implemented branch-site models which explore changes in ω for a set of sites in a specific branch of the tree in order to assess changes in their selective regime (Yang and Dos Reis 2011). In this case the ancestral branch of the European rabbit clade was labeled as the foreground branch. We compared the modified model A (Yang et al. 2005; Zhang et al. 2005), in which some sites are allowed to change to an ω > 1 in the foreground branch, with the corresponding null hypothesis of neutral evolution. The Bayes Empirical Bayes (BEB) method was used to identify sites under positive selection (Nielsen and Yang 1998; Yang et al. 2000). Because the branch-site analysis estimates rates of evolution on a codon by codon basis, its implementation is particularly useful in cases when different gene segments evolve at different rates, as is the case with the different domains of the RLN2 genes.

Results and Discussion

Genomic Structure of the RFLB in the European Rabbit

The number and nature of genes located on the RFLB locus is well conserved among most mammals other than primates (Park et al. 2008a, b; Hoffmann and Opazo 2011; Arroyo et al. 2012). Most species examined are characterized by having an INSL6 gene at the 5′ end of the RFLB locus, and a relaxin-like gene at the 3′ end, which is generally referred to as RLN2. The RFLB locus is flanked by janus kinase 2 (JAK2), RNA terminal phosphate cyclase-like 1 (RCL1) and adenylate kinase 3 (AK3) on the 5′ side, and by the chromosome 9 open reading frame 46 (C9Orf46), CD274 molecule (CD274), and programmed cell death 1 ligand 2 (PDCD1LG2) on the 3′ end (Fig. 1; Park et al. 2008a, b). Previous studies on the RFLB locus in the European rabbit suggested it conformed to the general pattern described above (Fields et al. 1995; Jetten et al. 1992). However, our examinations of the RFLB locus in European rabbit reveal the presence of five tandemly arranged RLN2 gene copies, in addition to a single copy of INSL6 at the 5′ end of the cluster (Fig. 1). Each of these five tandemly arranged RLN2 genes is characterized by the canonical two exons/one intron gene structure typical of mammalian relaxins, and has a perfectly intact open reading frame. Experimental evidence indicates that these genes are mainly expressed in the uterus, placenta, and tracheobronchial epithelial cells (Jetten et al. 1992; Fields et al. 1995), and utilized by the endometrium, uterine cervix, and mammary glands (Fields and Fields 2005).Thus, the RFLB locus in the European rabbit includes six putatively protein-coding genes, whereas the corresponding locus includes two putatively coding genes in all rodents, which are the order sister to Lagomorpha (Fig. 1). The rabbit RFLB is located on chromosome 1 and extends for 194 kb from the start codon of INSL6 on the 5′ end to the stop codon of the RLN2 paralog in the 3′ end. Synteny is conserved relative to other mammals, as genes found up- and downstream of the RFLB locus in the European rabbit are orthologous with the genes found in the corresponding locations in most other mammals (Fig. 1).

Thus far, duplications of the ancestral gene located on the RFLB locus of vertebrates have only been reported for mammals. The oldest duplication gave rise to the INSL6 gene in the common ancestor of placental mammals, a gene that has remained as a single-copy gene in all species examined (Fig. 1; Park et al. 2008a, b; Arroyo et al. 2012). A second duplication event gave rise to INSL4, which is only found in primates (Bièche et al. 2003; Hoffmann and Opazo 2011). Even though INSL4 was though to derive from a primate-specific duplication, phylogenetic reconstructions suggest that INSL4 derives from a duplication in the last common ancestor of Euarchontoglires (Arroyo et al. 2012; Hoffmann and Opazo 2011). There are also several duplications of the RLN2 and INSL4 genes that are more recent in time, and are restricted to primates for the most part (Arroyo et al. 2012). The RLN2 gene duplicated in the last common ancestor of anthropoids (Arroyo et al. 2012) but the two resulting primate duplicates, RLN1 and RLN2, have only been retained as a pair in hominoids (Arroyo et al. 2012).

Orthologous Relationships of the RFLB Genes in Mammals

We conducted a phylogenetic analysis to resolve orthologous relationships between the European rabbit RLN2 paralogs and the different genes on the RFLB locus in other mammals, focused on the European rabbit and primates, the groups that posses multiple copies of relaxin-like genes in RFLB. At the deeper level, the phylogenies we obtained are concordant with the evolutionary scenarios proposed for the genes located on the RFLB locus (Fig. 2). The INSL6 clade was recovered as sister to the clade that includes the RLN1, RLN2, and INSL4 genes of placental mammals (Arroyo et al. 2012; Hoffmann and Opazo 2011), as expected for a gene that derives from a duplication that predates the radiation of the group (Fig. 2). The INSL4 clade was embedded within the clade that includes the RLN1 and RLN2 genes (Fig. 2), confirming that it derives from a relatively old duplication of an RLN-like ancestor (Bièche et al. 2003; Arroyo et al. 2012; Hoffmann and Opazo 2011). With regards to the European rabbit RLN2 paralogs, maximum likelihood and Bayesian phylogenies place the five gene copies in a strongly supported monophyletic clade sister to the single-copy RLN2 gene from the ground squirrel, embedded within the clade that includes the RLN2 genes from all other rodents (Fig. 2).

Fig. 2
figure 2

Maximum likelihood phylogram depicting phylogenetic relationships among the RFLB genes in mammals. The RLN2 paralogous genes identified in the European rabbit are in bold. Number above the nodes correspond to maximum likelihood bootstrap support values, and numbers below the nodes correspond to Bayesian posterior probabilities. Sequences from the chicken (Gallus gallus), western clawed frog (Xenopus tropicalis), fugu (Takifugu rubripes), and medaka (Oryzias latipes) were used as outgroups

This pattern suggests that the duplications that gave rise to the presence of multiple RLN2 paralogs in the European rabbit are independent of those that gave rise to the duplicate RLN1 and RLN2 genes of apes. However, this could also be due to gene conversion, which can overwrite genetic information and result in phylogenies where paralogous genes from the same species are more similar to each other than they are to their orthologs in closely related species. Because interparalog gene conversion is typically restricted to coding regions, analyzing variation in non-coding regions has the potential to retrieve the true orthologous relationships, even in the presence of gene conversion. Accordingly, we conducted phylogenetic analyses based on 1 kb of upstream flanking sequence, 1 kb of downstream flanking sequence, and intron 1 for all relaxin genes of Boreoeutherian mammals, including the five tandemly arranged genes of the European rabbit. As shown in Fig. 3, all phylogeny reconstructions recovered the five tandemly arranged RLN2 gene copies of the European rabbit in a monophyletic group with strong support, suggesting that the five tandemly arranged RLN2 genes of the European rabbit are the product of duplications specific to the European rabbit lineage. In agreement with these analysis, gene conversion analyses using GeneConv (Sawyer 1989) also failed to detect any evidence of gene conversion.

Fig. 3
figure 3

Maximum likelihood phylograms depicting relationships among relaxin genes in boreoeutherian mammals based on 1 kb of 5′ flanking sequence (left), intron 1 (center), and 1 kb of 3′ flanking sequence (right). Number above the nodes correspond to maximum likelihood bootstrap support values, and numbers below the nodes correspond to Bayesian posterior probabilities

Natural Selection at the Molecular Level

Functional studies indicate that the European rabbit RLN2 gene has acquired a novel biological role relative to other mammals (Jetten et al. 1992). Because this functional change coincided with the emergence of the five European rabbit RLN2 paralogs, we explored the potential role of positive Darwinian selection in this process in a maximum likelihood framework. These analyses focused on assessing whether positive selection played a role in the preservation of the European rabbit RLN2 duplicates, and on exploring when positive selection acted relative to the duplicative history of these paralogs. We started by comparing the d N/d S ratio (=ω) among the different RLN2 paralogs of the European rabbit in a pairwise manner using the YN00 method (Yang et al. 2000). Results from this approach are compatible with positive selection in two comparisons, (T5/T2 and T5/T4), whereas in other six comparisons only non-synonymous substitutions were estimated, and in the last two the ω ratio was lower than 1 (Supplementary material). We then incorporated phylogenetic information into the analyses and estimated variation in ω along a tree built with a subset of the sequences included in Fig. 2 that included representatives of all major groups of mammals (Supplementary Fig. S1). Overall, the simplest model, which assigned a single ω for all branches (the 1-ω model), indicates that all RLN1 and RLN2 genes in the study have relatively high ω estimates (Table 1), in line with previous findings from primates (Wilkinson et al. 2005). When we compared variation in ω among different set of branches, we found that a 2-ω model that assumes one ω value for the European rabbit branches and a second one for the non-rabbit branches was significantly better than the 1-ω value for all branches on the tree (Table 1; 2Δ lnL = 5.24, df = 1, P < 0.025). Under the 2-ω model, European rabbit branches had an ω of 1.37, whereas non-rabbit branches had an ω of 0.71 (Table 1). The 3- and 7-ω models that allowed separate ω estimates were not significantly better than the 2-ω model. In all cases the synonymous and non-synonymous rate estimates obtained for the post-duplication branches were at least an order of magnitude lower than the estimates for the pre-duplication branch obtained by branch models.

Table 1 Log likelihood values and parameter estimates under different branch and branch-site models

We further explored the putative role of positive Darwinian selection in the European rabbit pre-duplication branch using branch-site analyses, which are summarized in Table 1. Here we set the European rabbit pre-duplication branch as foreground, and all other portions of the tree as background. The likelihood ratio test indicates that the model that estimates a class of sites with an ω value of 3.94 (Model A) had a significantly better fit than the null model in which the ω value was fixed to 1 (2ΔlnL = 5.52, df = 1; P < 0.025). The branch-site analysis indicates that 27% of the residues switched to a class of sites with an ω value of 3.94 in the European rabbit pre-duplication branch, and the BEB analysis placed 17 codons in this class, three located on the signal peptide, seven on the region encoding for the B chain, one on the region encoding for the C chain, and six located on the region encoding for the A chain (Table 1). Four of these sites, one in each region, are over the 90% posterior probability threshold of the BEB analysis (Table 1). From a biochemical standpoint, two of the putatively selected sites on the B chain map to functionally relevant portions of the protein. The residues in the B12 and B13 positions, which are part of the receptor recognition and binding site (Fig. 4), have been substituted for amino acids with different physical–chemical properties relative to rodents and human. In the case of the B12, glycine, a non-polar amino acid, was replaced by arginine, a positively charged residue, and in the case of B13 arginine was replaced by asparagine, a polar uncharged residue (Fig. 3). Interestingly, all these substitutions are shared by the different European rabbit RLN2 paralogs, which is consistent with the results of PAML analyses, which indicates that positive selection in the European rabbit pre-duplication branch was responsible for the remodeling of this protein.

Fig. 4
figure 4

An alignment of RLN2 amino acid sequences from human and the European rabbit. Positively selected amino acid sites according to the branch-site analyses are shade. The box with discontinuous line denote the receptor recognition and binding site

Taken together, these analyses indicate that positive Darwinian selection has played a strong role in the evolution of the European rabbit RLN2 genes, specially in the pre-duplication branch. Even though our analyses also estimate elevated non-synonymous substitution rates for the European rabbit post-duplication branches, because of the relatively low estimates of d N and in particular of d S, the estimates of ω are not reliable for these branches. As a results it is difficult to distinguish between relaxations of purifying selection and positive selection. However, pairwise comparisons between the RLN2-T2 and T5, and between the T4 and T5 paralogs leave open the possibility that positive selection might still be acting in the European rabbit post-duplication branches.

Although the duplication and functional divergence of protein-coding genes has long been recognized as an important process in the origin of biological novelties (Ohno 1970; Zhang 2003; Hahn 2009), the putative role of adaptive changes in promoting these innovations remains unclear. Most of the models that invoke positive selection posit that post-duplication changes that lead to the differentiation of the resulting duplicates are driven by natural selection (Innan and Kondrashov 2010), but our results do not match the expectations of these models. The codon-based models implemented in this study would suggest that there was a burst of positive Darwinian selection in the proto-RLN2 gene, prior to the duplication events that gave rise to all the European rabbit RLN2 paralogs. Bringing together the results of our evolutionary analyses and the functional data from Jetten et al. (1992), it is possible to make an association between the positive selection regime mainly inferred for the ancestral gene of the European rabbit and the gain-of-function observed in this species which is related to cell differentiation in tracheobronchial epithelia. It remains to be seen whether the different rabbit RLN2 paralogs have diverged in their pattern of gene expression, and thus, could have partitioned some of their functional roles among them, and whether the elevated d N /d S rates reported for primates (Wilkinson et al. 2005) might be related to currently unknown functional changes.