Introduction

Chinese alligator (Alligator sinensis), one of the 23 critically endangered crocodile species in the world, has been closed to extinct during the past decades due to habitat loss and serious illegal hunting [1]. The investigation indicated that the number of Chinese alligators has decreased from 500 individuals in the 1980s to currently less than 120 or 150 individuals in the wild [1, 2].

Genetic variability in this relict species is obviously essential for the genetic management of the captive alligator. Until now, the genetic diversity studies about Chinese alligator have mostly focused on neutral DNA markers, such as RAPD [3], AFLP [4], mtDNA D-loop sequencing [5, 6] and microsatellite [710]. However, because of selective processes involving the environment or the capacity for future adaptive change, the genetic structure may differ from that of neutral genetic regions [1114]. Therefore, adaptive non-neutral markers have become valuable especially [1517], such as the genes of the major histocompatibility complex (MHC), which are found in all jawed vertebrates and play a critical role in an organism’s immune response [18, 19]. Shi et al. [20] analyzed the fragments of MHC class IIb exon2 in three Chinese alligators from ARCCAR, and Liu et al. [21] researched the exon3 partial sequences in 14 Chinese alligators. They found there was higher polymorphism of MHC class IIb genes in Chinese alligators.

In this study, we would define MHC IIb gene exon3 complete sequences and expand on these earlier studies by examining genetic variation across populations based on these complete sequences and investigate selection at the sequence level by examining patterns of sequence substitutions. Our specific goals were (1) to define MHC class IIb gene exon3 complete sequences and analyze selection, (2) to provide more detailed genetic information for conservation and management strategies for this endangered population [22].

Materials and methods

Samples and DNA extraction

Sample of individuals was given in Table 1. Sample collection, transportation and storage were same to the previous study [3, 5]. DNA extraction followed a conventional phenol/chloroform procedure [23], and genomic DNA dissolved with ddH2O. The extracted DNA was examined on 1 % agarose gels stained with 10 mg/ml ethidium bromide, stored at −20 °C for further use.

Table 1 Samples of Chinese alligator used for this study

PCR procedure

The primers amplifing the MHC designed by the sequence of Caiman crocodilus (AF256651, AF256652 and AF277661) through Primer premier 5.0 and Oligo 6.0 [24]. Amplification reactions (30 μl) containing 30 ng genome DNA, 3 μl 10× PCR buffer (Sangon in Shanghai), 2 μl 25 mM MgCl2 (Sangon in Shanghai), 1 μl 25 mM dNTP (Sangon in Shanghai), 2 μl 10 mM primer (Synthesized by Genscript in Nanjing), 1 U Taq DNA polymerase (Sangon in Shanghai). The PCR was performed in a Mastercycler gradient (Eppendorf, Germany). Initial denaturation of 95 °C for 5 min was followed by 35 cycles of 94 °C for 30 s, 47 °C for 30 s (primer annealing), and 72 °C for 1 min (primer extension). A final extension of 72 °C for 10 min was incorporated, followed by cooling to 4 °C until recovery of the samples. PCR products were separated on a 1.5 % agarose gel following staining with 10 mg/ml ethidium bromide and visualized using UV light.

Cloning and sequencing

PCR purified products were cloned into pMD18-T vectors (Takara, Dalian, China) and transformed into DH5a competent cells following the manufacturer’s recommendation. After incubation at 37 °C on LB agar-ampicillin plates overnight, at least 15 clones per individual were checked for an insert by PCR. Between 8 and 11 insert-positive clones per individual were sequenced with the M13 forward and reverse primer by Genscript in Nanjing.

Sequences were obtained from at least two clones in the same individual were used in the subsequent data analysis; the sequences which had different length in the same individual were kept. This strategy should allow for the removal of potential artificial polymorphisms, because of the use of Taq DNA polymerase by which mis-incorporation could occasionally occur.

Statistic analysis

After splicing with the ContigExpress software and correcting with peak Chromas [25], all sequences were compared (using the algorithm BLASTn) with those available via the NCBI database (http://www.ncbi.nlm.nih.gov) [26]. Nucleotide multiple alignments were performed with ClustalX [27].

Nucleotide polymorphism and diversity were calculated using DnaSP v5 [28]. Analysis of molecular variance (AMOVA) was performed using Arlequin v3.1 [29] to assess genotypic variations across all the populations studied. Population’s relationships were estimated using the NJ method.

Statistical analysis of nucleotide and amino acid sequences was performed with MEGA v4 [30]. The non-synonymous (dN) and synonymous (dS) substitution rates between alleles were calculated for the entire sequence using the Modified Nei–Gojorbori method applying the Jukes–Cantor correction [31]. In addition, we used MEGA v4 to perform Z tests of positive, neutral and purifying selection on the entire exon3.

Pairwise analyses of transitions and transversions, and computation of DNA and amino acid entropy were performed with the program DAMBE [32, 33]. To see how the alleles were related, we reconstructed a nucleotide phylogeny by the NJ method with Kimura’s two-parameter model, using MEGA v4 [30]. Bootstrap values were calculated by 1,000 replicates, and branches corresponding to partitions reproduced in less than 50 % of bootstrap replicates were collapsed.

Results

Definition of MHC IIb gene exon3 complete sequences

By sequencing, sequences of three length types were obtained in Chinese alligators, namely 913, 928, 943 bp. Similarity search of the sequences at the NCBI database showed 86–94 % similarity with C. crocodilus (AF256651, AF256652 and AF277661). According to the intron–exon boundary GT–AG rules, and by aligning with the sequence of C. crocodilus (AF256651, AF256652 and AF277661), we had found the sequences included 282 bp MHC IIb exon3 complete sequence. The sequence did not show any gaps, the nucleotide sequences were corresponded with putative 93 amino acids peptide, none of the sequences contained insertions/deletions or stop codons, suggesting that all sequences might come from functional molecules in the genome. The sequences from 928 and 913 bp length types could be cut to the same exon sequence.

Nucleotide and amino acid variation

For simplicity, different sequences are tentatively referred to as alleles even though they may be from different loci [34]. In total, 206 clones from 21 Chinese alligators were sequenced; an average of ten clones (range 8–11) per individual. By sequencing, we indentified 43 alleles from 21 individuals. These sequences have been deposited to GenBank (accession nos. JQ048623–JQ048665). Moreover, 21 alleles indentified in exon3 DNA complete sequence and 16 alleles indentified in exon3 amino acid complete sequence. 11.7 % (33 out of 282) of the nucleotide sites was variable and 16.1 % (15 out of 93) of the amino acid sites was variable. Entropy measures the variation index at per site were given (Fig. 1). Nucleotide diversity in Wild population was higher than that in two captive populations, Xuangzhou population higher than Changxing population. About haplotype diversity, Wild population was highest, followed by Changxing and Xuangzhou populations (Table 2).

Fig. 1
figure 1

a Entropy measures the variation index at per base site in exon3 sequences, b entropy measures the variation index at per amino acid site Hi = −sum [Pi·log 2 (Pi)]; Hi: 0–2; Pi: the substitution frequencies of different nucleotides per site within the allelic sequences sampled

Table 2 Analysis of genetic diversity of different populations

Selection on MHC IIb exon3 complete sequence

Z tests allowed us to exclude the possibility that the exon3 experiences balancing selection (Z = −3.167, P = 1.000); Z tests indicated that the dN/dS ratio does not differ from neutrality (Z = −2.990, P = 0.003), and a test of purifying selection was significant (Z = 3.134, P = 0.001) (Table 3).

Table 3 Summary of nucleotide substitution rates in the MHC IIb gene exon3 of China alligator

Substitution saturation analysis

The analysis of substitution saturation was conducted and is shown in Fig. 2. When transitions occur much more frequently than transversions, no saturation of substitution is recognized. On the other hand, when transversions gradually outnumber transitions, substitution saturation is suspected because multiple substitutions may have occurred at each site. Therefore, we conducted the comparison of two regression slopes. The slope of transitions was significantly steeper than that for transversions against evolutionary distances, so the substitution was considered not to be saturated.

Fig. 2
figure 2

Substitution saturation analysis transitions-type and transversions-type nucleotide differences plotted against the evolutionary distance of Kimura’s two-parameter method

Population structure

The genetic distances between the Wild population and the two captive populations were higher than that between the two captive populations. The NJ tree (Fig. 3) revealed that Xuangzhou population and Changxing population were genetically close related, while Wild population showed the most diverse from the others.

Fig. 3
figure 3

The cluster of three populations in Chinese alligator based on NJ method

The difference among three populations was not significant (P > 0.05) in AMOVA. The degree of differentiation within a population accounted for 99.53 %, while only 0.47 % among populations (Table 4). All those indicated that the genetic differences mainly occurred within populations.

Table 4 Hierarchical analysis of molecular variance (AMOVA) within/among populations

Alleles phylogeny

In this study, the sequences included exon3 complete sequences. Phylogenetic NJ tree showed alleles group coming from 928 bp sequences were relatively concentrated (Fig. 4). We also found trans-species polymorphism within the exon3 regions, three alleles of C. crocodilus were scattered in the alleles of Chinese alligator, suggesting that exon3 had high variation in or between species (Fig. 4).

Fig. 4
figure 4

a NJ phylogeny of 21 alleles based on exon3 DNA complete sequence, b NJ phylogeny of 16 alleles based on exon3 amino acids sequence. A NJ phylogeny based on genetic distance of the Kimura two-parameter model and bootstrap re-sampling for 1,000 times. The tree also includes three alleles from the Caiman crocodilus (AF256651, AF256652 and AF277661) (Bootstrap values >50 % were indicated above branches)

Discussion

Genetic variation of MHC IIb exon3 complete sequence

In previous studies, 38 variable sites among ten nucleotide sequences and 23 variable sites among amino acid sequences were detected in MHC IIb gene exon2 partial sequences, which were 166 bp long (except one 160 bp) [20]; 34 sequence haplotypes of exon3 were detected in the sampled Chinese alligators, and 83 polymorphic (variable) sites were found within MHC IIb gene exon3 partial sequences (260 bp) [21]. In this study, 11.7 % (33 out of 282) of the nucleotide sites was variable and 16.1 % (15 out of 93) of the amino acid sites was variable. The research based on MHC IIb gene was a more powerful method for genetic diversity of Chinese alligator, this technology could be used for Chinese alligator breeding and releasing in future.

In contrast to the three populations, Wild population had higher levels of MHC IIb diversity. This difference in diversity is interesting, because we would expect uniform distribution of alleles throughout the range. Different patterns of diversity might be explained by differences in population size, as larger populations have more genetic diversity. Changxing population, however, is smaller than Xuangzhou population.

Population fixation index (Fst) represents the genetic differentiation among populations; the larger Fst values are, the higher the degree of differentiation among populations is [35]. In this study, the result that Fst values was very low (Table 4), indicated a lack of isolation among three populations and there were no differentiation. The result was the same as the researches by Wu et al. [3] and Liu et al. [21]. Usually, gene flow values (Nm) less than one indicates a limited group of gene flow, Nm values higher than one may represent large levels of gene flow and genetic exchanging now or in the past [36]. In this study, gene flow (Nm) was 5.77, suggesting that inter-group gene flow may have occurred.

Evidence for purifying selection

An intriguing alternative explanation is that high diversity in MHC genes is driven by balancing selection against diverse pathogens [11, 37]. However, patterns of diversity differ somewhat across exons in the same gene; for instance, the exon3 of class II loci encodes an extracellular domain close to the transmembrane region, which may experience purifying selection [38].

Such selection leaves a characteristic mutational signature in coding regions; in particular, codons that have experienced balancing selection should have more non-synonymous changes than synonymous changes, while codons that experienced purifying selection will have very few non-synonymous changes relative to the number of synonymous changes.

In this study, Z tests indicated that the dN/dS ratio does not differ from neutrality (dS > dN), and a test of purifying selection was significant (P = 0.001) (Table 3). This result showed that the exon3 was expected to be under purifying selection.

Conclusion

In this study, we had defined the MHC IIb exon3 complete sequence. The sequence included 282 bp nucleotide and did not show any gaps, the nucleotide sequences were corresponded with putative 93 amino acids peptide. Z tests indicated that the dN/dS ratio does not differ from neutrality (dS > dN), which showed that the exon3 was expected to be under purifying selection. At the same time, we found that the research based on MHC IIb gene was a more powerful method for genetic diversity of Chinese alligator, this technology could be used for Chinese alligator breeding and releasing in future.