Introduction

The generation and maintenance of genetic variation within or among natural populations is a central issue in evolutionary and conservation biology (Piertney and Oliver 2006). Information on adaptive genetic variation at different geographical scales can be efficiently used to inform conservation management such as identifying adaptive units (AUs, special conservation units which are based on patterns of adaptive differentiation) (Funk et al. 2012; Zhu et al. 2013). Typically, studies of genetic variation occurring over different geographical scales have used neutral or nearly neutral markers, such as mitochondrial DNA (mtDNA), microsatellites or SNPs (Martínez-Cruz et al. 2004; Morin et al. 2004; Hull et al. 2008; Addis et al. 2015; Corrêa et al. 2015). Although these neutral markers can be used to infer demographic events or population history in natural populations (Fabiani et al. 2003; Ekblom et al. 2007; Zhou et al. 2010; Corrêa et al. 2015), their ability to demonstrate adaptive variation is limited (Meyers and Bull 2002; Alcaide et al. 2008; Witzenberger and Hochkirch 2011). Generally, neutral markers provide demographic history while adaptive markers show how these populations adapt to the environment (Ekblom et al. 2007; Alcaide et al. 2008; Vásquez-Carrillo et al. 2013). Neutral genetic variation is influenced by demographic factors, such as genetic drift and gene flow, whereas adaptive genetic variation is affected by both demographic factors and selective factors (Bichet et al. 2015; Lillie et al. 2015). Therefore, adaptive genetic markers can be efficiently used to study population adaptive variation at different environments. Recently, Major histocompatibility complex (MHC) genes have been used to study adaptive variation at different geographical scales, because the variation in pathogens across regions may lead to differential selection pressure on MHC proteins (Sommer 2005; Piertney and Oliver 2006; Spurgin and Richardson 2010).

The MHC is a multi-gene family that plays important roles in susceptibility or resistance to many vertebrate diseases, principally by recognizing foreign peptides and presenting them to T cells, initiating the adaptive immune response (Klein 1986; Klein et al. 1993; Sommer 2005; Janeway et al. 2008). Traditionally, MHC genes are classified into two major classes: class I and class II, which typically present peptide antigens that arise from intracellular and extracellular proteins, to CD8+ and CD4+ T cells, respectively (Bevan 1987; Germain et al. 1996; Janeway et al. 2008). MHC class II genes encode heterodimers composed of alpha- and beta-chains (Klareskog et al. 1977), and thus can be further subdivided into A (e.g., DRA) and B (e.g., DAB, DQB and DPB) genes, in which the B genes are responsible for the majority of the polymorphism. Which foreign peptides an individual can respond to is largely determined by genetic variation at specific regions of MHC genes (exons 2 and 3 of class I, and exon 2 of class II), which in turn influence individual fitness and long-term survival of populations (Klein 1986; Hughes 1991; Hughes and Nei 1992; Janeway et al. 2008). Genetic variation of the MHC is postulated to be generated by gene duplication and deletion, intra- and inter-locus recombination or gene conversion, and the accumulation of de novo mutations (Nei and Rooney 2005; Balakrishnan et al. 2010; Spurgin et al. 2011; Promerová et al. 2013; Nguyen-Phuc et al. 2016). MHC polymorphism is maintained by some forms of balancing selection, such as frequency-dependent selection (Takahata and Nei 1990), heterozygous advantage (Doherty and Zinkernagel 1975) and MHC-dependent mate choice (Penn and Potts 1999).

Recently, studies of geographical variation at MHC loci have been reported for several groups of Aves, including Passeriformes (Miller and Lambert 2004; Aguilar et al. 2005; Schut et al. 2011; Jones et al. 2014; Bichet et al. 2015), Galliformes (Piertney 2003; Nguyen-Phuc et al. 2016; Zeng et al. 2016), Charadriiformes (Ekblom et al. 2007; Vásquez-Carrillo et al. 2013), Strigiformes (Kohyama et al. 2015), and Falconiformes (Alcaide et al. 2008). However, as far as we know, there has hitherto been no study of MHC geographical variation conducted in Ciconiiformes, particularly in threatened ardeid birds.

The Chinese egret (Ciconiiformes, Ardeidae, Egretta eulophotes) is a migratory colonial waterbird, wintering in the south of Asia while breeding on offshore islands in Russia, North Korea, South Korea and China. This egret was nominated by Swinhoe after he collected the type specimen from Xiamen (formerly known as Amoy), China in 1863. Its populations have been declining dramatically since the nineteenth century (Kushlan and Hancock 2005; BirdLife International 2015; IUCN 2015). Currently, this egret is listed as a vulnerable species with an estimated global population of 2600–3400 individuals (BirdLife International 2015; IUCN 2015). In our previous MHC studies on this vulnerable species, we have isolated and characterized three classical single-copy loci of the MHC class II DAB gene (named as Egeu-DAB1, -DAB2, and -DAB3), and established an efficient locus-specific MHC genotyping technique (Li et al. 2011; Wang et al. 2013; Lei et al. 2015). Furthermore, our previous study (Zhou et al. 2010) found that there was a relatively high level of mtDNA genetic diversity in three populations of Chinese egret in China, and these populations had low but significant genetic differentiation with little geographical structure. To expand upon our previous work, providing the first study of population genetic diversity and differentiation at the MHC in ciconiiform birds, we had the following specific aims: (1) to analyze the allelic distribution at exon 2 of three Egeu-DAB genes in five populations of the Chinese egret in China; (2) to examine the genetic diversity of these genes in these five populations; and (3) to analyze the genetic differentiation of Egeu-DAB genes among these five populations that span the entire distribution range of this species in China. Our results provide fundamental population information for the conservation genetics of the vulnerable Chinese egret.

Materials and methods

Sample collection and DNA extraction

Sample collection from Chinese egret was conducted during the morning (06:00–08:00), and visits to the breeding colonies were restricted to a maximum of 2 h every day. A total of 172 feather samples of nestlings, originating from 172 nests, were individually collected from five archipelago populations: Xingrentuo (Xrt; 39°31′N, 123°03′E; n = 38), Hailvdao (Hld; 37°26′N, 122°40′E; n = 28), Mantoushan (Mts; 30°13′N, 121°53′E; n = 35), Riyu (Ry; 27°01′N, 120°25′E; n = 34) and Xiaocaiyu (Xcy; 23°48′N, 117°45′E; n = 37). The locations of these sampling archipelagoes span the entire Chinese distribution range of the Chinese egret, and can be taken to represent the different geographical populations across China (Fig. 1). The feather samples were preserved in 95 % ethanol and frozen at −80 °C. Genomic DNA was extracted using the Universal Genomic DNA Extraction Kit Ver. 3.0 (Takara, Dalian, China) following the manufacturer’s protocols, and then was kept at −80 °C until further use.

Fig. 1
figure 1

Geographical locations of the sampled Chinese egret populations in China. Xrt Xingrentuo, Hld Hailvdao, Mts Mantoushan, Ry Riyu, Xcy Xiaocaiyu

PCR and SSCP genotyping

Genetic polymorphism of exon 2 sequences at the three single Egeu-DAB loci was examined by semi-nested asymmetric polymerase chain reaction (PCR) combined with single-strand conformation polymorphism (SSCP), as previously described (Lei et al. 2015). Briefly, to first produce single-stranded amplicons, three-round PCRs, including the asymmetric PCR, were carried out. The single-stranded amplicons were then loaded on 10 % non-denaturing polyacrylamide gels (PAGEs) for electrophoresis, and visualized by the sensitive silver-staining procedure. Finally, SSCP-bands were excised from gels, re-amplified and sequenced following the protocols of Wang et al. (2013). To avoid the inclusion of PCR artifacts, every allele was directly sequenced in both directions from at least two individuals, or from two independent PCRs from one individual. Throughout this study, the word “allele” is used to describe the full-length exon 2 sequence (270 bp), of all three Egeu-DAB loci, derived from SSCP genotyping.

Data analyses

Exon 2 sequences obtained from the 172 individuals were aligned and edited using BioEdit v7.0.5.3 (Hall 1999). Estimates of allele frequency, observed heterozygosity and expected heterozygosity, and tests of deviation from Hardy–Weinberg equilibrium were assessed using GENEPOP 4.0 (Rousset 2008). Calculations of gene diversity and nucleotide diversity were conducted in FSTAT 1.2 (Goudet 1995) and DnaSP v5 (Librado and Rozas 2009), respectively. The pairwise nucleotide distance (p-distance) among haplotypes was calculated using MEGA 6.0 (Tamura et al. 2013). Allelic richness, a measure of the number of alleles independent of sample size, was estimated with FSTAT 1.2 (Goudet 1995).

Calculations of fixation index (ϕST) and pairwise comparison FST, analysis of molecular variance (AMOVA) and Mantel test were carried out with Arlequin 3.5 (Excoffier and Lischer 2010). ϕST, pairwise FST and AMOVA were based on both the haplotype frequencies and the molecular distances among haplotypes, using the Kimura two-parameter model. Statistical significance of the observed variance was determined using 1023 haplotype permutations. A Mantel test was carried out with 10,000 permutations, to investigate the isolation-by-distance relationship between the estimate of FST/(1 − FST) and the natural logarithm of geographic distance. Geographical distance (in km) was measured using Google Earth (http://earth.google.com), based on a straight line connecting each pair of sampled populations.

Phylogenetic relationships of exon 2 nucleotide sequences were analyzed separately for each of the three Egeu-DAB loci. PartitionFinder v1.1.1 (Lanfear et al. 2012) was used to determine the best-fit nucleotide substitution model, according to the Bayesian information criterion (BIC) and a “greedy” algorithm with branch lengths estimated as “unlinked”. The analyses suggested that the proposed best-fit nucleotide substitution model for each of the three locus-specific datasets was Kimura two-parameter model with gamma distribution substitution rates and proportion of invariable sites (BIC score: 1600.76, 1454.87, and 1258.68, respectively). These results were then implemented in phylogenetic analyses, which were conducted using the maximum likelihood method with 1000 bootstrap replicates in MEGA 6.0 (Tamura et al. 2013).

The Bayesian population clustering program STRUCTURE 2.3.3 (Falush et al. 2003) was used to investigate differentiation across the five Chinese egret populations. The structure analysis was conducted using an admixture model with correlated allele frequencies, and was run from K = 1 to 10 with 10 runs per K and a burn-in of 100,000 and 1,000,000 reps after the burn-in. The results were then uploaded to the Structure Harvester server (http://taylor0.biology.ucla.edu/structureHarvester/), which selects the number of clusters by simultaneously evaluating posterior probability and the Delta K statistic of Evanno et al. (2005).

Results

Allelic distribution within different populations

In the 172 examined individuals from five Chinese egret populations, a total of eight, eight and four exon 2 alleles were identified at Egeu-DAB1, -DAB2 and -DAB3 loci, respectively (Tables 1, S1, Supplementary material). For each locus, no stop codons or insertions or deletions were observed, and typically 1–2 alleles were identified per individual, suggesting that for each locus only one gene copy was sequenced with the primer sets. Sequences of these confirmed alleles were submitted to GenBank, with names according to the nomenclature proposed by Klein et al. (1990), denoted by the species’ gene prefix (Egeu-DAB), with a suffix comprising a locus number (1–3) and two sequential allele numbers (01–12). Their accession numbers are listed in Table 1.

Table 1 Allelic distributions of Egeu-DAB13 loci within the five Chinese egret populations

The deviation from Hardy–Weinberg equilibrium within each locus in each population was statistically significant (all P < 0.05). Allelic distributions of the three Egeu-DAB loci varied substantially within the five Chinese egret populations (Table 1). Egeu-DAB1*12 was detected in only one population (Hld), while the remaining 19 alleles were shared between at least two populations. The most frequent alleles shared were Egeu-DAB1*01, *02, *03, *06, Egeu-DAB2*01, *02, *04, and Egeu-DAB3*01, *02, which were found in all populations, at different frequencies. No single Egeu-DAB1 allele was most common in all populations. Egeu-DAB1*01 was most common in Hld (allele frequency: 0.357) and Xcy (0.351), while Egeu-DAB1*03 was most common in Xrt (0.329), Mts (0.257) and Ry (0.353). However, a single Egeu-DAB2 allele (Egeu-DAB2*01) and a single Egeu-DAB3 allele (Egeu-DAB3*01) was most common across all five studied populations, with frequencies higher than 0.45 (Table 1). More specifically, Egeu-DAB3*01 was highly abundant (allele frequency: >0.75) in all populations, and its frequency (0.776 → 0.786 → 0.871 → 0.912 → 0.959) increased with decreasing latitude (39°31′ → 37°26′ → 30°13′ → 27°01′ → 23°48′).

Genetic diversity in different populations

Several genetic diversity parameters were calculated to estimate genetic diversity of the vulnerable Chinese egret, including gene diversity (Gd), nucleotide diversity (π), allelic richness (AR) and expected heterozygosity (He). Detailed diversity statistics of the three Egeu-DAB loci within the five populations are summarized in Table 2. At each Egeu-DAB locus, the total number of haplotypes (N), Gd, π, AR, observed heterozygosity (Ho) and He were all found to vary slightly among the five populations. The mean AR across populations was 6.192 ± 0.202 for Egeu-DAB1, 6.022 ± 0.531 for Egeu-DAB2 and 2.734 ± 0.192 for Egeu-DAB3. Relatively high level of genetic diversity was found at the Egeu-DAB1 locus, as indicated by high values of Gd (0.744–0.812), π (0.056–0.067), AR (5.984–6.999) and He (0.737–0.806) (Table 2). Estimation of heterozygosity revealed significantly (P < 0.05) lower levels of observed heterozygosity than expected for all five populations at Egeu-DAB1; for Hld and Ry at Egeu-DAB2; and for Xrt, Mts and Ry at Egeu-DAB3. The average pairwise nucleotide distances among the haplotypes of the three Egeu-DAB genes were 0.077 ± 0.005, 0.061 ± 0.005 and 0.083 ± 0.008, respectively.

Table 2 Genetic diversity statistics of the Egeu-DAB13 loci in the five Chinese egret populations

Genetic differentiation among different populations

The AMOVA revealed low but highly significant ϕST values for the Egeu-DAB1 locus, whether based on haplotype frequencies (0.029, P < 0.01) or molecular distances (0.036, P < 0.01), with the majority of variance being found within populations (haplotype-based: 97.11 % and distance-based: 96.40 %) rather than among populations (2.89 and 3.60 %) (Table 3). Similar AMOVA results were shown for Egeu-DAB2, -DAB3, and all three loci combined (Egeu-DAB), suggesting that low but significant genetic differentiation was present among the five populations (haplotype-based ϕST: 0.020 for -DAB2, 0.042 for -DAB3, 0.007 for -DAB; and distance-based ϕST: 0.027, 0.043, 0.005, respectively; all P < 0.05) (Table 3).

Table 3 Analysis of molecular variance (AMOVA) of the Egeu-DAB13 loci in the Chinese egret

To further assess the genetic differentiation between populations, the pairwise comparison FST values were calculated (Table 4). Pairwise FST values based on haplotype frequencies estimated ranged from −0.016 to 0.097, while those based on molecular distances estimated ranged from −0.016 to 0.110, for the three Egeu-DAB loci. Forty-two out of the 60 (70.00 %) FST values were lower than 0.05 (Table 4), indicating low genetic differentiation among populations.

Table 4 Population pairwise FST values, based on estimated haplotype frequencies (below the diagonal) and based on estimated molecular distances (above the diagonal)

The Mantel test indicated that there was a significant isolation-by-distance pattern at the Egeu-DAB1 locus, comparing the FST/(1 − FST) value based on estimated molecular distances, with the natural logarithm of the geographic distance (r = 0.410, P = 0.047), however no significant isolation-by-distance relationship was suggested when FST/(1 − FST) was based on estimated haplotype frequencies (r = 0.275, P = 0.132) (Fig. 2a). No significant isolation-by-distance pattern at the Egeu-DAB2 locus was suggested, either based on estimated haplotype frequencies (r = 0.462, P = 0.129) or estimated molecular distances (r = 0.420, P = 0.146) (Fig. 2b). Significant evidence for an isolation-by-distance pattern was detected at the Egeu-DAB3 locus (haplotype-based: r = 0.903, P = 0.009 and distance-based: r = 0.893, P = 0.008) (Fig. 2c).

Fig. 2
figure 2

Isolation-by-distance, with pairwise comparisons of the five Chinese egret populations. a Comparisons at the Egeu-DAB1 locus; b Comparisons at the Egeu-DAB2 locus; c Comparisons at the Egeu-DAB3 locus. Filled triangles represent pairwise comparison values based on estimated haplotype frequencies, while open triangles represent values based on estimated molecular distances

Maximum likelihood trees of each of the three Egeu-DAB exon 2 nucleotide sequences showed little internal resolution, with sequences not grouped according to sampling location. This suggests that our phylogenetic analysis of the sequences does not demonstrate geographical structure of the genetic differentiation of the Chinese egret (Fig. 3). This absence of geographical structure was also suggested by the Bayesian clustering analysis, based on the three Egeu-DAB loci (Fig. 4). Although the Delta K showed one peak at K = 2 (Fig. S1, Supplementary material), the STRUCTURE analysis indicated very weak subdivision, where most egrets showed high levels of admixture between the two putative genetic clusters.

Fig. 3
figure 3

Maximum likelihood trees showing the phylogenetic relationships between Egeu-DAB alleles among the five Chinese egret populations. a Egeu-DAB1 alleles; b Egeu-DAB2 alleles; c Egeu-DAB3 alleles. Each allele is first denoted by the species’ gene prefix (Egeu), followed by a locus number (1–3) and two sequential allele numbers (01–12). Bootstrap values greater than 50 %, and the names of sampling locations (Xrt Xingrentuo, Hld Hailvdao, Mts Mantoushan, Ry Riyu, Xcy Xiaocaiyu) are shown

Fig. 4
figure 4

Genetic structure of the five Chinese egret populations in China. The genetic structure is based on the three Egeu-DAB loci and inferred by Bayesian clustering analysis, with the sampling location as prior information. Each bar represents the probability (P, y-axis) that an individual belongs to a particular color-coded population

Discussion

In this study, many Egeu-DAB alleles were found to be shared among populations, while only one population-specific allele was detected (Egeu-DAB1*12 in Hld, Table 1) suggesting a recent common historical past of these populations (Corrêa et al. 2015). On the other hand, our previous study found significant signs of positive selection in all the three Egeu-DAB loci, with dN/dS (dN, rate of non-synonymous substitution; dS, rate of synonymous substitution) ratios of the putative peptide-binding region being significantly greater than one (Lei et al. 2016). These findings indicated that most MHC alleles of the Chinese egret might have been conserved by positive selection (Weber et al. 2004; Luo et al. 2012), while Egeu-DAB1*12 might be the consequence of adaptive evolution occurred in Hld (e.g., responding to new pathogen variant). In addition, common alleles differed in frequency between populations. These common alleles might represent ancient sequences that have been selectively maintained in populations (Koutsogiannouli et al. 2014). The frequency of one of these common alleles, Egeu-DAB3*01, increased from north to south, which may correlate with the intensity of selective pressure from pathogens to which Egeu-DAB3*01 could respond. To assure the reliability of this conclusion, pathogen community characterization of the Chinese egret is needed.

Among the five Chinese egret populations, there was low but significant genetic differentiation, with little geographical structure, based on the following findings: (1) the AMOVA revealed low but highly significant ФST values for all three Egeu-DAB loci, using either haplotype frequencies or molecular distances; (2) the pairwise population analysis returned most FST values lower than 0.05, indicating low genetic differentiation among populations; (3) the phylogenetic analyses showed that there was no obvious geographical separation of Egeu-DAB alleles in any of the Maximum likelihood trees; (4) the Bayesian clustering analysis indicated very weak population subdivision at the three MHC loci. These MHC-based results are in accord with our previous mitochondrial data in the Chinese egret, which showed that three (Xrt, Ry, and Xcy) of the five populations sampled in this study had low but significant genetic differentiation with little geographical structure (Zhou et al. 2010). The Mantel test further suggested that the low but significant MHC genetic differentiation among our studied populations was likely due to an isolation-by-distance pattern, a commonly observed phenomenon in natural populations. It is defined as a decrease in the genetic similarity among populations as the geographic distance between them increases (Slatkin 1993; Jensen et al. 2005; Alcaide et al. 2008).

Population genetics data collected from adaptive loci, such as the MHC, can provide significant information on adaptive variation, making them useful for identifying AUs in threatened species (Funk et al. 2012; Zhu et al. 2013). Ideally, combining information from neutral and adaptive genetic markers can provide a clearer delineation of conservation units (Vásquez-Carrillo et al. 2013). The results of this new MHC approach complemented with the previous study of mtDNA neutral loci (Zhou et al. 2010) are unanimous in support that all Chinese egret populations in China might be considered as a single AU for conservation management. Although the Bayesian clustering analysis indicated that there were two possible genetic clusters, the five populations can be considered as a single AU based on our data, because most egrets in these two clusters showed high levels of admixture. This admixture could be related to the migratory colonial life pattern of the Chinese egret wintering in the south of Asia while breeding in Russia, North Korea, South Korea, and China (Kushlan and Hancock 2005; Zhou et al. 2010; IUCN 2015). Recently, it has been suggested that diversifying selection acts in a non-uniform manner across the entire MHC region (Smith et al. 2011), and pathogen data within wild populations would directly reflect the different characteristics of possible adaptive groups (Zhu et al. 2013). Nevertheless, MHC loci are unlikely to represent overall adaptive variation, and immune molecules derived from both the innate and adaptive immune system are significant in mounting an immune response to pathogens (Acevedo-Whitehouse and Cunningham 2006; Poelstra et al. 2014). Therefore, in future studies, it will be necessary to further clarify the genetic variation of Chinese egret populations using additional innate or adaptive immune system loci (e.g., Toll-like receptor, TLR; MHC class I UAA, class II DRA and DQB) combined with population pathogen data (e.g., parasite load), as well as using more comprehensive population sampling from a worldwide geographical range.