1 Introduction

The Eastern honeybee (Apis cerana Fabricius) is native throughout Asia and nests in distinct habitats of complex topography, divergent climate, and varied flora (Abrol 2013b). This species is a vital component of natural ecosystems that pollinates numerous flowering plants and agricultural crops (Sasaki et al. 1991; Verma and Partap 1993; Partap and Verma 1994; Klein et al. 2007; Kremen et al. 2007). Moreover, management of A. cerana is widely practiced in Asian countries and provides financial support to local people (Songram et al. 2006). However, despite the ecological and economic importance, Eastern honeybee colonies have undergone considerable decline in recent times (Abrol 2013a), and knowledge on the intra-specific structure of A. cerana is urgently required to guide conservation of this crucial species (Zayed 2009).

The within-species variability of A. cerana was explored originally using morphological methods, and more recently using molecular markers. Morphometric measurements were combined with multivariate analysis in a pioneering approach for determining honeybee diversity (DuPraw 1964; DuPraw 1965), and this methodology was used in a comprehensive examination of this species (Ruttner 1988). Extensive morphological investigations have now been performed on populations from most of the native habitats in which this species is found (Hepburn et al. 2001; Tan et al. 2003; Radloff et al. 2005a; Radloff et al. 2005b; Tan et al. 2008). Radloff et al. (2010) summarized these studies and inferred six highly differentiated major morphoclusters, building up a robust within-species systematics. However, inconsistencies between regional populations reported in different studies remain unexplained. For example, in mainland China, the work of Yang et al. (1986) identified five subspecies including Eastern (A. c. cerana), Tibetan (A. c. skorikovi), Aba (A. c. abansis), Indian (A. c. indica), and Hainan subspecies (A. c. hainana) (Yang et al. 1986), whereas morphological analysis by other groups identified four and three morphoclusters, respectively (Tan et al. 2008; Radloff et al. 2010). In particular, reports on bee populations from Hainan island differed markedly between studies.

More recently, molecular markers mostly based on mitochondrial DNA sequences have been used to investigate A. cerana diversity, and a framework structure consisting of four deeply divergent groups was suggested (Smith 2011b). Although mtDNA and morphological analysis do not always agree entirely, data on A. cerana differentiation from these approaches are in broad agreement (Smith 2011a). The mainland Asian group is comprised of three major morphoclusters: Northern, Himalayan, and Indo-Chinese. The Yellow Indian group corresponds to the Indian Plains morphocluster, the Sundaland group corresponds to the Indo-Malayan morphocluster, and the Oceanic Philippines group corresponds to the Philippine morphocluster. Sampling and molecular characterization of many regional populations of mainland China and its neighboring islands remains inadequate (Tan et al. 2007), as is the case for islands of the Sunda shelf (Smith 2011a). Most mtDNA studies on honeybee species to date have focused on mitochondrial genes of the intergenic region between the 3′ end of cytochrome oxidase subunit I and the 5′ end of cytochrome oxidase subunit II (COI-COII), and this region has proven to be highly informative for elucidating the Apis mellifera divergence (Arias and Sheppard 1996; Franck et al. 2000; Palmer et al. 2000; Clarke et al. 2001; de la Rua et al. 2001; Iiyasov et al. 2011). However, this fragment is not optimal for examining genetic variation of the Eastern honeybee as it is relatively small in size and is completely absent in samples from some A. cerana populations (Smith and Hagen 1996). Therefore, further studies using more suitable molecular markers (either mitochondrial or nuclear) are required.

Microsatellites, also referred to as simple sequence repeats (SSRs), are stretches of DNA sequence consisting of 1–6 tandem nucleotide repeats (Rasmus and Per 1999; Li et al. 2002). This type of molecular marker is abundant in eukaryotic genomes (Hiroshi et al. 1982; Tautz and Renz 1984), genetically codominant, and highly polymorphic (Tautz 1989; Hans 2004). In virtue of these genomic and genetic characteristics, it is widely used to interpret within-species variability of a variety of organisms (Paetkau et al. 1995; Van Hooft et al. 2000; Juan et al. 2008; Serrano et al. 2009; Boykin et al. 2010; Tashima et al. 2010). However, to the best of our knowledge, relatively few population genetic analyses have been performed on A. cerana using SSR (Sittipraneed et al. 2001; Ji et al. 2011; Rueppell et al. 2011).

In the present study, a set of 10 SSR markers from A. mellifera were genotyped against Eastern honeybee DNA samples from a previous study (Zhao et al. 2014). Genetic variation within and between these geographic populations was investigated at the molecular level, and combined analysis of mitochondrial and microsatellite data was used to probe phylogeographic structure.

2 Materials and methods

2.1 Sampling and DNA extraction

Bee samples were identical to those previously described in Zhao et al. (2014). A total of 360 A. cerana colonies representing 12 geographic populations (30 for each population) were collected from natural nests or semi managed log-hives in Guangdong province (GD), Guangxi autonomous region (GX), and Hainan island (HN), China. Geographic coordinate information is displayed in Figure 1. For each bee colony, one worker bee was randomly selected and total genomic DNA was extracted from its thorax using a standard phenol-chloroform extraction protocol (Smith and Hagen 1996). DNA was treated with ribonuclease A (Roche, Basel, Switzerland) and frozen at −75 °C until needed.

Figure 1.
figure 1

a Location of main sampling regions: GD Guangdong province, GX Guangxi autonomous region, HN Hainan island. b Location of sampling sites within the main sampling regions. Circles represent the following sampling sites: GDJL (N24.67°, E116.06°), Jiaoling County in GD; GDLM (N23.77°, E114.29°), Longmen County in GD; GDYD (N24.21°, E113.77°), Yingde City in GD; GXBH (N21.88°, E109.44°), Beihai City in GX; GXCZ (N22.02°, E107.52°), Chongzuo City in GX; GXGL (N24.98°, E110.42°), Guilin City in GX; GXHZ (N23.88°, E111.04°), Hezhou City in GX; GXLB (N23.87°, E109.31°), Laibin City in GX; HNBS (N19.34°, E109.46°), Baisha County in HN; HNHK (N19.95°, E110.30°), Haikou City in HN; HNTC (N19.35°, E110.07°), Tunchang County in HN; and HNWN (N18.75°, E110.35°), Wanning City in HN. Patterns within circles represent phylogeographical analysis of 12 bee populations based on individuals defined by mitochondrial mito-type (M) and microsatellite genotype (G). CH denotes the mainland group, HN denotes the island group.

2.2 Screening of microsatellite markers

A set of 48 microsatellite markers (listed in Supplementary file 1) originally developed from A. mellifera (Solignac et al. 2007) were prescreened and 10 (Table I) showed superior data quality and polymorphism when applied to A. cerana samples and were subsequently used in this study. For each of the 10 loci, the 5′ end of the corresponding forward primer was labeled with fluorescent dye (6-FAM) by Sangon Biotech Co., Ltd, Shanghai, China.

Table I Details and genetic polymorphism of the 10 selected SSR loci.

2.3 PCR and SSR genotyping

PCR was performed in a final volume of 20 μL, and reactions contained 11–11.4 μL sterile deionized water (depending on SSR primers), 2 μL of 10 × PCR buffer (Mg2+-free), 1.2–1.6 μL 25 mM MgCl2 (depending on SSR primers), 1.6 μL dNTPs (2.5 mM each), 0.8 μL each primer (10 μM), 0.2 μL Taq DNA polymerase (5U/μL, Takara), and 2 μL DNA template (50 ng/μL). All PCR amplifications were carried out on a Bio-Rad T100 thermocycler with the following conditions: initial denaturation at 95 °C for 3 min, followed by 25 cycles of denaturation at 94 °C for 30 s, annealing at 56–61 °C (depending on SSR primers) for 30 s, elongation at 72 °C for 30 s, and a final elongation at 72 °C for 10 min. A mixture of the 0.5 μL fluorescent PCR products, 0.25 μL size standard (GeneScan 500LIZ, Applied Biosystems, USA) and 9.25 μL highly deionized formamide was prepared, denatured at 95 °C for 10 min, cooled at −20 °C for 5 min, and then subjected to capillary electrophoresis using an ABI 3730xl DNA analyzer (Applied Biosystem Inc, the USA).

2.4 Data analysis

GeneMapper v.3.7 (Applied Biosystems, USA) was used to score allelic sizes according to the 500LIZ size standard, and allelic data were introduced into Microsoft Office Excel 2007 which was used to estimate diversity indicators including the mean number of alleles per locus (S) and geographic population (Na). The observed (Ho) and expected (He) heterozygosity were calculated using GenAlEx 6.5 (Peakall and Smouse 2006), while the polymorphic information content (PIC) and null allele frequency F(Null) were computed with Cervus 3.0 (Kalinowski et al. 2007). Using Genepop ver.4.0 (Rousset 2008), we performed linkage disequilibrium tests (with Fisher’s method and gametic phase unknown) for each pair of loci considering linkage equilibrium the null hypothesis. We also determined the inbreeding coefficient (Fis) of each population and performed the Hardy-Weinberg equilibrium (HWE) tests with heterozygote deficit considered as the alternative hypothesis using Genepop ver.4.0 (Rousset 2008). Four analysis of molecular variance (AMOVA) tests were carried out and their fixation indices (FCT, FSC, FST) of the corresponding levels (among groups, among populations within group, within populations) were determined by Arlequin 3.11 (Excoffier et al. 2005), with grouping patterns as follows: (1) mainland populations vs. island populations; (2) GD populations (including GDJL, GDLM and GDYD) vs. GX populations (including GXBH, GXCZ, GXGL, GXHZ, and GXLB); (3) HNHK vs. island populations HNBS, HNTC, and HNWN; and (4) mainland populations vs. HNHK vs. island populations HNBS, HNTC, and HNWN. To further explore patterns of divergence, we performed Bayesian inference using STRUCTURE 2.3.1 (Evanno et al. 2005) with the following parameters: length of burn-in period = 2,000,000; number of MCMC repeats after burn-in = 3,000,000; “Admixture model” and “Allele frequency correlated” selected; K values predefined from 1 to 12; and number of iterations = 20. The results were used by Structure Harvester v0.6.94 to generate log probability diagrams http://taylor0.biology.ucla.edu/structureHarvester/(Earl 2012), also by CLUMPP v1.1.2 (Jakobsson and Rosenberg 2007) with “Greedy algorithm” employed and repeats = 10,000 and DISTRUCT v1.1 (Rosenberg 2004) to draw a multicolored figure. MEGA v.6.06 (Tamura et al. 2013) was used to reconstruct neighbor-joining (NJ) relationships based on genetic distance matrices of the present microsatellites (Nei 1972) and previous mitochondrial (Kimura-2-parameter model) DNA data. Further individual assignment tests were performed using GENCLASS 2.0 (Piry et al. 2004). The previously determined mtDNA clustering pattern (all mainland populations represented as the CH group, HNBS + HNWN represented by the HN group) was treated as the reference, and Bayesian (Rannala and Mountain 1997), frequency-based (Paetkau et al. 1995) and genetic distance-based (Nei et al. 1983) algorithms were employed. Final assignment of an individual was dependent on the mean probability of the three methods. Detailed phylogeographic characterization could then be performed when both the microsatellite genotype and mitochondrial mito-type of each individual was determined.

3 Results

3.1 Polymorphism at microsatellite markers

Individuals GDJL-4 and GXGL-5 could not be genotyped at more than eight loci and were excluded from subsequent analysis. A total of 151 alleles were identified at all 10 SSR loci from 358 bee samples (the frequency and distribution of each allele is listed in Supplementary file 2). The number of alleles per locus varied from 7 (AP226) to 24 (AC139), with an overall mean of 15.1 (Table I). All markers proved highly informative (PIC > 0.5) with the exception of AP226 (PIC = 0.231) and are therefore useful for estimating genetic diversity. The results of linkage disequilibrium tests in Supplementary file 3 demonstrated large p values (>0.05) for all loci pairs but AC139 to AT170 (p = 0.003762), suggesting random genetic association among microsatellite markers except AC139 to AT170. Polymorphism at these markers was also determined by estimating heterozygosity, and observed (Ho) and expected (He) heterozygosity ranged from 0.230 to 0.879 and 0.227 to 0.870, respectively. Null allele frequency estimation F(Null) peaked at 9.95 % for AP187 and averaged 6.55 % across all loci, while the HWE test with heterozygote deficit considered as the alternative hypothesis reached a significant level at eight loci, but this threshold was not reached for AP226 (PHWE = 0.4831) or AT141 (PHWE = 0.2808).

3.2 Genetic variation within populations

The overall level of variability across the 10 loci in the 12 A. cerana populations and the mainland and island population was investigated (Table II). The mean number of alleles (Na) ranged from 6.8 to 13.8, and observed and expected heterozygosity was 0.603–0.661 and 0.646–0.712, respectively. These narrow ranges indicated homogeneously distributed variation among different populations. All populations except GXLB were significantly deviated according to the Hardy-Weinberg equilibrium test results (P HWE <0.05) when heterozygote deficiency was chosen as the alternative hypothesis. This result implied a higher degree of inbreeding for most populations. Though the inbreeding coefficients (Fis) displayed the same implication, the Wahlund effect could also be an explanation to present results.

Table II Statistical estimation of genetic variation in Apis cerana populations.

3.3 Phylogeographic structure

The results of AMOVA tests (Table III) revealed that variation within populations always contributed the biggest proportion (93.78–98.84 %), while less but significant variation (FCT = 0.0494**) was detected between mainland and island population groups, indicating genetic partitioning between populations on either side of the strait. A moderate level of differentiation (FCT = 0.0241) was apparent between the Hainan island population (HNHK) and the other three populations (HNBS, HNTC, and HNWN), indicating three main groups (mainland, HNHK, and HNBS + HNTC + HNWN). Bayesian analysis (Figure 2) gave the highest Ln probability L(K) of −11633.3 (averaged over 20 replicates) when three groups were assumed, in agreement with the AMOVA results. Unrooted NJ trees based on either microsatellite or mitochondrial DNA data displayed a similar topology (Figure 3). Specifically, mainland clustered together into one branch, while the HNHK population forms an independent branch. The only significant difference between the two trees is the positioning of the HNTC population, which is clustered with HNBS + HNWN in the microsatellite tree, but is more closely associated with mainland populations in the mitochondrial DNA tree. The results in Supplementary file 4 are from individual assignment tests and previous mtDNA data (Zhao et al. 2014) that we used to list the mitochondrial mito-type and microsatellite genotype of each individual bee based on four bee types (Figure 1b): CH mito-type and genotype M(CH) + G(CH); CH mito-type and HN genotype M(CH) + G(HN); HN mito-type and genotype M(HN) + G(HN); and HN mito-type and CH genotype M(HN) + G(CH). Almost all mainland individuals belonged to M(CH) + G(CH), whereas most island bees (mostly from HNBS and HNWN) were M(HN) + G(HN), indicating a clear phylogeographic structure. However, 13 individuals from HNHK and four from HNTC belonged to M(CH) + G(CH), consistent with genetic introgression in the two island populations. Interestingly, 10 individuals from HNTC and three from HNHK were M(CH) + G(HN), a hybrid between CH and HN, which may explain why HNTC clustered together with HNBS + HNWN in the microsatellite tree but was closer to mainland populations in the mitochondrial tree (Figure 3). The other hybrid type M(HN) + G(CH) was seldom detected.

Table III Molecular variance of the 12 Apis cerana populations.
Figure 2.
figure 2

a Log probability (L(K)) of K values ranging from 1 to 12 for the admixture and correlated frequencies model based on 20 replicates from Apis cerana populations (K = number of predefined groups). Length of burn-in = 2,000,000. MCMC repeats after burn-in = 3,000,000. Vertical lines indicate the standard deviation. b Graphic result of structure analysis with ‘Greedy algorithm’ employed and repeats = 10,000 in CLUMPP, K = number of predefined groups, and colors indicate the predefined groups. Populations are separated by thin black lines and each of the 358 bee individuals is represented by a thin vertical line.

Figure 3.
figure 3

Unrooted neighbor-joining trees of 12 geographic populations based on different genetic distance matrices a based on 10 microsatellite loci using Nei’s standard genetic distance (1972) as shown in the lower triangle of c; b based on mitochondrial DNA sequences using the Kimura-2-parameters distance shown in the upper triangle of c; and c genetic distance matrices based on both microsatellite DNA (lower triangle) and mitochondrial DNA (upper triangle). Circles and triangles represent mainland and island populations, respectively.

4 Discussion

Trans-species microsatellite markers from A. mellifera to other Apis species have been used successfully in the past, e.g., Oldroyd et al. (Oldroyd et al. 1997; Sittipraneed et al. 2001; Cao et al. 2012). In the present paper, we employed 10 SSRs from A. mellifera to investigate A. cerana, of which all except AP226 (PIC = 0.231) are highly polymorphic (PIC >0.5) according to a previous report (Vanhala et al. 1998) and are therefore suitable for evaluating genetic variation. A much larger PIC value (0.8966) was reported previously for the AP226 locus, and 21 alleles were described (Ji et al. 2011). This apparent difference from the present study may be explained by different sampling strategies, alternative methods of allele discrimination, or a combination of both factors. It should be noted that our results have little direct comparability with previous SSR studies, since different microsatellites have been used for analysis, with only AP226 in common with (Ji et al. 2011). All comparisons are therefore tentative.

The genetic diversity (average expected heterozygosity) of the geographic populations ranged from 0.646 to 0.712 (average = 0.685), indicating homogeneously distributed variation among different populations. Island populations generally exhibit lower genetic variation than their mainland counterparts, presumably due to the initial loss of diversity upon foundation, and a more constrained population size following foundation (Frankham 1997). However, our results do not fit such a model, suggesting gene flow between mainland and island populations has reduced inter-population differences.

We observed notable differences in the distribution of alleles between mainland and island populations (Supplementary file 2), implying genetic partition between the mainland (CH) and island (HN) groups. The results of AMOVA (Table III) together with unrooted NJ trees (Figure 3) also exhibited this pattern of genetic partitioning, illustrating the peculiarity of Hainan bee populations that was first noted by a morphometric study (Yang et al. 1986) and subsequently confirmed using mtDNA analysis (Zhao et al. 2014). The present study therefore reconfirmed this at the nuclear molecular level. Hainan island is separated from mainland China by the Qiongzhou Strait, and it has been suggested that various island species have been separated from and reunited with their mainland counterparts several times due to rising and falling sea levels during the Pleistocene (Voris 2000). Differentiation between species on either side of the strait may therefore reflect independent evolution since the last rise in sea level.

Within the Hainan island population, a moderate level of variation was detected between HNHK and HNBS + HNTC + HNWN (Table III; Figure 3). Given that A. cerana colonies have been transported to the island from mainland China on numerous occasions, this intra-island divergence is likely a consequence of gene flow that has resulted in genetic introgression in the HNHK population. We noticed that over half of all HNHK individuals were of the CH (13/30) and hybrid (5/30) types (Supplementary file 4; Figure 1b). Similar introgression has been reported in A. mellifera populations (Garnery et al. 1998; Franck et al. 2000; De la Rúa et al. 2001; Dall’Olio et al. 2007) and is considered a potential threat to indigenous genetic diversity (De la Rua et al. 2003; Cánovas et al. 2011). The unusual Hainan race is particularly sensitive to such introgression owing to its narrow distribution and limited population size. Therefore, it is suggested with some urgency that importation of exotic bee colonies is ceased, and conservation zones for existing Hainan populations are created and protected.

Previous mtDNA data identified a large introgression (14/30) of the CH mito-type in HNTC individuals (Supplementary file 4; Figure 1b), but the present SSR data detected this genotype in only 4/30 HNTC individuals. Using the same samples but different molecular markers can reveal distinct introgression patterns, as has been shown previously (Franck et al. 1998; De la Rúa et al. 2001). It was suggested that repeated importation may result in introgression that is faster in mtDNA than in nuclear genes (Garnery et al. 1998), presumably because all daughters of a newly introduced queen share her mtDNA, while nuclear genes inherited from exotic species are diluted due to polyandric mating. Our HNTC results are consistent with this interpretation; however, it should be noted that the discrepancy in the rate of introgression may impact the results to some extent. Additionally, sample size (number of individuals per colony and population) may also be an important factor. Further investigation is needed in this regard.

Our microsatellite data indicated no significant variation among mainland populations and therefore no inter-population subdivisions. The eight populations sampled essentially share the same gene pool, and this was the conclusion from both mtDNA and SSR data.

In summary, we reexamined microsatellite polymorphism data from A. cerana populations on Hainan island and southern mainland China and detected a distinct phylogeographic structure between mainland and island populations. These results further confirm the peculiarity of the Hainan island race. Our results revealed genetic introgression resulting from gene flow at both population and individual levels. Combined analysis of microsatellites and mitochondrial DNA identified two distinct introgression patterns in HNTC individuals. Together, these results allow some tentative conclusions on A. cerana phylogeographic structure to be made and will inspire further studies.