Introduction

Citrus is the most important fruit crop in the world, with a production of over 128 million tons and a cultivated area of 9.2 million hectares (FAOSTAT 2013). Among the commercial citrus fruits, mandarins are the second most important group in the fresh fruit market worldwide. ‘Mandarin’ is a common name given to most small, easy-peeling citrus fruits. This term includes interspecific hybrids, which make mandarins the most genetically and phenotypically polymorphic group of true Citrus (Nicolosi et al. 2000; Barkley et al. 2006; Garcia-Lor et al. 2012, 2013a). Moreover, a recent phylogenetic study (Garcia-Lor et al. 2013a) revealed a close relationship between the genus Fortunella and the mandarin group. Mandarin germplasm was classified as Citrus reticulata Blanco by Swingle and Reece (1967) and Mabberley (1997). On the contrary, Webber (1943) classified mandarin genotypes into four different groups: king, satsuma, mandarin and tangerine. Tanaka (1954) divided the mandarins into five groups that included 36 species, based on morphological differences in the tree, leaves, flowers and fruits. Group 1 included C. nobilis Lour. (‘King’), C. unshiu Marc. (satsumas) and C. yatsushiro Hort. ex Tanaka; group 2 included C. keraji Hort. ex Tanaka, C. oto Hort. ex Yuichiro and C. toragayo Hort. ex Yuichiro; group 3 included 14 species, including some of the most economically important varieties, such as C. reticulata (‘Ponkan’), C. deliciosa Tenore (‘Willowleaf’ or ‘common mandarin’), C. clementina Hort. ex Tanaka (clementines) and C. tangerina Hort. ex Tanaka (‘Dancy’); group 4 included C. reshni Hort. ex Tanaka (‘Cleopatra’), C. sunki Hort. ex Tanaka (‘Sunki’) and C. tachibana (Mak.) Tanaka; and group 5 included the species C. depressa Hayata (‘Shekwasha’) and C. lycopersicaeformis (Lush.) Hort. ex Tanaka. Hodgson (1967) divided the mandarins into four species: C. unshiu (satsumas), C. reticulata (‘Ponkan’, ‘Dancy’, clementines), C. deliciosa (‘Willowleaf’) and C. nobilis (‘King’).

None of these citrus classification systems is perfect, but the Tanaka system seems better adapted to the horticultural features of each group, whereas the Swingle system simplifies it to the extreme. At present, C. reticulata (mandarin) is considered to be one of the four ancestral groups of the cultivated citrus (Nicolosi et al. 2000; Barret and Rhodes 1976; Krueger and Navarro 2007), along with C. maxima (Burm.) Merr. (pummelo), C. medica L. (citron) and C. micrantha Wester (papeda). The centre of diversification of C. reticulata is located in Asia, from Vietnam to Japan (Tanaka 1954). This group is highly polymorphic, as revealed by molecular markers (Coletta-Filho et al. 1998; Ollitrault et al. 2012a), chromosomal banding patterns (Yamamoto and Tominaga 2003) and phenotypic characters, such as fruit pomology and the chemical variability of peel and leaf oils (Lota et al. 2000; Fanciullino et al. 2006), as well as tolerance to biotic and abiotic stresses. Several germplasm collections have been characterised by morphological characteristics and/or molecular markers (Barkley et al. 2006; Koehler-Santos et al. 2003; Tapia Campos et al. 2005). This phenotypic and genetic variability reflects a long history of cultivation, in which many mutations and natural hybridisations have given rise to the existing diversity within this mainly facultative apomictic group, including the recently published introgression of ancestral genomes, like C. maxima, in some mandarins (Wu et al. 2014; Curk et al. 2014, 2015). The intraspecific organisation of mandarins and the determinants of the group’s phenotypic diversity remain poorly understood.

In addition to the taxonomic complexity of the mandarin group, the genotypes included in citrus germplasm collections are sometimes of doubtful origin. The origin of these genotypes can be from plant explorations in regions of natural genetic diversity (mainly Asia in mandarins), selection of new materials from hybridisations or mutations or by exchange between germplasm collections (Krueger and Navarro 2007). The assignation of a cultivar name and/or membership in a species can be done arbitrarily, with no molecular basis, leading to possible mistakes in assignation or duplication of material (Krueger and Navarro 2007). For these reasons, molecular studies are important for the detection of misidentifications and redundancies (Krueger and Roose 2003).

To clarify to which genetic group we refer by using the term ‘mandarin’ in the text of this work, we have used the following nomenclature: C. reticulata for ‘mandarin’ as a true species (one of the four ancestors of the cultivated citrus); C. reticulata (Sw) for ‘mandarin’ according to the Swingle classification; C. reticulata (Tan) for one of the 17 ‘mandarin’ species represented in this work according to the Tanaka classification; and ‘mandarin-like’ for genotypes that are phenotypically similar to mandarins.

The goals of this work were (1) to characterise the mandarin germplasm using nuclear (simple sequence repeats (SSRs), indels, single nucleotide polymorphisms (SNPs)), chloroplastic SSRs (cpDNA) and mitochondrial indel markers (mtDNA); (2) to evaluate its genetic diversity and detect redundancies; (3) to detect introgressions of other citrus ancestral taxa into the mandarin germplasm; and (4) to determine the genetic structure within the mandarin group.

In this study, we have observed the introgression of several ancestral genomes in the mandarin genotypes considered previously as pure mandarins and also that many mandarins appear to have a very complex genetic organisation (mixture of mandarin genomes).

Materials and methods

Analysed germplasm

One hundred ninety-one genotypes were studied to determine their nuclear diversity. Throughout the text, these genotypes will be referred to by identification number (ID), shown in Online Resource 1. Genotype classification was performed according to the Swingle and Reece (1967) and Tanaka (1954) systems. A summary of the genotypes used is shown in Table 1. Plant material for the analysis was collected from the germplasm collections of the Instituto Valenciano de Investigaciones Agrarias (IVIA, Valencia, Spain), mainly obtained from American and Mediterranean sources, and the Station de Recherches Agronomiques (CIRAD-INRA, Corsica, France), which include many genotypes of Asiatic origin (China, Japan, Philippines, India, Vietnam…). The databases of both collections are based on the Tanaka system of classification. These genotypes belong to the four ancestral species (26 C. reticulata (Sw, mandarins), C. indica and C. tachibana), ten C. maxima (pummelos), six C. medica (citrons), two Papeda (C. hystrix D.C. and C. micrantha) and four Fortunella (kumquats: F. crassifolia Swing., F. hindsii (Champ.) Swing., F. japonica (Thunb.) Swing. and F. margarita (Lour.) Swing.). The 26 mandarin genotypes considered as C. reticulata by Swingle and Reece (1967) were considered by Tanaka (1977) as 15 species. The other genotypes (141 ‘mandarin-like’ accessions, intra- and interspecific hybrids) were not assumed in any of the previously mentioned main taxa, in order to decipher their structure and determine whether their Tanaka classification in the germplasm bank data was properly assigned in our databases. Severinia buxifolia (Poir.) Tenore was added as an out-group for neighbour-joining analysis.

Table 1 Summary of genotypes employed in the study

For the maternal phylogeny, besides the ancestral species and interspecific hybrids detailed before, eight extra genotypes (secondary species) were analysed (two C. sinensis (L.) Osb., two C. aurantium L., two C. paradisi Macf., one C. aurantifolia (Christm.) Swing. and one C. limon Osb.).

Genotyping

Fifty SSR markers (Kijas et al. 1997; Froelicher et al. 2008; Luro et al. 2008; Aleza et al. 2011; Cuenca et al. 2011; Kamiri et al. 2011; Garcia-Lor et al. 2012, 2013a, b) located along the nine linkage groups of the reference genetic map of clementine (Ollitrault et al. 2012b), 24 indel markers identified in a discovery panel representative of the genus Citrus (Garcia-Lor et al. 2012, 2013a) and 67 SNP markers mined in 27 nuclear genes (Garcia-Lor et al. 2013a) and in clementine BAC-ends (Ollitrault et al. 2012a) were used (Online Resource 2). To assess the maternal origin of the mandarin germplasm, eight chloroplastic SSR markers (Cheng et al. 2005) (ccmp1, ccmp2, ccmp4, ccmp5, ccmp6, NTCP7, NTCP9 and NTCP28) and four mitochondrial indel markers (Froelicher et al. 2011) (nad2, nad5, nad7, rrn5/rrn18) were used. After the initial analysis, nad5 (no polymorphisms were found in our population) and three chloroplastic markers (NTCP28, ccmp1 and ccmp4) were discarded. In the latter, due to bad amplifications or due to a difference in polymorphisms of just one base that could be more confusing than clarifying.

For SSR and indel markers, amplifications by polymerase chain reaction (PCR) and analyses with a capillary genetic fragment analyser (CEQ/GeXP Genetic Analysis System; Beckman Coulter, Fullerton, CA, USA) were performed as described in Garcia-Lor et al. (2012). The Genetic Analysis System software (GenomeLab GeXP, v. 10.0) was used for data collection and analysis.

SNPs were genotyped by competitive allele-specific PCR as described by Garcia-Lor et al. (2013b).

Data analysis

The allelic data obtained with the SSR, indel, SNP, cpDNA and mtDNA markers were used to calculate genetic dissimilarity matrix, at nuclear, chloroplastic, mitochondrial and cytoplasmic levels, using the simple matching dissimilarity index (d ij ) between pairs of accessions (units), with the Darwin5 software, version 5.0.159 (Perrier and Jacquemond 2006). Weighted neighbour-joining (NJ) analyses (Saitou and Nei 1987) were computed to describe the population diversity organisation, and robustness of branches was tested using 1000 bootstraps.

Population structure was inferred with the programme Structure, v. 2.3.3 (http://cbsuapps.tc.cornell.edu/structure), which implements a model-based clustering method using genotype data (Pritchard et al. 2000; Falush et al. 2003). When there is a known population structure, it allows to calculate their contribution to genomes of genotypes of unknown origin. In cases of unknown population structure, the Structure programme helps to assign the optimal number of populations within the sample data set under study, based on the parameters of Evanno et al. (2005).

F-statistics were calculated with the programme GENETIX, v. 4.03 (Belkhir et al. 2002), based on the parameters of Wright (1969) and Weir and Cockerham (1984). Some other genetic population statistics were estimated from the allele data using the programme PowerMarker, v. 3.25 (Liu and Muse 2005).

Results

Nuclear genetic diversity parameters

Genetic diversity statistics were calculated for each SSR, indel and SNP marker for the 191 genotypes analysed (Online Resource 3).

Using the SSR markers, we detected 529 alleles. Allele numbers varied between 4 (MEST107) and 18 (MEST192). The average number of alleles and the H e (expected heterozygosity) value per locus were 10.6 ± 0.508 and 0.63 ± 0.021, respectively. The whole population had an observed heterozygosity (H o) of 0.60 ± 0.024. F w (Wright fixation index) values varied from −0.40 (CAC23) to 0.48 (CimCrCIR01D11). The average F w value over all SSR loci was close to 0 (0.04 ± 0.024).

We detected a total of 74 alleles with the indel markers. Allele number per locus ranged from 2 (10 markers) to 7 (IDDFR), with an average of 3.1 ± 0.182. The H e values ranged from 0.01 (IDINVA2) to 0.66 (IDDFR), with a median value of 0.20. F w values varied from −0.52 (IDF’3H) to 1.00 (IDINVA1). The overall H o and F w values among all loci were 0.19 ± 0.033 and 0.29 ± 0.067, respectively.

All SNP markers were biallelic (134 alleles identified). The whole population had an observed (H o) and expected (H e) heterozygosity of 0.27 ± 0.023 and 0.26 ± 0.016, respectively. The minimum H o value was 0.01 (4 markers) and the maximum 0.52 (CCC1-M85). The overall F w value among all loci was 0.12 ± 0.059.

Genetic population statistics within the whole population (AG, [ID 1–191]), all ‘mandarin-like’ genotypes of unknown or supposed hybrid origin (AM, [ID 51–191]) and the 28 genotypes selected from all Tanaka species represented in our collections (MT, [ID 1–28]) (Online Resource 1) are summarised in Table 2. Gene diversity (GD) and the H o values were higher among SSR markers than SNP and indel markers, reflecting the higher maximum allele frequencies (MAF) of the indel and SNP than SSR. Comparing the whole population (AG), all ‘mandarin-like’ genotypes (AM) and mandarins from Tanaka species (MT), the mean allele number decreased at each step for SSR and indel markers (SSRs AG = 10.58 > AM = 6.84 > MT = 6.76; indels AG = 3.08 > AM = 2.25 > MT = 2.02) and for the SNPs (AG = 2 > AM = 1.90 > MT = 1.79). The GD was higher in AG than in AM or MT for both kinds of markers. For AG, H o was slightly lower than H e, leading to slightly positive F w for SSR, indel and SNP markers. In AM and MT, H o values were higher than H e, providing negative F w values for both kinds of markers. In AM, 13 loci did not show polymorphism and in MT 14.

Table 2 Genetic population statistics within the whole population, all ‘mandarin-like’ genotypes and Tanaka mandarin species

Rare alleles

The ‘mandarin-like’ population included 20 genotypes with unique alleles (Online Resource 4), ranging from 1 (11 genotypes) to 10 (‘Nicaragua’; [ID 112]) unique alleles per genotype.

Classifications by NJ analysis

For the whole data set (SSR, indel and SNP markers), NJ analysis (Fig. 1) revealed a clear differentiation between the five main taxa studied, the four ancestral Citrus groups (papeda, citron, pummelo and mandarin) and kumquat, with very high bootstrap support. The combination of the data from SSR, indel and SNP markers revealed high intraspecific diversity in the mandarin group, which was not well resolved (low bootstrap support in many branches). From the whole data set, 22 genotypes were reduced to 8 multilocus genotypes (MLGs; Online Resource 5). Some of these were originated by budsport mutations (different cultivars of C. unshiu or C. clementina), and others are possible redundant genotypes collected and named differentially in different locations.

Fig. 1
figure 1

NJ analyses with the SSR, indel and SNP data for the entire data set and all the ‘mandarin-like’ genotypes (1000 bootstraps). Numbers represented has correspondence with the ID numbers in Online Resource 1. Bootstrap values over 50 are represented. Entire data set (191 genotypes) representing the four ancestral Citrus species (C. reticulata, C. maxima, C. medica, Papeda) and Fortunella

Contribution of the ancestral taxa to the mandarin group and modern hybrids: analysis with the Structure software

The SSR, indel and SNP data were analysed with the Structure software to assess the contribution to the mandarin germplasm of the four ancestral Citrus taxa (C. reticulata (Sw), C. maxima, C. medica and Papeda) and Fortunella, using an admixture model and the option of correlated allele frequencies between populations. The degree of admixture alpha was inferred from the data. The burn-in period was set to 500,000, and MCMC (Markov Chain Monte Carlo) repetitions were set to 1,000,000; 10 runs of Structure with K = 5 (5 populations assumed) were performed. These populations were as follows: mandarin (Sw), C. indica and C. tachibana (28 samples, representing 17 Tanaka species), pummelo (10 samples), citron (6 samples), papeda (2 samples) and kumquat (4 samples). The other samples analysed (141) were assumed to have been derived from these ancestral populations (Online Resource 1). Assuming an admixture model between the four ancestral citrus species and Fortunella (Online Resource 1, genotypes 1–50), the relative proportion of these genomes in the mandarin group and recent hybrids was inferred using Structure, v. 2.3.3 (Fig. 2), with the complete data set (SSR + indel + SNP). Contributions lower than 5 % were not considered (since they may be artefacts or due to homoplasy in the SSR markers, Barkley et al. (2009)).

Fig. 2
figure 2

Structure analysis of 191 genotypes representing the four ancestral Citrus species (C. reticulata, C. maxima, C. medica, Papeda) and Fortunella. Dark blue, C. reticulata (Sw.) (1); dark red, C. maxima (2); green, C. medica (3); purple, Papeda (4); pink, Fortunella (5). Genotypes 51–191 are genotypes without assigned populations. Numbers’ correspondence in Online Resource 1

Thirteen of the 50 genotypes assumed to belong to one of the ancestral citrus populations, as well as Fortunella, appeared to contain a certain degree of contribution from other ancestors. This was particularly the case for genotypes considered as mandarin species by Tanaka. The two C. amblycarpa (Hassk.) Ochse (only differing by five SSR markers) had a very high contribution from the Papeda genome (∼55 %), with the remainder (∼45 %) from C. reticulata. Citrus deliciosa [ID 3] had a 4 % contribution from other ancestors, C. depressa [ID 4] had introgression from Fortunella (∼15 %). C. indica Tan. [ID 7] has a tri-hybrid genome origin according to the Structure analysis (25 % from C. reticulata, 45 % from C. medica and 30 % from Papeda). The two C. nobilis [ID 10 and ID 11] had around 15 % introgression from C. maxima. Citrus suavissima Hort. ex Tan. [ID 18] and C. succosa Hort. ex Tan. [ID 19] presented introgression from C. maxima (∼7 %). Citrus tachibana [ID 24] appeared to have introgression from Fortunella (∼19 %). The two C. unshiu [ID 27 and ID 28] had introgression from the C. maxima genome (∼14 %). One ancestral Papeda, C. hystrix DC., had 6 % contribution from Fortunella.

The contribution of mandarin to the genomes of the 141 ‘mandarin-like’ genotypes that were not included in any of the five pre-assumed populations (Online Resource 1, genotypes ID 51/ID 191) was on average ∼87 %. Pummelo contributed on average 10 %, and papeda, kumquat and citron contributions were lower than 5 % (Fig. 3). Contributions in individual genotypes lower than 5 % were not considered for the calculations.

Fig. 3
figure 3

Contributions of the ancestral genomes (mandarin, pummelo, citron, papeda) and kumquat to the ‘mandarin-like’ genotypes under study. (Mand, blue) mandarin, (Pum, dark red) pummelo, (Cit, green) citron, (Pap, purple) papeda, (For, pink) kumquat

In the whole data set, only the citrus ancestors (C. maxima, C. medica and Papeda) and Fortunella did not exhibit any contribution from the mandarin genome. The 141 genotypes analysed with no assumed population had at least 60 % contribution from C. reticulata. Seventy-seven genotypes had a C. maxima contribution of at least 5 %, with a maximum of 50 %. Papeda contributed 5–10 % to 11 genotypes, C. medica contributed 10–20 % to only one genotype and Fortunella contributed 5–20 % to five genotypes.

Inferring clusters in the mandarin population

The statistics used to select the correct K value were the ones followed by Evanno et al. (2005): the mean likelihood, L(K); the mean difference between successive likelihood values of K, L’(K); the absolute value of this difference, ׀L”(K)׀; and ΔK, which is the mean of the absolute values of L”(K) divided by the standard deviation of L(K). The likelihood distribution L(K) and ΔK were the main values used to choose the optimal K value of the population. One analysis was performed to obtain the correct number of groups within the mandarin germplasm.

Fig. 4
figure 4

DARwin analysis comparing the Structure populations found to the groups observed with DARwin within the mandarin germplasm (166 genotypes). Bootstrap values over 50 are represented. One thousand resamplings were performed. Seven mandarin groups were identified at the nuclear level. (N) Nuclear group. N1, light blue (all C. deliciosa); N2, purple (C. nobilis); N3, orange (mandarin hybrids); N4, red (all C. unshiu); N5, green (C. reticulata); N6, light red (C. tangerina and others); N7, grey (C. depressa, C. reshni, C. sunki, C. tachibana)

One hundred sixty-six genotypes were selected from the first Structure analysis (191 genotypes, K = 5, Fig. 2), discarding the genotypes of the four ancestral taxa and three of the mandarin genotypes considered by Tanaka as species, but that showed strong introgression from other ancestors (the two C. amblycarpa [ID 1 and ID 2] and C. indica [ID 7]). The Structure analysis was performed with no population assignation. The optimal ΔK was 7 (Online Resource 6).

Structure analysis was compared with a NJ tree (Fig. 4) to validate the clustering. Genotypes included in each of the seven mandarin groups identified are presented in Table 3 in relation to the Tanaka classification. Nuclear group 1 (N1) included seven genotypes (all C. deliciosa); nuclear group 2 (N2) included three genotypes, two C. nobilis and one unknown; nuclear group 3 (N3) was formed by two genotypes of unknown origin [ID 83 and ID 188]; nuclear group 4 (N4) included seven genotypes (all C. unshiu); nuclear group 5 (N5) included 18 genotypes (94.44 % C. reticulata); nuclear group 6 (N6) included 13 genotypes (mainly C. tangerina and C. reticulata); and nuclear group 7 (N7) included ten genotypes (small mandarins, like C. depressa, C. tachibana, C. reshni or C. sunki). The genotypes included in N3 exhibited a high degree of mutual similarity and shared a high percentage of C. sinensis molecular marker data (ID 83 85.33 % and ID 188 96 %), and also exhibited high heterozygosity. Therefore, they are very probably interspecific hybrids, similar to sweet orange.

Fig. 5
figure 5

Mandarins’ genome structure. Contribution of the seven parental mandarin groups (N1–N7) into the mandarin genome portion of each ‘mandarin-like’ genotype under study. Contributions lower than 5 % were discarded. N1, light blue (all C. deliciosa); N2, purple (C. nobilis); N3, orange (mandarin hybrids); N4, red (all C. unshiu); N5, green (C. reticulata); N6, light red (C. tangerina and others); N7, grey (C. depressa, C. reshni, C. sunki, C. tachibana)

Table 3 Parental mandarin groups identified at the nuclear level within the mandarin germplasm

Contribution of the various mandarin groups to the constitution of the other mandarin genomes

The contributions of the seven mandarin groups identified within the mandarin germplasm into the other mandarins under study is summarised in Fig. 5 and their population statistics in Table 4. H o was higher than H e for the seven groups for SSR, indel and SNP markers, leading to negative F w values. The whole ‘mandarin-like’ population exhibited a similar pattern. The mandarin-like genotypes exhibited complex hybrid structures with contributions from more than two genomes (Online Resource 7).

Table 4 Summary statistics of the whole mandarin population and the seven groups identified at the nuclear level (N1–N7)

The average contribution of each nuclear group to the genotypes not included in any defined population was 15 % for N1, 9 % for N2, 30 % for N3, 9 % for N4, 15 % for N5, 10 % for N6 and 13 % for N7.

Data from hybrids with known parents were checked in order to validate the analysis of the contributions from the mandarin groups identified (accessions ID 144 to ID 171). Most of them agreed with their known origins; therefore, the origins of other genotypes can be accepted from this analysis. For example, the hybrid mandarin ‘Palazzeli’ ([ID 165]; C. clementina × ‘King’) had contributions from three different ‘mandarin-like’ genomes, defined as groups N1, N2 and N3 of the present Structure analysis, which come from its supposed parents, clementines ([ID 54, ID 55 and ID 56]; genomes from groups N1 and N3) and tangor ‘King’ ([ID 178]; included in group N2).

Another example is the hybrid mandarin ‘Simeto’ [ID 168], which was obtained from a cross between a C. unshiu and C. deliciosa. Our study confirms this cross (almost 50 % each from each parent).

On the other hand, some examples of discrepancies between the Structure results and supposed parental origin can be explained by misidentified origin. The ‘Fortune’ mandarin [ID 155] was reported to come from a cross between a clementine and ‘Dancy’ [ID 26], made by Furr (1964). However, the Structure analysis showed that ‘Fortune’ has almost no contribution of N5 genome contribution while ‘Dancy’ has a 75 % contribution. The false parental origin was confirmed by individual locus checking: in 15 out of 50 SSR markers, one indel and four SNP markers, ‘Fortune’ possesses a specific allele absent in ‘Dancy’ and clementine. Barry et al. (2015) already showed that ‘Dancy’ is not a parent of ‘Fortune’ mandarin. Similar observations were made for ‘Fremont’ [ID 156], supposed hybrid between C. clementina and C. reticulata [Tan.] ‘Ponkan’ [ID 15], made by Furr (1964). Indeed, for 11 SSR markers, this hybrid possesses alleles that are not observed in its supposed parents. Moreover, ‘Fremont’ has almost no contribution of N5 genome, while ‘Ponkan’ belongs to this nuclear group.

Cytoplasmic analysis (mitochondria and chloroplast)

The summary of the maternal origin information is shown in the Online Resource 1. In the whole population (199 genotypes, including the secondary species), the mitochondrial markers allowed discrimination of six mitotypes (Mito; Fig. 6a), previously described by Froelicher et al. (2011). One Fortunella genotype (F. hindsii; ID 48) was associated with the Papeda (C. micrantha) mitotype. In the mandarin group (166 genotypes), five mitotypes were distinguished: two of mandarins (Mito1 and Mito2), one identical to C. maxima (Mito3), one identical to C. medica (Mito4) and one identical to C. micrantha (Papeda, Mito5). The first mandarin mitotype (Mito1) included most of the genotypes studied. In the second mitotype (Mito2), 18 genotypes were present; 11 of them were acid mandarins (four C. depressa [ID 4, ID 67, ID 68, ID 69], three C. sunki [ID 22, ID 23, ID 128], C. reshni [ID 14], C. daoxianensis [ID 52] and two ‘Sun chu sha’ [ID1 6, ID 122]), and seven were sweet genotypes (C. tankan Hay. [ID 53], C. kinokuni [ID 71], C. tangerina [ID 134] and four C. reticulata [ID 92, ID 94, ID 108 and ID 110]). The C. maxima mitotype (Mito3) included mandarin cultivars [ID 83, ID 118, ID 140, ID 124, ID 127, ID 74, ID 18, ID 164 and ID 103], as well as a tangor ([ID 180]; C. sinensis × ‘Dancy’) or the tangelos. The C. medica mitotype (Mito4) included C. indica [ID 7]. The Papeda mitotype (Mito5) included ‘Nicaragua’ [ID 112] and the two C. amblycarpa [ID 1 and ID 2].

Fig. 6
figure 6

NJ trees of 199 varieties of Citrus and Severinia buxifolia used as an out-group with mitochondrial and chloroplastic markers. a Mitochondrial markers. b Chloroplastic markers. c Combination of mitochondrial and chloroplastic markers

The chloroplastic markers discriminated two main mandarin chlorotypes (Chloro1 and Chloro 2; Fig. 6b). Chloro1 included most of the mandarins and chloro2 included the same 18 genotypes than in the Mito2. Besides these two main chlorotypes, there are other minor ones including only one genotype, e.g. mandarin ‘Suntara’ ([ID 118]; shared with two secondary species, C. aurantium and C. limon), C. tachibana [ID 24] and C. indica [ID 7].

In the whole population, the combination of chloroplastic and mitochondrial markers (Fig. 6c) differentiated 10 cytotypes (Cyto). Four of them corresponded to C. maxima, C. medica, Papeda and Fortunella. The C. maxima cytotype included eight mandarin cultivars [ID 83, ID 140, ID 124, ID 127, ID 74, ID 18, ID 164 and ID 103], as well as tangors and tangelos. The Papeda cytotype included ‘Nicaragua’ [ID 112] and C. amblycarpa [ID 1, ID 2]. Two main cytotypes were found within mandarins, one (Cyto1) that included most of the mandarins, mainly sweet genotypes. The second one (Cyto2) included 18 genotypes: C. sunki [ID 22, ID 23, ID128], ‘Sun chu sha’ [ID 16, ID 122], C. daoxianensis [ID 52], C. depressa [ID 4, ID 67, ID 68, ID 69], C. reshni [ID 14], C. tankan [ID 58] and other genotypes [ID 80, ID 92, ID 94, ID 108, ID 110 and ID 134]. Mandarin ‘Suntara’ [ID 118] appeared with an independent cytotype (included within the secondary species C. aurantium and C. limon cytotype), as well as C. tachibana [ID 24] or C. indica [ID 7].

Discussion

Genetic structure of the studied population

SSR markers are more polymorphic than indel and SNP markers. The average numbers of alleles, gene diversity and heterozygosity were all higher in SSR markers. The combination of the three types of markers allowed differentiation of the mandarin group from the other ancestors and revealed diversity within the mandarin group (mainly from SSR markers), as reported by Garcia-Lor et al. (2012).

The clear differentiation of mandarins from C. maxima, C. medica, C. micrantha and Fortunella (Fig. 1) has been described in several studies (Nicolosi et al. 2000; Barkley et al. 2006; Garcia-Lor et al. 2012, 2013a). Moreover, as previously observed by Federici et al. (1998) and Barkley et al. (2006), the mandarin group was not well resolved (low bootstrap support in many branches; Fig. 1), perhaps due to the large number of hybrids.

Some accessions displayed the same genotype for the analysed markers. Among the groups of accessions with identical genotypes, the groups MLG1, MLG2 and MLG8 are known to have been diversified by the selection of natural mutations and are probably distinguished only by point mutations (satsumas and clementines). Therefore, the probability of distinguishing them by analysis of molecular markers such as SSRs, indels or SNPs is very low. On the other hand, the clusters MLG3, MLG4, MLG5, MLG6 and MLG7 include genotypes for which there is no clear prior information about their origin; therefore, they may represent either derivative mutants of this kind or simply redundancies within the germplasm collections.

The overall F w value among all loci and all genotypes was relatively low (0.02; 0.04 for SSR, 0.29 for indel and 0.12 for SNP) when compared to the high structuration (positive F w) observed by Garcia-Lor et al. (2012) in the citrus genus. This may be due to the large proportion of mandarin hybrids within the population under study. The F w values observed for all the mandarin-like genotypes and the representatives of the Tanaka mandarin species was close to 0. Therefore, it is a favourable situation for using the Structure software, which assumes that the populations are in Hardy-Weinberg equilibrium (Pritchard et al. 2000).

Cytoplasmic and nuclear data reveal interspecific hybridisation and introgression of ancestral genomes into mandarin varieties

Mitochondrial and chloroplastic markers have been previously used to reveal maternal phylogeny in Citrus (Green et al. 1986; Yamamoto et al. 1993; Bayer et al. 2009; Morton 2009). In our study, six mitotypes were found (pummelo, micrantha, citron, mandarin mitotype Mito1, mandarin mitotype Mito2 and Fortunella), all of them observed by Froelicher et al. (2011), who proposed a distinction between the acid mandarin and sweet mandarin mitotypes. The mandarin germplasm (166 genotypes) was represented in five of the six identified mitotypes; two of them included mandarin and ‘mandarin-like’ genotypes (Mito1, Mito2), and three corresponded to other ancestral species (Mito3, Mito4, Mito5). Our results, obtained with a large mandarin panel, show that the denomination of acid mandarin and sweet mandarin mitotypes proposed by Froelicher et al. (2011) may not be appropriate since we found sweet mandarin genotypes that share the supposed acid mitotype (7 out of 18 sweet mandarins in the mandarin mitotype 2).

Three ‘mandarins’ (‘Nicaragua’ [ID 112] and the two C. amblycarpa [ID 1 and ID 2]) have a Papeda mitotype (Mito3), and nine have a C. maxima mitotype (Mito4). For example, ‘Bendiguangju’ mandarin ([ID 163]; C. unshiu, according to Tanaka classification) exhibited a pummelo rather than mandarin cytoplasm, as reported by Cheng et al. (2005) in a chloroplast DNA analysis and Froelicher et al. (2011) in a mitochondrial DNA analysis. At the nuclear level, however, we observed a close relationship between ‘Bendiguangju’ and satsumas, confirming the data of Nicolosi et al. (2000). The genotypes included in the Papeda and C. maxima mitotypes are interspecific hybrids, and not true mandarins, according to the Structure analysis.

At chloroplastic level, most mandarins formed a main cluster, as it was already observed by Nicolosi et al. (2000), Cheng et al. (2005) and Yamamoto et al. (2013). We observed also a second chlorotype formed mainly by acid mandarins, as observed by Froelicher et al. (2011) and Yamamoto et al. (2013). Mandarins are well differentiated from the other citrus ancestral species (C. maxima, C. medica and Papeda), which are separated in different clusters in our study and others (Cheng et al. 2005; Bayer et al. 2009), but not in Yamamoto et al. (2013). Other genotypes, like C. tachibana [ID 24], appeared in a different sub-cluster (own chlorotype), as found by Nicolosi et al. (2000), Penjor et al. (2013) and Yamamoto et al. (2013), or mandarin ‘Suntara’ [ID 118] that appeared with an independent chlorotype may be related to C. aurantium, as it was observed by Garcia-Lor et al. (2012).

Among the mandarin species considered by Tanaka, we identified some interspecific hybrids, such as C. amblycarpa, which appears to be a cross between the papeda and mandarin gene pools with a maternal phylogeny from papeda, as already observed by Froelicher et al. (2011). By contrast, Federici et al. (1998) and Barkley et al. (2006) considered C. amblycarpa to be the result of a cross between C. reticulata and C. aurantifolia. The latter study observed contributions of three genomes: C. reticulata (∼60 %), C. medica (∼25 %) and Papeda (∼15 %). Our results show that C. amblycarpa genotypes had approximately a 50 % of Papeda and C. reticulata, suggesting a potential origin from direct interspecific hybridisation between them at the nuclear level as proposed from a large SNP analysis (Ollitrault et al. 2012a).

Citrus indica clustered within the citron group at the nuclear level. It had contributions of 46 % from citron, 30 % from papeda and 24 % from mandarin genomes, as well as a very high observed heterozygosity (61.33 %), indicating that it was originated as an interspecific hybrid. In our study, C. indica had a citron mitotype, whereas Nicolosi et al. (2000) clustered C. indica with the citron on the basis of cpDNA markers. It has its own chlorotype, and therefore, cytotype separated from citron, although it is closed to it. This may be due that the female citron parent (pure or hybrid) is not present in our collection.

Citrus tachibana was considered native to Japan by Hirai et al. (1990) and to be a wild species of mandarin by Swingle and Reece (1967). Later on, it was clustered with the mandarins by Nicolosi et al. (2000). Our results point out that it is not a pure mandarin. Indeed, C. tachibana clustered with the mandarin mitotype Mito1, but it has its own chlorotype and cytotype, and displays contributions from different genomes, mainly C. reticulata (Sw) and Fortunella genomes at the nuclear level. It is included in the nuclear group 7 together with C. sunki, C. reshni and C. depressa. The high H o (54.67 %) also suggests that C. tachibana is an interspecific hybrid. The C. tachibana genotype present in our collections may not be the original tachibana from Japan and may be a hybrid with a Chinese genotype (Hirai et al. 1990).

Citrus daoxianensis is mostly of C. reticulata origin (92 %, considering the first Structure analysis), with small introgressions of three different genomes (pummelo, papeda and kumquat) lower than 5 %, and therefore not considered significative. This result is in agreement with Li et al. (1992), who considered C. daoxianensis to be a wild mandarin and with Curk et al. (2015) who did not found introgression from other genomes. On the other hand, in the second Structure analysis, we found an 80 % contribution of the acidic group and the rest from other mandarin groups, which indicates that it is a hybrid between mandarins.

Introgressions from other genomes were also suggested by the Structure analysis for other genotypes considered to be pure mandarin species by Tanaka and that come from the area of origin. The Fortunella genome (∼15 %) is present in C. depressa and the C. maxima (∼8 %) genome in C. succosa. Similar genome contributions, albeit at different percentages, were found by Barkley et al. (2006). Those authors reported that the genome of C. depressa is shared between C. reticulata and Papeda in equal proportions, whereas we observed a higher contribution from C. reticulata (∼81 %) and contribution from Fortunella (∼15 %). It is also remarkable that some other genotypes, like C. unshiu and C. tankan, have contribution from C. maxima. Citrus nobilis was considered also as a species by Tanaka (group1), but other authors (Nicolosi et al. 2000; Garcia-Lor et al. 2012, 2013a; Coletta-Filho et al. 1998) considered it as a tangor, with introgression from the C. maxima genome. Our results confirm this pummelo introgression in the various C. nobilis analysed (‘Geleking’ ∼22 % and ‘Campeona’ ∼13 %), also displayed by 454 amplicon sequencing (Curk et al. 2014). Cytoplasmic data include it within the mandarins. At the nuclear level, it appears as an independent group (N2).

This introgression of other taxa in varieties considered as pure mandarin was previously found in molecular marker studies (Nicolosi et al. 2000; Barkley et al. 2006) and more recently from NGS data (Curk et al. 2014; Wu et al. 2014) and in a 454 sequencing work (Curk et al. 2015). Within our discrete markers’ dispersion throughout the genome (124 markers), it is probable that for some varieties, we have miss small introgressions, as the 4 % of C. maxima introgression identified by Whole Genome Sequencing (WGS) (Wu et al. 2014) in ‘Ponkan’ and ‘Willowleaf’ not detected in our study was like. On the other hand, in this study, many more mandarin genotypes have been used than in previous studies with molecular markers and the recent ones of WGS.

The other mandarin hybrids, tangors, tangelos and clementines that we analysed exhibited similar contributions from ancestral genomes to those reported by Garcia-Lor et al. (2012), with higher introgression of pummelo in tangelos than in tangors. Some genotypes of unknown origin included in the study, and not related to the mandarin species defined by Tanaka, exhibited complex genomic structures.

Our Structure analysis showed that many ‘mandarin-like’ genotypes are introgressed by other ancestral species, as reported by Barkley et al. (2006). In our work, the ancestor with the highest contribution to the mandarin germplasm was C. maxima (∼9 %), instead of the Papeda/Fortunella group reported by Barkley et al. (2006). Moreover, we extend the study to a much higher number of genotypes than in previous published works.

Organisation of the mandarin germplasm

The two main Citrus classification systems of Swingle and Reece (1967) and Tanaka (1954) differ greatly in their treatments of the mandarins. The former system placed all mandarins in one species, C. reticulata, whereas the latter divided them into 36 species. Neither of the two systems is completely right, as discussed in many reports (Nicolosi et al. 2000; Barkley et al. 2006; Federici et al. 1998). Different studies have tried to define groups within the mandarins. Coletta-Filho et al. (1998) studied 35 accessions of mandarins and divided them into two main groups consisting of two and seven subgroups, which agreed partially with Tanaka’s (1954) and Webber’s (1943) taxonomic groups. Koehler-Santos et al. (2003) characterised 34 different genotypes from a Brazilian collection and described five groups, different from the ones found by Coletta-Filho et al. (1998). Kaçar et al. (2013) characterised 65 mandarin genotypes of the Tuzcu Citrus Variety Collection in Turkey, using 14 SSRs and 21 SRAP markers, resulting in two main groups: one including only tangelo ‘Orlando’ and the other including the rest (clementines, other tangelos, etc.).

In this work, a broad range of samples representing the mandarin germplasm (ancient cultivars from Asia, old and recent natural hybrids and human-made hybrids) were analysed to clarify the structure of this highly diversified group. After two analyses with the Structure software (Figs. 2 and 4), it is clear that there is introgression of ancestral genomes within many mandarins, mainly C. maxima. Moreover, seven groups were identified within the mandarin germplasm at the nuclear level (N1–N7; Fig. 4, Table 3), although there is not a strong differentiation between nuclear groups, confirmed by the F st value (0.445). These mandarin groups exhibited higher allelic diversity for SSRs than for indel and SNP markers. The negative F w values observed in these groups should be due to fixated heterozygosity resulting from apomixis and vegetative propagation of citrus varieties. The global mandarin population has an F w value close to 0, reflecting strong intergroup gene flow.

Five nuclear groups, N1, N2, N4, N5 and N6, share the same mandarin mitotype (Mito1). The N3 group is clustered with the pummelo mitotype. Genotypes from N7 were present in three mitotypes (Mito1, Mito2 and Papeda). Most of the genotypes sharing the mandarin mitotype (Mito2) are also differentiated at the nuclear level, having a high contribution of mandarin nuclear group N7, like C. daoxianensis (80 %). Regarding the N7, it was dispersed in different chlorotype and cytotype groups, being the most heterogeneous chloroplastic group. Tanaka (1954) divided the acid mandarin genotypes into two groups, with C. reshni, C. sunki and C. tachibana in group 4 and C. depressa in group 5, which are joined in our analysis at the nuclear level, but separated into two different groups at the cytoplasm level, including C. reshni, C. sunki and C. depressa in one cytotype and C. tachibana in another one.

Tanaka (1954) grouped the 36 mandarin species that he considered into five clusters. These clusters are compared with the nuclear groups found in our study (Table 5). One cluster included the species C. nobilis and C. unshiu, which seem to be of interspecific hybrid origin mandarin × pummelo in our analysis and are separated into two nuclear groups (N2 and N4). The second cluster included species not analysed in our study. The third cluster had 14 species, including C. clementina (considered in our study as a hybrid and not a pure mandarin species), C. deliciosa, C. reticulata (Tan.) and C. tangerina, which appear in our work as different mandarin groups (N1, N5 and N6). The fourth Tanaka group was formed by C. reshni, C. sunki, and C. tachibana, and the fifth group included C. depressa and C. lycopersicaeformis. These two groups are included in a unique nuclear group in our study (N7). From these species, C. tachibana is separated at the cytoplasmic level. Other Tanaka species, such as C. erythrosa and C. suhuiensis, seem to have originated from hybridisation between mandarin groups.

Table 5 Comparison of the mandarin organisation between the Tanaka groups and the nuclear groups found in this study

Hodgson (1967) divided the mandarins into four groups: C. unshiu, C. reticulata (Tan.) (‘Ponkan’, ‘Dancy’, clementines), C. deliciosa and C. nobilis (‘King’). Three groups are fully in agreement with our results, C. deliciosa (N1), C. nobilis (N2) and C. unshiu (N4). The fourth group defined by Hodgson, C. reticulata (Tan.), included a known hybrid (C. clementina, mix of group N1 and N3) and two genotypes, ‘Ponkan’, which is within the C. reticulata (Tan.) group (N5), and ‘Dancy’, with a big contribution of this group (75 %).

The contributions of the seven mandarin groups identified in the mandarin germplasm (Fig. 4), besides the contributions of the other ancestral taxa and Fortunella (Fig. 2), were estimated for the entire ‘mandarin-like’ collection. These analyses revealed that the genomes of most ‘mandarin-like’ genotypes are complex admixtures of the mandarin groups and even include contributions from the other ancestral populations.

Most of the hybrids with known origins displayed admixture coherent with the genomic structures of their supposed parents. Because most of these parents are themselves heterozygote admixed, the proportion of each genome in the hybrid variety is not inherited in an additive way (i.e. the sum of half shares of each parent) but instead depends on the recombination and segregation occurring in each parental gamete (Motohashi et al. 1992).

For some accessions, the admixture structure did not agree with their supposed parents. In these cases, allele checking confirmed that the supposed parental origins were erroneous. Further analyses could provide more clues toward the identification of parents for these hybrids.

In summary, this work shows that the mandarin horticultural varietal group is highly polymorphic and that many genotypes believed to be pure mandarins have introgressions from other basic taxa in their genomes. Moreover, some of them exhibited non-mandarin maternal phylogeny. Another characteristic of the mandarin group is that many genotypes originated from crosses between mandarins. These data point out that many mandarins (including ancient genotypes coming from the mandarin area of origin) are really hybrids and not pure mandarins. This idea is in agreement with the recent results obtained from a few mandarins’ sequenced genomes, leading to the uncertainty of which is/are the ancestral mandarin/s, which has/have not been identified yet.

This work has provided new insights into mandarin structuration, and it will help for a better management of the two germplasm collections studied, it will allow the implementation of a database with all the molecular characterisation and is already helping to select the proper genotypes to perform new crosses in order to generate new diversity and to define new strategies for the citrus triploid breeding programme that is being carried out.