Introduction

Brassica crops are important sources of food, feed, and fuel in many parts of the world and provide a unique opportunity for investigating plant genome evolution. The six widely cultivated Brassica species, including three diploid species B. rapa (A genome), B. nigra (B genome), and B. oleracea (C genome) and three amphidiploid species B. juncea (A and B genomes), B. napus (A and C genomes), and B. carinata (B and C genomes), are a classic example of the importance of polyploidy in plant evolution, as illustrated by the triangle of U (Nagaharu, 1935). Furthermore, the B. rapa crops, especially the two subspecies ssp. pekinensis (Chinese cabbage) and ssp. chinensis (Pak choi) (Labana and Gupta, 1993), play critical roles in vegetable production and supply in China and other Asian countries. In China, B. rapa is the top-ranked vegetable in terms of cultivated area and total yield and is extremely important to the country’s agricultural economy.

B. rapa crops are predominantly cross-pollinated diploid (2n = 20), with an estimated genome size of 485 Mb (Wang et al. 2011). As a hybrid crop, B. rapa is a model plant for genetic studies due to its high recombination rate and rich genetic diversity. In terms of breeding, the selection of diverse genetic resources possessing different agronomic characteristics and understanding the genetic relationships between these breeding materials are crucial for cultivar improvement. However, in many cases, we know very little about the ecology and population structure of these genetic materials. Also, centuries of artificial selection for desirable traits have resulted in an overall loss of genetic diversity in many of the early self-pollinated inbred lines, which are important materials for B. rapa crops breeding. Therefore, it is imperative to understand the genetic diversity present within the available breeding lines using genome-wide molecular markers.

Over the past three decades, several different DNA marker technologies have been used to detect genetic diversity in the cultivated B. rapa gene pool, such as random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and simple sequence repeats (SSRs) (Song et al. 1988; Powell et al. 1996; Chen et al. 2000; Guo et al. 2002; He et al. 2003). At present, SNPs are the markers of choice for genome-wide analyses due to their higher density level across the genome and high genetic stability, and also because they can be readily adapted to automated genotyping methods. A number of high-throughput, cost-effective SNP genotyping platforms have been developed such as the GoldenGate (Fan et al. 2003) and Infinium platforms (Steemers and Gunderson 2007), TaqMan (Livak et al. 1995), and the KASPar platform (KBiosciences, www.kbioscience.co.uk). Many of these platforms have been applied to important crop species such as barley, wheat, maize, soybean, cowpea, and pea (Allen et al. 2011; Cortés, et al. 2011; Hiremath et al. 2012). KASPar is a user-friendly system that provides flexibility in the numbers of SNPs and genotypes to be used for assays. Because of the importance of KASPar assays in genotyping variable numbers of samples with variable numbers of SNPs, they have been developed for wheat, common bean, and chickpea (Allen et al. 2011; Cortés, et al. 2011; Hiremath et al. 2012).

The aim of this study was to use the newly KASPar technology to develop a flexible and cost-effective SNP genotyping platform in B. rapa. In particular, by using a broad panel of 231 materials across four subspecies, B. rapa ssp. pekinensis, B. rapa ssp. chinensis, B. rapa ssp. rapifera, and B. rapa ssp. oleifera, the newly developed SNP markers were conducted to characterize the polymorphism information of the markers and genotype the germplasm. Moreover, their use in marker-associated research, germplasm charterization, population structure, and phylogenetic relationships assessment across the four subspecies was emphatically evaluated.

Materials and methods

Germplasm and DNA isolation

A total of 231 B. rapa inbred lines, which were collected from different areas in China, as well as Japan, Korea, Thailand, Pakistan, Nepal, Vietnam, Russia, England, the USA, and Australia, were used in this study (Supplementary Table 1).

Total DNA was extracted from two to three young leaves following a standard DNA isolation protocol (Su et al. 2014). DNA quality and concentration were measured with a NanoDrop 2000 UV spectrophotometer (Thermo Scientific, MA, USA), and working solutions were prepared at a concentration of 10 ng/μL.

Single nucleotide polymorphism and KASPar genotyping

SNPs were identified by using Illumina GA IIx technology on 10 genotypes, including two autumn-, two summer- and one spring-type accessions from ssp. pekinensis, two accessions from ssp. chinensis var. communis, and one each from subspecies chinensis var. communis (naibaicai), chinensis var. communis (heiyebaicai) and rapifera, which were selected from the 231 inbred lines. A total of 10.8 GB of 36-base short single-end DNA sequencing reads was generated from these genotypes. The identification of SNPs between the 10 genotypes and the reference genome of Chiifu-401-42 (v1.5) (Wang et al. 2011) was performed using the GATK software (McKenna et al. 2010).

High-quality SNP candidates were then selected for the KASPar assays. The strict criteria used for the selection of high-quality SNPs for KASPar assays included the following: (1) the minor allele frequency (MAF) value among the 10 genotypes must ≥ 0.1, (2) read depth ≥ 5, and (3) the potential SNP candidates are evenly distributed across the whole genome. Finally, a total of 3183 high-quality SNPs were filtered and found suitable for KASPar assays.

For each SNP, two allele-specific forward primers and one common reverse primer were designed by LGC (Laboratory of the Government Chemist). Using these primers, KASPar assays were performed in final reaction volumes of 1 μL in 1536-plates (No. KBS-0751-001, KBioscience), containing 1× KASP reaction mix (KBS-1016-011, KBioscience), 12 nM each allele-specific forward primer, 30 nM reverse primer, and 4 ng of genomic DNA. The Gene Pro Thermal cycler (Hydrocycler) was used for amplification with the following cycling conditions: 15 min at 94 °C, 10 touch-down cycles of 20 s at 94 °C and 60 s at 65–57 °C (the annealing temperature for each cycle being reduced by 0.8 °C per cycle), and 26–42 cycles of 20 s at 94 °C and 60 s at 57 °C. Fluorescence detection of the reactions was performed using an Omega Fluorostar scanner (BMG PHERAstar), and the data were analyzed using KlusterCaller 1.1 software (KBioscience). Detailed instructions can be downloaded at www.kbioscience.co.uk.

All markers were comprehensively screened over the 231 genotypes, and 1167 SNPs (36.7%) with scorable allele calls were successfully developed. However, after excluding markers that (1) were monomorphic, (2) had > 20% missing values across all genotypes, (3) had ambiguous SNP calls, (4) had a MAF < 0.1, and (5) had loci for which > 10 inbred samples showed the “ab” genotype, the total number of usable SNP markers was reduced to 568 (17.8%). Polymorphic markers were classified into eight segregation patterns (ab × cd, ef × eg, hk × hk, lm × ll, nn × np, aa × bb, ab × cc and cc × ab). For the SNP marker, its normal segregation pattern was aa × bb.

Marker polymorphism and population structure analysis

The polymorphic information content (PIC) and gene diversity values for the SNP markers in this study were calculated by using PowerMarker software (http://statgen.ncsu.edu/powermarker/). To assess genetic diversity within different subspecies or variant clusters, we used Genalex 6.3 (Peakall and Smouse, 2006) to estimate MAF, observed heterozygosity (ObsHET), and the FST values.

Population structure was investigated using two different methods: principal coordinates analysis (PCoA) and STRUCTURE (Pritchard et al. 2000). PCoA was carried out based on the modified Rogers’ distances (Wright 1978). The modified Rogers’ distances (d w ) were calculated as d w  = \( \frac{1}{\sqrt{2m}}\ \sqrt{\sum_{i=1}^m\sum \limits_{j=1}^m{\left({p}_{ij}-{q}_{ij}\right)}^2} \), and genetic similarities were calculated as 1 − d w to produce principal coordinate scores, which were then used to investigate different population groups within the collection of 231 germplasm accessions. Patterns revealed by the first three coordinates of each accession were plotted using the G3D procedure.

STRUCTURE was run on the full dataset of 568 SNPs using an admixture model and the default settings. The most likely value of k was determined by the delta k method (Evanno et al. 2005) implemented in STRUCTURE HARVESTER (Earl and VonHoldt 2011).

The matrix of genetic distances was used to create a neighbor-joining (N-J) tree with Mega 5 (Tamura et al. 2011). Based on these categories, population and subpopulation genetic structure were further analyzed by conducting an analysis of molecular variance (AMOVA) using Arlequin (Excoffier et al. 1992; Peakall and Smouse, 2006). To assess differentiation significance, alleles were randomly permuted 1000 times among individuals (Edgington 1995).

Results

Identification of candidate SNPs and development of SNP markers for the KASPar assays

Based on the 10 resequencing genotypes, a total of 149,100,294 usable single-end reads (each of which was 36 bp in length), with an average depth of 3.9× and coverage of 81.1% were generated (Supplementary Table 2). The Q20 ratio averaged 90.3%, and the guanine+cytosine (GC) content was 40.5%. Through comparisons with the reference genome of the inbred Chiifu-401-42 (v1.5), we detected a total of 709,037 SNPs, which translated to an overall density across the genome of 2488 SNPs/Mb.

A total of 3183 high-quality SNPs were filtered for the development of KASPar assays, and finally, 568 SNP markers which satisfied the criteria as described were selected for further analysis of the 231 genotypes. These SNP markers developed based on different ssp. pekinensis, ssp. chinensis, and rapifera genotypes, which were also polymorphic and useful for oleifera accessions (Supplementary Table 3). The physical distribution of the 568 loci on the 10 chromosomes was determined from their mapped positions on the Chiifu-401-42 genome sequence (Fig. 1). Most of the SNP loci were found to be distributed evenly throughout the genome, with an average density of 2.0 SNPs/Mb. Only two gaps (> 3 Mb intervals) were found to be present on chromosome 5.

Fig. 1
figure 1

Distribution of the 568 SNP marker loci on the 10 chromosomes of Brassica rapa

Marker polymorphism analysis in B. rapa

Data obtained from the 231 B. rapa genotypes were used to calculate the PIC value of each SNP marker. The PIC for the 568 markers across all accessions ranged from 0.10 to 0.38, with an average of 0.34 for all examined accessions. In particular, the percentage of PIC values between 0.3 and 0.4 was 87.9% (Fig. 2a; Supplementary Table 4), which suggested that these markers were strongly polymorphic. Loci (46.1%) had MAF values of 0.4–0.5, 49.8% had MAF values of 0.3–0.4, and 4.6% had values of 0–0.3 (Fig. 2b; Supplementary Table 4). The ObsHET of the 568 variation loci ranged from 0 to 0.54, with an average of 0.10. As the 231 lines included in this study have been selfed for many generations and can all be expected to be largely homozygous, very little heterozygosity should be present in these lines. Indeed, only 2.9% of the lines had ObsHET values > 0.2 (Fig. 2c; Supplementary Table 4). The genetic diversity within the germplasm collection was also assessed, and was found to range from a low of 0.11 to a high of 0.5, with an average of 0.45 (Fig. 2d; Supplementary Table 4).

Fig. 2
figure 2

PIC (a), MAF (b), observed heterozygosity (c), and genetic diversity (d) values for the 568 SNP markers based on data from 231 inbred lines

Population classification analysis

PCoA was initially performed based on the 568 high-quality SNPs to investigate population structure on the entire dataset of 231 genotypes. The proportion of genotypic variance explained by the three principal coordinates was 14.21, 5.07, and 3.92%, respectivly (Supplementary Fig. 1). Plotting with both the 3D (Supplementary Fig. 1) and 2D (Fig. 3a) plot of the PCoA confirmed the presence of the four major populations, which is in agreement with traditional classification schemes. Population I is referred to as the ssp. pekinensis, which included 99 accessions, population II included 85 accessions from ssp. chinensis, while populations III and IV included 30 and 17 inbred lines, respectively, from ssp. rapifera and ssp. oleifera.

Fig. 3
figure 3

Analyses of population structure in B. rapa. a Principal coordinates analysis of population structure for the 231 B. rapa accessions. b Neighbor-joining tree of all inbred lines calculated from 568 SNP markers. The 12 divergent groups are shown in colored shapes. The scale bar indicates the simple matching distance. c Populations structure analysis. All 231 germplasm accessions were further divided into 12 subpopulations. Pop1, ssp. pekinensis; Pop2, ssp. chinensis; Pop3, ssp. rapifera; Pop4, ssp. oleifera. Subpop1, spring-type Chinese cabbage lines of ssp. pekinensis (Spr); Subpop2, autumn-type (Aut); Subpop3, summer-type (Sum); Subpop4, ssp. chinensis var. parachinensis (P); Subpop5, ssp. chinensis var. communis (naibaicai) (N); Subpop6, ssp. chinensis var. communis (heiyebaicai) (H); Subpop7, ssp. chinensis var. communis (C); Subpop8, ssp. chinensis var. japonica (J); Subpop9, ssp. chinensis var. taicai (T); Subpop10, Nar, ssp. chinensis var. narinosa (Nar); Subpop11, ssp. rapifera (R); Subpop12, ssp. oleifera (O)

To further assess relationships among these accessions, we used STRUCTURE and observed a gradual increase in log likelihood from k = 2–14. The best delta k was 5, then followed by k = 3, and the third was k = 4 (Supplementary Fig. 2). When k = 3, the 231 genotypes were divided to three groups: ssp. pekinensis, ssp. chinensis, and a mixed group of ssp. rapifera and ssp. oleifera (Fig. 3b); when k = 4, the STRUCTURE divided the set of materials to the four traditional subspecies, ssp. pekinensis, ssp. chinensis, ssp. rapifera, and ssp. oleifera, which was consistent with the phyletic classification results obtained from PCoA (Fig. 3b). When k = 5, on the basis of k = 4, the ssp. pekinensis was further divided to two groups, a spring-ecotype group and a mixed group of autumn- and summer-ecotype group (Fig. 3b).

We also showed phylogeny on an unrooted N-J tree calculated from pairwise genetic distances (Fig. 3c). All 231 genotypes were classified into four main groups with 12 lower-level clusters (Fig. 3c). Group I, comprising 99 inbred lines from ssp. pekinensis, was further grouped into three clusters based on cultivation season: spring-ecotype cluster (comprised of 29 spring-ecotype lines), autumn-ecotype cluster (comprised of 23 autumn-ecotype, one summer-ecotype and one spring-ecotype lines, respectively), and summer-ecotype cluster (comprised of 34 summer-ecotype, 11 autumn-ecotype and two spring-ecotype inbred lines, respectively) (Fig. 3c). Group II, 85 inbred lines from ssp. chinensis, was further divided into eight variety clusters; ssp. chinensis var. communis (36 lines), ssp. chinensis var. narinosa (13 lines), ssp. chinensis var. parachinensis (17 lines), ssp. chinensis var. taicai (five lines), ssp. chinensis var. japonica (two lines), ssp. chinensis var. communis (seven lines), and ssp. chinensis var. communis (five lines) (Fig. 3c). Groups III and IV included 30 and 17 accessions, respectively, of ssp. rapifera and ssp. oleifera (Fig. 3c).

Marker diversity analysis among the different populations

The polymorphism of these 568 SNP markers was further analyzed among the four subspecies. As shown in Table 1, 567, 556, 475, and 480 SNP markers were polymorphic in the subspecies pekinensis, chinensis, rapifera, and oleifera, respectively. The polymorphism rates of the SNP markers for these four groups were 99.8, 97.9, 83.6, and 84.5%, respectively. Notably, four SNP markers were found to be polymorphic exclusively in ssp. pekinensis (Table 1).

Table 1 Summary of statistics for each population based on in-home perl script

The average PIC values of ssp. chinensis, ssp. rapifera, and ssp. oleifera were 0.28, 0.28, and 0.25, respectively, while a higher PIC value, that was 0.32, was detected in ssp. pekinensis. The corresponding MAFs and gene diversities were 0.26, 0.27, 0.24, and 0.33 and 0.35, 0.36, 0.32, and 0.41, respectively, while the observed heterozygosity showed a different trend, with values of 0.09, 0.24, 0.17, and 0.05, respectively (Table 1).

Population structure in B. rapa crops

The 231 genotypes were divided into four primary populations and then 12 subpopulations. To test if populations are significantly different, we hierarchically analyzed variation with an AMOVA analysis. The result indicated that 49.82% of the genetic variation resided between germplasms within different subpopulations and then followed by 20.71, 15.78, and 13.68%, which was each caused by variations within samples, variations between populations, and variations between subpopulation within population (Supplementary Table 5). Thus, we concluded that most of the variance was observed at the subpopulation level.

To test if populations and subpopulations are significantly different, we performed a randomization test. From the output, we can see four histograms representing the distribution of the randomized strata. The observed results in the output show that there was significant population/subpopulation structure considering all levels of the population/subpopulation strata (Fig. 4). Furthermore, pairwise estimates of FST showed that the highest level of genetic differentiation was between the ssp. pekinensis and the ssp. chinensis populations (FST = 0.15) and the lowest was between the ssp. chinensis and the ssp. oleifera populations (FST = 0.07) (Supplementary Table 6).

Fig. 4
figure 4

Significance testing of population and subpopulation differentiation. The black line represents the observed data. The above graphs show significant population differentiation at all levels given that the observed line does not fall within the distribution expected from the permutation

Discussion

SNP selection and KASPar assay in B. rapa

A number of marker systems, such as RAPD, AFLP, SSR, diversity array technology markers (DArT), and single feature polymorphism (SFP), have been developed for germplasm characterization of different crops (Song et al. 1988; Powell et al. 1996; Chen et al. 2000; Guo et al. 2002; He et al. 2003). Recently, SNP markers have also been developed and converted to cost-effective genotyping platforms such as KASPar and BeadXpress assays (Allen et al. 2011; Cortés, et al. 2011; Hiremath et al. 2012; Roorkiwal et al. 2013). KASPar assays provide flexibility in terms of number of SNPs used for genotyping. This feature provides upper edge to KASPar assays as compared to other SNP genotyping assays. KASPar assays have been found suitable for diversity estimation in common bean, chickpea, and peanu (Allen et al. 2011; Cortés, et al. 2011; Hiremath et al. 2012); however, this assay has not been used for large scale germplasm characterization in B. rapa.

In this study, candidate SNPs for KASPar assays were initially selected on the basis of reproducibility, signal strength, and their utility for defining the different genotypes. The candidate loci were further screened based on PIC, MAF, missing data rate, and uniform genetic distribution. Moreover, we also considered polymorphism bias, which will be present if the genetic background of the selected materials is narrow. Here, genomic sequence data from 10 genotypes with a broad genetic base was analyzed and used for SNP selection (Supplementary Table 2). Although, we did not use oleifera genotypes for SNP selection, some markers were also polymorphic and useful for oleifera accessions (Supplementary Table 3).

Of the 3183 selected SNPs, 1167 were successfully converted into KASPar assays (36.7%) and 568 were finally selected for germplasm characterization (17.8%). The failure of the remaining SNP markers to be validated is likely due to technical issues, incorrect primer design, and/or the need to optimize PCR conditions. The average PIC, MAF, genetic diversity, ObsHET, and PredHET value were 0.35, 0.37, 0.45, 0.10, and 0.45, respectively (Fig. 2; Supplementary Table 4). Especially, the average PIC value is much higher than recently developed KASPar assay or Illumina SNP array for pigeonpea, maize, and wheat representing as 0.16, 0.09, and 0.33, respectively (Saxena et al. 2012; Tian et al. 2015; Tobias et al., 2013). All these parameters suggest a high discriminatory ability and reliable high depth resolution for these SNPs. In addition, the higher PIC value of the 231 inbred lines may indicate a higher diversity in our experimental set but must certainly also be attributed to the larger number of lines included in our study. Previously, SSR markers were detected within morphotypes represented by multiple accessions and the mean PIC values were 0.48 (Brussels sprouts), 0.54 (broccoli), 0.57 (cauliflower), and 0.65 (cabbage) (Federico et al. 2008). It must be noted that for biallelic markers such as SNPs, the PIC ranges from 0 to 0.5; however, for multiallelic markers like SSRs, the PIC values can exceed 0.5 and approach 1. SSR markers have been used for variety identification for more than 10 years because of their high discriminatory power and associated relatively easy-performed experimental techniques. Compared with SSRs, SNPs are bi-allelic and high-throughput, making them easy to read, compare, and integrate between different data sources. In addition, with the development of a variety of SNP genotyping platforms, SNPs are thus ideal for DNA fingerprinting, genetic diversity analysis, and molecular marker-assisted selection (MAS) in breeding.

Utility of KASPar assays for marker-associated research and germplasm characterization

The current availability of the 568 markers in B. rapa could provide high or significant marker density in many of the populations to be adequate to allow a thorough scan of the genome for QTL discovery, association analysis, and map-based cloning and anchoring of the genome sequencewith the genetic map. Seven of the 231 plant accessions in this study are reported to be parents of several mapping populations segregating for various economically important traits, such as heading color and downy mildew resistance (with QDX (Br053) and ZDJ (Br062) as parents), leaf color (ZYC (Br164) and ZDJ (Br062)), verticillium wilt resistance (CR-WM (Br020) and JDY (Br049)), and club-root resistance (ZL6 (Br048) and 20395SD (Br051)) (Yu et al. 2011; Zhang et al. 2012; Wang et al. 2014; Su et al. 2014). For instance, a total of 213 SNP markers showed polymorphism between inbred lines of QDX and ZDJ (data not shown). Therefore, our study provides a list of polymorphic markers that can be not only used to assess the genetic diversity of B. rapa germplasm resources but will also be helpful in enriching the recently-developed AFLP/SSR/InDel-based genetic linkage maps for intraspecific mapping population.

Assessing the relationships within germplasm collections can assist in the selection of more distantly related lines for use in breeding programs. Here, SNP genotyping data was used to quantify the genetic diversity and distances within the B. rapa germplasm collection (Fig. 3b, c). A detailed individual-by-individual genetic distance matrix was concluded, which could be of great use to plant breeders (Supplementary Table 7). Pair-wised genetic distance among the 231 accessions ranged from 0.01 to 2.46, with an averaged value of 0.68. In modern breeding systems, the selection of excellent germplasm resources and the use of hybrid vigor are effective to cultivate improved varieties of B. rapa crops. For instance, Xin No. 3 is a cross hybrid between the inbred lines JDY (Br049) and QDX (Br053), both of which show no outstanding agronomic characters. However, Xin No. 3 is one of the most popular Chinese cabbage varieties cultivated in autumn, sharing 90 and 50% of the planting areas in Beijing and North China, respectively. Here, we noticed that the value of pair-wised evolutionary distance between JDY and QDX reached 0.66, which is far above average value among autumn-ecotype (0.52). Thus, we speculated that there are many factors accounting for hybrid vigor, but a distant phylogenetic relationship of the parents should be one of the causes. These results, which coincided with our current hybrid breeding practices, verified the significance of germplasm characterization in heterosis breeding.

Population differentiation and phylogenetic characterization

By using the SNP genotyping data, the overall genetic diversity among the 231 accessions was measured as high as 0.45 (Supplementary Table 4), indicating that extensive genetic variation is present within this subspecies. Furthermore, AMOVA is a powerful tool to test hypotheses of population structure (Grünwald and Hoheisel 2006). Our AMOVA analysis revealed that most of the variance (49.82%) arise from within different subpopulations, and there is significant population/subpopulation structure considering all levels of the population/subpopulation strata (Fig. 4), which provided another important evidence that we indeed have some sort of population or subpopulation structure among B. rapa.

The major allele frequency (not minor allele frequency (MAF)) was averaged 0.63 (Supplementary Table 4), suggesting that there are large number of loci that are not fixed in the population. Besides, the FST between different groups, which indicate the level of population differentiation, was estimated between 0.07 and 0.15 (Supplementary Table 6), which is much lower than that in rice (0.55) (Huang et al. 2012), and cucumber (0.41) (Qi et al. 2013), but is comparable to maize (0.11) (Hufford et al. 2012). Wright, 1978 suggested that when FST value ranged from 0 to 0.05, populations differentiate at a low level; 0.05–0.15, at a modest level; and > 0.15, at a high level. Thus, we believed that there is a large amount of variation present in B. rapa crops and genetic differences did existed among different B. rapa populations, but the degree was relatively low. One of the possible reasons is that all subspecies of B. rapa could be free to mate with each other, like maize, and this reduced the effect of genetic isolation. On the other hand, all the germplasms used in the study were inbred lines that have been self-fertilized for at least six generations. Selfing tends both to reduce the level of genetic variability within populations and to increase the amount of genetic differentiation among populations, which keep population differentiation at a modest level.

In our study, three complementary methods, PCoA, STRUCTURE, and a N-J tree, were used to analyze the population structure and individual relationships within our germplasm collection, and the 231 accessions were divided into four groups, as traditional classification (Fig. 3b, c). Pairwise FST statistics within subspecies showed that genetic differentiation was not evenly distributed across the four populations (Supplementary Table 6). We noticed that only the FST value between ssp. pekinensis and ssp. chinensis just reached 0.15, which suggested a considerable genetic differentiation between the two populations. However, considering the big phenotypic differences between ssp. pekinensis and ssp. chinensis (heading vs. non-heading), the FST value is quite low. This fact makes us infer that the big phenotypic differences might be controlled by only a handful of genes or genes that acted in a genetic pathway.

Artificial selection of spring-ecotype of Chinese cabbage

Chinese cabbage was first recorded in China in the eighteenth century. Only two decades ago, it was originally an autumn crop, but now it has grown all year round, represented as autumn-, summer-, and spring-ecotype (Ke. 2010). About a century ago, some germplasms from Chiifu, ShanDong, China, were spreaded to Japan and Korea. Due to the difference of ecologic condition, consumer demand, and long time artificial selections, those germplasms possessing bolting-resistance genetic resources were further domesticated to spring-ecotype at the second half of the twentieth century.

To investigate genetic relationships among the three clusters and to search for evidence of selection of spring-ecotype Chinese cabbage, we conducted the STRUCTURE analysis based on Chinese cabbage accessions. At a k of 4, the spring-ecotype was seperated from the Chinese cabbage group, showing that the spring-ecotype shared quite different genetic composition with the autumn- and summer-ecotype. Meanwhile, summer- and autumn-ecotypes were further seperated from each other by using the k of 6, but a certain level of mixed genetic composition was still found within the two clusters. The above results showed that the three clusters were clearly distinguished, although different degrees of introgression were detected in these groups.

To further delineate the evolutional roadmap of spring-ecotype, information of N-J tree was explored. From the observed genetic distances (Fig. 3c), the traditional autumn- and summer-ecotypes were closer to the root of ssp. pekinensis, and the spring-ecotype positioned at the most distant point from the root. It indicated that the spring-ecotype was the most modern ecotype and was selected from the other two traditional ecotypes. In the future, more detailed genotyping data or resequencing data of the traditional landrace from ShanDong, China will be valuable for exploring the impact of genomic selection on domestication.

Molecular characterization of ssp. chinensis var. taicai and ssp. chinensis var. japonica

B. rapa ssp. chinensis var. taicai was reported to have originated from wild ssp. rapa in Europe but was now only existed in ShanDong and JiangSu, China (Ke. 2010). The highly variable characteristics of var. taicai make many researchers deem it as a separate subspecies, not a variety, of B. rapa. To further discuss the argument, pairwised FST values between var. taicai and the four B. rapa subspecies, pekinensis, chinensis, rapifera, and oleifera, were each calculated. We noticed that the FST values of var. taicai vs. ssp. oleifera and var. taicai vs. ssp. rapifera were 0.094 and 0.056, respectivley, while the values of var. taicai vs. ssp. pekinensis and var. taicai vs. ssp. chinensis were 0.021 and 0.017, respectivley. Compared with subspecies rapifera and oleifera, we believed that ssp. chinensis var. taicai did have enough genetic differentiation to make it as a separate subspecies of B. rapa (Supplementary Table 8; Fig. 3c). In addion, more detailed genotyping data or resequencing data of the ssp. chinensis var. taicai will be valuable for exploring its origination. This makes it a valueable source for introducing genetic diversity into new varieties in B. rapa breeding programs.

It is interesting that the two var. japonica accessions, JSJ (Br165) and JSC (Br166), previously considered to be a variety of ssp. chinensis, clustered with two Japanese ssp. rapifera accessions (Br205, Br206) and one ssp. chinensis var. communis accession (Br163) (Fig. 3c). A remarkable characteristic of var. japonica crop is tillering. The stem branches at the basal region without elongation, and then numerous leaves grow (Hirai and Matsumoto, 2007). We predicted that either the var. japonica has experienced complicated genetic introgression from ssp. rapifera and ssp. chinensis var. communis or it may be a product of hybridization between the two groups.

In summary, this study provides an extensive resource of cost-effective and polymorphic KASPar markers of B. rapa and their application in population stucture characterization. The 568 SNP markers, coupled with future new developed KASPar markers, will make it possible for breeders to genotype thousands of accessions rapidly and economically and to provide great help in MAS breeding.