Introduction

Brassica campestris L. ssp. chinensis Makino (Packoi), also known as non-heading Chinese cabbage (chromosomal group AA, 2n = 2x = 20), is a member of the Brassicaceae family. As a most economically important leafy vegetable in East Asia, non-heading Chinese cabbage is a valuable source of vitamin C, dietary fiber, and other health-enhancing factors (Cheng et al. 2009; Liu et al. 2014a). High temperature is the major limiting factor in vegetable production (Laurie and Stewart 2006; Sánchez et al. 2014). Non-heading Chinese cabbage can’t well grow at high temperatures (exceeding 25 °C). Therefore, the identification and isolation of thermo-tolerance genes and linked markers are very important. However, the publicly available data is limited for the elucidation of molecular mechanisms and gene expression for regulation of heat tolerance cabbage. Moreover, the genetic research of non-heading Chinese cabbage has also been hindered due to lack of efficient genetic markers.

Microsatellites or simple sequence repeats (SSRs), containing genomic-SSRs and expressed sequence tags (EST)-SSRs, are useful for genetic diversity and population structure analysis, as well as marker-assisted breeding, due to the co-dominant inheritance, dispersal throughout the whole genome, polymorphism, abundance, locus-specificity, and high reproducibility (Powell et al. 1996; Wang et al. 2010). In addition, single-nucleotide polymorphisms (SNPs), identified through the comparisons of genome or transcriptome sequences, are also useful in constructing high-resolution genetic maps, investigating population evolutionary history, analyzing genetic diversity, and verifying marker-trait linkages (Chopra et al. 2015).

In this study, a large expressed sequence dataset based on RNA-seq was generated from two non-heading Chinese cabbage cultivars with different heat tolerances. Besides the identification of genes conferring heat-stable resistance, microsatellite and SNP markers have also been developed. This research may lay a theoretical basis for breeding of heat-resistant non-heading Chinese cabbage cultivars.

Materials and methods

Plant materials

Sterile seeds of non-heading Chinese cabbage cultivars (2n = 2x = 20), containing both heat-sensitive cultivar “GHA” and heat-resistant cultivar “XK”, were sown in pots and germinated in a growth chamber under the condition of 20 °C and 75% relative humidity. Furthermore, seedlings of the two varieties grew up under controlled conditions (25 °C, 75% relative humidity) in the growth chamber. For heat stress treatment, the three-week-old seedlings (five-leaves stage) were grown at 37 °C high temperature for 24 h. Then, tender leaves were collected from five plants and pooled for total RNA extraction and transcriptome sequencing. In addition, 36 accessions with morphological differences were collected for validation of microsatellite markers (Supplementary file 1). All these materials were conserved at the Wuhan Vegetable Research Institute (Wuhan, Hubei, China).

Total RNA extraction and transcriptome sequencing

Total RNA was extracted from tender leaves with the TRIzol kit (Takara) according to the manufacturer’s instructions. The mRNA was enriched with magnetic beads coated with oligo (dT)n, and then fragmented, which was reverse transcribed to first-stranded cDNA. Double-stranded cDNA was synthesized with random hexamer primers, purified and added with sequencing linkers. Fragments of the correct size were purified with Universal DNA Purification Kit, and the sequencing libraries were prepared with the RNA-Seq Library Construction Kit. The quality and quantity of the libraries were verified using an Agilent 2100 Bioanalyzer and ABI real time RT-PCR. The qualified cDNA libraries were used for paired-end sequencing on an Illumina HiSeq 2000 platform. The sequencing raw data was deposited in a SRA database at the NCBI website.

Data processing and annotation

Raw sequence reads were filtered by the Illumina pipeline. Then the clean reads were subjected to the TopHat2-Cufflinks-Cuffmerge-Cuffdiff standard pipeline for identification of differentially expressed genes (DEGs). These DEGs were blasted against the Gene Ontology database (GO), Kyoto encyclopedia of Genes and Genomes database (KEGG), and Cluster of Orthologous Groups of proteins database (COG) for enrichment analysis.

Development of polymorphic microsatellite markers

The high quantity filtered transcriptome reads were obtained by Illumina HiSeq 2000 sequencing. Contigs were assembled with Trinity tools (Grabherr et al. 2011). Then, non-redundant unigenes were created from paired-end reads and used for the development of EST-SSR markers with MicroSAtellite (MISA, http://pgrc.ipk-gatersleben.de/misa). The settings for minimum number of repeats were 10, 6, 5, 5, 4, and 4 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively. For compound SSRs, the maximum distance between the two SSRs was 50 bp. Within these SSR-containing sequences, unique microsatellites with sufficient flanking regions were chosen for primer pair design using the online software Primer 3 (Wang et al. 2010).

Polymorphism of these EST-SSR primers was assessed in 36 individuals of non-heading Chinese cabbage. DNA was extracted from fresh leaves with a modified CTAB (cetyltrimethyl ammonium bromide) method, and quantified by agarose gel electrophoresis. After optimizing amplification conditions for each primer pair, the PCR amplification was performed in a volume of 15 μL consisting of 1.5 μL 10× PCR buffer, 0.3 μM for each primer, 50 ng genomic DNA, 250 μM each dNTP, as well as 0.5 U Taq DNA polymerase (TianGen, Beijing, China). The PCR amplification conditions were set to include initial denaturation at 95 °C for 10 min, followed by 35 cycles (94 °C for 30 s, annealing for 30 sat the optimal temperature listed in Table 1, and 72 °C for 40 s), and a final 10 min elongation step at 72 °C. PCR products were separated on 6% (w/v) denaturing polyacrylamide sequencing gels and visualized by silver staining. The size of DNA fragments was determined by comparison to a 20 bp DNA ladder marker of 20–600 bp (Tiangen, Beijing, China) (Pan et al. 2007).

Table 1 Transcriptome reads and distribution information in two B. rapa cultivars

Data analysis

PCR products were scored manually, and a 0/1 binary matrix was set according to the presence and absence of corresponding amplified bands. Parameters of genetic diversity including number of alleles (Na) per locus, expected heterozygosity (H E ), observed heterozygosity (H O ), tests for linkage disequilibrium (LD), and deviation from Hardy–Weinberg equilibrium (HWE) were calculated with the ARLEQUIN 3.01 software (Wang et al. 2010). In addition, the occurrence of null allele frequencies (NAF) was also estimated by MICROCHECKER 2.2.3 software (van Oosterhaut et al. 2004).

Results

RNA-seq of two non-heading Chinese cabbage cultivars

After stringent quality assessment and data filtering, a total of 2.3 and 2.1 Gb reads were produced from cultivar “GHA” and cultivar “XK” cDNA libraries, respectively (Table 1). The raw data was deposited in a SRA database (NCBI) under the accession number SRP064703 (“XK”: SRR5169348; “GHA”: SRR5169349). After filtering the low quality sequences and trimming low quality bases, 23,519,534 and 29,253,759 reads were aligned to the reference genome (http://brassicadb.org/brad/downloadOverview.php) for cultivars “GHA” and cultivar “XK”, respectively. In particular, 13,116,870 and 18,288,026 unique reads were mapped for “GHA” and “XK”, taking up 55.8 and 62.5%, respectively.

In total, 96.8% of the reads were in intragenic regions, while intergenic reads made only 3.2% in the heat-sensitive cultivar “GHA”. Similarly in the heat-resistant cultivar “XK”, percentages of reads located in intragenic and intergenic regions were 96.9% and 3.1%, respectively. As expected, most reads (92.8% for “GHA” and 92.7% for “XK”) were distributed in the exon regions. In contrast, percentage of reads located in intronic regions was only 4% and 4.2% for cultivar “GHA” and cultivar “XK”, respectively. After optimization of both the start and stop sites, a total of 29,037 non-redundant unigenes were assembled (“GHA”: 23,794, “XK”: 24,524). To our excitement, 3736 novel transcripts were obtained, which produced 3267 novel gene.

Totally, 139 genes were considered to be differentially expressed between these two cultivars. Compared with “GHA”, the expression levels of 87 DEGs were higher in “XK”, while the expression levels of the other 52 DEGs were lower (Fig. 1). NAC domain-containing proteins, MYB protein family, heat shock proteins, ATP binding proteins, superoxide dismutase, as well as other hormone-responsive element, were main members of these DEGs, inferring that these genes might conferred heat tolerance to non-heading Chinese cabbage. These DEGs can be categorized into three main divisions: biologcial processes, cellular components, and molecular functions (Fig. 2). Among the 26 COG categories, signal transduction mechanisms was the largest group, general function prediction only was the second group, while the third group was translation, ribosomal structure and biogenesis (Fig. 3).

Fig. 1
figure 1

A pie chart showing the fraction of genes conferring heat tolerance traits in non-heading Chinese cabbage. (Color figure online)

Fig. 2
figure 2

Histogram presentation of GO classification of genes from B. rapa transcriptome sequences. The blue bars and red bars referred to DEGs and all unigenes, respectively. (Color figure online)

Fig. 3
figure 3

COG function classification of consensus sequences (x axis represented the 26 different groups, y axis referred to the percentage of genes). (Color figure online)

Characteristics and development of EST-SSR markers

Using a perl script known as MISA (Wang et al. 2013), 19,522 SSR loci were found in 14,653 sequences. A set of 3627 sequences contained more than one EST-SSR loci. Totally, 2105 compound SSRs were identified. Mononucleotide repeat was the main type, with a frequency of 58.124% (11,347), followed by trinucleotide (22.211%, 4336), dinucleotide (19.02%, 3714), tetranucleotide (0.482%, 94), pentanucleotide (0.082%, 16), and hexanucleotide repeats (0.077%, 15) (Fig. 4).

Fig. 4
figure 4

Frequency distribution of different EST-SSRs in non-heading Chinese cabbage (a repeat types; b repeat number)

The most mononucleotide repeat motif was A/T (55.916%). Among the dinucleotide repeat motifs, AG/CT, AT/TA, and AC/GT were the most abundant with frequencies of 14.9%, 2.23% and 1.87%, respectively, while the CG/GC repeat only made 0.026%. For trinucleotides, AAG/CTT, AGG/CCT, and ATC/GAT were the most common types, with the frequencies of 7.535%, 3.55%, and 3.05%, respectively. Repetitions of SSR loci ranged from 5 to 25, and EST-SSRs with ten repeats were the most abundant, followed by those with five, six, and eleven random repeats (Fig. 4).

From the 19,522 primer pairs, 80 were randomly chosen for further validation (Supplementary file 2). Among these EST-SSR primer pairs, 44 produced specific amplification products, while the other 36 primers could not amplify the target products even when annealing temperature was reduced by 10 °C. In particular, 37 primers were polymorphic, with a polymorphic proportion of 46.25%.

These 37 primer pairs were used for genetic diversity analysis in 36 accessions (Table 2). A total of 104 alleles were obtained, with an average of 2.81 alleles per locus (Table 2). However, CaSSR-26, CaSSR-32, and CaSSR-51 all gave out five alleles. The observed Heterozygosity (H O ) and expected Heterozygosity (H E ) ranged from 0.0000 to 1.0000 and 0.0881 to 0.7738, respectively (Table 3). The average H O and H E were 0.4452 and 0.4354, respectively. However, 17 EST-SSR loci significantly deviated from Hardy–Weinberg equilibrium (Table 3), which might be due to the directional selection towards importantly economic traits. Further analysis, conducted with MICROCHECKER software (van Oosterhaut et al. 2004), showed that null alleles existed in these microsatellite loci. Moreover, no significant linkage disequilibrium (LD) was found among these polymorphic EST-SSR loci after Bonferroni correction (Rice 1989).

Table 2 Polymorphic primer pairs used in genetic diversity analysis

Detection of SNP markers

Totally, 285,573 putative SNPs have been identified, and 247,708 were heterozygous, while 37,865 were the homozygous SNPs. Moreover, numbers of the synonymous, missense, stopgain, and stoploss were 162,900, 65,321, 284, and 42, respectively. A set of 228,546 SNPs were found in exon regions, and 62 SNPs were in splicing regions. However, no SNPs were found in sequences coding NcRNA, 5′UTR, and even 3′UTR. In addition, 6053 and 9168 SNPs were in intronic and intergenic regions, respectively. At the same time, 15,021 and 21,295 SNPs were present in upstream or downstream flanking regions. Among these SNP loci, the C:G → T:A mutation had the highest rate, followed by T:A → C:G and T:A → A:T mutation types (Fig. 5).

Fig. 5
figure 5

Distribution of mutation types. (Color figure online)

A total of 23,788 heterozygous and 5079 homozygous InDel markers were found. Among them, 1813 InDels were frameshift insertions, and 2303 InDel were non-frameshift insertions. Simultaneously, numbers of InDels corresponding to frameshift deletion and non-frameshift deletion were 1976 and 2430, respectively. No frameshift block substitutions or non-frameshift block substitution InDel were found. The numbers of stopgain and stoploss InDels were 111 and 34, respectively. A total of 8667 InDels were located in exon regions, and 44 InDels were found in splicing regions. Additionally, 14 InDels were found in 5′UTR sequences. Numbers of InDels located in intronic, upstream, downstream, and intergenic regions were 1469, 6426, 8496 and 1217, respectively. InDels with a length of 1nt were the most frequent type, while 2–3nt InDels types followed. In particular, InDels with lengths more than 10nt were very rare.

Table 3 Results of initial primer screening in B. rapa

Discussion

With the advent and rapid development of next generation sequencing (NGS) technologies, RNA sequencing (RNA-seq) shows to be powerful and reliable tools for gene expression profile analysis, identification of candidate genes for target traits, genetic map construction, and development of molecular markers in many species including the genus Brassica (Shendure and Ji 2008; Kucuktas et al. 2009; Manuel et al. 2011; Liu et al. 2014b; Xiao et al. 2015). Recently, a large number of genomic and transcriptomic data has been made available for both model and non-model organisms, including Oryza sativa, Arabidopsis, cucumber, as well as Chinese cabbage. These high-throughput data is helpful in understanding the complexity of plant growth, development, and responses of plants to environmental stress. However, little data for heat stress in non-heading Chinese cabbage has been reported to our knowledge. To identify genes conferring heat tolerance and isolate SSR markers linked with these genes in non-heading Chinese cabbage, a heat-sensitive cultivar “GHA” and a heat-resistant cultivar “XK” were chosen for transcriptome sequencing.

The BcHSP81-4 gene, a member of heat shock proteins, was identified from a suppression subtractive hybridization cDNA library in non-heading Chinese cabbage (Brassica campestris ssp. chinensis Makino), which is also responsive to salt stress and cold stress (Liu et al. 2011). In this study, heat shock proteins have also been found to be involved in heat tolerance. In addition, NAC domain-containing proteins, MYB protein family, and hormone-responsive elements were also important gene products in response to heat stress, which were in agreement with reports in soybean (Irsigler et al. 2007), citrus (Oliveira et al. 2011), and rice (Fang et al. 2015).

In related to genomic-SSRs identified from random genomic sequences, EST-SSR markers are potentially more efficient for gene targeting, QTL mapping, and marker-assisted breeding due to their potentially linkage with particular transcriptional regions contributing to agronomic phenotypes (Bozhko et al. 2003; Zheng et al. 2013; Scott et al. 2000). The distribution density of EST-SSR markers in non-heading Chinese cabbage (one loci per 2.37 kb) was higher than that of other reported species, such as rice (one SSR loci per 3.4 kb), Amorphophallus (one SSR loci per 3.63 kb), wheat (one SSR loci per 5.4 kb), soybean (one SSR loci per 7.4 kb), Arabidopsis (one SSR loci per 14 kb), and even one loci per 20 kb in cotton (Zheng et al. 2013; Peng and Lapitan 2005; Varshney et al. 2002). The small genome size might contributed a lot to the high frequency of EST-SSR loci in non-heading Chinese cabbage.

The trinucleotide repeat motif (22.21%) was more frequent than dinucleotide type (19.02%), which was consistent with the EST-SSR distributions reported in radish (Wang et al. 2012). However, dinucleotide repeat was the most abundant in spruce (Rungis et al. 2004), Cucurbita pepo (Gong et al. 2008), Momordica charantia (Wang et al. 2010), pigeonpea (Dutta et al. 2011), as well as Amorphophallus (Zheng et al. 2013). Moreover, the AG/CT repeat motif was the most common dinucleotide type, which was in agreement with most plant species (Pan et al. 2007; Wang et al. 2010). Most of these SSR loci have two alleles, suggesting the relatively low polymorphism in non-heading Chinese cabbage.

As a virtually unlimited, bi-allelic, evenly distributed along genome, and co-dominant resource, SNPs are highly valuable for research and modern breeding. Until now, only a few SNP markers have been described in non-heading Chinese cabbage. Rahman et al. have developed 24 SNPs with more than 2-kb sequence for the major seed coat color gene in Brassica rapa (Rahman et al. 2007). Moreover, 151 SNP markers have been developed from B. rapa, which was used to construct the linkage map (Li and Hinaba 2009). Chung et al. have also constructed a genetic map with SNP markers and mapped a TuMV resistance locus in B. rapa (Chung et al. 2014).

The EST-SSRs and SNPs characterized in this study possessed important implications for genetics study and molecular breeding in non-heading Chinese cabbage. Moreover, these markers will be useful for constructing high-density genetic linkage maps, mapping quantitative trait loci, assessing germplasm polymorphism and evolution, marker assisted selection, and cloning functional gene in Chinese cabbage.