Introduction

Dendrobium, one genus of Orchidaceae, has approximately 1450 species, of which 74 species have been identified in China (Takamiya et al. 2011). Stems of some Dendrobium species contain various types of polysaccharides that exhibit anti-inflammatory, immune-enhancing, antioxidant, and anti-glycation activities (Hsieh et al. 2008; Ng et al. 2012; Pan et al. 2014), which makes them valuable as traditional medicines used for tonifying the stomach and promoting fluid, nourishing “yin” and clearing heat, improving eyesight, and relieving sore throat. Dendrobium officinale, a well-known traditional Chinese medicinal herb and officially listed in the Chinese Pharmacopeia (The State Pharmacopeia Commission of P. R. China 2010), has both ornamental values and a broad range of therapeutic effects. Due to over-exploitation and habitat deterioration, wild D. officinale, like many other Dendrobium species, has been exploited to near extinction and is now listed as an IUCN critically endangered plant (http://www.incnredlist.org/search).

Microsatellites or simple sequence repeats (SSRs) are made up of tandem arrays of short nucleotide motifs and randomly distributed across prokaryotic and eukaryotic genomes. The SSR markers have distinguishing features such as good repeatability, polymorphism, and codominance. These make them a powerful high-resolution tool for the study of population genetics, molecular ecology, and marker-assisted selection (MAS) studies in plants.

The genomic resources of D. officinale are still limited, although there exists a limited availability of simple sequence repeat (SSR) markers for this species (Gu et al. 2007; Xu et al. 2011; Lu et al. 2013). Considering the urgency for germplasm conservation of this species and its important role in phylogenetic studies of Dendrobium, it is necessary to develop more genic SSR markers, which will contribute to the conservation of the species and related genetic research in Dendrobium. In this study, 17 polymorphic genic SSR markers were isolated and characterized using the Ion Torrent sequencing platform for D. officinale.

Materials and methods

Plant materials

Two common cultivated varieties, D. officinale “Brown Stem” (BS) and D. officinale “Green Stem” (GS), were used to develop transcriptome sequences. “Brown Stem” (BS) is a low-growing and disease-resistant, brown stem cultivar, while “Green Stem” (GS) is a fast-growing, susceptible to disease, green stem cultivar. Young leaves of 31 D. officinale individuals were collected and were individually ground into powder in liquid nitrogen, and DNA extraction was performed using the DNeasy Plant Mini Kit (Qiagen).

RNA extraction, library preparation, and sequencing

For transcriptome sequencing, total RNA was extracted from 2-year-old stems of two cultivated varieties with the RNeasy Plant Mini Kit (Qiagen), and two rounds of oligo (dT) selection of the poly (A) RNA were performed using the MicroPoly(A) Purist™ Kit (Ambion). Two transcriptome libraries of sequencing were generated using Ion Total RNA-Seq Kit v2 (Thermo Fisher Scientific) and sequenced in the Ion Torrent PGM™ platform (ION PGM™ sequencer with 318 chip, Thermo Fisher Scientific) following the manufacturer’s recommended procedures.

Data processing, assembly, and annotation

The raw reads were cleaned by removing reads containing ploy-N and low-quality reads. All the downstream analyses were based on clean data with high quality. Trinity (http://trinityrnaseq.github.io) was used for assembly (Grabherr et al. 2011), by which, clean reads of various isoforms from one gene were assembled into distinct transcripts but the same subcomponent, which can be regarded as a gene, and the longest transcript of each subcomponent was defined as the “unigene” for annotation.

All the assembled unigenes of the two tissues were searched against the NCBI non-redundant protein sequences (Nr) database to identify the putative mRNA functions using the BLAST algorithm with an E-value cut-off of 10−6 (Korf et al. 2003). Gene ontology (GO) terms were extracted from the best hits obtained from the BLASTx against the Nr database using the Blast2GO program (Conesa and Götz 2008). The BLAST algorithm was also used to align unique sequences to the NCBI non-redundant nucleotide sequences (Nt), Swiss-Prot, KOG, and KO to predict possible functional classifications and signaling pathways.

Differential expression analysis

Gene expression levels were estimated by RSEM (Li and Dewey 2011) for each sample (two chips per sequenced library). Differential expression analysis of the two samples was performed using the DEGseq (Wang et al. 2010) R package. Genes with an adjusted P value of <0.05 found by DESeq were assigned as differentially expressed genes (DEGs). Volcano plot of the DEGs was drawn using the value of log10 (padj) and log2 (foldchange). GO and KEGG analysis of DEGs was implemented by the GOseq R packages (Young et al. 2010) and KOBAS (Mao et al. 2005).

Isolation of microsatellite markers

Perl scripts were developed to search for SSRs using the MIcroSAtellite (MISA) search module. The parameters were set to detect perfect ten-, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs with a minimum of 10, 6, 5, 5, 5, and 5 repeats, respectively. Primer pairs were designed using online software (http://www.genscript.com/), and their forward primers were labeled with a 6-FAM fluorescent dye.

PCR reactions were carried out under the following conditions: an initial denaturation at 94 °C for 3 min, and then followed by 30 cycles of 30 s at 94 °C, 30 s at 56 °C, and 30 s at 72 °C, followed by a final extension 3 min at 72 °C. A typical 20-μl reaction volume contained the following: 1 × PCR buffer, 1.6-mM MgCl2, 0.2 mM of each dNTPs, 0.5 μM of each primer, 0.25 U of Taq DNA polymerase (Takara), and 10–25-ng genomic DNA. Polymorphic PCR products were detected by capillary electrophoresis using the ABI 3730 DNA Analyzer (Applied Biosystems).

SSR data analysis

The genetic diversity indexes including number of alleles at each locus (Na), the number of effective alleles per locus (Ne), observed heterozygosity (Ho), and expected heterozygosity (He) were estimated using GenAIEx version 6.5 (Peakall and Smouse 2012).

Results and discussion

Sequencing and assembly

Two transcriptome libraries were constructed from 2-year-old stems of two cultivated varieties and sequenced with the Ion Torrent PGM™ platform. After removing adaptor sequences, ambiguous nucleotides, and low-quality sequences, there were 5,721,513 clean reads for BS and 5,107,465 for GS, respectively. These clean reads were used for de novo assembly. The Trinity software generated 51,683 all-transcripts with average length of 505 bp, and 40,405 all-unigenes with N50 of 644 bp were achieved in Supplementary File S1. Of the unigenes, 28,952 (28.38%) were 200–500 bp, 7719 (19.10%) were 500–1 kb, 3309 (8.19%) were 1–2 kb, and the remaining 425 (1.1%) were >2 kb (Table 1).

Table 1 Length distribution of assembled transcripts and unigenes

Functional annotation and classification

All the 40,405 assembled unigenes were annotated by sequence aligning in diverse public databases (Supplementary Table S1). Analyses showed that 23,773 unigenes (58.83%) had significant hits in the Nr database, 9076 (22.46%) in the Nt database and 16,319 (40.38%) in the Swiss-Prot database. In total, there were 24,955 unigenes (61.76%) successfully annotated in at least one of the seven public databases (Supplementary Table S1), and 2945 unigenes (7.28%) were annotated in all seven databases.

For GO analysis, the 13,510 matched unigenes were divided into three functional categories: biological process, cellular component, and molecular function (Fig. 1). For biological process category, unigenes involved in “cellular process” (7150), “metabolic process” (6960), and “single-organism process” (5317) were highly represented. The cellular component category mainly comprised proteins involved in “cell” (4099), “cell part” (4097) and “organelle” (2482). Within the molecular function category, “binding” (7355), “catalytic activity” (6018), and “transporter” (803) were highly represented.

Fig. 1
figure 1

Gene ontology classification of assembled unigenes. The 13,510 matched unigenes were classified into three functional categories: biological process, cellular component, and molecular function

All the unigenes were aligned to KOG database for functional prediction and classification. In total, there were 8241 (20.40%) unigenes assigned to KOG classification and divided into 25 specific categories (Supplementary Fig. S1). The “general function prediction only” (1274) was the largest group, followed by “posttranslational modification, protein turnover, chaperones” (1136), “signal transduction mechanisms” (766), “translation, ribosomal structure and biogenesis” (673), “RNA processing and modification” (629), and “intracellular trafficking, secretion, and vesicular transport” (569).

The metabolic pathway analysis was also conducted by sequence aligning in KEGG ORTHOLOG (KO) Database for all unigenes (Supplementary Fig. S2). This process predicted a total of 264 KEGG pathways, representing a total of 9643 unigenes that were assigned to five KEGG biochemical pathways (Hierarchy1): Cellular Processes (a), Environmental Information Processing (b), Genetic Information Processing (c), Metabolism (d), and Organismal Systems (e). The pathways (Hierarchy2) involving the highest number of unique transcripts were “translation” (943), followed by “signal transduction” (851), “carbohydrate metabolism” (766) and “folding, sorting and degradation” (739).

Differentially expressed genes

To detect differentially expressed genes (DEGs), the expression level of these unigenes in the two cultivars was estimated. Volcano plots showed the expression differences of gene among these two cultivars. A total of 3224 genes (7.98% of all 40,405 genes) were identified as significant DEGs between these two cultivars, which comprised 1679 upregulated genes (accounting for 52.08% of all significant DEGs) and 1545 downregulated genes (accounting for 47.92%) in BS compared to GS (Fig. 2, Supplementary Table S2). Statistics of GO and KEGG enrichment was archived in Supplementary Tables S3 and S4, respectively.

Fig. 2
figure 2

Volcano plot of the differentially expressed genes (DEGs) between BS and GS. The red and blue dots indicated significantly upregulated or downregulated expression of genes in BS compared to GS

Frequency and distribution of SSR motifs

The MISA search module was used to search for SSRs with the 40,405 unigenes. In this study, only SSR loci were considered to contain mon-, di-, tri-, tetra-, penta-, and hexa-nucleotides with minimum repeat numbers of 10, 6, 5, 5, 5, and 5, respectively. Totally, 8527 potential genic SSRs were identified from 7332 (18.15%) unigene sequences, of which 1023 (2.53%) unigenes contained more than one SSR (Table 2).

Table 2 Occurrence and number of repeats of SSR motifs in Dendrobium officinale

Development, validation, and transferability of genic SSR markers

After adaptor removal and assemblage, more than 4000 unigenes were generated. A total of 68 locus-specific primer pairs were designed, and 17 of them produced polymorphic products in eight D. officinale individuals. These 17 loci were subsequently tested on 31 individuals for allele polymorphism. The number of alleles (Na), the number of effective alleles (Ne), the observed heterozygosity (Ho), and the expected heterozygosity (He) per polymorphic locus were estimated using GenAIEx version 6.5 (Peakall and Smouse 2012). As showed in Table 3, 2 to 15 alleles were counted in per polymorphic locus, the observed and expected heterozygosity estimates ranged from 0.194 to 0.903 and from 0.360 to 0.893, respectively.

Table 3 Characteristics of 17 polymorphic genic SSR markers in Dendrobium officinale, including loci name, repeat motif, primer sequences, expected size of alleles (S), the number of alleles (Na), the number of effective alleles (Ne), the observed heterozygosity (Ho), and the expected heterozygosity (He)

These loci were further examined in D. huoshanense for cross-species transferability. All the 17 primer pairs were successfully amplified to yield locus-specific PCR products in D. huoshanense. These genic SSR markers have high levels of polymorphism and transferability, which are potentially valuable tools not only for germplasm conservation of this species but also for phylogenetic studies of Dendrobium.