Introduction

With more than 55,000 known compounds, terpenoids (or isoprenoids) are regarded as the largest class of secondary metabolites. A number of terpenoids exhibit essential biological functions and processes, including membrane fluidity, electron transport chains, respiration, photosynthesis, plant defense, and regulation of growth and development (Chang et al. 2015; Son et al. 2014). Terpenoids also affect our daily lives as important ingredients in spices, cosmetics, food, and drugs. As such, terpenoids possess significant commercial value (Vranová et al. 2013).

All terpenoids are derived from condensation of two simple common C5 precursors, isopentenyl diphosphate (IPP) and its isomer dimethylallyl diphosphate (DMAPP) (Rodríguez-Concepción and Boronat 2002). Both are synthesized by two separate pathways in plants, namely, the mevalonate (MVA) pathway in cytosol (Buhaescu and Izzedine 2007; Zenoni et al. 2010) and the 2-methyl-d-erythritol-4-phosphate (MEP) pathway in plastid (Heuston et al. 2012). The C5 units from the MEP pathway mostly participate in the synthesis of diterpenes, monoterpenes, hemiterpenes and polyterpenes (Schilmiller et al. 2009), whereas the MVA pathway predominantly provides precursors for sterols, sesquiterpenes, triterpenes, and ubiquinones (Liu et al. 2006; Zhao et al. 2012). Several terpenoid biosynthesis-linked genes have been identified and isolated in recent years (Kalita et al. 2015). The biosynthesis of terpenoids is a complex process that involves a series of chemical reactions such as prenylation, oxidation, reduction, and isomerization. Furthermore, the genes coding the enzymes responsible for these reactions are differentially expressed in different plant organs. Therefore, the sporadic studies on a few genes in the terpene pathway are not sufficient to clarify the control mechanism of terpenoid biosynthesis.

Next-generation sequencing (NGS) has recently been employed to bring about breakthroughs in molecular genetics (Mardis 2008; Shendure and Ji 2008). NGS enables the sequencing of up to 1 million kilobases of DNA in a short time to provide the sequence data for a comprehensive analysis of genomes, transcriptomes, and interactomes (Pop and Salzberg 2008; Shendure and Ji 2008). Transcriptome analysis called RNA sequencing (RNA-Seq) is an NGS application that gives snapshot of changing cellular transcriptome (Libault et al. 2010). RNA-Seq is most powerful in understanding biological processes by identifying genes participating in diverse biological processes, through annotation of DEGs (Benedito et al. 2008; Lange et al. 2000).

Chamomile (Matricaria chamomilla L., synonym M. recutita), also known as German chamomile, is an annual herb of substantial economic value owing to its volatile essential oil (Sayadi et al. 2014). M. chamomilla is one of the most important medicinal plant species widely cultivated in Asia and Europe. The herb accumulates numerous terpenoid secondary metabolites, such as (-)-α-bisabolol and chamazulene as major ingredient (Su et al. 2015). Spiroethers, (-)-α-bisabolone oxide A, anthecotulid, and apigenin are also found in the essential oil from chamomile (Murti et al. 2012; Srivastava et al. 2010). Bioactivities, including antiphlogistic, anti-inflammatory, antiseptic, and spasmolytic properties, have been exploited in biopharmaceuticals, cosmetics, perfume, aromatherapy, and health food industries (Formisano et al. 2015). Despite the importance of M. chamomilla as a medicinal plant, genomics and transcriptome of the plant have yet to be analyzed in detail.

To identify genes involved in the sesquiterpenoids biosynthesis of the chamomile, we adopted high-throughput RNA-Seq method supplemented by quantification of the expression levels of the involved unigenes. In parallel, we determined the content and composition of the essential oil from M. chamomilla samples. We also examined the transcriptional regulation of MVA pathway and downstream thereof in sesquiterpenoids biosynthesis. The putative genes examined were: acetyl-CoA C-acetyltransferase (AACT), mevalonate kinase (MK), 3-hydroxy-3-methylglutaryl-CoA synthase (HMGS), phosphomevalonate kinase (PMK), and 3-hydroxy-3-methylglutaryl-CoA reductase (HMGR), isopentenyl diphosphate delta-isomerase (IPPI), (-)-germacrene D synthase (GDS), geranylgeranyl diphosphate synthase (GGPPS), and farnesyl diphosphate synthase (FPPS). The transcriptome data and gene expression profiles are expected to provide invaluable resources for the study of various aspects of M. chamomilla, especially terpenoid biosynthesis in the plant.

Materials and methods

Sample collection and RNA extraction

Matricaria chamomilla L. (Asteraceae) was grown in the botanical garden at Yangtze University in Jingzhou, China (30.35°N, 112.14°E). When the flowers of 4-month-old M. chamomilla opened, roots, stems, leaves, and flowers were collected for transcriptome and sesquiterpenoid anayses. Each sample was cut into small pieces, quick-frozen in liquid nitrogen, and placed at − 80 °C until use. The total RNA of each sample was extracted using the MiniBEST Plant RNA Extraction Kit (Dalian, China) in accordance to the manufacturer’s instructions. Integrity of the total RNA was assessed through electrophoresis on 1% agarose gels, and the RNA was determined using a NanoPhotometer® (Implen, CA).

Library construction, deep sequencing, and de novo assembly

The transcriptome libraries of the plant organs were prepared using a NEBNext®Ultra™ RNA Library Prep Kit for Illumina (NEB, USA) (Li et al. 2016) and were sequenced on an Illumina HiSeq 2500 system at Biomarker Technologies (Beijing, China). The clean reads, generated by removing empty reads, adaptor sequences, and low-quality sequences, were assembled into contigs through Trinity method (Grabherr et al. 2011) to recover extra full-length transcripts across a broad range of expression levels. Trinity method allowed construction of transcripts by connecting the contigs on the basis of the paired-end information of the sequences. Paired-end reads were then applied for gap filling of transcripts, and the longest transcripts thus obtained were defined as unigenes.

Functional annotation and metabolic pathway analysis

For functional annotation, BLASTx (Version 2.2.26) was used with a cut-off E value of 10− 5 to search all of the assembled unigenes of M. chamomilla against the NCBI Nr protein database (Deng et al. 2006), Swiss-Prot (Apweiler et al. 2004), KEGG (Kanehisa et al. 2004), Gene ontology (GO) (Ashburner 2000), and Cluster of Orthologous Groups of proteins (COG) (Tatusov et al. 2000). Hits with E value exceeding 10− 5 were excluded from the analysis, and the optimum comparison results were screened for annotating the assembled unigenes (Altschul et al. 1997). Using the GO program, GO annotation, based on the molecular function, biological process, and cellular component, were obtained. The search of unigenes against the COG database assigned possible functions, and the application of KEGG database allowed analysis of inner-cell KEGG metabolic pathways.

Analysis and validation of the DEGs

To compare the expression characteristics of the genes, the read counts were normalized by calculating the fragments per kilobase of the transcript per million mapped reads and the false discovery rate (FDR) to obtain relative expression value (Trapnell et al. 2010). Differential expression in different organs was assessed using the DESeq R package (version 1.10.1). The package provides a statistical approach for assessing the differential expression by utilizing a negative binomial distribution model (Anders and Huber 2010). FDR was employed to find the threshold of the p values adjusted using Benjamini and Hochberg method (Haynes 2013). An expression was considered significant when |log2 (fold change)| ≥ 1 and the FDR (false discovery rate) was lower than 0.01.

Simple sequence repeats

In this study, frequency and distribution of the simple sequence repeats (SSRs) were analyzed in all obtained unigenes. SSRs were detected using the MISA software (http://pgrc.ipk-gatersleben.de/misa/). The minimum repeat length was set as 12 for di-, tri-, tetra, and hexa-nucleotide repeats, and 15 for penta-nucleotide repeats. FASTA-formatted sequences were the input file (Chen et al. 2016; Reddy et al. 2015).

Gas chromatography–mass spectrometry

For identifying the sesquiterpenes in M. chamomilla, 2 µL of pentane extract of each organ (roots, leaves, stems, and flowers) were directly analyzed through gas chromatography–mass spectrometry (Agilent 5975B GC–MS system with a 6890 N gas chromatograph) with authentic standards under the following temperature program: injection at 250 °C and ramped from 40 to 250 °C at a rate of 10 °C min− 1. The column was DB1-MS (0.25 µm film thickness, 250 µm × 30 m). Helium was carrier gas at 1 mL min− 1. The sesquiterpenes were identified based on comparison of their measured retention times and mass spectra with those of authentic standards. The reference compounds were purchased from Roth (Karlsruhe, Germany), Sigma-Aldrich (Steinheim, Germany), and Herbfine (Jiangxi, China). The mass spectra were searched against the NIST11 database. A previously published GC–MS method and a chiral column were used for chiral analysis (Irmisch et al. 2012).

qRT-PCR validation

The RNAs of the leaves, stems, roots, and flowers were isolated as described above. First-strand cDNA was synthesized from 1 µg high-quality RNA using the PrimeScript™ Reagent Kit (Dalian TaKaRa, China). The 10 × diluted cDNA was used as template for quantitative real-time PCR (qRT-PCR). qRT-PCR was carried out using a SYBR Premix ExTaq™ II Kit (Dalian TaKaRa, China) on the Bio-Rad Mini Opticon™ Real-time PCR system instrument according to the manufacturer’s instructions. The qRT-PCR conditions were the same as those described by Xu et al. (2014). The qRT-PCR data were normalized using 18S rRNA (18SU: 5′-ACCGAGCGTCGAGTGGATTAA-3′ and 18SD: 5′-CTAGTTCGTGCGTCCGTCAAA-3′) as the reference gene (Tao et al. 2016). Gene primers used in qPCR validation are listed in Table S1. The relative expression levels of the target genes in each sample were calculated by the 2− ΔΔCt method (Livak and Schmittgen 2001). The qRT-PCR experiments were repeated using two biological replicates for each organ and three technical replicates for each sample.

Results and discussion

Illumina sequencing and de novo assembly

Recent development of NGS technology has made significant progress in gene discovery possible (Reisfilho 2009). In the present study, four cDNA libraries, derived from root, stem, flower, and leaf samples of M. chamomilla, were sequenced on an Illumina HiSeq2500 platform. The sequences were deposited in NCBI Sequence Read Archive under accession numbers SRR3990144 (roots), SRR3990143 (stems), SRR5557958 (flowers), and SRR3961779 (leaves). The libraries produced 21,561,283,776 (21 Gb) clean reads with 90.82% Q30 (sequencing error rate at 0.1%). The reads were de novo assembled using Trinity software after quality control to yield 139,471 transcripts with a mean length of 950 nt. The clean reads were assembled into 83,741 unigenes with N50 of 1277 nt. The number of unigenes exceeding 1000 bp were 20,875 (Fig. 1a, b). The average unigene length of 704 bp was longer than the values from previous reports on medicinal plants, such as Davidia involucrata Baill (Li et al. 2016), Siraitia grosvenorii (Tang et al. 2011), and Panax quinquefolius (Wu et al. 2010).

Fig. 1
figure 1

Statistics of Illumina short read assembly quality

Functional annotation of unigenes

BLASTX of the above-mentioned 83,741 unigenes annotated 42,138 (50.3%) unigenes with a cut-off E value of 10− 5 (Table 1). Among the unigenes annotated against NCBI non-redundant (nr) protein database, 10.11% displayed a close homology to Vitis vinifera (10.11%), followed by Sesamum indicum (6.42%), Coffea canephora (5.83%), Nicotiana sylvestris (4.20%), Nicotiana tomentosiformis (4.13%), Theobroma cacao (3.14%), Nelumbo nucifera (2.66%), Solanum tuberosum (2.34%), Citrus sinensis (2.25%), and Erythranthe guttata (2.15%) (Fig. 2).

Table 1 Annotation summary of unigenes from Matricaria chamomilla
Fig. 2
figure 2

No. of M. chamomilla transcripts homologous to genes from other species

Gene ontology classification

To describe the functions of the predicted genes in M. chamomilla, we performed GO assignments in the three main domains, cellular component, biological process, and molecular function, and the results were plotted using WEGO (Ashburner 2000; Ye et al. 2006). A total of 21,626 sequences were categorized into 52 functional groups (Fig. 3). “Cell”, followed by “cell part” was the most highly represented GO term under cellular component. In the case of molecular function and biological process categories, catalytic activity and metabolic process were most highly represented, respectively. The same trend in maximum GO categories as the present study was previously reported for the Cassia angustifolia Vahl transcriptome (Li et al. 2013). We also observed high percentages of genes under the categories of “organelle”, “binding”, and “cellular process”, as well as several genes under the terms of “extracellular matrix part”, “protein tag”, “translation regulator activity” and “cell killing”. Under biological process category, the maximum number of unigenes was associated with “metabolic process”, suggesting possibility that novel genes involved in important metabolic activities in M. chamomilla could be identified in the present study.

Fig. 3
figure 3

Gene Ontology categories of the M. chamomilla unigenes. The unigenes are summarized in three categories: cellular component, biological process, and molecular function

Clusters of orthologous group classification

To further validate the transcriptome library and the annotation process, 18,929 sequences out of the 42,138 nr hits were classified into a cluster of orthologous group (COG). The COG annotation distributed the sequences into a minimum of 25 categories according to their biological function (Fig. 4). The largest group was the cluster “general function prediction only” (3433, 18.14%), followed by “replication, recombination, and repair” (1806, 9.54%) and “transcription” (1695, 8.95%). In contrast, the unigenes classified under extracellular and nuclear structures were few. Under secondary metabolite biosynthesis, transport, and catabolism categories, a total of 725 unigenes (3.83%) were assigned and several of these were involved in sesquiterpene biosynthesis (vide infra).

Fig. 4
figure 4

Clusters of orthologous groups (COG) classification of the M. chamomilla unigenes. Out of 36,021 nr hits, 16,008 sequences have a COG classification among the 25 categories

Functional classification by the Kyoto encyclopedia of genes and genomes

Sesquiterpenes have been regarded as pharmacologically active constituents of M. chamomilla. Insight into sesquiterpene biosynthesis in this plant thus can accelerate the future engineering of the pathway that can impart high sesquiterpene content. We adopted KEGG, which has been widely used to identify unigenes involved in the biological pathways (Reddy et al. 2015), to analyze the unigenes for terpenoid pathway. All of these unigenes would be important resources for genetic manipulation of M. chamomilla.

The metabolic pathways of sesquiterpene biosynthesis in plants are under active study (Lange et al. 2000; Son et al. 2014) because of myriad variations in structure and modification. To construct the biological pathways in M. chamomilla, 15,746 sequences were assigned to 128 KEGG pathways. The most predominant among these pathways was ribosome (867, 5.51%), followed by carbon metabolism (658, 4.18%), biosynthesis of amino acids (545, 3.46%), protein processing in the endoplasmic reticulum (525, 3.33%), and spliceosome (430, 2.73%) (Table S2). Regarding the biosynthesis of secondary metabolites, 1115 unigenes were found to be involved (Table 2). Among these unigenes, the cluster of “phenylpropanoid biosynthesis [PATH: ko00940]” represented the largest group (348, 31.21%), followed by “flavonoid biosynthesis [PATH: ko00941]” (90, 8.07%) and “terpenoid backbone biosynthesis [PATH: ko00900]” (81, 7.26%). Most terpene-related enzymes were mapped to the terpenoid backbone biosynthesis [PATH: ko00900] and sesqui- and triterpenoid biosyntheses [PATH: ko00909] groups by KEGG.

Table 2 The unigenes related to secondary metabolites

In plants, the building blocks for isoprenoids are synthesized via MVA and MEP pathways (Seemann et al. 2002; Vranová et al. 2013). The BLASTX search identified and classified 61 enzymes as terpenoid synthase in ko00900 metabolic pathways (Table 3). All of these unigenes were important genetic resources for studying the MVA and MEP pathways in M. chamomilla. These results again confirmed the usefulness of high-throughput sequencing in identifying metabolic pathways genes.

Table 3 The number of unigenes involved in sesquiterpenoid biosynthesis

Identification of differentially expressed genes

To estimate the expression difference of genes in different M. chamomilla organs, the FDR and absolute log2 ratio value were used as thresholds for assessing the significance of differential gene expression (Gao et al. 2015). A total of 29,975 genes had substantial expression differences between the flower, stem, root, and leaf libraries (Fig. 5).

Fig. 5
figure 5

Differentially expressed genes (DEGs) in different M. chamomilla organs. a Hierarchical cluster analysis of common DEGs. A scale indicating the color assigned to log2 FPKM is shown to the right of the cluster. Yellow colors indicate high expression, blue colors indicate low expression, and each horizontal bar represents a single gene. b The number of upregulated and downregulated genes between flowers and stems, flowers and roots, flowers and leaves, stems and roots, stems and leaves, and roots and leaves

Figure 5 depicts summary of DEGs between different M. chamomilla organs. Tables S3 through S8 list detail of the DEGs between flower and stem (Table S3), flower and root (Table S4), flower and leaf (Table S5), stem and root (Table S6), stem and leaf (Table S7), and root and leaf (Table S8). As shown in Fig. 5 5355 DEGs were detected between the flowers and the stem with 1004 genes upregulated and 4351 downregulated (Table S3). Between flowers and roots, a total of 5793 DEGs, 1912 upregulated and 3881 downregulated, was detected (Table S4). Exactly 3000 DEGs, which included 896 upregulated and 2104 downregulated genes, were found between flowers and leaves (Table S5). Then, 5342 DEGs, including 3084 upregulated and 2258 downregulated genes, were observed between stems and roots (Table S6). A total of 4552 DEGs, which included 2707 upregulated and 1845 downregulated genes, were identified between stems and the leaves (Table S7). Finally, 5933 DEGs, which included 3245 upregulated and 2688 downregulated genes, were observed between roots and leaves (Table S8). The results shows that the number of DEGs was highest between flower and root pair, whereas the lowest number was shown between flower and leaf pair. The findings suggested that these DEGs were differentially expressed in different organs of M. chamomilla.

DEGs in terpenoid pathway among four organs

To clarify differential expression of terpenoid biosynthetic genes in the four of M. chamomilla organs, the DEGs involved in the terpenoid metabolism were further analyzed. Gene candidates that may correlate with the terpenoid pathway were examined by cluster analysis of the gene expression pattern with HemI software. One gene cluster for sesquiterpene biosynthesis was identified in the cluster analysis of the intersection of DEGs (Fig. 6a). Comparison of stems with roots revealed that the expression levels of 20 unigenes were significantly higher in stems than in roots, and eight significantly lower in stems compared with roots. Similarly, the expression levels of 19 unigenes were considerably higher in leaves in comparison with stems, and five unigenes showed a substantially lower expression in leaves relative to stems. The expression patterns of 28 unigenes in roots and leaves were comparable, and seven unigenes presented considerably higher expression in roots than in leaves. Figure 6a shows that 24 unigenes exhibited a lower expression in flowers than in leaves; 23 unigenes lower expression in flowers than in stems; and 21 unigenes lower expression in flowers than in roots. These results suggested that the unigenes involved in the MVA pathway were expressed higher in leaves and stems than in roots and flowers. These differential expression patterns of terpene genes may give rise to the differences in the types and contents of sesquiterpenes in various organs.

Fig. 6
figure 6

Expression patterns of genes related to terpenoids. a Expression patterns of MVA pathway-related genes. b Expression patterns of MEP pathway-related genes. Each column shows a pair of organs used in comparison, and the names are shown at the bottom. Each row shows a unigene, and expression differences are observed in different colors. Expression undetected is in black

Another cluster, containing 31 unigenes related to the MEP pathway, was obtained as the union of DEGs (Fig. 6b). Comparison of the expression levels of these genes showed similar spatial expression patterns of the genes in MVA pathway-higher expression in leaves and stems than in roots and flowers. Among the 31 unigenes, 22 were expressed higher in stems than in flowers, 15 higher in leaves than in flowers, 18 higher in stems than in roots, and 15 higher in leaves than in roots.

Putative sesquiterpene biosynthesis genes in different organs

Because sesquiterpenoid profiles in various organs of this chamomile cultivar is not available, flowers, roots, stems, and leaves were analyzed for their terpene composition and content. Flowers had the highest sesquiterpene content, which was nearly 2.5 times of that in roots (Table 4). The flowers accumulated bisabolol oxide A (27.9%), α-bisabolol (22.8%), bisabolol oxide B (17.3%), and β-farnesene (22.5%) as the major sesquiterpenoid components. In addition, small amounts of isocomene and β-caryophyllene were detected, accounting for 0.02 and 0.27% of the total sesquiterpenes, respectively. By contrast, the sesquiterpene blend of the leaves was dominated by α-farnesene (55.4%) and germacrene D (27.5%), whereas α-bisabolol and bisabolol oxide A and B were not detected. Stems and roots displayed comparable bouquet of sesquiterpenes, with β-farnesene as the major compound (stems, 88.6%; roots, 77.2%). Our analysis confirmed the organ-selective production of essential oils in chamomile as described in the previous studies (Presibella et al. 2006).

Table 4 The content of sesquiterpenoids in various organs of M. chamomilla (µg/g fresh weight)

In plants, sesquiterpene synthases is localized in cytosol (Degenhardt et al. 2009), and MVA pathway supplies their precursors (Vranová et al. 2013). The entry reaction of the MVA pathway is catalyzed by AACT to produce acetoacetyl-CoA, which is then converted into 3-hydroxy-3-methylglutaryl-CoA (HMG-CoA) by HMGS. HMG-CoA is consequently reduced to MVA by HMGR. Successive phosphorylation reactions catalyzed by MK and PMK converts MVA into MVA 5-diphosphate. ATP-dependent decarboxylation of MVA 5-diphosphate by MVD yields IPP. Ultimately, reversible conversion of IPP by IPPI into DMAPP completes MVA pathway. IPP and DMAPP are then used for synthesizing isoprenoids in the cytosol and the mitochondria (Vranová et al. 2013). Condensation of DMAPP and IPP by FPPS produces FPP. Subsequently, sesquiterpene skeletons are generated from the precursor FPP by sesquiterpene synthases (Fig. 7a).

Fig. 7
figure 7

Spatial expression analysis of DEGs related to sesquiterpene biosynthesis. a MVA pathway for sesquiterpene biosynthesis in M. chamomilla. Heat-Map of the hierarchical clustering of 12 gene expression profiles in four different organs F flower, R root, S stem, L leaf. Expression ratios are expressed as log2 values. b Expression analysis of the 12 genes in various organs. Relative expression level in flowers was set to 1.0

Economic and pharmacological importance of chamomile justifies detailed study on terpene-related genes in M. chamomilla. We thus selected 10 genes that are known to be responsible for the sesquiterpenoid biosynthesis (AACT, HMGS, HMGR1-3, MK, PMK, IPPI, FPPS, and GDS) and, in addition, two for diterpenoid synthesis (GGPPS1 and GGPPS2) in this plant. Homologs of these terpenoid biosynthesis-linked genes have been isolated and identified from various plant species. In Ginkgo biloba, AACT and MVK genes are suggested to act in terpene trilactone biosynthesis. On salicylic acid and methyl jasmonate treatment, the expression levels of GbAACT and GbMVK correlate positively with terpene trilactone (TTL) production in G. biloba seedlings (Chen et al. 2017). TmHMGS was expressed in needles and the stems at similar levels in Taxus × media (Kai et al. 2006). In M. chamomilla, the highest and lowest expression levels of McHMGS were observed in flowers and stems (Tao et al. 2016). CaHMGS is strongly expressed in hypocotyls and cotyledons, exhibiting good correlation with camptothecin content in the tested tissues (Kai et al. 2013). The HMGR gene from numerous plants has been isolated and its function has been studied. CmHMGR is involved in determining the fruit size of melons (Kobayashi et al. 2002). In Artemisia annua L., the co-overexpression of HMGR and FPPS genes enhances artemisinin content (Wang et al. 2011). When HMGR is overexpressed, the production level of β-sesquiphellandrene in Lactococcus lactis increases by 1.25 to 1.60-fold (Song et al. 2012). Two enantioselective GDSs involved in the biosynthesis of (+)- and (−)-germacrene D were known in Solidago canadensis (Schmidt et al. 1998). In German chamomile, four sesquiterpene synthases (TPS) and one monoterpene synthase were identified, and the expression patterns of TPSs in different organs of chamomile positively correlate with the content of terpenoid products in the corresponding organs (Irmisch et al. 2012).

In the present study, qRT-PCR demonstrated the organ-specific expression of above-mentioned genes, reflecting the composition and content of sesquiterpenoids in the respective plant organs. The expression of seven unigenes operating in MVA pathway, namely, AACT, HMGR1, HMGR2, HMGR3, MK, PMK, and IPPI, was significantly higher in flowers compared to other organs (Fig. 7b) explaining the highest sesquiterpene content in flowers (Table 4). Previous study of Litsea cubeba also showed the highest expression levels of six genes (AACT, HMGR, HMGS, PMVK, MVK, and MVD) in flowers (Han et al. 2013). Among the MVA pathway genes, HMGR is the rate-limiting enzyme in isoprenoid biosynthesis, catalyzing the conversion of HMG-CoA into MVA (Liu et al. 2014). HMGR thus could serve as a marker representing overall MVA pathway flux. The enzyme is highly expressed in flowers and abundantly expressed in leaves and roots in the present study, displaying a similar pattern to that of Panax quinquefolius HMGR (Wu et al. 2012). AACT, MK, PMK, and IPPI are also involved in the MVA pathway to provide building block to form sesquiterpene skeleton. Therefore, the expression levels of these genes positively correlated with the total sesquiterpenoid content in the present study. These results suggested that flowers are crucial to total sesquiterpenoid biosynthesis in M. chamomilla. In practice, only the flowers of chamomile have been harvested for essential oil extraction (Raal et al. 2012). However, the present study demonstrated that the leaves are promising sources of α-farnesene and germacrene D.

FPPS catalyzes the condensation of DMAPP with two molecules of IPP to form FPP that is the precursor of all sesquiterpenes (Zhao et al. 2015). FPPS and GGPPS showed identical expression pattern in M. chamomilla (Fig. 7). The highest expression of FPPS and GGPPS appeared in the leaves, followed by flowers, roots, and stems. These data suggested that the organ-specific expression of FPPS and GGPPS regulated the biosynthesis of α-farnesene and germacrene D. This assumption confirms the previous studies that organ-specific expression of terpene synthases parallels the organ-specific accumulation of essential oils in chamomile (Irmisch et al. 2012).

As the second enzyme involved in the MVA pathway, HMGS combines acetyl-CoA with acetoacetyl-CoA to form HMG-CoA (Liu et al. 2014). In our study, HMGS was predominantly expressed in stems, where β-farnesene is the major sesquiterpene compound. These results alluded that HMGS played an important role in the accumulation of β-farnesene by regulating the flux of MVA pathway. GDS was detected in all the tested M. chamomilla organs at a comparable level. Functional diversity of terpenoids calls for highly complex regulation in the synthesis and metabolism. Combining transcriptome analysis and metabolomics makeup of sesquiterpenoids in M. chamomilla, we concluded that flowers, leaves, stems, and roots respectively synthesizes and accumulates particular composition of sesquiterpenoids. This study confirmed that the organ-specific expression of these genes correlated with the dominant sesquiterpenoid compounds in the respective organs in M. chamomilla. Mining and identification of the sesquiterpenoid biosynthesis genes in M. chamomilla would be useful not only in understanding regulation of terpenoid biosynthesis but also in providing molecular information for genetic improvement of this important medicinal plant.

Identification of unigene-derived microsatellite markers

Microsatellites or SSRs, as important genetic markers, have been extensively employed in assessment of genome organization and phenotypic diversity (Murat et al. 2011). A total of 5555 SSRs were identified out of the 82,946 unigenes generated in the present study (Table S9). The proportion of SSRs were not even among the unigens. The proportion of mono-nucleotide SSR was the largest (2872 or 51.70%), followed by the tri- (1482 or 26.68%), di- (771 or 13.88%), tetra- (75 or 1.35%), penta- (13 or 0.23%), and hexa-nucleotide (9 or 0.16%) (Fig. 8). (GT/TG)n, (CA/AC)n, and (TA/AT)n were the other three major motif types prevailing among the di-nucleotide SSRs, displaying frequencies of 4.2, 3.4, and 2.9%, respectively. Among the 20 types of tri-nucleotide SSRs, TGA (1.4%) was most common, followed by TCA (1.2%), GAA (1.1%), and ATC (1.1%). The unigene-derived markers identified in the present study can provide a useful genetic approach on M. chamomilla and other Asteraceae species.

Fig. 8
figure 8

Distribution of different types of SSRs identified in the Matricaria chamomilla unigenes. The scale at the bottom is the amount of repeated nucleotides

Conclusions

Although M. chamomilla, a non-model plant, has not been accessible for total genome analyses, transcriptome analyses as shown in the present study offers an efficient and cost-effective approach to gain access to genetic data for genomic study. This is the first report of such study to identify and assess genes with emphasis on terpene metabolism in M. chamomilla using NGS technology. A total of 1115 DEG unigenes were assigned to specific secondary metabolites using KEGG. Genes in the sesquiterpenoid synthesis pathway exhibited significant differential expression in different organs. The obtained transcriptome resources would provide foundation for identifying functional genes involved in sesquiterpenoid biosynthesis in M. chamomilla. Several genes of agronomical and medicinal importance as well as numerous microsatellite markers were also found. Our data provide useful information for identifying the genes involved in the secondary metabolism in Asteraceae species, especially M. chamomilla.

Author contribution statement

FX and JC designed and conceived the experiments. WWZ, TTT and XML finished the experiments. WWZ, TTT, and FX analyzed the data. WWZ, TTT, XML, FX, and YLL contributed reagents/materials/analysis tools. WWZ, TTT and FX prepared the manuscript. All authors reviewed and approved the manuscript.