De novo assembly and comparative transcriptome analysis: novel insights into terpenoid biosynthesis in Chamaemelum nobile L.

Liu, Xiaomeng; Wang, Xiaohui; Chen, Zexiong; Ye, Jiabao; Liao, Yongling; Zhang, Weiwei; Chang, Jie; Xu, Feng

doi:10.1007/s00299-018-2352-z

De novo assembly and comparative transcriptome analysis: novel insights into terpenoid biosynthesis in Chamaemelum nobile L.

Original Article
Published: 15 November 2018

Volume 38, pages 101–116, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Plant Cell Reports Aims and scope Submit manuscript

De novo assembly and comparative transcriptome analysis: novel insights into terpenoid biosynthesis in Chamaemelum nobile L.

Download PDF

Xiaomeng Liu¹,
Xiaohui Wang²,
Zexiong Chen³,
Jiabao Ye¹,
Yongling Liao¹,
Weiwei Zhang¹,
Jie Chang^4,5 &
…
Feng Xu ORCID: orcid.org/0000-0003-3212-6284¹

1253 Accesses
21 Citations
3 Altmetric
Explore all metrics

Abstract

Key message

Analysis of terpenoids content, transcriptome from Chamaemelum nobile showed that the content of the terpenoids in the roots was the highest and key genes involved in the terpenoids synthesis pathway were identified.

Abstract

Chamaemelum nobile is a widely used herbaceous medicinal plant rich in volatile oils, mainly composed of terpenoids. It is widely used in food, cosmetics, medicine, and other fields. In this study, we analyzed the transcriptome and the content and chemical composition of the terpenoids in different organs of C. nobile. Gas chromatography–mass spectrometry analysis showed that the total content of the terpenoids among C. nobile organs was highest in the roots, followed by the flowers. Illumina HiSeq 2500 high-throughput sequencing technology was used to sequence the transcripts of roots, stems, leaves, and flowers of C. nobile. We obtained 139,757 unigenes using the Trinity software assembly. A total of 887 unigenes were annotated to secondary metabolism. In total, 55,711 differentially expressed genes were screened among different organs of C. nobile. We identified 16 candidate genes that may be involved in the terpenoid biosynthesis from C. nobile and analyzed their expression patterns using real-time PCR. Results showed that the expression pattern of these genes was tissue-specific and had significant differential expression levels in different organs of C. nobile. Among these genes, 13 were expressed in roots with the highest levels. Furthermore, the transcript levels of these 13 genes were positively correlated with the content of α-pinene, β-phellandrene, 1,8-cineole, camphor, α-terpineol, carvacrol, (E,E)-farnesol and chamazulene, suggesting that these 13 genes may be involved in the regulation of the synthesis of the volatile terpenoids. These results laid the foundation for the subsequent improvement of C. nobile quality through genetic engineering.

De novo assembly and comparative transcriptome analysis: novel insights into sesquiterpenoid biosynthesis in Matricaria chamomilla L.

Article 23 June 2018

Transcriptome analysis identifies putative genes involved in triterpenoid biosynthesis in Platycodon grandiflorus

Article 21 July 2021

Transcriptome and metabolite analyses reveal the complex metabolic genes involved in volatile terpenoid biosynthesis in garden sage (Salvia officinalis)

Article Open access 22 November 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Roman chamomile (Chamaemelum nobile L.) is a perennial herb that belongs to the Asteraceae family and is native to southwestern Europe and now is distributed throughout Europe, South Africa and Southwest Asia. (Ma et al. 2007). Chamaemelum nobile is rich in volatile aromatic oil, and its active components mainly include ester volatile oil, flavonoids, and terpenoids (Farhoudi 2013). Among these active components, terpenoids, especially sesquiterpenoids, are the main pharmacological components. At present, various terpenoids, such as β-pinene, germacrene D, and chamazulene (Newall et al. 1996), have been isolated and identified. These terpenoids possess anti-inflammatory, bacteriostatic, antioxidant, and anticancer properties and are widely used in pharmaceuticals, cosmetics, spices, and other fields (Yu and Utsumi 2009). However, the essential oil content of C. nobile is immensely low, and the medicinal active ingredients extracted from the plant are far from meeting the market demand. Therefore, improving the terpenoid content in C. nobile is one of the goals in the research field of C. nobile cultivation. With the progress in isolation and identification of genes related to the synthesis of compounds in metabolic pathways, the use of genetic engineering to alter the transcript level of key genes to increase the yield and species of compounds has become a means of improving plant quality (Banyai et al. 2010). However, the detailed analysis of transcriptome and genomic information of C. nobile has not yet been reported.

Terpenoids are a class of natural organic compounds composed of isoprene (C5) structural units widely found in nature. At present, more than 60,000 terpenoids, including their derivatives, are known (Bohlmann and Keeling 2008). These compounds can be classified into monoterpene (C10), sesquiterpene (C15), diterpene (C20), and triterpene (C30), according to the number of isoprene units, and can also be categorized into chain, monocyclic, bicyclic, and tricyclic terpenes, according to their carbocycle number. In plants, the terpene compounds have two synthetic routes: the mevalonate (MVA) pathway (Buhaescu and Izzedine 2007; Zenoni and Delledonne 2010) and the 2-methyl-D-erythritol 4-phosphate (MEP) pathway (Heuston et al. 2012). The MVA pathway mainly occurs in the w of eukaryotes to provide isoprene units that eventually form sesquiterpenes and triterpenes (Liu et al. 2006; Zhao et al. 2013); the MEP pathway exists in some prokaryotes and plant plastid and leads to synthesis of monoterpene, diterpenoids, and some polyterpenes (Schilmiller et al. 2009). In recent years, several genes involved in biosynthetic pathway of terpenoid genes have been identified from C. nobile (Yan et al. 2017; Meng et al. 2016; Cheng et al. 2016). However, terpenoid biosynthetic pathway is a complex process that involves various genes specifically expressed in different organs. Thus, the regulation mechanism of terpenoid biosynthesis in a given organism cannot be fully understood just by identifying and characterizing a few genes.

In this study, we used Illumina HiSeq 2500 high-throughput sequencing technology to sequence the transcriptomes in roots, stems, leaves, and flowers of C. nobile. The transcriptome database, namely the unigene library, thus constructed was functionally annotated. In addition, the composition and content of terpenoid in C. nobile were analyzed by gas chromatography–mass spectrometry (GC–MS), and several genes involved in the biosynthesis of terpenoids were identified by correlation analysis between transcript level of genes and terpenoid content. Our transcriptome data and related findings would provide valuable genetic resources for studying various biochemical processes of C. nobile, particularly the biosynthesis of terpenoids.

Materials and methods

Plant materials

We planted the C. nobile plants in the botanical garden at Yangtze University, located in Jingzhou, Hubei province, China (around N30.35, E112.14). Then, we collected the roots, stems, leaves, and flowers of C. nobile after one week of flowering for terpenoid content and transcriptome analysis.

Gas chromatography–mass spectrometry (GC–MS)

The essential oil of each organ (roots, stems, leaves, and flowers) from C. nobile was extracted using the method described in Chinese Pharmacopoeia Appendix XD and Soxhlet extraction (Misra et al. 2003). The terpenoid composition and content of the pentane extract were directly analyzed on GC–MS system (Agilent 5975B with 6890N gas chromatograph) with authentic standards under the following temperature program: injection at 250 °C and ramped from 40 to 250 °C at a rate of 10°C·min⁻¹ (Irmisch et al. 2012). We used the DB1-MS (0.25 µm film thickness, 250 µm × 30 m) as the column with helium at 1 ml·min⁻¹ as carrier gas. Injection volume was 2 µL. The terpenoids were identified based on comparison of their measured retention times and mass spectra with those of authentic standards. The reference compounds were purchased from Roth (Karlsruhe, Germany), Sigma-Aldrich (Steinheim, Germany), and Herbfine (Nanchang, China). In addition, we used the NIST11 database to search the fragmentation pattern to identify the corresponding terpenoid compounds.

RNA extraction

After cleaning and cutting the bulk materials into small pieces, we collected the roots, stems, leaves, and flowers separately and immediately frozen in liquid nitrogen and stored at − 80 °C for later use. According to the manufacturer’s instructions, we isolated the total RNA from each sample using TaKaRa MiniBEST plant RNA extraction kit (Dalian, China). We then assessed the quality and quantity of the total RNA using 1% agarose gels and a NanoPhotometer® spectrophotometer (Implen, CA, USA).

Library construction, deep sequencing, and de novo assembly

We conducted library construction and RNA-Seq by Biomarker Biotechnology Corporation (Beijing, China). We then prepared the transcriptome libraries of roots, stems, leaves and flowers using the NEBNext^® Ultra^™ RNA library prep kit for Illumina (NEB, USA) (Li et al. 2016). And the libraries were sequenced using the Illumina HiSeq 2500 high-throughput sequencing platform to obtain a large number reads referred to as raw data. We pooled and assembled all the clean reads using the Trinity de novo assembly program (Grabherr et al. 2011). First, we broke the sequenced reads into smaller segments (k-mer), and then we extended these small segments into contigs. Next, we used the overlapping portions of these fragments to obtain a collection of fragments. Finally, we relied on genome alignments to construct transcripts, and we obtained the UniGene database by aggregating the assembled sequences.

Functional annotation and metabolic pathway analysis

We used the BLAST (Version 2.2.26) (Kent 2002) software to obtain information on protein function annotation to compare the sequence of the UniGene databases and the National Center for Biotechnology Information non-redundant (NCBI nr) (Deng et al. 2006), Swiss-Prot (Apweiler et al. 2004), Gene ontology (GO) (Ashburner 2000), Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa et al. 2004), and Cluster of Orthologous Groups of proteins (COG) (Tatusov et al. 2000) databases. In addition, we set the E value ≤ 10⁻⁵ to obtain the protein sequences that were highly similar to the C. nobile unigene sequences and finally, to acquire function annotations. We utilized the GO program to obtain GO annotation, according to molecular function, biological process, and cellular component. Meanwhile, we searched the assembled unigenes against the COG and KEGG databases to predict possible functional classifications and molecular pathways and to classify possible COG functions and KEGG pathways, respectively.

Identification of differentially expressed genes (DEGs)

For all the comparisons, we normalize the read counts by calculating the fragments per kilobase of transcript, per million mapped read (FPKM) (Trapnell et al. 2010) and the false discovery rate (FDR) to obtain relative expression levels. The DESeq R package (1.10.1) for different expression analyses in different organs provided statistical routines that determine the differential expression in digital gene expression data using a model based on negative binomial distribution (Anders and Huber 2010). P value corresponds to differential gene expression test. The threshold of P value in multiple tests was determined through manipulating the FDR value (Benjamini and Yekutieli 2001). An FDR < 0.01 and fold change ≥ 2 were used as the threshold to determine the significance of gene expression differences.

Simple sequence repeats (SSRs)

All the unigenes obtained were searched to determine the frequency and distribution of SSRs. We detected the SSRs using an SSR identification tool, called the Perl script MISA (microsatellite searching tool) (http://pgrc.ipk-gatersleben.de/misa/). This tool accepts FASTA-formatted sequence files and reports the GenBank ID, microsatellite motifs (dimers to hexamers), number of repeats, and sequence coordinates for each microsatellite (Chen et al. 2016; Rama et al. 2015). We set the search parameters for the maximum motif-length group to hexamer, and the minimum number of repeats was set at five.

Co-expression analysis

To study the interaction between genes, we analyzed the correlation coefficient between each two genes and used correlation coefficients to determine the correlation between gene expressions. We set the thresholds (> 0.9 or < − 0.9) to screen certain genes for network analysis to ensure a strong gene co-expression relationship. We constructed the gene co-expression network using the Cytoscape software because network elements represent the ways in which genes might regulate other genes; gene networks can be divided into subgraphs, called k-core networks, where all genes are linked to at least k in other genes in the subgraph (Huber et al. 2007).

qRT-PCR validation

We prepared the samples using the same method mentioned above and isolated the total RNA from the plant in different stages of development. Experiments were performed on two independent biological replicates, each containing three technical replicates for each sample. We synthesized first-strand cDNA from total RNA using PrimeScriptTM One Step RT-PCR kit ver. 2 (Takara, Japan) and diluted tenfold as template. We normalized the data of qPCR using 18S gene as the reference gene. Since we were mainly concerned with the key genes involved in the synthesis of sesquiterpenes, we selected 16 relevant genes excluding the monoterpene synthase genes for RT-PCR and designed specific primers based on the open reading frame of the sequence database. Primers used in qPCR are shown in Supplementary Table S1. We also performed real-time assays by SYBR Premix ExTaq^™ II kit (Dalian TaKaRa) and Bio-Rad Mini OpticonTM Real-time PCR Mini Cycler (BioRad, Hercules, CA, USA), with the following cycle conditions: 95 °C for 5 min, followed by 40 cycles of 95 °C for 10 s, 60 °C for 30 s; 95 °C for 15 s, 60 °C for 60 s, and 95 °C for 15 s (Xu et al. 2014). Furthermore, we calculated the relative expression fold of each sample by its Ct value normalized to the Ct value of the reference gene using the 2^−ΔΔCt method (Livak and Schmittgen 2001).

Results and discussion

Composition and content of terpenoid in different organs of C. nobile

As shown in Table 1, the total terpenoid content in C. nobile was significantly highest in roots, followed by flowers, being significantly low in stems and leaves. The total terpenoids in the stems and leaves were less than 5% and 10% of the roots and the flowers, respectively. The main terpenoids in the roots were chamazulene (71.31%), 1,8-cineole (9.33%), (E,E)-farnesol (4.00%), and germacrene D (3.91%). Similarly, the main terpenoids in flowers were chamazulene (65.93%), germacrene D (19.87%), and terpinen-4-ol (3.17%). In stems and leaves, the main components were germacrene D (26.36%) and carvacrol (25.39%), respectively. Recently, Farhoudi (2013) reported the composition and content of terpenoids using the whole C. nobile seedlings as sample, but the composition and content of the terpenoids were not separately measure in different organs of C. nobile. In this study, we compared the composition and content of terpenoids among different organs. Compared with the results of Farhoudi (2013), the composition of terpenoids was similar, and the main component of total terpenoids was chamazulene. Furthermore, we found that the content of chamazulene, germacrene D and (E,E)-Farnesol were higher in roots and flowers compared to stems and leaves.

Table 1 The content of terpenoids (µg/g fresh weight) in the different organs of Chamaemelum nobile

Full size table

Sequencing analysis and de novo assembly

We constructed separate cDNA libraries from C. nobile root, stem, leaf, and flower samples each. In total, the libraries produced 23,572,568,706 clean reads. Each sample produced no less than 4 Gb data, the percentage of Q30 reached more than 88.03%, and the GC content was 42.88–43.90% (Table 2). We brought all the clean reads together and assembled them de novo using Trinity (Grabherr et al. 2011). The sequencing reads deposited with the National Center for Biotechnology Information (NCBI) under the SRA accession numbers SRS1610818 (T01, roots), SRS1610819 (T02, stems), SRS1610820 (T03, leaves), and SRS1610821 (T04, flowers). We obtained a total of 40,201 transcripts with an average length of 856 nt and an N50 length of 1280 nt. A total of 139,757 unigenes, of which 22,198 (15.88%) were longer than 1000 bp (Supplementary Fig. S1), were obtained with a total length of 83,444,804 nt with an average length of 597 nt and an N50 length of 924 nt. These data illustrated that the results of the assembly were favorable and applicable for subsequent studies. We compared the unigene sequence with the Nr, Swiss-Prot, GO, COGs of proteins, KOG, Pfam, and KEGG databases using BLAST. As shown in Supplementary Table S2, we obtained 37,644 unigenes with annotated information, of which 10,642, 17,427 and 14,383 were annotated against COG, GO, and KEGG databases, respectively.

Table 2 Sample sequencing data evaluation

Full size table

Nr annotation analysis

By comparing the Nr database with non-redundant proteins, the species distribution of the best match results in Nr was obtained (Fig. 1). The C. nobile unigenes showed the closest matches with Vitis vinifera (3054, 11.10%), followed by Sesamum indicum (2046, 7.43%), Coffea canephora (1817, 6.60%), Nicotiana sylvestris (1363, 4.95%), Nicotiana tomentosiformis (1291, 4.69%), Theobroma cacao (1063, 3.86%), Nelumbo nucifera (774, 2.81%), Solanum tuberosum (744, 2.70%), Erythranthe guttata (739, 2.69%), and Citrus sinensis (701, 2.55%).

Functional classification

GO is a functional classification system of genes and have three ontologies that describe the molecular function of a gene, its cellular components, and biological processes in which the gene is involved. To understand macroscopically the functional distribution of unigenes, we used WEGO to categorize the GO annotation results of unigenes into GO functions and to draw a histogram (Ashburner 2000; Ye et al. 2006). A total of 83,658 unigenes were noted in the 49 GO functional categories (Fig. 2), among which 16 (25%), 13 (25.1%) and 20 (49.9%), respectively, belonged to the molecular function, cellular components, and biological processes. Catalytic activity, cell part, and metabolic process were most highly represented, respectively, in these three subclasses.

The unigenes of C. nobile were compared with those in the COG database. We predicted the possible unigene functions, which were classified statistically. A total of 10,642 unigenes were annotated to 25 protein families, accounting for 7.61% of the total unigenes (Fig. 3). In the 25 COG categories, the “general function prediction only” accounted for the largest proportion (3002 unigenes, 28.21%), followed by “replication, recombination, and repair” (1787 unigenes, 16.79%) and “transcription” (1644 unigenes, 15.45%). Only three sequences were annotated to the “nuclear structure”, and no sequences were annotated into the “extracellular structures” category. Notably, 599 (5.63%) of the functional unigenes were annotated to “secondary metabolite biosynthesis, transport, and catabolism” category.

To identify unigenes involved in metabolic pathway of C. nobile, we compared all unigene sequences with the reference canonical pathways in KEGG (Kanehisa et al. 2004). The results showed that 29,056 unigene sequences were successfully assigned to 335 KEGG pathways (Supplementary Table S3). The number of unigenes involved in the carbon metabolism pathway was the largest (473, 1.63%), followed by protein processing in the endoplasmic reticulum (460, 1.58%), spliceosome (439, 1.51%), and biosynthesis of amino acids (436, 1.50%). In addition, 19 KEGG pathways included 887 unigenes associated with secondary metabolism (Table 3). Among them, the cluster of “Phenylpropanoid biosynthesis (PATH: ko00940)” was predominant (294, 33.15%), followed by “Terpenoid backbone biosynthesis (PATH: ko00900)” (118, 13.30%) and “Flavonoid biosynthesis (PATH: ko00941)” (91, 10.26%). These annotations could help further study in the metabolic processes and offer the possibility of identifying genes involved in the secondary metabolism in C. nobile. We screened several genes in the terpene biosynthetic pathway by RNA-seq technique to identify 24, 27, and 27 genes involved in the MVA pathway, the MEP pathway, and the branch point, respectively. At the same time, we identified 12 monoterpene synthases by local blast based on the protein sequence of the previously known monoterpene synthases (Dudareva et al. 1996, 2003; Landmann et al. 2007; Nagegowda et al. 2008) and our transcriptome data (Table 4). These genes will be important resources for subsequent molecular breeding of C. nobile to improve terpenoid accumulation.

Table 3 The unigenes related to secondary metabolite of C. nobile

Full size table

Table 4 The number of unigenes involved in terpenoid biosynthesis

Full size table

Identification of DEGs

To analyze the specificity of unigene expression in various organs of C. nobile, we mapped the reads from the roots, stems leaves and flowers libraries to the assembled transcriptome. Most of the unigenes (11,244, 18.6%) were commonly expressed in all four organs of C. nobile (Fig. 4). We detected 18,273 tissue-specific unigenes (30.2%) from different tissues of C. nobile. Among these unigenes, 5324 (8.8%), 3012 (5.0%), 4129 (6.8%), and 5808 (9.6%) were expressed exclusively in roots, stems, leaves, and flowers of C. nobile, respectively.

To assess differential expression level of genes in various C. nobile organs, we used a FDR and a fold change (FC) in the log10 ratio (Gao et al. 2015). According to the relative expression level between the two samples, the DEGs can be divided into up-regulated and down-regulated genes (Fig. 5a, b). In total, 55,711 DEGs were discovered among different organs of C. nobile (Fig. 6a), and we compared the two combinations between the four tissue samples of C. nobile (Fig. 6b), respectively, to obtain the up-regulated and down-regulated genes in the latter relative to the former. Tables S4 through S9 list detail of the DEGs between root and stem (Supplementary Table S4), root and leaf (Supplementary Table S5), root and flower (Supplementary Table S6), stem and leaf (Supplementary Table S7), stem and flower (Supplementary Table S8), and leaf and flower (Supplementary Table S9). The results show that the number of DEGs between flower and root pairs is the highest, and interestingly, the number of annotated DEGs is also the most, with a total of 9125 DEGs (Table 5), indicating that relatively larger number of gene plays an important role in the process of growth and development of the roots or flowers of C. nobile.

Table 5 Different expression gene annotation statistic

Full size table

DEGs in terpenoid pathway among four organs

Terpenoids are diverse in type and structure and are widely found in higher plants. To elucidate the expression patterns of terpenoid biosynthetic genes in four organs, we screened out all unigenes using RNA-seq data. We identified 78 unigenes that were involved in terpenoid synthesis, and their expression level was presented as a heat map (Fig. 7). Furthermore, 28 DEGs were present in the biosynthetic pathway of terpenoids. The number of DEGs between organs is shown in Table 6. The gene ID and gene name of the DEGs between the organs were shown in Supplementary Table S10. In detail, the number of DEGs between roots and stems was the highest, with 11 up-regulated and 5 down-regulated genes in the stem, followed by the roots and leaves with 13 DEGs. The number of DEGs in leaves and flowers was the least, with only two down-regulated genes.

Table 6 Statistics on the number of differential genes between different tissues

Full size table

Construction and analysis of DEG co-expression network for terpenoid synthesis in various organs of C. nobile

Most of the gene co-expression networks have been constructed on the basis of the correlation of gene expression data (Nikiforova and Willmitzer 2007) in an undirected graph. The nodes in the network represent genes or gene products, and genes with similar expression profiles are linked to form networks. The edge represents a significant association between a pair of nodes in gene expression, that is, co-expression between genes. The construction of a co-expression network is conceptually simple and intuitive. Using RNA-seq data, we obtained the gene expression data and analyzed the relationship, including both correlation coefficients and correlations, between each two genes. The correlation coefficient represents the degree of similarity between the expression profiles of the two genes, with higher values indicating greater similarity (Carlson et al. 2006). The relationship indicates whether the expression of the two genes is positively or negatively correlated. The similarity of gene expression enabled us to analyze possible interactions between genes or gene products to understand the interactions between genes and find core genes (Barabasi and Oltvai 2004), which are important hubs and play a key role in the network. In the present study, 28 DEGs involved in terpenoid biosynthesis were screened based on RNA-seq data of four C. nobile organs. We constructed the co-expression network based on the FPKM values of these genes (Fig. 8). The octagon represents the gene, and the straight line represents the regulatory relationship of the gene. The rank of k-core values describes the complexity of gene-related relationships (Supplementary Table S11). The co-expression network of 28 DEGs was divided into three subnetworks (Fig. 8). The k-core value of DXR2, DXR3, MDS, GPPS1, and GPPS2 genes was 7; 6 for HDR, HMGR3, FPPS1, and FPPS3 genes; and 5 for HMGR2, HMGR4, and FPPS6. These genes might play a key role in terpenoid biosynthesis of C. nobile.

Putative terpenoid biosynthesis genes in different organs

There are two pathways to provide isoprene units in terpene biosynthesis: the MVA and the MEP pathways (Fig. 9a). We further profiled the expression of 16 selected genes that are known to participate in the terpenoid biosynthesis of C. nobile using qRT-PCR. We then generated the differential expression heat maps of these genes based on the qRT-PCR data (Fig. 9a). Overall, RNA-seq data did not significantly correlated with qRT-PCR data from most of the genes (Fig. 9b). The consistency between the FPKM value and the qRT-PCR value was not desirable possibly because we did not repeat the sequencing of the C. nobile transcriptome. Therefore, we chose the qRT-PCR data for assessing the expression level of selected tissue-specific genes. Most of the tested genes, including AACT, HMGS, HMGR1, HMGR2, MDPS, MDS, MK1, MK2, PMK, IPPI, FPPS1, DXS2, and GPPS, had the highest expression levels in roots. The expression levels of these genes positively correlated with the total terpenoid content, indicating that these genes may play an important regulatory role in terpenoid synthesis. We attempted to confirm the relationship between the transcript level of these genes and the accumulation of various components of terpenoids in different organs of C. nobile. We edited the script using the software EditPlus and RStudio to calculate the correlation coefficient (Supplementary Table S12). As indicated in Fig. 9c, the expression levels of most terpenoid genes (AACT, HMGS, HMGR1, HMGR2, MDPS, MDS, MK1, MK2, PMK, IPPI, FPPS1, DXS2, and GPPS) positively correlated with the content of most of the monoterpenes (α-pinene, β-phellandrene, 1,8-cineole, camphor, α-terpineol, and carvacrol) and the sesquiterpenes ((E,E)-farnesol and chamazulene). Therefore, these 13 genes extensively contributed in regulating the biosynthesis of terpenoids in C. nobile. Moreover, these 13 genes are involved in MEP, MVA and common downstream pathways of terpenoid biosynthesis, respectively, suggesting cross-talk between MEP and MVA pathways in C. nobile. Interestingly, GDS1 was mainly expressed in flowers, and germacrene D was higher in flowers than other organs. Therefore, GDS1 was likely to be one of the key genes in the accumulation of germacrene D. Several genes involved in the synthesis of terpenoids have been isolated and identified from C. nobile. For example, CnHMGR showed different expression patterns in roots, stems, leaves, and flowers, with higher expression in flowers and the lowest expression in stems (Meng et al. 2016). HMGS gene of C. nobile was also cloned and identified from C. nobile, CnHMGS had the highest expression level in flowers, followed by roots, and low expression levels in stems and leaves (Cheng et al. 2016). In the present study, CnHMGR1 had the highest expression level in roots, followed by flowers and stems, and low expression levels in leaves. CnHMGR2 also had the highest expression level in roots, followed by flowers, and low expression levels in leaves and stems. Similarly, CnHMGS had the highest expression level in roots, followed by stems and leaves, and with the lowest level of expression in flowers. These findings were different from the tissue expression patterns of the genes in our previous studied, which may be related to the different functions of different members of the HMGS and HMGR gene families in the synthesis of terpenoids and growth in C. nobile.

Overall, the accumulation levels of terpenoids significantly varied among the four organs of C. nobile. The differential expression of the terpene-related genes in different organs may be one of the reasons for this differential accumulation. Therefore, analysis of the biosynthetic genes involved in terpenoids provides deeper understanding into terpene synthesis as well as molecular basis for improvement in the quality of this medicinal plant through biotechnology.

Identification of unigene-derived microsatellite markers

SSRs are a series of simple, repetitive short-sequence composition, widely distributed in the eukaryotic genome. SSR, as a molecular marker, is widely used in crossbreeding, population genetic diversity, genetic linkage map construction, and other research fields. SSR plays an important role in genomic and phenotypic diversity and is broadly applied in molecular marker technology (Murat et al. 2003). At present, molecular markers of C. nobile are considerably limited. The present study identified 4614 SSRs from all unigenes (Supplementary Table S13), and the mononucleotide repeats were the most abundant (2241), accounting for 48.57% of the total SSRs, followed by tri- (1416, 30.69%), di- (657, 14.24%), tetra- (57, 1.24%), penta- (10, 0.02%), and hexa-nucleotide, with 8 SSRs (Fig. 10). In the di-nucleotide repeats, AC/CA, GT/TG, AT/TA type distribution ratios were predominant, accounting for 27.25%, 25.11%, and 27.25%, respectively; in the tri-nucleotide, GAT was the most widely distributed, accounting for 5.72%. The study of C. nobile SSRs will provide important resource in genetic marker research of C. nobile.

Conclusion

In this study, we determined the composition and content of terpenoids in different organs of C. nobile and found that the roots were the main synthesis site of terpenoids with chamazulene being the main component among the terpenoids. Transcriptome datasets, established from roots, stems, leaves, and flowers of C. nobile, generated basic molecular information for identifying genes involved in terpenoid synthesis through the MVA and MEP pathways. We screened 78 unigenes, of which 28 DEGs were identified, putatively involved in terpenoid biosynthesis. In addition, we found that 13 genes (AACT, HMGS, HMGR1, HMGR2, MDPS, MDS, MK1, MK2, PMK, IPPI, FPPS1, DXS2, and GPPS) among them had the highest expression levels in roots. The expression levels of these 13 genes positively correlated with the content of most terpenoid compounds in different organs. These findings would be helpful for understanding of the anabolic pathways of terpenoids, which are the active ingredients of many medicinal plants, and provide diverse genetic resources for further study.

Author contribution statement

FX and JC conceived the idea and designed the experiment. XL, XW, JY, ZC, YL and WZ carried out various experiments. XL, XW, JY and FX analyzed the data. XL and FX wrote the manuscript.

References

Anders S, Huber W (2010) Differential expression analysis for sequence count data. Genome Biol 11:1–12
Article Google Scholar
Apweiler R, Bairoch A, Wu CH, Barker WC, Boeckmann B, Ferro S (2004) UniProt: the universal protein knowledgebase. Nucleic Acids Res 32:115–119
Article Google Scholar
Ashburner M (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25:25–29
Article CAS Google Scholar
Banyai W, Kirdmanee C, Mii M, Supaibulwatana K (2010) Overexpression of farnesyl pyrophosphate synthase (FPS) gene affected artemisinin content and growth of Artemisia annua L. Plant Cell Tiss Org 103:255–265
Article CAS Google Scholar
Barabasi AL, Oltvai ZN (2004) Network biology: understanding the cell’s functional organization. Nat Rev Genet 5:101–113
Article CAS Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188
Article Google Scholar
Bohlmann J, Keeling CI (2008) Terpenoid biomaterials. Plant J 54:656–669
Article CAS Google Scholar
Buhaescu I, Izzedine H (2007) Mevalonate pathway: A review of clinical and therapeutical implications. Clin Biochem 40:575–584
Article CAS Google Scholar
Carlson MR, Zhang B, Fang Z, Mischel PS, Horvath S, Nelson SF (2006) Gene connectivity, function, and sequence conservation: predictions from modular yeast co-expression networks. BMC Genom 7:40
Article Google Scholar
Chen H, Xin C, Jing T, Yong Y, Liu Z, Hao X, Wang L, Wang S, Liang J, Zhang L, Yin F, Cheng X (2016) Development of gene-based SSR markers in rice bean (Vigna umbellata L.) based on transcriptome data. PLoS One 11:e0151040
Article Google Scholar
Cheng S, Wang X, Xu F, Chen Q, Tao T, Lei J, Zhang W, Liao Y, Chang J, Li X (2016) Cloning, expression profiling and functional analysis of CnHMGS, a gene encoding 3-hydroxy-3-methylglutaryl coenzyme a synthase from Chamaemelum nobile. Molecules 21:316
Article Google Scholar
Deng YY, Li JQ, Wu SF, Zhu YP, Chen YW, He FC (2006) Integrated nr database in protein annotation system and its localization. Comput En 32:71–74
Google Scholar
Dudareva N, Cseke L, Blanc VM, Pichersky E (1996) Evolution of floral scent in Clarkia: novel patterns of S-linalool synthase gene expression in the C. breweri flower. Plant Cell 8:1137–1148
Article CAS Google Scholar
Dudareva N, Martin D, Kish CM, Kolosova N, Gorenstein N, Faldt J, Miller B, Bohlmann J (2003) (E)-beta-ocimene and myrcene synthase genes of floral scent biosynthesis in snapdragon: function and expression of three terpene synthase genes of a new terpene synthase subfamily. Plant Cell 15:1227–1241
Article CAS Google Scholar
Farhoudi R (2013) Chemical constituents and antioxidant properties of Matricaria recutita and Chamaemelum nobile essential oil growing wild in the south west of Iran. J Essent Oil Bear Plant 16:531–537
Article CAS Google Scholar
Gao B, Zhang D, Li X, Yang H, Zhang Y, Wood AJ (2015) De novo transcriptome characterization and gene expression profiling of the desiccation tolerant moss Bryum argenteum following rehydration. BMC Genom 16:1–14
Article Google Scholar
Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I (2011) Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol 29:644–652
Article CAS Google Scholar
Heuston S, Begley M, Davey MS, Eberl M, Casey PG, Hill C, Gahan CGM (2012) HmgR, a key enzyme in the mevalonate pathway for isoprenoid biosynthesis, is essential for growth of Listeria monocytogenes EGDe. Microbiology 158:1684–1693
Article CAS Google Scholar
Huber W, Carey VJ, Long L, Falcon S, Gentleman R (2007) Gentleman R: Graphs in molecular biology. BMC Bioinformatics 8:S8
Article Google Scholar
Irmisch S, Krause ST, Kunert G, Gershenzon J, Degenhardt J, Köllner TG (2012) The organ-specific expression of terpene synthase genes contributes to the terpene hydrocarbon composition of chamomile essential oils. BMC Plant Biol 12:1–13
Article Google Scholar
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M (2004) The KEGG resource for deciphering the genome. Nucleic Acids Res 32:277–280
Article Google Scholar
Kent WJ (2002) BLAT-the BLAST-like alignment tool. Genome Res 12:656–664
Article CAS Google Scholar
Landmann C, Fink B, Festner M, Dregus M, Engel KH, Schwab W (2007) Cloning and functional characterization of three terpene synthases from lavender (Lavandula angustifolia). Arch Biochem Biophys 465:417–429
Article CAS Google Scholar
Liu Z, Li X, Simoneau AR, Jafari M, Zi X (2006) Isoprenoid biosynthesis in plants: pathways, genes, regulation and metabolic engineering. J Biol Sci 351:371–374
Google Scholar
Livak KJ, Schmittgen TD (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2^–∆∆C _T method. Methods 25:402–408
Article CAS Google Scholar
Ma CM, Winsor L, Daneshtalab M (2007) Quantification of spiroether isomers and Herniarin of different parts of Matricaria matricarioides and flowers of Chamaemelum nobile. Phytochem Anal 18:42–49
Article Google Scholar
Meng XX, Yan JP, Liu XM, Liao YL, Chang J, Xu F (2016) Cloning and expression analysis of HMGR gene from Chamaemelum nobile. Acta Agric Boreali Sin 31:68–75
Google Scholar
Misra I, Wang CZ, Miziorko HM (2003) The influence of conserved aromatic residues in 3-hydroxy-3-methylglutaryl-CoA synthase. J Biol Chem 278:26443–26449
Article CAS Google Scholar
Murat C, Riccioni C, Belfiori B, Cichocki N, Labbé J, Morin E, Tisserant E, Paolocci F, Rubini A, Martin F (2011) Distribution and localization of microsatellites in the Perigord black truffle genome and identification of new molecular markers. Fungal Genet Biol 48:592–601
Article CAS Google Scholar
Nagegowda DA, Gutensohn M, Wilkerson CG, Dudareva N (2008) Two nearly identical terpene synthases catalyze the formation of nerolidol and linalool in snapdragon flowers. Plant J 55:224–239
Article CAS Google Scholar
Newall CA, Anderson LA, Phillipson JD (1996) Herbal medicines. A guide for health-care professionals. Pharm Press 296
Nikiforova VJ, Willmitzer L (2007) Network visualization and network analysis. Plant Syst Biol 97:245–275
Article CAS Google Scholar
Rama Reddy NR, Mehta RH, Soni PH, Makasana J, Gajbhiye NA, Ponnuchamy M, Kumar J (2015) Next generation sequencing and transcriptome analysis predicts biosynthetic pathway of sennosides from Senna (Cassia angustifolia Vahl.), a non-model plant with potent laxative properties. PLoS One 10:e0129422
Article Google Scholar
Schilmiller AL, Schauvinhold I, Larson M, Xu R, Charbonneau AL, Schmidt A, Wilkerson C, Last RL, Pichersky E (2009) Monoterpenes in the glandular trichomes of tomato are synthesized from a neryl diphosphate precursor rather than geranyl diphosphate. Pro Natl Acad Sci USA 106:10865–10870
Article CAS Google Scholar
Tatusov RL, Galperin MY, Natale DA, Koonin EV (2000) The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28:33–36
Article CAS Google Scholar
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, Van Baren MJ, Salzberg LS, Wold BJ, Pachter L (2010) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol 28:511–515
Article CAS Google Scholar
Xu F, Ning Y, Zhang W, Liao Y, Li L, Cheng H, Cheng S (2014) An R2R3-MYB transcription factor as a negative regulator of the flavonoid biosynthesis pathway in Ginkgo biloba. Funct Integr Genomic 14:177–189
Article Google Scholar
Yan JP, Meng XX, Li Z, Zhang WW, Jie C, Xu F (2017) Molecular cloning and expression analysis of germacrene A synthase gene in Chamaemelum nobile. Chinese Tradit Herb Drugs 48:1851–1859
Google Scholar
Ye J, Fang L, Zheng H, Zhang Y, Chen J, Zhang Z, Wang J, Li S, Li R, Bolund L, Wang J (2006) WEGO: a web tool for plotting GO annotations. Nucleic Acids Res 34:293–297
Article Google Scholar
Yu F, Utsumi R (2009) Deversity, regulation, and genetic manipulation of plant mono-and sesquiterpenoid biosynthesis. Cell Mol Life Sci 66:3043–3052
Article CAS Google Scholar
Zenoni S, Delledonne M (2010) Characterization of transcriptional complexity during berry development in Vitis vinifera using RNA-SEq. Plant Physiol 152:1787–1795
Article CAS Google Scholar
Zhao L, Chang W, Xiao Y, Liu H, Liu P (2013) Methylerythritol phosphate pathway of isoprenoid biosynthesis. Annu Rev Biochem 82:497–530
Article CAS Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant no. 31400603).

Author information

Authors and Affiliations

College of Horticulture and Gardening, Yangtze University, Jingzhou, 434025, Hubei, China
Xiaomeng Liu, Jiabao Ye, Yongling Liao, Weiwei Zhang & Feng Xu
Enshi Autonomous Prefecture Academy of Agricultural Sciences, Enshi, 445000, Hubei, China
Xiaohui Wang
Research Institute for Special Plants, Chongqing University of Arts and Sciences, Yongchuan, Chongqing, 402160, China
Zexiong Chen
Hubei Collaborative Innovation Center of Targeted Antitumor Drug, Jingchu University of Technology, Jingmen, 448000, Hubei, China
Jie Chang
College of Chemical Engineering and Pharmacy, Jingchu University of Technology, Jingmen, 448000, Hubei, China
Jie Chang

Authors

Xiaomeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zexiong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiabao Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yongling Liao
View author publications
You can also search for this author in PubMed Google Scholar
Weiwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jie Chang
View author publications
You can also search for this author in PubMed Google Scholar
Feng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Feng Xu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Salim Al-Babili.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 346 KB)

Supplementary material 2 (DOCX 15 KB)

Supplementary material 3 (DOCX 14 KB)

Supplementary material 4 (XLSX 171 KB)

Supplementary material 5 (XLSX 478 KB)

Supplementary material 6 (XLSX 485 KB)

Supplementary material 7 (XLSX 727 KB)

Supplementary material 8 (XLSX 489 KB)

Supplementary material 9 (XLSX 546 KB)

Supplementary material 10 (XLSX 468 KB)

Supplementary material 11 (XLSX 10 KB)

Supplementary material 12 (XLSX 11 KB)

Supplementary material 13 (XLSX 12 KB)

Supplementary material 14 (XLSX 973 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, X., Wang, X., Chen, Z. et al. De novo assembly and comparative transcriptome analysis: novel insights into terpenoid biosynthesis in Chamaemelum nobile L.. Plant Cell Rep 38, 101–116 (2019). https://doi.org/10.1007/s00299-018-2352-z

Download citation

Received: 16 July 2018
Accepted: 07 November 2018
Published: 15 November 2018
Issue Date: January 2019
DOI: https://doi.org/10.1007/s00299-018-2352-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

De novo assembly and comparative transcriptome analysis: novel insights into terpenoid biosynthesis in Chamaemelum nobile L.

Abstract

Key message

Abstract

Similar content being viewed by others

Introduction

Materials and methods

Plant materials

Gas chromatography–mass spectrometry (GC–MS)

RNA extraction

Library construction, deep sequencing, and de novo assembly

Functional annotation and metabolic pathway analysis

Identification of differentially expressed genes (DEGs)

Simple sequence repeats (SSRs)

Co-expression analysis

qRT-PCR validation

Results and discussion

Composition and content of terpenoid in different organs of C. nobile

Sequencing analysis and de novo assembly

Nr annotation analysis

Functional classification

Identification of DEGs

DEGs in terpenoid pathway among four organs

Construction and analysis of DEG co-expression network for terpenoid synthesis in various organs of C. nobile

Putative terpenoid biosynthesis genes in different organs

Identification of unigene-derived microsatellite markers

Conclusion

Author contribution statement

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Electronic supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation