Keywords

1 Introduction

From several years, plant metabolomics has great importance due to its huge chemical diversity when compared to animals and microbes. The extensive chemical biodiversity of plants was explored to develop well-targeted therapeutic strategy for better protection, less side effects, and decreased relapse. These chemical constituents are small organic molecules with specific biological activity restricted to particular plant species, genus, or family and are termed as secondary metabolites which comprises the major physiologically active chemical constituents distributed to distinct taxonomic group, and many of which are applied to several prime valued chemicals, pharmaceuticals, flavors, and fragrances (Fabricant and Farnsworth 2001; Oksman-Caldentey and Saito 2005). Different from the function of primary metabolites in fundamental processes, like plant growth and development, secondary metabolites chiefly participates in mediating plant environment interactions, including the pollinator attraction and defense response to plant pathogens (Xiao et al. 2013). These specialized metabolites are distinguished by diverse chemical structures and can be grouped into five major categories depending on the nature of biosynthesis: polyketides, isoprenoids, alkaloids, phenylpropanoids, and flavonoids. Biosynthesis of polyketides is elucidated through the acetate-mevalonate pathway; the isoprenoids such as terpenoids and steroids are produced from a five-carbon precursor isopentenyl diphosphate (IPP), through mevalonate pathway or the novel MEP (non-mevalonate or Rohmer) pathway; the various amino acids present produce alkaloids, and a combination of phenylpropanoids and polyketides synthesizes the flavonoids (Robert Verpoorte 2000). Broadly, most categories possess thousands of known and novel compounds with more emerging discoveries and interpretations. Both primary and secondary metabolite production is initiated through complex, enzymatic, multistep biosynthetic pathways (Nicolaou and Chen 2011). Plant phenolics such as flavonoids and phenols are synthesized through various routes and thus comprise a heterogeneous group. The two fundamental pathways associated are the shikimic acid and the malonic acid pathways. The shikimic acid pathway participates in the biosynthesis of most plant phenolics, and the soluble carbohydrates serve as the basic factor in producing phenolic component. The conversion of simple carbohydrate precursors derived from glycolysis and pentose phosphate pathway to aromatic amino acids is mediated through shikimic acid pathway (Herrmann and Weaver 1999). However, the shikimic acid and acetate-malonate pathways are the prominent metabolic routes of polyphenolic synthesis in plants, and two precursors, acetate and phenylalanine, are required for flavonoid synthesis (Van Soest 1982; Jung and Fahey 1983). The potential application of diverse secondary metabolites in plant breeding technology was extensively studied (Wink 1988). Figure 10.1 outlines how metabolites from the process of photosynthesis, glycolysis, and Krebs cycle lead to synthesis of secondary metabolites (Ghasemzadeh and Jaafar 2011).

Fig. 10.1
figure 1

Elucidation of biosynthetic pathways synthesizing secondary metabolites (Adopted from Ghasemzadeh and Jaafar 2011)

2 Functions and Applications of Secondary Metabolites

Essential Oils

Essential oils also known as volatile oils are complex mixtures, constituted by terpenoid hydrocarbons, oxygenated terpenes, and sesquiterpenes responsible for their characteristic aroma in plants, and also serve as internal messengers Harrewijn et al. (2001). Due to their powerful fragrance, they are involved in attracting pollinating insects and act as defense substances or plant volatiles against diseases and predators (Evans and Mitch 1982). Location of essential oil varies in different parts with different plants, for example, they can be observed in leaves like in eucalyptus trees (Eucalyptus citriodora) and citronella (Cymbopogon nardus); in roots such as that of the vetiver grass (Vetiveria zizanioides); in stems such as peteribi wood (Cordia trichotoma) and incense, chinchilla (Tagetes minuta), and lemongrass (Cymbopogon citratus); in flowers like lavenders (Lavandula officinalis); and in fruits of citrus species (lemon, orange), and also they are available in seeds such as anise (Pimpinella anisum), coriander (Coriandrum sativum), and pepper (Piper nigrum), among others (Baser and Buchbauer 2015). The major representatives of essential oil involve α-pinene, β-pinene, limonene, α-ocimene, geraniol, anethole, germacrene D, α-terpenoil, γ-cadinene, δ-cadinene, myrcene, etc. in varying concentrations in different plants (Shexia Ma 2012) (Fig. 10.2).

Fig. 10.2
figure 2

Chemical structure of major essential oil components in plant species (Adopted from Shexia Ma 2012)

Several research findings showed the multipurpose application of essential oil in perfumery, food industry, sweets and beverage preparation, and aroma therapeutic products of plant origin (Bernáth and Fuleky 2009). The role of essential oil in pharmacological effects such as antibacterial, antifungal, and antiviral properties was reported (Böhme et al. 2014). Several reports demonstrated that plants, particularly diverse bioactive components of essential oils, protect crops from contamination by various mold species (Kitic et al. 2013).

Terpenoids

The terpenes, or isoprenoids, constitute one of the most diverse classes of metabolites, and are isoprene derivatives synthesized from acetate through mevolanate pathway. Steroids, gibberlic acid, and carotenoids represent few members of this class. Terpenoid biosynthesis proceeds through the condensation of isoprene units (C5) and is categorized by the number of five carbon units found in the core structure (Mahmoud and Croteau 2002). Biosynthesis of terpenes involves two pathways: the mevalonate pathway in the cytosol and the MEP pathway (2-C-methyl-D-erythritol-4-phosphate) in plastids. The major building blocks of terpene are an isoprene unit that is derived from isopentenyl diphosphate (IPP) and dimethylallyl diphosphate (DMAPP) (McGarvey and Croteau 1995). Terpenes include monoterpenes (constituents of the essential oils, floral scents, and defensive resins particularly in aromatic plants), sesquiterpenes (less volatile than monoterpenes), diterpenes which contribute large group of terpenoids with a wide range of biological activities (e.g., phytol, reduced form of geraniol), and sesterpenes (least common group of terpenoids) (Wang et al. 2005). Plants utilize terpenoid compounds for various purposes such as growth and development and predominantly more specific chemical interactions and plant response to abiotic and biotic environment stress. Usually, plant-derived terpenoids have been used in the food, pharmaceutical, and chemical industries, and shortly they have been explored in producing certain biofuel products. Besides, the ecological priority of terpenoid metabolites has achieved great attention in developing novel strategies for viable pest control and plant stress protection (Tholl 2015). Drugs derived from terpenoids such as artemisinin and its derivatives provided significant effects on human disease therapy and prevention, and enormous benefits to patients and for the pharmaceutical industries (Wang et al. 2005). Reports also showed the anticancer property of terpenoid compounds which promises to open new more opportunities for cancer therapy (Huang et al. 2012) (Table 10.1).

Table 10.1 Classification and application of terpenes according to number of isoprene units

Alkaloids

They can be defined as pharmacologically active, nitrogen-containing basic compounds of plant origin which blocks ion channels, obstructs enzymes, or hinders with neurotransmission and exert hallucinations, coordination loss, convulsions, vomiting, and death. Previous studies demonstrated that alkaloids do have major role in ecochemical functions either as in plant defense mechanism against invading pathogens and herbivores or, another case like pyrrolizidine alkaloids, as pro-toxins for insects, which precede additional modification of alkaloids and then integrate them to its own defense secretions (Hartmann 1991). The first alkaloid isolated is morphine from Papaver somniferum, quinine from Cinchona species, and coniine (poison hemlock) from Conium maculatum, and reserpine purified from Rauwolfia serpentina (the prominent reported true alkaloids). Usually alkaloid biosynthesis occurs from either of the common amino acids, specifically, aspartic acid, lysine, tyrosine, and tryptophan (Pearce et al. 1991) (Table 10.2).

Table 10.2 Source and pharmacological properties of major alkaloids

Flavonoids

Flavonoids represent low molecular weight (Heim et al. 2002; Fernández et al. 2006) bioactive polyphenols (Hollman and Katan 1999) which play pivotal role in photosynthesizing cells synthesized by phenylpropanoid pathway. They are huge class of polyphenolic compounds bearing a γ-benzopyrone structure and are ubiquitous in plants. Numerous studies demonstrated the key role of secondary metabolites particularly possessing phenolic nature such as flavonoids in wide variety of pharmacological activities (Pandey 2007; Mahomoodally et al. 2008). It has been shown that flavonoids and hydroxylated phenolic substances are produced by plants in response to pathogenic infection particularly to microbial infections (Dixon et al. 1983). Though flavonoids hold numerous biochemical properties, the frequently studied feature of most groups of flavonoids is its antioxidant property. The flavonoid-based antioxidant property depends on the arrangement of functional groups in the nuclear structure. The total number of hydroxyl groups, substitution, and configuration critically influences the mechanisms involved in the antioxidant activity like the ability to metal ion chelation and free-radical scavenging (Heim et al. 2002; Pandey et al. 2012). Flavonoids such as catechin, apigenin, quercetin, naringenin, rutin, and venoruton were illustrated for their potential hepatoprotective properties (Tapas et al. 2008). Various clinical findings showed the significance and safety of flavonoids in treating hepatobiliary dysfunction and digestive disorders including fullness sensation, loss of appetite, abdominal pain, and nausea. Equisetum arvense flavonoids, hirustrin, and avicularin obtained from different sources were observed to accord protection toward chemically prompted hepatotoxicity in HepG2 cells (Spencer et al. 2009; Kim et al. 2011). Flavonoid extracts from various plant species have been observed to exert antibacterial activity (Mishra et al. 2013). Other flavonoids such as flavones, apigenin, galangin, isoflavones and flavanones, flavonol glycosides, and chalcones were also reported to possess potent antibacterial activity (Cushnie and Lamb 2005). Flavonoids such as luteolin, apigenin, hesperidin, and quercetin were observed to exert anti-inflammatory and analgesic effects. Flavonoids act particularly on enzyme systems significantly associated with inflammatory processes, specifically tyrosine and serine-threonine protein kinases (Nishizuka 1988; Hunter 1995) (Table 10.3).

Table 10.3 Flavonoid classes and its dietary source

Phenolics

Phenolic compounds are plant secondary metabolites that constitute the largest group of substances synthesized through shikimate-phenylpropanoid pathway, deriving phenylpropanoids, or the polyketide and acetate-malonate pathway, which leads to produce simple phenols. They comprise a big reservoir of natural chemical diversity that holds a wide variety of compounds and enzymes and regulate the mechanisms of gene regulation and transport of metabolites and enzymes. Phenols exhibit antioxidant, anti-inflammatory, anticarcinogenic, and other biological properties and may provide protection to oxidative stress and some infectious diseases (Park et al. 2001). Two main phenolic compounds found in most plants are hydroxybenzoic and hydroxycinnamic acids.

Phenolic acids are present in plants as esters or glycosides attached with other natural compounds like flavonoids, alcohols, hydroxyfatty acids, sterols, and glucosides (Dai and Mumper 2010). The total phenolic compounds in tea, coffee, berries, and fruits were estimated up to 103 mg/100 g fresh weight (Manach et al. 2004).

Tannins

Tannins are phenolic compounds made up of diverse group of oligomers and polymers synthesized during shikimic acid pathway (phenylpropanoid pathway) and can be grouped into two major categories: hydrolysable tannins and condensed tannins (non-hydrolysable) (Athanasiadou et al. 2001; Chaichisemsari et al. 2011; Hassanpour et al. 2011; Maheri-Sis et al. 2011). Condensed tannins are the usual tannins found in forage legumes, trees, and stems (Barry and McNabb 1999) and widely distributed in legume pasture species including Lotus corniculatus and in some types of acacia and numerous other plant species (Degen et al. 1995). Tannins differ from other secondary metabolites in forming complex with proteins, starch, minerals, other basic and large molecular compounds, pigments, and metallic ions. While evaluating polyphenols, some tannin, ellagitannins, and their oxidized analogs, pentagalloyl glucose and EGCG, exhibited effective suppression of tumor initiation in second stage of two-stage chemical carcinogenesis (Ito et al. 1999). Cancer-preventing activity of tannins through inhibition was substantially validated on EGCG, disclosing its positive and potential effects in this area (Yoshizawa et al. 1987). Tannins also possess beneficial effects on protein metabolism in ruminants, lowering rumen degradation of dietary protein and enhancing amino acids absorption in the small intestine.

Glycosides

Glycosides can be phenol, alcohol, or sulfur compounds largely distributed in the plant kingdom. The sugar which can be observed in cyclic structure consists of a sugar part known as glycon and non sugar group called aglycon. The sugar part is covalently attached to the aglycon through hydroxyl group. The non-sugar part aglycon may be a terpene, flavonoid, or any other natural product. Many plants store chemicals in the form of inactive glycosides, which can be activated by enzyme hydrolysis (Polt 1995). Glycosides found major therapeutic applications such as anticancer (Zhou et al. 2013), expectorant (Fernández et al. 2006), sedative, and digestive activities (Galvano et al. 2004).

Saponins

Saponins are high molecular weight glycosides and contain sugar unit(s) attached to a triterpene or a steroid aglycone. Saponins mostly exhibit detergent properties and lower the surface tension of aqueous solutions and so provide stable foams when in contact with water. Saponins were reported to cause hemolysis, often with bitter taste, and may exert toxic effects to cold-blooded animals (Guçlu-Ustundag and Mazza 2007). Several plant-derived drugs and traditional medicines, particularly distributed in Asia, consist of saponins, and hence there is a prominent focus in characterization and elucidation of their pharmacological and biological properties (Hostettmann and Marston 2005). Saponins possess major application in physicochemical properties (emulsification, sweetness, and bitterness), biological properties (antimicrobial, antioxidant, insecticide, and ichthycide), food, cosmetics, pharmaceutical industries, and soil bioremediation (Kabera et al. 2014).

3 Bioresources Mining Through Gene Manipulation

Available genomic resources and developing tools in synthetic biology promote the metabolic engineering of prime valued secondary metabolites in plants. Exploring the molecular biology of complex multistep biosynthetic pathways may reveal more opportunities for developing plants bearing better disease resistance, altering the levels of human health-promoting compounds (nutraceuticals) in food crops, and introducing genetically modified plants with improved secondary metabolites for the production of pharmaceuticals (Dixon and Arntzen 1997). The molecular characterization of numerous classes of plant natural product biosynthetic enzymes, like terpene cyclases and polyketide synthase, leads in developing transgenic plants with the aid of specific plant gene cloning through PCR strategies or through EST database information (Dixon 1999). More recent development in transgenic technology toward potential pharmacological studies includes the application of transgenic organ cultures for achieving increased production and biotransformation of vital secondary metabolites. Induction of Ri plasmid of Agrobacterium rhizogenes in hairy roots has been demonstrated as an effective way in eliciting various secondary metabolites which are synthesized normally in differentiated plant roots (Lam et al. 1984). De novo biosynthesis and biotransformation of particular secondary products that are normally produced in leaves of plants can be achieved either through the shooty teratomas stimulated by the tumor forming Ti plasmid or through Agrobacterium tumefaciens shooty mutant (Escobar and Dandekar 2003). Another more precise means to alter secondary metabolite producing pathways is accorded through transferring and expressing significant modified genes to specific plant cells using vector systems such as Agrobacterium (Chung et al. 2006). Transcriptional control of metabolism within the cells in response to developmental and environmental signals is a key factor of plant metabolic regulations (Gaudinier et al. 2015). Genetic engineering has become a potential strategy for producing improved secondary metabolite production and developing pharmaceuticals such as interferon, growth hormones, and growth factors (Goddijn and Pen 1995). This approach can also be applied in obtaining rare compounds restricted to wild plants, which are difficult to culture, or in low concentration through introducing key genes to biosynthetic pathways of interest to cultivar plant species. Another application is the utilization of transcriptional regulation of metabolic pathways for increased production of secondary metabolites through overexpression of specific genes or hindering undesirable step of metabolic pathways (Giulietti and Ertola 1997). Genetic engineering of plant genomes has led to direct alteration of plant metabolism and the ability to manipulate the amount and nature of plant secondary metabolites of commercial value. Hence, plants are nowadays thought to be a potential factory for producing diverse useful products (Kishore and Somerville 1993; Ap Rees 1995). The major aim of generating transgenic plants utilizing these novel and powerful approaches modifies plant secondary metabolic pathways to increase low molecular weight compounds and polymers, enhanced resistance towards biotic and abiotic stresses, modified food quality (comprising altered levels in carbohydrate, protein and lipid concentration), and formation of polypeptides for pharmaceutical, medical, and industrial use (Ap Rees 1995; Herbers and Sonnewald 1999; Miflin 2000; Ye et al. 2000; Kumar 2001; Veronese et al. 2001; Rohini and Rao 2001; Lessard et al. 2002; Sharma et al. 2002). Recently, several plant derived products such as taxol (anticancer), vinblastine/vincristine (anticancer), artemisinin (antimalarial), reserpine (antihypertension), and quinine (antimalaria) have emerged as potent drugs for treating several human disorders. Genetic manipulation can be attributed through decreasing the production of either an unwanted compound or group of compounds in the secondary metabolic pathway (Verpoorte and Memelink 2002). Another investigation on modification of terpenoid biosynthesis in Arabidopsis has shown that a gene encoding 1-deoxy-D-xylulose-5-phosphate synthase (DXPS) enzyme has been continuously overexpressed in the initial enzymatic step of terpenoid metabolism. Enhanced gene expression leads to increased enzymatic activity and resulted in increased production of terpenoids in Arabidopsis (Estévez et al. 2001). Antioxidants such as flavonoids and anthocyanins are beneficial to human health and hence used as key targets for genetic manipulation. Based on this attempt on increasing flavonoid levels in tomato by overexpressing, the Petunia chalcone isomerase (CHI) gene showed a 78-fold increase of flavonoid levels in tomato fruits (Muir et al. 2001). Genetic improvement on alkaloid levels in Catharanthus roseus by overexpressing the genes encoding tryptophan decarboxylase (TDC) produced increased levels of tryptamine alone without changing the total alkaloid content (Verpoorte and Memelink 2002), while overexpression of STR (strictosidine synthase) resulted in increased total alkaloid contents (Canel et al. 1998). Whereas, the overexpression of a transcription regulatory factor ORCA3, regulating major steps in alkaloid metabolic pathway in C. roseus, cannot induce enhanced alkaloid production (van der Fits and Memelink 2000). A study was reported on anthocyanin biosynthesis in maize which is regulated through combinatorial action of two transcription factors R and C1. The whole flavonoid biosynthetic pathway has been induced through overexpressing of the transcription factors R and C1 in in vitro undifferentiated maize cell cultures (Grotewold et al. 1998). In rice activation of anthocyanin, biosynthesis was achieved through overexpressing the maize transcription factors C1 and R in combination with the chalcone synthase gene resulting in increased fungal resistance (Gandikota et al. 2001). In Arabidopsis the overexpression of a single MYB-type transcription factor (PAP1) resulted in plants bearing intense purple pigmentation during entire plant development (Borevitz et al. 2000). Based on these reports, it is evident that strict and specific genetic control of natural product accumulation during natural plant development can be altered by overexpression of either of the few transcription factors, even in heterologous plant species for better production and resistance (Table 10.4).

Table 10.4 Plant-derived secondary metabolites and their pharmaceutical properties

In some cases transcription factors inhibit the plants from accumulating natural compounds. In Arabidopsis inhibition of gene MYB4 resulted with increased sinapate ester levels in leaves and enhances the tolerance to UV-B irradiation (Jin et al. 2000). Likewise, the similar expression manipulation in strawberry led to reduced flower pigmentation and lower anthocyanin and flavonol levels, proving that in strawberry fruit MYB-type transcription factor acts as a repressor in particular steps of flavonoid metabolic pathway (Aharoni et al. 2001). The application of β-carotene biosynthesis into rice through overexpressing genes encoding phytoene synthase, phytoene desaturase, and lycopene β-cyclase is a major achievement of transcription-based metabolic engineering (Ye et al. 2000).

A study demonstrates the manipulation of entire secondary metabolite biosynthesis pathway in a heterologous plant species through overexpression of cyanogenic glucoside biosynthesis genes from Sorghum bicolor to Arabidopsis (Tattersall et al. 2001). Upon tissue damage cyanogenic glucoside dhurrin present in sorghum got hydrolyzed by a β-glucosidase, and the generating cyanide can act as a significant pest deterrent and insecticide. Dhurrin is synthesized from tyrosine catalyzed by two multifunctional cytochrome P450 enzymes (CYPs) in the presence of UDPG glucosyltransferase. Overexpression of the first enzyme cytochrome P450 in the Arabidopsis pathway resulted in the formation of p-hydroxybenzylglucosinolates, which are normally absent in this plant species (Bak et al. 1999; Petersen et al. 2001). Hence the overexpression of the Sorghum specific glucosyltransferase in the presence of two CYP genes induces dhurrin synthesis in Arabidopsis and resulted in dhurrin producing transgenic Arabidopsis which release increased levels of cyanide during tissue damage. As a result transgenic Arabidopsis leaf tissue was rejected by larvae of the flea beetle Phyllotreta nemorum, and it was observed that larvae feeding on transgenic Arabidopsis leaves got expired. This study effectively proved that enhanced levels of a foreign metabolite were induced in a plant species without affecting its growth and development and gained positive effects on resistance against invading pests. Another example suggests the effective expression of synthetic three gene clusters in Escherichia coli to produce plant flavanones such as pinocembrin and naringenin using amino acids phenylalanine and tyrosine (Hwang et al. 2003). Reports also suggest the successful manipulation of yeast with two plant specific genes to generate stilbene resveratrol acts as antimicrobial agent from fed 4-coumaric acid (Becker et al. 2003). E. coli strains were developed by the overexpression of artificial amorpha-4, 11-diene synthase gene and the mevalonate isoprenoid pathway from Saccharomyces cerevisiae to increase the production of artemisinin precursor amorphadiene (a sesquiterpene olefin). The strains developed in such a way can serve as good manifest hosts for the synthesis of all terpenoid compounds when a terpene synthase gene is available because isopentenyl and dimethylallyl pyrophosphates function as the universal precursors to all isoprenoid biosynthesis (Martin et al. 2003) (Table 10.5).

Table 10.5 List of some factors that can be utilized for enhanced secondary metabolite production

Several approaches can be taken into account such as knocking down an enzymatic step in the pathway through lowering the extent of corresponding mRNA via antisense, co-suppression or RNA interference (RNAi) technologies, or overexpression of an antibody toward the enzyme. It has been reported that essential oil in mint plants was improved by downregulating synthesis of the undesirable content methanofuran through overexpressing an antisense derivative of the methanofuran synthase gene (Mahmoud and Croteau 2001). In Papaver somniferum a terminal enzyme, codeinone reductase, involved in morphine biosynthesis was knocked out by RNAi targeting all members of the gene family and resulted in notable morphine and codeine level reduction. The genetically modified poppy latex was observed with a drastic change in the alkaloid pattern, with the accumulation of rare alkaloids (Allen et al. 2004). RNAi technology was applied to California poppy, (E. californica) by hindering berberine bridge enzyme (BBE) activity which results in the accumulation of (S)-reticuline, a major intermediate compound produced during isoquinoline alkaloid biosynthesis (Fujii et al. 2007). In transgenic Panax ginseng, the gene which encodes for dammarenediol synthase (DDS) was silenced to lower the DDS expression responsible for ginsenoside production to 84.5% in roots using pK7GWIWG2 vector through Agrobacterium tumefaciensbased genetic transformation. From the study it has been inferred that the DDS expression has major role in synthesizing ginsenosides in P. ginseng and showed that no evident morphological changes were observed in DDS-RNAi transgenic plants with respect to its wild variety (Han et al. 2006). Till now RNAi has been utilized as a rapid reverse genetic tool for generating valuable crops and medicinal plants bearing new chemical phenotypes and also to understand the genes coding for synthesizing several pharmacologically relevant secondary metabolites. The advent of more RNAi techniques for genome-wide screening may aid rapid gene identification involved in novel compound production and may facilitate the development of specific genes to explore them for commercially valuable plant derived products such as drugs, flavoring agents, etc (Table 10.6).

Table 10.6 RNAi-mediated regulations reported in some plants

In Western countries plant derived drugs gained huge market price. Some illustrations include the significance of Madagascar periwinkle (Catharanthus roseus) which derived two drugs vinblastine and vincristine (Mukherjee et al. 2001), anticancer drugs paclitaxel (Taxol), analgesic drug morphine, podophyllotoxin and camptothecin (Mukherjee et al. 2001), or semisynthetic drugs of steroid nature hormones obtained from diosgenin (Robert Verpoorte 2000). Genetic engineering of the lignin biosynthesis pathway reached potential interest due to the selection of model plants in exploring metabolic pathways and also due to its biomass content, which is indirectly related to its forage digestibility and quality requirement in pulping industry (Eudes et al. 2014). Findings strongly suggest that removal of lignin from developing plants without affecting its development is crucial. Attempts failed in downregulating particular genes coding for enzymes in the lignin biosynthetic pathway in plant species such as poplar, maize, pine, and switch grass using natural mutants or gene silencing (RNAi) techniques. However some reports also showed restricted genetic modifications are successful in moderate reduction of lignin content, biomass content modification, and modestly increasing saccharification efficiency, forage digestibility, and pulping yield (Li et al. 2008). Genetically modified plants also have been utilized for the developing antibodies against dental caries, rheumatoid arthritis, malaria, viral related cancers, cholera, diarrhea, HIV, rhinovirus, influenza viruses, hepatitis B virus, herpes simplex virus, etc. (Thomas et al. 2002). Recently it has become clear that chemical diversity of plants has great potential than any other human made chemical library, and thus the plant kingdom holds extensive resource of pharmacologically valuable compounds which are yet to discover. The major challenge observed in plant metabolic profiling is its complexity and diversity of the chemical compounds (Oksman-Caldentey et al. 2004). The diverse chemical properties exhibited by each valuable compound restrict the analytical tools while proceeding with several secondary metabolites in a parallel manner (Trethewey 2004).

4 Transcriptome Sequencing for Exploring Secondary Metabolite Pathways

Transcriptomic data allows substantial opening for the identification of novel genes and assessing collection of ESTs (expressed sequence tags) present in the sample, which can be utilized for molecular marker development, particularly for non model organisms bearing no reference genome (Wang et al. 2009). The huge sequence data produced by transcriptome sequencing also gained a prominent success in illustrating metabolic pathways, global gene expression, and differential expression analysis. Due to the absence of immense genomic data for most of the economically significant plants, traditionally using microarray based transcriptome analysis in these plant systems becomes inconvenient because of the requirement of large EST or cDNA clone assemblies. The first generation Sanger technology or chain termination method based sequencing of model crops was successful; its throughput and huge cost exerted few limitations to sequence more plant species, particularly studies on complex genomes; and this motivated increased demand for novel and modified sequencing technologies. Besides, numerous non model plants are crucial assets for energy resource bearing unique characteristics to them, and thus use of a model plant to study them becomes a need (Carpentier et al. 2008). Consequently, genomics in non model species raised some challenges until the excellent progress made by second generation sequencing platforms with high throughput and comparatively low sequencing cost generally termed as NGS technologies. Hence with the advent of next generation sequencing (NGS) technology, RNA sequencing (RNA-seq) has been extensively applied to plants limiting complete genomic information accord the transcriptional evidence for various downstream applications (Duan et al. 2012). RNA-seq or transcriptome sequencing with the aid of several NGS platforms is usually less costly and facilitates de novo assembly due to its smaller size (10–100 times) when compared to genomes (Gayral et al. 2013). RNA-seq can be utilized for studies on population genetics (Neil et al. 2010; Renaut et al. 2012), phylogenetic research (Chiari et al. 2012; Timme et al. 2012), and molecular adaptation (Elmer et al. 2010; Künstner et al. 2010; Gayral et al. 2013). Expression level differences among tissues, genotypes, and different population help in understanding functional and evolutionary relationships (Wolf et al. 2010; Gayral et al. 2013; Nadiya et al. 2017).

The first and foremost factor before starting RNA-seq experiment relies in selection of a particular NGS platform because the data generated from different RNA sequencing platforms varies, and this variation can influence the experimental interpretations. Procedures for sample preparation change in each platform, and hence selection of proper platform with respect to the downstream application is a prerequisite to obtain experimental triumph. Different NGS platforms are available commercially, and most of them are under active development (Metzger et al. 2011). Many are based on sequencing by synthesis technology, using DNA polymerase or ligase as the major component. Roche 454, Illumina, Helicos, and PacBio (Pacific Biosciences) initiate their sequencing reaction with DNA polymerase, while SOLiD (Life Technologies) and Complete Genomics utilize DNA ligase. The sequencing platforms can additionally be classified as either single molecule based (sequencing a single molecule, such as Helicos and PacBio) or ensemble based (sequencing of multiple identical copies of a DNA molecule, such as Illumina and SOLiD). Recently, the NGS platform Illumina dominated the sequencing industry and becomes the leading company in the market which utilizes the sequencing by synthesis approach adapting fluorescently labeled reversible terminator nucleotides toward clonally amplified cDNA templates immobilized into an acrylamide coating on the surface of a glass flow cell (Bentley et al. 2008). The launching of Illumina Genome Analyzer and recent HiSeq 2000 has set the standard for massively parallel sequencing in a high throughput way, and in 2011 Illumina launched the MiSeq suitable for smaller laboratories and clinical diagnostics suggesting lower-throughput fast turnaround instrument. Another two new NGS platforms which revolutionized the genomic sequencing world include the Ion Torrent Personal Genome Machine (PGM) and the Pacific Biosciences (PacBio). The Ion Torrent PGM utilizes a different semiconductor technology which detects the protons released for each nucleotide incorporated during synthesis (Rothberg et al. 2011), and PacBio was developed with a process based on single-molecule real-time (SMRT) sequencing (Eid et al. 2009). Transcriptome sequencing can be preceded with single-end reading or paired-end reading. In single end method, the sequencer reads a fragment from only one end to the other, and in paired end reading, it starts at one read, finishes this direction at the specified read length, and then starts the second round of reading from the opposite end of the fragment. Paired end reads are expensive and time consuming to execute than single end reads. After generating the reads, transcriptome assembly is obligatory to convert individual reads into complete mRNA sequences or individual transcripts. The longer the individual reads, the simpler it is to assemble transcripts unambiguously, but recent leading NGS platforms usually produce short reads and should be assembled into contigs. Analysis of massive amount of RNA-seq data generated during experiments seems to be challenging and time consuming. For instance, Hiseq 2000 (Illumina) generates up to 200 million 100-nt reads (approximately 50 GB) of data in one sequencer run per lane. It is mandatory to analyze the data and not only detect transcriptome similarities but also for contig assembly, differential expression, and quantification for making findings to biological meaning (Chu and Corey 2012). For a plant species lacking good quality, reference genome requires de novo assembly, which is absolutely essential for downstream RNA-seq analyses to obtain an accurate overview of the transcriptome (Duan et al. 2012). For a model plant species for which genome information is available, the transcriptome assembly can be performed upon the reference genome, and the ultimate success of reference based assemblers depends on the quality of the reference genome being used. For a plant species without a well characterized reference genome, de novo transcriptome assembly should be performed for further analysis. Several de novo transcriptome assemblers have been developed such as SOAPdenovo-Trans (Xie et al. 2014), ABySs (Robertson et al. 2010), and Velvet (Zerbino and Birney 2008) followed by Oases (Schulz et al. 2012), Trinity (Haas et al. 2013), and MIRA (Mimicking Intelligent Read Assembly (Chevreux et al. 2004). RNA-seq has been utilized for hundreds of non model plants (Johnson et al. 2012; Schliesky et al. 2012). But still more extensive coverage for selected plant species is needed for better characterization of biosynthetic pathways of particularly important specialized metabolites. Hence, the PhytoMetaSyn Project (www.phytometasyn.ca) aimed at 75 non model plants which could produce terpenoids, alkaloids, and polyketides (Facchini et al. 2012), and among those six subgroups are mainly targeted such as sesquiterpenes, diterpenes, triterpenes, monoterpenoid indole alkaloids, benzylisoquinoline alkaloids, and polyketides for the identification of novel biosynthetic genes responsible for the diversity of specialized compounds present in the 75 species (Table 10.7).

Table 10.7 List of 75 plant species included in PhytoMetaSyn Project for transcriptome analysis using NGS

Using NGS based transcriptome sequencing, several genetic and genomic studies related to the molecular mechanisms associated with the diverse chemical composition of commercially and pharmacologically relevant plant species were performed. A study demonstrated large scale unigene identification of Polygala tenuifolia was performed by Illumina sequencing and showed that several transcripts were involved in the biosynthesis of triterpene, saponins, and phenylpropanoid. The study also laid foundation to modify strategies to synthesize active compounds through marker assisted breeding or genetic engineering for P. tenuifolia and also inferred that the genes involved in the enrichment of secondary metabolite biosynthetic pathways could increase the prominent applications of P. tenuifolia in pharmaceutical industries (Tian et al. 2015). Transcriptome based study on pear plants revealed the identification of major genes expressed under salt stress to salt was significantly involved in fundamental biological processes, secondary metabolite biosynthetic pathways, and signal transduction mechanisms (Xu et al. 2015). The genus Panax (Xiao et al. 2013) possesses about nine species generally called ginsengs and reported to have antidiabetic, anticancer, anti-inflammatory and immunomodulatory, and anti-allergic compounds. Consequently, using different NGS platforms, various candidate genes responsible for secondary metabolite biosynthesis were identified from Panax species (Sun et al. 2010; Luo et al. 2011; Li et al. 2013; Jayakodi et al. 2014). Another report suggests that Podophyllum species possess podophyllotoxin and aryl tetralyn lignan which are extensively used in partial synthesis of anticancer drugs, and the biosynthetic pathway and the putative genes responsible for the release of these two compounds were identified through NGS based metabolomic analysis through various bioinformatics tools (Marques et al. 2014). Transcriptome sequencing using Illumina MiSeq platform of young and mature leaf tissue of Cassia angustifolia identified genes involved in various secondary metabolite pathways including terpenoids, sennosides, and polyketides metabolism (Reddy et al. 2015). A study illustrated the preparation of a high quality EST database from Glycyrrhiza uralensis using 454 GS FLX platform, and based on the developed ESTs, novel putative candidate genes related to glycyrrhizin secondary metabolite biosynthetic pathway and novel genes for cytochrome P450s and glycosyltransferases were identified. The study also inferred that with the aid of organ specific expression pattern, three unigenes were identified to be responsible for cytochrome P450s and six unigene codes for glycosyltransferases and were the most probable putative genes involved in glycyrrhizin biosynthetic pathway (Li et al. 2010). The medicinal plant, Withania somnifera, due to the presence of the bioactive molecules withanolides, is widely used in Ayurvedic and other native medical treatments. To understand the basic molecular mechanism of withanolide biosynthetic pathway, transcriptome sequencing of Withania leaf and root was conducted which particularly produce withaferin A and withanolide A, respectively. Transcript annotation, gene ontology, and KEGG analyses elucidated a comprehensive view of enzymes associated with withanolide production. The study also identified members of cytochrome P450, glycosyltransferase, and methyltransferase gene families which are differentially expressed in leaf and root suggesting the presence of tissue specific withanolides. Thus the transcriptome sequence data developed for Withania may found new ways to elucidate complete biosynthetic pathway of tissue-specific secondary metabolites in non model plant and to introduce modifications for increased biosynthesis of withanolides via various emerging biotechnological approaches (Gupta et al. 2013). Transcriptome data of Calotropis gigantea (Sodom apple) species, a significant medicinal shrub and well known fiber resource, identified abundant gene transcript resources for evaluating the molecular characteristics of fiber and secondary metabolite biosynthetic pathways. The putative fiber responsible genes were elucidated and were experimentally validated through real-time PCR techniques (Muriira et al. 2015). Curcuma longa L., most widely used herbal medicine and important spice, is a rich source of biologically active compound curcumin. Exploring the basic molecular mechanism underlying curcumin biosynthesis may aid in modification of curcumin content and keeping the growth stability in different ways of cultivation. A study was proposed to investigate the candidate genes responsible for curcuminoid biosynthesis by utilizing de novo transcriptome assembly of rhizome transcriptome of C. longa and C. aromatica. In the study, differential expression analysis of two new polyketide synthase genes (clpks1 and clpks2) showed increased expression in C. longa compared to C. aromatica and inferred the major roles of these transcripts in curcuminoid biosynthesis. The study also provided useful data for altering the curcumin biosynthetic pathway in curcuma and its related species for developing new turmeric traits (Sheeja et al. 2015). Using RNA-seq technology, Hedychium coronarium transcriptome provided an important resource for functional genomic studies. The study showed the existence of a number of candidate scent-related genes such as flower-specific HcDXS2A, HcGPPS, and HcTPSs which perform major role in regulating the biosynthetic pathway synthesizing floral volatile terpenes. The results of the study also suggested that through uncovering the molecular mechanism of floral scent formation and pathway regulation in H. coronarium, various breeding techniques and genetic manipulations can be developed to generate scent associated traits of Hedychium with higher commercial values (Yue et al. 2015).

Another study based on the anticancer, antioxidant, and antimalarial properties of curcumin utilized the transcriptome sequencing of rhizome of three C. longa varieties and de novo transcriptome assembly. The study elucidated terpenoid biosynthesis pathway, other secondary metabolite pathways, and genes associated with biosynthetic pathways of several anticancer compounds (taxol, curcumin, and vinblastine), antimalarial compounds (artemisinin), acridone alkaloids, and other prominent metabolites such as sesquiterpenes like capsidiol, gossypol, phaseic acid, bergamotene, germacrene, and farnesene. The assembled data of this significant phytochemically valid herb provide information to develop fast-growing cultivars with increased terpenoid profiles such as anticancer, antimalarial, and antioxidant properties (Annadurai et al. 2013). A study was reported to determine the varying status and quality of oil accumulation in different avocado cultivar fruits, through transcriptome sequencing and analysis. Transcriptomic data gained information on molecular genetics and functional genomics and helps in the identification of the pathways and genes associated with the release of a diverse essential nutrients and beneficial phytochemicals. Hence the study represented a detailed illustration of transcriptomic variations observed during ripening of the Mexican avocado fruit which allows an effectual view of genes associated with fatty acid biosynthesis and the fruit ripening process (Ibarra-Laclette et al. 2015). Boesenbergia rotunda, a food ingredient and medicinal plant, is a rich source of panduratin A which is a flavonoid having a wide range of medicinal properties such as anticancer, anti-dengue, anti-HIV, anti-inflammatory, and antioxidant properties. Transcriptome sequencing along with digital gene expression profiling of native and phenylalanine treated B. rotunda identified differentially expressed genes involved in the panduratin A biosynthetic pathway. From this study it has been shown that several genes were upregulated and some were downregulated which includes upregulation of two phenylalanine ammonia-lyase (PAL), three 4-coumaroyl-coenzyme A ligase (4CL), and one chalcone synthase (CHS) which plays significant role in the phenylpropanoid pathway results in the synthesis of panduratin A (Md-Mustafa et al. 2014).

Uncaria rhynchophylla plant produces two terpene indole alkaloids having great medical importance, namely, rhynchophylline (RIN) and isorhynchophylline (IRN), which hinder and destabilize the formation of pathological indicator of Alzheimer’s disease (amyloid β-protein). NGS based transcriptome sequencing de novo assembly and differential expression analysis identified candidate genes coding for enzymes involved in the biosynthesis of reliable secondary metabolites. Cytochrome P450, methyltransferase, and isomerase were identified as putative genes involved in late biosynthesis of RIN and IRN. Hence the transcriptome data obtained from this study serves as an important source for understanding the way of altering specific bioactive compounds from the extracts of Uncaria (Guo et al. 2014). Centella asiatica (L.) is a medicinal herb used widely in traditional therapeutic systems particularly in Ayurveda. The plant exhibits memory enhancer, antiaging, antipyretic, diuretic, and wound healing properties (Mangas et al. 2009) and is also found to have a good role in healing varicose veins, ulcer, lupus, eczemas, and mental retardation (Brinkhaus et al. 2000; James and Dubery 2009). Phytochemical analysis suggests that leaves of C. asiatica possess diverse secondary metabolites including triterpenoids, volatile monoterpenes and sesquiterpenes, flavonoids, and alkaloids (Suntornsuk and Anurukvorakun 2005; Zainol et al. 2003; Zhu et al. 1997). Based on these observations, an attempt was made to develop a transcriptome assembly resource of C. asiatica using NGS based transcriptome sequencing and de novo assembly of pooled samples of C. asiatica leaves. Different secondary compounds including isoprenoids, genes responsible for various primary and secondary metabolites, and several cellular and molecular functions were characterized in the study. Hence the information provided by the study such as phytochemicals, responsible genes, and its metabolic pathways can be utilized in various biotechnological manipulations for the enhanced production of metabolites (Sangwan et al. 2013).

High throughput sequencing technologies become an exceptionally significant source for the generation of reference genome sequence in non model plants. Hence transcriptome data is being used to characterize candidate genes and networks associated with diverse secondary metabolite production in plants. Tea plants (Camellia sinensis) possess healthy nutritional properties due to the presence of chemical constituents such as polyphenols (chiefly catechins), theanine, and caffeine. In a study on demonstrating secondary metabolite in tea when compared to oil tea, it was observed that major secondary metabolites including theanine were lower in oil tea when compared with tea. The genes coding for key enzymes responsible for regulation of these pathways were comparatively highly expressed in tea (Tai et al. 2015). The differential expression of tea and oil tea revealed the status of secondary metabolite pattern lies between tea and oil tea which helps in describing the molecular information leading to the biosynthesis of specific metabolites in tea, and this illustration can be utilized for various gene manipulation techniques for the development of novel high-quality breeds. Roots of Euphorbia fischeriana contain 12-deoxyphorbol-13-acetate (prostratin, a phorbol ester of tigliane diterpene). Prostratin was observed as a protein kinase C activator which can be used in human immunodeficiency virus (HIV) treatments. Transcriptome sequencing and de novo assembly of root transcriptome of the E. fischeriana identified 26 unigenes encoding enzymes responsible for different biosynthetic pathways including casbene biosynthesis pathway, which produces prostratin precursor. The study also revealed the increased expression of ent-kaurene oxidase and tRNA dimethylallyltransferase enzymes initiates the production of kaurenol and cis-zeatin-O-glucoside required for the casbene biosynthesis. The transcriptomic resources developed from this study may facilitate more functional studies to enhance prostratin production and other phorbol esters of interest for improving HIV research and therapeutics (Barrero et al. 2011).

A study demonstrated the exploration and characterization of temporal and spatial transcriptome, miRNA, and mRNA expression analysis in developing bamboo culms revealed the molecular mechanisms underlying the process of sequentially elongated internodes from the base to the top. Various significant pathways including environmental adaptation, signal transduction, translation, transport and other metabolism, and gene annotations like cell growth, hormone mediated signaling, protein modification, primary shoot apical meristem specification, xylem and phloem pattern formation, response to stimuli, metabolic process, and biological regulation were observed to be responsible for rapid generation culms in bamboo. Furthermore the combined analysis of transcriptome, miRNAs, and posttranslational and proteomic studies represents the overall characterization and other molecular insights, particularly the complex phenomenon during the rapid generation of culms in moso bamboo (He et al. 2013). Transcriptome sequencing and de novo assembly of Suaeda fruticosa, a non-conventional crop, provided information on differentially expressed and unique genes which were categorized using gene ontology terms and their corresponding pathways. The complex genetic mechanism of the plant was revealed through analysis of predicted genes by providing comprehensive information on mechanism of salt tolerance, novel genes discovery, association, and comparison of differential expression profiles without salt and normal salt concentration (Diray-Arce et al. 2015). Nicotiana benthamiana, a model plant, has been extensively used in gene expression studies, and exploring the genomic resources might enhance further studies on developmental, metabolic, and defense pathways associated with N. benthamiana and in understanding the molecular mechanisms underlying in such a frequently used model plant.

A study on transcriptome data generated from nine different tissues of N. benthamiana revealed putative genes, RNAi-associated pathways, genome coding capacity, high level transient transgene expression, and susceptibility to virus infections (Nakasugi et al. 2013). This study may provide information on developing viral resistant plants through RNA silencing pathway exploration. A study on comparison of ginger (Zingiber officinale Rosc.) and mango ginger (Curcuma amada Roxb.), toward bacterial wilt infection, elucidated the overexpression of genes associated with MEP (mevalonate) pathway for the terpene/isoprene biosynthesis in C. amada when compared with Z. officinale. The severalfold upregulation of MEP pathway-regulated genes in mango ginger was observed due to its resistance to R. solanacearum (bacterial wilt) through the secretion of phenolic compounds and terpenoids, and other highly expressed transcripts were annotated as genes associated with pathogen recognition, biotic and abiotic stress resistance genes, transcription factors, and signaling. Transcripts that were observed as differentially expressed in C. amada were associated with plant defense response to pathogens, i.e., genes responsible for bacterial defense response, oxidative stress, introduction of a physical barrier to inhibit pathogen progression, and systemic resistance. Finally, genes observed to be participating in mango ginger resistance against bacterium codes for proteins associated with lignin synthesis including cytochrome P450, succinyl-CoA ligase, and S-adenosylmethionine synthase and suggested that the lignin accumulation during the interaction between C. amada and bacterium serves as a physical barrier for the pathogen invasion. The study also reported the overexpression of some other resistance genes involved in phenylpropanoid biosynthetic pathway in C. amada (Prasath et al. 2014). Using 454 pyrosequencing, the leaf and root transcriptome of Avena barbata (wild oat) identified root-specific genes involved in secondary metabolic pathways such as isoflavone 7-O-methyltransferase, cytochrome P450, protein catabolism including aspartic proteinase nepenthesin-2, vignain cystein endopeptidase, and serine carboxypeptidase. Since the wild variety of this crop was subjected to allelic change for desirable agronomic traits, Avena barbata has been the subject to investigations to explore more genetic mechanisms for adaptation in adverse environmental conditions (Swarbreck et al. 2011).

Several biosynthetic pathways were now successfully elucidated using NGS based transcriptome sequencing which illustrated several different plant specialized metabolic pathways in numerous plants such as the study on synthesis of mild sedative valerenic acid by the valerian plant, Valeriana officinalis, production of natural sweetener hernandulcin from Lippia dulcis, and the synthesis of anticancer drug thapsigargin from Thapsia garganica (Pickel et al. 2012; Pyle et al. 2012). The major genes responsible for the production of pharmacologically relevant monoterpenoid indole alkaloid through MIA pathway were elucidated in Catharanthus roseus, Tabernaemontana elegans, and Amsonia hubrichtii using transcriptome sequencing. The transcripts from C. roseus, A. hubrichtii, and T. elegans were annotated as geraniol-10- hydroxylase, 10-hydroxygeraniol oxidoreductase, loganic acid O-methyltransferase, secologanin synthase, tryptophan decarboxylase, and strictosidine synthase, while genes responsible for 16-methoxy-2,3-dihydro-3-hydroxytabersonine N-methyltransferase, desacetoxyvindoline 4-hydroxylase, and deacetylvindoline acetyltransferase required for the biosynthesis of vinodiline were exclusively observed in Catharanthus supporting the evidences for the unique origin of vindoline in the Catharanthus genus (Xiao et al. 2013). Hypericum perforatum contains polyketides hypericin and hyperforin used for depression treatments. Polyketides are another group of structurally diverse and biologically active metabolites.

The biosynthetic pathway of prenylated acylphloroglucinol hypericin, usually concentrated in leaf and flower glands, was elucidated partially (Karppinen and Hohtola 2008; Karppinen et al. 2008). The study revealed that the acylphloroglucinol component of hyperforin is catalyzed by polyketide synthase which condenses isobutyryl-CoA using three molecules of malonyl-CoA to generate phlorisobutyrophenone (PIBP) which is then prenylated in the presence of dimethylallyl diphosphate (DMAPP) which acts as the donor and uses geranyl diphosphate (GPP) to produce hyperforin. Due to the high content of flavonoids in blackberries, the transcriptome data may provide significant novel resource for biosynthetic pathway based molecular research. A study on blackberry transcriptome data identified putative genes encoding enzymes catalyzing flavonoid biosynthetic pathways, genes related to other metabolic processes, and transcripts responsible for RNA virus resistance genes and fungal and bacterial pathogens (Garcia-Seco et al. 2015). The study may provide more insights on gene discovery to develop genomic tools for breeding of improved varieties with enhanced health benefits. Banana (Musa accuminata), an economically important fruit crop, usually undergoes ethylene-induced ripening which causes heavy postharvest losses due to fruit over ripening to farmers and consumers. The fruit ripening mechanism includes diverse physiological and biochemical changes, and several genes representing various metabolic pathways result in ripe and edible fruit. For uncovering the molecular mechanisms underlying in banana ripening, a study was demonstrated through sequencing the transcriptome of ripe and unripe stages of banana fruit pulp and detected genes responsible for the ripening mechanisms such as softening and synthesis of aroma volatiles.

Transcripts were annotated as genes encoding acyltransferases from the transcriptome data which are involved in the production of aromatic volatiles and flavor components. Additionally, the present study also reveals the significance and role of expansions, PL, and XTH in fruit softening, genes responsible for cell wall degradation, and more differentially expressed novel genes which could play a major role in banana ripening may serve in the future as a good candidate for emerging gene manipulation methods in banana fruit ripening that could reduce postharvest loss (Asif et al. 2014). To demonstrate the expression profile during chickpea flower development, RNA-seq analysis was performed, and the transcriptome data generated identified differentially expressed transcripts during shoot apical meristem, floral development, and various other metabolic processes. The study also provided information on molecular mechanisms, regulatory networks, and particular metabolic pathway responsible genes during the developmental stages in the legume, chick pea (Singh and Jain 2014). A comprehensive transcriptome data set was developed for sugarcane induced by S. scitamineum as a good resource for exploring the molecular mechanisms associated with sugarcane responses to S. scitamineum. The data analysis identified differentially expressed genes associated with plant-pathogen interaction, hormone signal transduction, phenylalanine metabolism, flavonoid biosynthesis, phenylpropanoid biosynthesis, and other pathogenic response related metabolic pathways (Que et al. 2014). Saponins, the major amphipathic glycosides, exert several health benefits from their biological and medicinal properties (anticancer and antioxidant properties). Roots of Asparagus racemosus contain steroids responsible for the synthesis of saponins which led to the comparison studies utilizing high throughput transcriptome sequencing and de novo assembly of root and leaf tissues. The study identified novel transcripts involved in saponin biosynthetic pathway and unique expression of genes encoding enzymes associated with the MVA pathway for triterpene biosynthesis in root when compared with A. racemosus leaf transcriptome, and the data provided may serve as a source for functional characterization at biochemical, cellular, and molecular level to manipulate saponin biosynthetic pathways (Upadhyay et al. 2014). Transcriptome analyses also help in studies related to gene discovery, transcript quantification, molecular marker development, small RNA profiling, and negative gene regulation; thereby sequencing strategies dominate EST-based microarray experiments which were comparatively tedious. Transcriptome sequencing and de novo assembly of wild and five cultivar varieties of Elettaria cardamomum Maton (small cardamom) revealed differentially and uniquely expressed putative genes involved in various secondary metabolite synthesis pathways including terpenoid and flavonoid biosynthesis.

Challenges in Transcriptome Sequencing

Although transcriptome sequencing has shown significant advances on molecular basis of the discovery of several putative genes, data analysis exerts several restrictions and challenges such as generation and de novo assembly of large number of short reads, gene annotation of big data, and low transcript abundance especially in non model organisms such as medicinal and phytochemically important plants. The first and foremost challenge in plant transcriptome sequencing is the requirement of high quality RNA which is very difficult to isolate in required quantities good for library construction before sequencing. Another challenging factor is drawbacks of reference based transcriptome assembly which mainly depends on the quality of preferred reference genome. Most of the cases the alignment provides hundreds to thousands of misassemblies and genomic deletions (Salzberg and Yorke 2005) which lead to incomplete transcriptome assembly.

In some cases it is reliable to depend on the reference of a closely related species such as the use of strawberry reference genome for raspberry transcriptome assembly (Ward and Weber 2011), but still divergent genomic regions could be missed in such cases. Finally, reference based assembly may also induce trans spliced gene misassemblies where pre-mRNAs are spliced to a single mature mRNA which is crucial for obtaining information on metabolic pathways. Reference based assembly can generate comprehensive transcriptome profiling in case a high quality reference genome is available. The biological sequences appear very complicated, and hence error rates might be possible up to 1±4% per nucleotide, which result in mismatches (Claros et al. 2012). For instance, it has been reported that the Illumina sequencing generates sequence specific miscalls, GC-biased errors (Nakamura et al. 2011), and more substitution-type miscalls than indel-type miscalls (Hoffmann et al. 2009), while 454 pyrosequencing releases more indel-type miscalls than substitution-type due to its homopolymer length inaccuracy concerns (Gilles et al. 2011). The assemblers designed for Sanger reads were observed to be not suitable for NGS data, and in response new assemblers with more sophisticated approaches were developed (Imelfort and Edwards 2009). The recently introduced assemblers require servers or clusters with >500 GB of RAM and several terabytes of disk space. Though low-cost servers, supercomputing centers, and the emerging cloud computing solved most the requirements, upcoming sequencing projects including loblolly pine (Neale et al. 2004) and maritime pine (Díaz-Sala and Cervera 2011), with 22±30 Gbp genomes, challenge the data analysis through more required computational demands. Another challenge is the specificity of the currently available de novo assemblers toward particular sequencing platforms, for example, Trinity de novo assembler (Haas et al. 2013) strongly supports Illumina paired-end reads for assembly, and MIRA assembler (Chevreux et al. 2004) is designed with error corrections of short reads generated from 454 pyrosequencing and ion torrent reads, although the assemblers work with almost all platforms but may exert some errors such as misassemblies or incomplete assemblies which hinders the downstream applications. Performing de novo assembly in eukaryotic transcriptomes is much more challenging, not only due to its huge data but also due to the presence of alternatively spliced variants (Kumar and Khurana 2014). Generally, de novo transcriptome assembly needs greater sequencing depth for full-length transcript assembly than with reference-guided assembly strategy. Moreover, de novo transcriptome assemblers are prone to sequencing errors, low transcript abundance, and chimeric molecules generally observed in data set of non-model plants.

5 Conclusions

Plants are important source of diverse specialized metabolites, most of which play significant roles in pharmaceuticals, flavors, fragrances, and other industrial values. Commercial value of some secondary metabolites gained great importance in its production and practical possibilities to enhance the production through biotechnological approaches. The major focus of the present review is the application of next-generation-based transcriptome sequencing toward the identification of putative genes involved in the synthesis of diverse secondary compounds present in varying concentrations in different plant species. But most of these secondary metabolites are extensively located in non model plants without a well characterized or well assembled reference genome sequence. However, generation of huge amount of sequence data through NGS technologies becomes rapid and relatively low cost particularly when dealing with 454 pyrosequencing, ion personal genome machine, and latest Illumina sequencers with better modified base calling efficiency. Hence the pattern of specialized secondary metabolite biosynthetic pathways and putative candidate genes involved in the synthesis can be studied through data mining framework utilizing NGS technology combined with various computational algorithms. Several studies reported the exploration of these bioresources through sequencing technologies to alter biosynthetic pathways in plant systems for enhanced production of existing phytochemicals and to design a biologically active product library which can be screened for new drug applications. Hence the advent of transcriptome sequencing adds the way to pathway discovery along with gene expression studies, functional annotation information, and plant genetic engineering to explore gene functions in various plant species which provides the most promising approach on various aspects of plant biodiversity for developing economically valuable products. The wider availability of metabolite-profiling techniques may increase our knowledge in metabolic networks by detecting unexpected correlations and relations among various metabolites. Thus studies should significantly improve and must be able to execute quantitative analysis of distinct categories of metabolite fluxes and pools exerting no or few disturbances in metabolism. It has been expected that with all this information, it will be applicable by metabolic engineers to produce predictive models of plant secondary metabolism.