Keywords

2.1 Introduction

Functional genomics is a field of molecular biology that integrates genomic and transcriptomic data to describe gene (and protein) functions and interactions. Genomics is a study of the function and structure of genome, which comprise the complete set of all genes, regulatory sequences, and non-coding regions within an organism’s DNA. This discipline in genetics relies on sequencing and bioinformatics approach to sequence, assemble, and analyse all the gene coding and non-coding sequences and how these genetic components interact to produce an organism and all its functions. Conversely, transcriptomics is the study of the transcriptome—the complete set of RNA transcripts (including mRNA, rRNA, tRNA, and other non-coding RNA) that are produced by the genome, under specific circumstances or in a specific cell—using high-throughput methods. Sometimes, genomics is used as an umbrella term that encompasses genome-wide studies in many subdisciplines, including transcriptomics, proteomics, metabolomics, bioinformatics, systems biology, and synthetic biology. Hence, genomics provides not only a suite of methods and analytical techniques but also a perspective to study an organism as a whole.

On the other hand, functional genomics focus on the dynamic regulation of gene expression and protein-protein interactions , to elucidate DNA function at the levels of genes, transcripts, and proteins in a genome-wide context. The term “genomics” was first coined in 1986 by geneticist Tom Roderick during a meeting on the mapping of human genome, 66 years after the word “genome” was used by the German botanist Hans Winkler [1]. In the year 2000, genome sequences of the first model flowering plant Arabidopsis thaliana [2] and insect fruit fly Drosophila melanogaster [3] were published. The year after, two independent draft human genome sequences were reported 1 day after another in Feb 2001 [4, 5], followed by the mouse genome in 2002 [6]. The International Human Genome Project initiated in 1990 was officially completed in April 2003, 5 years after sequencing started in 1998, 2 years ahead of schedule. This provides a reference human genome with composite representative sequence derived from several selected individuals of nearly 100 anonymous donors. Since then, more genomes from individuals of different nations were sequenced [7]. The field has advanced to the cataloguing of regulatory elements and epigenomic mapping. These efforts, like genome sequencing, are not done by individual labs but continued as international collaborations, i.e. the ENCODE Consortium [8] and the Roadmap Epigenomics Mapping Consortium [9].

These genome projects not only drove the emergence of new methods for genome-wide investigations but also provided a framework for global views of biology through the advent of sequencing techniques [10]. This propelled myriads of genome sequencing projects of non-model organisms, including the “Genome 10K Project” for vertebrates [11]. Cancer (epi)genomics is another ongoing research hotpot [12]. This is made possible by a parallel development in bioinformatics tools and resources, such as GO tools for the unification of biology [13] and KEGG [14]. High-throughput data generation demands development in large-scale statistical analyses, such as that for genome-wide association studies (GWAS) [15], while clustering has become an integral tool to partition a large dataset into more easily digestible conceptual pieces [16]. Furthermore, visualisation of genome data is of paramount to comprehend emerging patterns [17], such as Integrative Genomics Viewer [18] and Circos [19]. Therefore, genomic transformation of biology into a data-intensive field has recruited many engineers, physicists, mathematicians, and computer scientists into biological research.

2.1.1 Different Aspects in Genomic Research

Current field of genomic research has bloomed into diverse scopes from molecular, cellular systems to population level. Molecular genomics like structural genomics [20], glycogenomics [21], toxicogenomics [22], chemogenomics [23], and pharmacogenomics [24] study particular genomic characteristics focused on molecular biology aspects. Cellular genomics like single-cell genomics [25] investigates cellular behaviour in the context of genomics content. Higher level of research scope encompasses complex interactions between multiple genomics such as epigenomics [12], metagenomics [26], comparative genomics [27], phylogenomics [28], GWAS [29], and translational research of genomic medicine [30]. Therefore, genomic research has span across the continuum of basic and applied research, which can be classified into comparative, functional, and translational genomics (Fig. 2.1).

Fig. 2.1
figure 1

A continuum of diverse research fields in genomics in addressing different levels of biological questions

Transcriptomics and proteomics are key parts of functional genomics. These varied genomic platforms allow researchers to address global, general, and specific questions in biology with respect to the genome under study. For example, GWAS have revealed variations in human genome with numerous single nucleotide polymorphisms (SNPs) that are linked to disease risk [31].

2.1.2 Functional Genomics in the Context of Systems Biology

The advancement of genomics provided a critical boost to systems biology by facilitating the prediction of complex systems’ behaviours, properties, and active processes. For example, the human genomic network permits the gene prediction of the best drug targets and guides the design of new therapies in treating complex diseases [24]. The ultimate aim is to produce more predictive, preventive, personalised, and participatory (P4) medicine for everyone (further described in Sect. 2.2.1).

Genomics in the broadest sense include both structural and functional aspects. The genome assembly and read mapping are considered structural, whereas the analyses of read abundance and exon usage are functional. These different aspects raise the question “to what extent genomic analyses qualify as systems biology?” [32]. For example, genome assembly is critical to genomic analysis with some of the greatest algorithmic challenges, but the resulting assembly on its own provides little direct insight about the biological system without further analysis.

To address this question , we apply the definition of systems biology as the study of interactions between system parts which involves (i) experimental perturbation, (ii) quantitative measurement, (iii) data integration, and (iv) modelling [33]. For instance, while genome survey solely for genome size and heterozygosity estimation would not fall within the realm of systems biology, the comparative analysis of genome sequences from different cancer cell types to study the genetic variations would qualify. The identification of mutations causing specific cancers represents a systems approach of finding one part in the system which affects the whole system’s behaviour.

Reference genome assembly and annotation, as well as comparative genomics, do not shed light on system behaviour on its own, but serve as blueprints for systems-level analysis, such as gene regulatory network (GRN) inference. Functional genomics to study the genome “in action”, such as tissue-specific gene expression and the dynamics of transcriptional regulation, are generally within the systems biology framework. Lastly, various genome-wide experiments involving chromatin immunoprecipitation-sequencing (ChIP-seq) interactomics, RNA-seq differentially expressed gene (DEG) analysis, and populational genome variation analysis also fall under the operational definition of systems biology. These analyses typically associate called peaks, expression levels, or variants of specific genes to infer functional enrichment in pathways. Figure 2.2 illustrates how genomics can fit into the context of systems biology through integration with other “omics” platforms.

Fig. 2.2
figure 2

Genomics in the context of systems biology

The integration of genomic data annotation, through functional and comparative genomic analyses, with that of proteomics and fluxomics allows systems-level pathway analysis, which helps in GRN inference. This contributes towards model development in pharmacogenomics and genomic medicine to identify drug targets. Metagenomics expands the study system beyond a single organism towards a community understanding of associated microbes at a functional level.

2.2 Applications of Functional Genomics

Over the past decades, genome sequencing technology has evolved from the first-generation Maxam-Gilbert and Sanger sequencing to current next-generation sequencing (NGS) methods of sequencing by synthesis and single-molecule real-time sequencing [34]. Genome data analyses, such as sequence mapping, assembly, genome annotation, and pathway mapping, have also undergone great advances with the advent of supercomputers, database development, and bioinformatics tools. While genome sequences only provide one-dimensional view on genetic compendium of a cell, when combined with systems biology, it can provide multidimensional insights and understanding on the dynamics of biological processes. In this section, example genomic applications in the studies of human, plants, and microbes are presented.

2.2.1 Comparative Genomics in the Study of Human Genetic Variation

A genome library provides an overview of the genetic makeup of a single organism. Comparison of multiple genome libraries from a single species provides insights into the genetic variation. The 1000 Genomes Project Consortium revealed the genetic variation that encompassed 26 human population and 2,504 individuals throughout five continents [7]. More than 88 million different variants were found, with approximately 96% of them represent SNPs followed by short insertions and deletions (indels) and structural variants. The comparative genomics analysis also reported that every individual harbours four to five million genetic variant sites with more than 99% of them are SNPs.

With the accessibility of genome sequencing, various diseases caused by single-gene mutations (Mendelian or monogenic diseases), such as cystic fibrosis, fragile X syndrome, and Huntington disease, can be easily identified within the genome of individuals or families with risk of inheritance. It is known that many genetic diseases are caused by single nucleotide variants that affect protein function through amino acid substitution [35]. By genome analysis, one can also identify SNPs with indirect association to the disease phenotype and important in the disease development. For instance, many SNPs in the non-coding regions are known to play regulatory roles in gene transcription and expression [36]. Some SNPs found in transcriptional elements have been identified to be associated with β-thalassemia [37], tumour formation [38], melanoma [39], and retinal vasculature defects [40]. Genome analysis has also helped to identify loci which are responsible for disease susceptibility and severity, such as for type 2 diabetes, coronary heart disease, systemic lupus erythematosus, hypertriglyceridemia, or even infectious diseases like trypanosomiasis, malaria, and Lassa fever [41, 42].

This comparative genomics has extended our understanding on human population history at the molecular level and helps in linking disease phenotypes with genetic variants. In the foreseeable future, more comprehensive genomics data and analysis will be available to provide guidance in disease prevention, diagnosis, and treatment according to the personal genetic profile, hence moving us towards P4 medicine era.

2.2.2 Plant Functional Genomics for Crop Improvement

Food security is one of the biggest challenges in this century as the current 7.3 billion world population is projected to reach 9.7 billion by 2050 (UN DESA Report 2015). Along with the impact of climate change and water scarcity, crop productions need urgent improvement with new technologies to overcome upcoming challenges . One of such technologies is to apply functional genomics in place of time-consuming and laborious traditional plant breeding.

Since the fully sequenced genome of model plant Arabidopsis thaliana [2], more than 100 plant genome sequences are now available, especially important crop plants such as rice [43, 44], maize [45], soybean [46], potato [47], and bread wheat [48]. Other important commodity plants such as African oil palm [49] and rubber [50] are also sequenced. The availability of crop genome information allowed the identification of genes related to important traits, including yield, disease resistance, and stress tolerance. Crop breeders are now able to accelerate hybrid-breeding programme via marker-assisted selection with genotyping-by-sequencing to produce higher-quality crops [51].

One of the recent examples on how genomic studies can benefit the plantation industry is the oil palm genome study of Elaeis guineensis and Elaeis oleifera [49] with the identification of MANTLED locus responsible for the mantled phenotype through epigenome-wide association studies [52]. The methylation of Karma long interspersed nuclear element (LINE) retrotransposon was found to be associated with clones of normal fruit yield compared to mantled clones with hypomethylation [52]. It is therefore useful for screening somaclonal epigenetic alterations during in vitro cloning to cull mantling at plantlet stage to prevent commercial and land use losses.

Apart from the development of molecular markers for the selection of superior traits, precision genome engineering is now possible to improve crops using genome editing tools such as transcription activator-like effector nuclease (TALEN) and CRISPR/Cas9 system [53]. Genetically modified crops with disease and pest resistance as well as higher yields could become more acceptable with the recent genome editing techniques to alter specific gene in a more precise manner without introducing foreign DNA. In the year 2014, hexaploid bread wheat with resistance to powdery mildew was generated by simultaneously introducing three targeted mutations into homoeoalleles of mildew resistance locus o (Mlo) using both TALEN and CRISPR/Cas9 technologies [54]. Genome editing with CRISPR/Cas9 is now possible in major crops such as sorghum, rice, maize, and soybean [55,56,57,58]. Despite some current technical challenges like the ineffective delivery method [59], genome editing tools which are relatively cheap and easy to apply will revolutionise crop improvement.

2.2.3 Metagenomics: A New Approach for Antibiotic Discovery

The discovery of antibiotic penicillin by Alexander Fleming in 1928 from Penicillium rubens mould saved millions of lives in fighting bacterial infection. Since then various kinds of antibiotics were discovered from bacteria during the prosperous age of antibiotic discovery (1940–1990), primarily from cultured bacteria and sensitivity testing with candidate compounds. A new class of antibiotic is hard to find nowadays in current bacterial collections. For instance, the discovery rate for new antibiotics in actinomycetes is only ~1%, which includes the genus Streptomyces that contributed to over 80% of antibiotics discovered so far [60]. Furthermore, the pressing need for new antibiotics is compounded by emerging multiresistant pathogenic bacteria.

Metagenomics has revolutionised microbiology in uncultured microflora [26]. It is estimated that 99% of bacterial species are not yet cultured in the laboratory, [61] and 70% of prokaryotic phyla that exist in seawater, freshwater, and soil are unexplored [62, 63]. While studying these uncultured bacteria remains a big challenge, metagenomics has made it possible to decode genomes for not one but a community of bacteria. This genome information opens up a new source for searching novel bioactive metabolite candidates as antibiotics or drugs. For the past two decades, metagenomics approaches have successfully identified novel antibiotics or new derivatives of known antibiotics (Table 2.1). For example, a new antibiotic teixobactin that could inhibit cell wall synthesis without causing resistance in pathogenic bacteria was discovered [64].

Table 2.1 New antibiotics or known antibiotic derivatives identified from metagenomics approaches

More importantly, many antibiotic resistance-related genes and pathways were also discovered through metagenomics studies [71]. Metagenomics also revealed events of antibiotic resistance gene exchange between environmental bacteria and clinical pathogens [72]. Moreover, single-cell and metagenomics studies coupled with bioinformatics, metabolomics, and chemical analysis by Wilson and colleagues discovered unique bioactive chemical compounds (polyketides and peptides) from two uncultivated phylotypes of marine sponges [73]. These findings encourage further exploitation of uncultivated bacteria for drug discovery via systems biology approach.

2.3 Transcriptomics Workflow

Here we describe main aspects of general workflow for transcriptomics analysis based on RNA-sequencing (RNA-seq) with focus on protein-coding mRNA analysis, which include experimental considerations, sequencing approaches, and data analysis. Comprehensive descriptions are beyond the scope of this chapter; readers are recommended to refer to other recent literature for further understanding on other aspects of transcriptomics [74], such as small RNA-seq, degradome-seq [75], translatome-seq [76], targeted RNA-seq [77], single-cell RNA-seq [78], and epitranscriptomics [79]. News on the latest development in the field of transcriptomics can be obtained by following the RNA-Seq Blog (https://www.rna-seqblog.com/).

2.3.1 Experimental Considerations

Various aspects need to be considered when designing a transcriptomic experiment, some of which are listed in Table 2.2. The first and the most critical aspect is the experimental design in addressing a specific biological question, which not only determine the strategy of all the downstream transcriptomic analyses but also key to the validity of the experimental results. This includes the purpose of the study on whether it is for expression/differential expression study, study of alternative splicing events (gene isoforms), or discovery of novel transcripts.

Table 2.2 Different aspects of transcriptomic experimental considerations

Different purposes require different strategies of RNA-seq . For example, differential expression analysis of gene with high transcript abundance will not require as high sequencing depth as for the analysis of gene expression with very low abundance and the same applied to the profiling of common and rare transcripts. This also influences the number of biological replicates required for the necessary statistical power to achieve the targeted significance level of differentially expressed genes (DEGs). In general, more biological replicates at lower sequencing depth are better than fewer samples at higher depth with equivalent total number of reads [80]. Pseudo-replication of technical replicates should be avoided to identify true DEGs with biological significance. Furthermore, power analysis is recommended to estimate the minimum number of biological replicates under the budget constraint of RNA-seq experiments [81].

In most cases, obtaining sufficient amount of good quality RNA from limited samples can be a bottleneck for increasing the number of biological replicates. This is related to the second aspect of consideration which is RNA isolation in deciding the most suitable method of extracting high-quality (RIN > 8) RNA for cDNA library preparation. Which library preparation method to choose will depend on the purpose of study on whether the target is only the protein-coding RNA (mRNA), non-coding RNA, small RNA (size selection), or total RNA (no selection). Furthermore, conventional mRNA-seq of whole fragmented transcript can be replaced by tag-based sequencing [82] such as 5′ or 3′ end sequencing if only interested in the expression of annotated genes [83]. This also improves gene quantification with higher sensitivity for rare transcript without the need of gene length normalisation during downstream analysis as tagged-based sequencing avoids the situation of longer transcripts crowding out shorter transcripts at low abundance. Most of the cDNA library kit is currently strand-specific to provide a more accurate estimate of transcript expression and genome annotation.

The choice of sequencing platform will be described in the next section, but briefly, it will depend on the purpose of study with considerations on the cost, accuracy, read length, required throughput (depth), and application. For example, Ion Torrent will be suited for transcript variant analysis of model organisms but not suitable for de novo transcriptome profiling. Lastly, downstream analyses after gathering all the sequencing data will be major considerations depending on the biological questions to be answered. In general, these include preprocessing of raw reads by trimming poor reads, QC, and examination of consistency among the biological replicates through correlation or multivariate analysis such as principal component analysis (PCA). This will require some informed judgement on which is the most suitable software/tools or even version to choose based on intended goal. The rule of thumb is to use the most established and up-to-date version of software/tools relevant to the topics of study as described in the latest literature. It is important to report the version of software/tools and database used throughout the data analysis for reproducibility as different versions might influence the outcome of result.

General workflow of transcript reconstruction will be described in Sect. 2.3.2. For more details on transcriptomic analysis and experimental considerations, we can also refer to a good website, RNA-seqlopedia (https://rnaseq.uoregon.edu/), or a recent review [84]. There are many specialised analysis software/tools, either proprietary such as Ingenuity Pathway Analysis (IPA) or open-source Web Gene Ontology Annotation Plot (WEGO) , which are developed for various downstream analyses that will require readers to be aware about the latest development in the field as many intensively used software/tools are regularly updated. However, the most up-to-date version is not necessarily the best option; it is therefore important to understand the detailed functions of a software/tool and changes made by the updates. Older versions are generally still available for download if required. Many of these software/tools are available on GitHub, such as Trinity, with detailed documentation. There is currently a trend towards DockerHub , which contains Trinity and all dependent software used for downstream analyses within the Trinity framework. This will allow easy implementation of analysis pipeline in the server or for cloud computing. The reproducibility of analysis will be improved by the recent efforts in moving transcriptomics analysis towards a customisable automated pipeline in cloud computing [85,86,87].

2.3.2 Data Acquisition

Nowadays, transcriptomic analysis is largely dependent on data generated from RNA-sequencing technology, especially for non-model organisms. However, many studies on model organisms such as human and rice still apply the established Affymetrix microarray approach [88] which will not be covered in this section. Some of the current sequencing platforms used in RNA-seq is summarised in Table 2.3. Roche 454 sequencing platform is not included due to the termination of its development for application in the field of transcriptomics. In general, there are two categories of RNA-seq platforms, which are short-read sequencers such as Illumina and Ion Torrent with generally <400 bp and long-read sequencers such as Pacific Biosciences (PacBio) and Oxford Nanopore with >10 kb of reads in real time.

Table 2.3 Summary on the different state-of-the-art sequencing platforms for transcriptomics

The choice of platform will depend on the purpose of experimental study as described above. It is now possible to generate a full-length high-quality transcriptome reference with PacBio isoform sequencing (Iso-Seq) [89] with unprecedented confidence in the identification of novel transcripts and allele-specific gene expression. However, Iso-Seq is still limited by relatively high cost and lower throughput of sufficient read depth for statistical gene expression analysis. On the other hand, if the study is only concerning with the expression of known genes in annotated genome or organism with high-quality transcriptome, 3′ mRNA-sequencing approach that generates only one fragment per transcript such as QuantSeq [90] can greatly increase the depth of sequencing with more sample multiplexing to improve statistical power at lower cost with simplified analysis. Therefore, it is foreseeable that both sequencing platforms will remain relevant for transcriptomics study, respectively, for transcriptomics profiling and differential expression analysis. To date, Illumina platform is still leading the field of transcriptomics in simultaneously providing the most cost-effective option for both applications.

2.3.3 Data Analysis

For RNA-seq data analysis, the major difference which distinguishes between different analysis pipelines depends on the method chosen for transcript reconstruction (Fig. 2.3). There are two main strategies of transcriptome assembly, namely, align-then-assemble or assemble-then-align. These two strategies can be combined to construct a more comprehensive reference transcriptome.

Fig. 2.3
figure 3

Different analysis pipelines for transcript reconstruction from RNA-seq analysis based on a reference, de novo, or combined approach. Major differences in the approach are underlined. Some of the popular software/tools used for each step of analysis are listed and colour-coded according to different assembly approaches. Common analysis or general tools are in black font

For reference-guided or ab initio assembly, the reference can be based on an annotated genome or a transcriptome reference. aware aligners or gapped mappers are used when mapping reads to genome to account for mapping across exon junctions. Novel transcripts or gene structures can be discovered from non-annotated transcripts, which can then be functionally annotated. If not interested in novel transcripts, reads can be mapped to a reference transcriptome using unspliced or ungapped mappers for more accurate mapping. Transcript identification and quantification can be achieved simultaneously, such as using Cufflinks.

When no reference is available, a reference-free or de novo assembly can be performed using two different types of assembly algorithms, namely, overlap-layout-consensus (OLC) and De Bruijn graphs (DBG). DBG is based on a much faster k-mer indexing approach that works well with short reads compared to a more computing intensive OLC that infers consensus sequences based on a layout of all the reads and overlaps information. DBG approach is currently more popular because most RNA-seq studies are using Illumina short-read sequencing. OLC can be useful for longer sequences generated from Sanger or 454 sequencing. The assembled contigs or transcripts are then used as a reference for read mapping to estimate the transcript abundance. This quantification can be estimated at “transcript/isoform” or “gene” level. The transcripts generated from de novo assembly are often subjected to further clustering using software such as TGICL before further analysis.

For a more comprehensive assembly, reads that failed to align to the genome can be de novo assembled, whereas unassembled reads from de novo assembly can be used to scaffold and extend contigs based on the reference genome [91]. This combined approach helps to generate a comprehensive transcriptome that maximises the utilisation of sequencing reads. The final assembled transcriptome will serve as a reference for quantification of expression which can then be subjected to DEG analysis using various statistical software [80] that suit the experimental design or nature of the datasets. For functional annotation, assembled transcripts are BLAST searched against public databases, such as NR and Swiss-Prot, which can then be further categorised according to gene ontology (GO) or clusters of orthologous group (COG) . This functional information and results from DEG analysis can then be combined to answer biological questions based on gene set enrichment analysis (GSEA) or pathway analysis for an overview of affected metabolite pathways.

2.4 Case Study: Functional Genomics Study of Polygonum minus

Polygonum minus Huds. (syn. Persicaria minor) is rich in secondary metabolites with medicinal and pharmaceutical importance [92]. Functional genomics study of P. minus started in 2011 with the identification of cDNA for jasmonic acid-responsive genes in root by suppression subtractive hybridisation [93]. The first leaf, stem, and root expressed sequence tag (EST) library was established in 2012 [94]. The is followed by leaf transcriptome profiling of genes induced by salicylic acid and methyl jasmonate (MeJA) through cDNA-amplified fragment length polymorphism (AFLP) approach [95]. All of these studies relied on the low-throughput Sanger sequencing. Recently, de novo RNA-seq using a hybrid NGS approach was taken to construct a more comprehensive transcriptome profile from the leaf and root tissues, respectively, using Illumina sequencing and Roche 454 pyrosequencing [96, 97]. Table 2.4 summarises the statistics from EST library and NGS transcriptome, which shows the great improvement of currently available transcript sequences. Furthermore, DEG analysis of mRNA [98] and small RNA [99] transcriptomes in leaf treated with MeJA can help to understand the effect of elicitation on global gene reprogramming which resulted in the compositional changes of volatile organic compounds (VOCs) [36].

Table 2.4 Statistics of P. minus EST library and NGS transcriptome

General workflow of RNA-seq analysis, particularly generating transcriptome profile, involves steps in the following order: raw reads preprocessing, filtering and trimming of low-quality reads and contaminant sequences, assembly and clustering, annotation, functional classification, and pathway mapping (Fig. 2.4).

Fig. 2.4
figure 4

Comparison of analysis workflow in transcriptomic studies of P. minus

In EST analysis, a preprocessing step was carried out using Seqclean and then an assembly step using CAP3 and StackPack, followed by open reading frame (ORF) prediction using ESTScan, whereas for RNA-seq analysis, a full Trinity analysis pipeline was followed for de novo assembly. Both required sequence similarity search using NCBI BLAST and functional classification using BLAST2GO based on Gene Ontology and Clusters of Orthologous Groups (COG). Lastly, pathway mapping was performed using KEGG Mapper.

Transcriptome profiling not only contributed to the identification of genes in response to emulated stresses but also allowed the discovery of genes involved in secondary metabolite biosynthesis. Several genes from secondary metabolite biosynthetic pathways were studied. One example is the functional characterisation of sesquiterpene synthase (PmSTS) which has been successfully expressed in both Lactococcus lactis [100] and Arabidopsis thaliana [101]. More recently, a recombinant β-sesquiphellandrene synthase from P. minus was expressed and characterised [102]. Furthermore, the transcript sequence database serves as an important reference in proteomics study for protein identification. Increasing availability of genetic information on P. minus will help in future exploration of this plant for biotechnological applications.