Keywords

11.1 Introduction

Various environmental factors like heat, cold, salinity, and drought severely affects plants growth and development that affects its production and productivity significantly. To address the abiotic stresses, defense mechanisms are often triggered by the plant to mitigate these unfavorable conditions. Understanding the mechanisms of plant defense systems at molecular level, there is a need to conduct a comprehensive study to decode the molecular mechanisms using various bioinformatics tools and techniques. Advanced DNA sequencing technology has accelerated the pace of genomics and transcriptomic studies in plants and animals to understand the molecular mechanisms. With the progress in omics approaches (viz. genomics, transcriptomics, proteomics, metabolomics, and phonemics) and its use in agriculture, a huge amount of data has been generated in molecular and biotechnology labs which can be used to identify novel genetic and chemical elements controlling various physiological processes and pathways of plant defense system. However, using only one approach is not sufficient to understand the complexity of stress response in plants. Recent development in the field of next generation sequencing technology (i.e., high-throughput data generation with reduced cost) in OMICS era generated a huge volume of molecular data. The major omics approaches are composed of genomics, transcriptomics, proteomics, metabolomics, and phenomics. These approaches provide a holistic view of molecular pathways at the cellular, tissue, or organism level. The integration of different omics-based approaches provides many folds of biological information which resulted in the development of a new branch of life science known as system biology (Hong et al. 2016; Chaudhary et al. 2019). However, analysis of high-throughput data from various omics-based approaches is one of the biggest challenges to interpret the plant defense mechanism(s). There are several tools, techniques, and databases available in public domain for various omics-based analyses independently. To handle this challenge due to generation and availability of multi-omics data, one has to use these tools in a more judicial and integrated way for deeper and novel biological insights. This chapter discusses various omics techniques such as genomics, transcriptomics, proteomics, metabolomics, and phenomics which are used to explore and understand the defense mechanism of plants at the molecular level to address abiotic stresses. Moreover, this chapter also provides a list of some important and widely used tools which can further be used to integrate the results of these omics approaches to draw a meaningful inferential conclusion.

11.2 OMICS Approaches to Study Plant Defense Mechanism

11.2.1 Genomics

  1. (i)

    Whole genome sequencing and resequencing: Genomics deals with the study of the complete genetic makeup of organisms or individuals. The field of genomics has grown exponentially in the past 20 years since the announcement of the first draft human genome in 2001. Further, the reduced sequencing costs and time accelerated the pace of whole genome sequencing due to the advancement of Next Generation Sequencing (NGS) technologies that have resulted in flooding of sequencing data (Fig. 11.1). This led to the development of advanced and efficient bioinformatics tools and techniques to handle such large-scale sequencing data for deeper and novel biological insights. We can consider mainly two groups of genomics, i.e., structural genomics and functional genomics. Structural genomics deals with locating the mapped genes and markers to individual chromosomes which results in producing physical map of the genome whereas functional genomics focuses on relating genome sequences with its transcriptome and proteome (encoded proteins) to describe gene functions and interactions. The most efficient way to study molecular mechanisms in plants is to decode the whole genome sequence. In plants, Arabidopsis was the first genome to be sequenced by an international consortium (Berardini et al. 2015). Plant genome sequence helps to explain the organization, regulation, and evolution of studied genomes. The advent of next generation sequencing (NGS) technologies allows millions of molecules to be sequenced simultaneously and whole genome sequencing has become substantially cheaper and faster than traditional sequencing methods (Goodwin et al. 2016). Availability of high-quality whole genome sequence data and a well-annotated reference genome is very crucial for genomics and transcriptomic-based research. The catalogue of annotated gene models, genome organization, and synteny-based knowledge, repeats, and most notably the basis for distinguishing genetic variants are more apparent advantages acquired from genome sequencing. The reference genome is also used as the basis for the annotation of other genomes of closely related species. However, sequencing of the whole genome (i.e., resequencing) is faster and cost-effective for the species which have already sequenced high-quality reference genomes. There are several Bioinformatics tools for assembling the reads of sequenced genome like Bowtie, Soap2, MIRA, Abyss, SOAPdenovo, and velvet (Wee et al. 2019).

  2. (ii)

    Identification of Molecular markers: The whole genome sequences can be extensively studied in discovering the molecular markers. One of the promising marker systems suitable for laboratories is microsatellites or simple sequence repeats (SSRs). A valuable resource for upcoming breeding programs are being developed for genome-wide identification of microsatellites and subsequently helps in markers development. MISA and GMATA are two most popular and widely used bioinformatics tools for identification of SSRs in the genomic data. But nowadays, SNP genotyping approaches are gaining mainstream acceptance with the introduction of cost-efficient and high-throughput genotyping techniques. SAM tools, GATK, Picard, etc. are some variant calling tools that are used to identify SNPs from the whole genome sequence assembly. The genotyping by sequencing (GBS) approach is an extremely multiplexed framework for building RRL (reduced representation libraries), finding molecular markers, and genotyping for crop improvement among the various other SNP-based genotyping approaches (Eltaher et al. 2018; Elbasyoni et al. 2018). GBS has been applied to many crop varieties as a result of low cost and innovative technology (Poland and Rife 2012; Kim et al. 2016). For example, a tomato GBS study led to the discovery of 8784 SNPs based on an approach to NGS and 88 percent of these SNPs are commonly found in tomato germplasm, (Sim et al. 2012). GBS is simple and cost-effective solution but use is still limited because it requires specialized skills in computational and data  analysis. In the future, it can be a commonly used approach with the availability of easy-to-use computational packages and pipelines.

  3. (iii)

    QTL mapping and GWAS: Linkage mapping (LM) and association mapping (AM) by identifying marker–trait associations have contributed to the identification of QTL (Cockram and Mackay 2018). In many plant species, the importance has been given to mapping QTLs for many abiotic stresses, such as heat, salinity, drought, and cold. QTLs controlling seed germination under various stress conditions have been identified using QTL experiments. QTL mapping experiments are conducted to identify loci regulating stress resistance in particular, advancements in genomics have encouraged more complex approaches involving multi-parental populations such as nested association mapping (NAM) and Multi-parent advanced generation inter-cross (MAGIC). A Genome-wide association studies (GWAS) approach, on the other hand, has an advantage over linkage mapping (Linkage Disequilibrium, i.e., LD) as it examines the genetic variation and recombination events in germplasm collections and also offers higher precision mapping (Fukushima et al. 2009). This set is designed to capture the genetic variability for the trait of interest and represents the products of hundreds of historic recombination cycles, providing higher resolution during QTL mapping (Mackay et al. 2009). GWAS is systematically used to detect SNPs for agronomic characteristics in a germplasm collection (Pasam et al. 2012). However, associations detected in AM are often spurious because associations are based on LD, which not only depends on linkage but also on population stratification and relatedness among individuals. Nowadays, efforts have also been made to combine linkage-based QTL mapping with LD-based AM, and conduct joint linkage association mapping (JLAM) to overcome the limitations and exploit the benefits associated with each of the two approaches, i.e., linkage and LD.

  4. (iv)

    Genomic Selection: The declining cost of SNP assays has made it possible to genotype vast numbers of experimental lines in stress-tolerant crop breeding programs to introduce the Genomic Selection (GS) method. The GS method is successful in simultaneously controlling all the loci that lead to the growth of the trait, regardless of the magnitude of their individual impact. The GS solution overcomes the disadvantage of QTL mapping-based breeding where it is difficult to track/identify small-effect QTLs. Importantly, the small effects of QTLs can collectively have greater effects on abiotic traits of economic significance. Due to epistatic interactions, the most economically significant traits are complex and influenced by unexpected trait expressions (Deshmukh et al. 2014). Therefore, by using all available molecular markers in conjunction with the phenotypic data of a training population, GS is the best way to predict genetic values for selection. A model has been developed to classify and analyze genotypic and phenotypic data to evaluate the phenotypic variation based on their genotypes of their whole genomes (genetic composition) (Yan et al. 2009). To estimate breeding values, different GS models like nonlinear regressions (RKHS and RF), Bayesian approaches (Bayes A and B), and penalized regressions (RR, LASSO, and EN) have been used in many studies.

Fig. 11.1
A workflow diagram starts at the omics level which comprises genomics, transcriptomics, proteomics, metabolomics, and phenomics. Then on the data processing level, quality control and transformation occur. Data analysis is carried out by a statistical model, either by enrichment or network analysis, which includes protein modeling. The output is the discovery of desirable genes.

Workflow diagram of omics approaches for study of plant defense mechanism

11.2.2 Transcriptomics

For the efficient management of abiotic stress, understanding the gene regulatory cascades for stress responses is very important. The best strategy for investigating plant response regulation and identifying genes involved in mechanisms of stress tolerance is to collect and compare the transcriptome of different tissue types at various developmental stages. Thus, understanding the transcriptome of different tissues at developmental stages will lead to better understand the associated phenotypic variation. Several tools and techniques are available to obtain expression profiling for assessment of transcriptomic results both gene-by-gene and collectively for several genes at a time.

  1. (i)

    Microarray

    Microarray technology is based on hybridization between the target DNA and probe DNA designed with known sequences. It is capable of covering tens of thousands of genes at a time, it has made a significant contribution to research. It is well developed and is still being used as a major platform for transcriptome analysis of sequenced species, despite its shortcomings in the variety of target transcripts in the dynamic spectrum of quantification compared with NGS-based RNA-seq technology (Wang et al. 2009; Jazayeri et al. 2014). Microarray is used to identify the differentially expressed genes in response to abiotic stresses, including salinity, heat, cold, drought, and oxidative stress. Numerous studies have been conducted in several plant species using microarray approaches to identify genes having significant roles in stress tolerance mechanisms as well as for the understanding of diverse molecular mechanisms (Kumar et al. 2018, 2019; Nagaraju et al. 2019, 2020).

  2. (ii)

    RNAseq

    This approach is based on high-throughput Next Generation Sequencing. RNAseq relies on high-speed sequencing of short cDNA fragments (typically 30–400 bp) reverse-transcribed from mRNAs. Further, number of cDNA fragments aligned to the reference sequence indicates the abundance of the mRNA. RNA sequencing (RNAseq) has become the most cost-effective, reliable, and high-throughput transcriptomic technology with the quick advancement of next generation sequencing. In contrary to microarray, the RNAseq approach is not only confined to comparing the transcripts levels, but also it is useful in discovery of novel genes and spliced forms, especially in non-model plants. Numerous reports on the application of RNAseq technology in case of plants are available (Ye et al. 2017; Xiong et al. 2017; Guan et al. 2019). RNAseq technology has also been applied to unsequenced organisms (Ekblom and Galindo 2010) as several computational tools enable de novo assembly of the reads without the availability of a reference genome (Oshlack et al. 2010; Grabherr et al. 2011). Although management of the huge data sets generated poses many challenges and this technology is becoming a mainstream of transcriptome analysis.

  3. (iii)

    HiCEP

    High-coverage gene expression profiling (HiCEP) is based on the amplified-fragment-length polymorphism technique. This approach can detect changes in transcript expression with high coverage (Fukumura et al. 2003). Amplified DNA fragments are first derived from mRNA followed by capillary electrophoresis. Their abundances are estimated by the peak observed through electropherogram. The relevant peaks are then fractionated and sequenced.

11.2.3 Proteomics

Proteomics is the large-scale study of proteins in a studied organism or system. The proteome represents a complete set of proteins that are produced by an underlying organism or system. Proteomics has enabled us to identify and validate the ever-increasing numbers of proteins. Proteins are important for living organisms as they produce a variety of functions. Modern proteomic technologies have made it possible to detect vast number of proteins in plant samples easily and simultaneously (Vanderschuren et al. 2013). Over the last few years in plant science, high-throughput quantitative proteomics studies gained considerable significance in characterizing proteomes and their differential regulation during plant growth, biotic and abiotic stresses. Proteomics experiments often found that many insect attack-responsive proteins were associated with the cycle of tricarboxylic acid (TCA) and also involved in carbon metabolism, which suggested that carbon metabolism was altered during insect attack for defense. High abundance of proteins such as ribulose-1,5-bisphosphate carboxylase oxygenase (Rubisco) creates considerable difficulties using shotgun plant proteomics for the whole proteome characterization. To understand the defense mechanisms during plant–insect interactions, an enhanced proteomic system, called Polyethyleneimine Assisted Rubisco Cleanup (PARC) was used (Zhang et al. 2013). George et al. (2011) reported the differential protein expression in maize (Zea mays L.) in response to infestation of a chewing (Spodoptera littoralis) and a boring insect (Busseola fusca).

  1. (i)

    Gel-Based Electrophoresis:

    In the first dimension, proteins are isolated either by an immobilized pH gradient strip or by an isoelectric focusing tube, and then followed by SDS polyacrylamide gel electrophoresis in the second dimension (Komatsu et al. 2007, 2012, 2013a, b, 2015). Protein spots are extracted from the gel after staining, reduced by dithiothreitol, alkylated with iodoacetamide, and digested with trypsin. A form of Mass Spectrometry (MS), such as nano-liquid chromatography (LC) tandem MS or nano-LC MS/MS, will then analyze the peptide mixtures. While 2D gel-based methods offer a visual description of proteins including intact protein profiles and they are not sufficient for the detection and identification of proteins with low abundance or with extreme molecular weights, isoelectric points, or hydrophobicity.

  2. (ii)

    Gel-free proteomics:

    Gel-free proteomics includes both label-free and labeling methods. In case of label-free method, protein samples are purified by chloroform-methanol extraction and reduced with dithiothreitol, alkylated with iodoacetamide, and digested with trypsin and lysyl endopeptidase. They are analyzed by nano-LC MS/MS (Komatsu et al. 2013b). Differentially expressed proteins are identified from the spectrum obtained by scanning with MASCOT Daemon client software against a peptide database. For identification and annotation of homologous proteins, positive matches are searched against protein databases available at NCBI (www.ncbi.nlm.nih.gov) through BLASTP. It is now a commonly used technology in proteomics, since its protocol is simple and helpful in identifying proteins in a large scale.

11.2.4 Metabolomics

Metabolomics is a promising approach that provides a biochemical snapshot of phenotype of an organism. Metabolomics makes it possible to systematically classify and measure low-molecular weight molecules which are closely related to essential toxicological and nutritional features. Information on genes, proteins, and transcriptomes are not adequate to thoroughly classify a cell but the broad variety of primary and secondary metabolites found in a cell must also be examined. Numerous studies have been done to explain the function of metabolites in plants under conditions of biotic and abiotic stresses. Plant chemical compounds that are not active in photosynthetic and core metabolic processes are linked to the evolution of the chemical defense mechanism against stress in plants (Mithöfer and Boland 2012; Gjindali et al. 2021). These compounds are classified as secondary metabolites that do not play any significant role in the plant’s growth, development, or reproduction rather these compounds serve as signaling molecules or direct defense chemicals and include alkaloids, terpenoids, cyanogenic glycosides, glucosinolates, and phenolics (Bennett and Wallsgrove 1994; Zebelo and Maffei 2012). To study the chemicals involved in the interactions of living organisms, including the chemical defense system during plant–insect contact, a special area called “chemical ecology” is developed (Mithöfer and Boland 2012). Plants have to sacrifice some of the central metabolism by allocating energy to this defense while activating the defense response mechanism controlled by the secondary metabolites against insect attack. Along with secondary metabolism, during an insect or pathogen attack, the primary metabolism of a plant is often differentially influenced (Barah et al. 2013). It has been in use for the past decades to study the selective control of primary or secondary metabolites during plant–insect activity (Salem et al. 2020).

Recent developments in high-throughput metabolite profiling methods and advanced combinatorial protocols available in plant metabolomics are liquid chromatography–mass spectrometry (LC-MS), gas chromatography–mass spectrometry (GC-MS), Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR-MS), ultra-performance liquid chromatography tandem mass spectrometry (UPLC-MS), flow-injection electrospray ionization mass spectrometry (FI-ESI-MS) and nuclear magnetic resonance (NMR). However, it is computationally difficult to analyze the enormously diversified plant metabolites produced using these methods (Allwood et al. 2008; Ernst et al. 2014). Hence in analyzing and processing highly complex biological data, the role of bioinformatics is very crucial.

11.2.5 Phenomics

Phenomics is the study of high-throughput phenotypic variation analysis, which is a complex web of genotype, phenotype, and environment interactions. Phenome represents a set of phenotypes. Studies of the genome and phenome with individuals or large populations are complementary to each other (Yasunori and Sinha 2014). Plants with stable phenotypes are strong genomic tools and are also a target to identify the alleles by high-throughput sequencing. Advances in sequencing technology have increased genotyping efficiencies, while phenotypic characterization has progressed more slowly over the past decade, restricting the characterization of quantitative characteristics, especially those associated with stress tolerance (White et al. 2012). There are recent developments in phenotyping methods which allow the identification of specific characteristics. Phenomics technology requires advanced imaging systems, sensors, automations, and computational resources for the phenotyping in plants. These make phenomics a high-throughput approach that is capable of handling thousands of genotypes for the evaluation of hundreds of phenotypic parameters simultaneously (White et al. 2012; Ubbens and Stavness 2017; Tardieu et al. 2017). There are various phenomics platforms available to investigate physiological parameters in plants under different stress conditions, e.g., one such tool is scan analyzer 3D. As phenomic data collection is an expensive and time-consuming method, integrated technological developments would help to minimize the associated costs and increase phenomic throughput.

11.3 Bioinformatics Tools and Techniques for Integration of Multi-OMICS Data

Due to availability of large-scale multi-omics data and their availability in public domain, e.g., in the form of various databases and repositories, poses a major challenge for bioinformatics community for integrating different tools and techniques so that one can draw biologically useful inferences because the use of only one approach at a time cannot lead to understand the defense mechanism robustly. Even after having lots of development in this area, integration of heterogeneous omics data to draw meaningful biological inferences is a major challenge (Keurentjes et al. 2011). However, to develop ultimate products like climate-smart cultivars, efficient integration of different tools, techniques, and approaches appears to be a promising strategy. For example, GWAS and QTL mapping both identify a genomic region or marker that is associated with underlying trait of interest and further in discovering the candidate genes. As the use of RNAseq data with gene expression profiles gives an idea about the functions of unknown genes. So, relating GWAS and QTL with their respective transcriptome will give the clue to identify differentially expressed candidate genes.

There are large number of user-friendly computational platforms are developed for the integration of multi-omics data (Table 11.1). Details of such few tools and software are given in Table 11.1.

Table 11.1 Computational platforms used for the integration of multi-omics data

11.4 Concluding Remarks

The recent developments in modern high-throughput sequencing technologies have flooded the web with the availability of biological data from various platforms. Recent efforts for development in integrating omics data are not sufficient in understanding such vast biological data. However, the integrative system-based approach, i.e., integrating multi-omics data generated from heterogeneous platforms, using various bioinformatics tools, techniques, and approaches, is the only solution to this problem of understanding and finding meaningful biological conclusions. Although efficient adaptation of bioinformatics tools and techniques depends on their availability and user-friendly manner. So, there is a need to develop more user-friendly and easy-to-use bioinformatics tools and pipelines for end users, such as accessibility, easy to use tutorials and manuals, and interactive options to analyze multi-platform data. This will help the researchers to understand the biological system in a more realistic way, and will definitely help to translate this understanding to develop better crop varieties with improved defense mechanisms.