Keywords

3.1 Introduction

In nature, plants are simultaneously exposed to a wide range of stresses (abiotic and biotic), which is a major threat towards the living world more precisely to the plants. This stress leads to various physiological and metabolic changes, which in turn negatively hinder growth, development and productivity of plants (Tardieu and Tuberosa 2010). Based on the climate change study, the occurrence and severity of stresses will surge, resulting in a loss (nearly 70%) of agricultural production (Ghosh and Xu 2014). Thus, an important solution for plant protection and yield increase is by designing plants based on a molecular understanding of gene function and on the regulatory networks involved in stress tolerance, growth and development (Shafi et al. 2014, 2015a, b, 2017). Technological advancement has offered a holistic view on systems organization and functionality; however, the ever-growing extensive data poses great challenges for its efficient analysis and interpretation and finally the integration into different crop improvement schemes (Esposito et al. 2016). Latest, ultrahigh-throughput computational studies are crucial to know about the molecular crosstalks of stress conditions on agricultural crop production. Now the challenge is how to integrate multidimensional biological information in a network and model leading to the development of system biology. Most of the plant system biology strategies rely on four main axes, viz. genomics, proteomics, transcriptomics and metabolomics, which provide us with a better platform to identify and understand the molecular systematics and mechanism under stress conditions (Yuan et al. 2008). Genomics deals with the study of genome; transcriptomics includes structural and functional analyses of coding and non-coding RNA or transcriptome; proteomics deals with protein and post-translational protein modification along with their regulatory pathway and metabolomics, a powerful tool to analyse various metabolites and help in identifying the complex network involved in stress tolerance when analysed in an integrated way. Multifaceted molecular regulatory system and biochemical properties which are specifically involved in stress tolerance and adaptation in plants can be easily deciphered with the help of combined ‘omics’ study (Chawla et al. 2011). Further, bioinformatics has many practical applications in current plant disease management with respect to the study of host-pathogen interactions, understanding the disease genetics, pathogenicity factor of a pathogen and plant-pathogen biological network, which ultimately help in designing best disease management options (Koltai and Volpin 2003).

Plant amends its ‘omics’ profiles to cope with the changing environment for their survival, tolerance and growth. The main aim of this ‘omic’ approach is to find out the molecular interaction and their relationship with the signalling cascade and to process the information which in turn connects specific signals with specific molecular responses (Esposito et al. 2016). The era of genomics, proteomics, metabolomics and phenomics of crop stress biology involves transformation, mining and functional ontology annotation, promoter and SNP analysis, gene expression, pathway enrichment analysis, microRNA prediction, subcellular localization, gene structure analysis, comparative analysis, interactome, protein function analysis, tissues-specific and developmental stage expression analysis and simulation and focused on morpho-molecular differences in stress-exposed and stress-affected crop/model plants. These omics approaches can provide new insights and open new horizons for understanding stresses and responses as well as the improvement of plant responses and resistance to stresses (Duque et al. 2013).

Little is known about the ‘omics’ characterization of abiotic and biotic stress combinations, but recently, several reports have addressed this issue (Suzuki et al. 2014; Kissoudis et al. 2014). The three main domains that must be addressed to take full advantage of plant systems biology are the development of omics technology, integration of data in a usable format and analysis of data within the domain of bioinformatics. This explicit omics knowledge could subsequently be harnessed by researchers to develop improved crop plants in terms of quality and productivity, showing the enhanced level of abiotic stress tolerance and disease resistance (Singh et al. 2011). In the present chapter, we will introduce the key omics technologies and contemporary innovative technology employed in plant biology and the bioinformatics platforms associated with them (Fig. 3.1). Since the focus of this chapter is the integrated omics approaches in plant stress tolerance, we will describe some of the key concepts, techniques and databases used in bioinformatics, with an emphasis on those relevant to plant stress. It also covers some aspects with regard to the role of application of this endeavour science in today’s plant disease management strategies, molecular diagnosis of plant disease in particular to see the application of bioinformatics in detection and diagnosis of plant pathogenic microorganisms.

Fig. 3.1
figure 1

Schematic outline of main ‘omics’ approaches, their technologies and databases as well as expected outcomes in plant biology and stress research

3.2 Plant Genomics-Related Computational Tools and Databases Under Abiotic Stress

Developments over the past decade, arising predominantly from the human genome project, have led to a new phase of plant genetics known as genomics. ‘Genomics’ study of all the genes in a given genome includes the identification of gene sequences, intragenic sequences, gene structures and annotations (Duque et al. 2013). This field is the application of the newly available vast amounts of genomic DNA sequence, using a range of novel high-throughput, parallel and other technologies. The innovation of high-throughput sequencing methods gives scientists the ability to exploit the structure of the genetic material at the molecular level which is known as ‘genomics’. Genome sequencing technologies have enabled us to study this part of omics, and it has commenced with the first generation (the 1970s), followed by the next-generation sequencing (NGS) technologies (1990s) as well as the latest third-generation sequencing technologies (El-Metwally et al. 2014b, c). These NGS technologies have got huge impact in plant genome research for the improvement of economically important crops and the understanding of model plant biology. Substantial innovations in platforms for omics-based research and application development provide crucial resources to promote stress-related research in the model and applied plant species (Feuillet et al. 2011). Recent advancement in plant genomics has allowed us to discover and isolate important genes and to analyse functions that regulate yields and tolerance to environmental stress (Govind et al. 2009). Genomics mainly helped in identifying the functional relevance of genes involved in abiotic and biotic stress responses in plants via functional genomic approaches (Ramegowda and Senthil-Kumar 2015). Combinatorial approach using multiple omics platforms and integration of their outcomes is now an effective strategy for clarifying molecular systems integral to improving plant stress tolerance and productivity; this combo approach has helped plant breeders in creating new breeds that can tolerate several biotic and abiotic stresses and, consequently, have increased crop yields as well as pathogen resistance (Shankar et al. 2013; Agarwal et al. 2014). Thus an understanding of plant response towards stresses is enhanced with the application of genomic techniques such as high-throughput analysis of expressed sequence tags (ESTs), large-scale parallel analysis of gene expression, targeted or random mutagenesis and gain-of-function or mutant complementation (Cushman and Bohnert 2000).

Plant genomics study has exploded recently and has become the major boom in plant research due to the rapid increase in plant genomic sequences (Govindaraj et al. 2015). This plant genomic period started from whole-genome sequencing of Arabidopsis thaliana (The Arabidopsis Genome Initiative 2000), followed by a draft genome sequence of rice, both japonica and indica (Yu et al. 2002). Afterwards, the genome sequence of japonica rice was completed and published by the International Rice Genome Sequencing Project (International Rice Genome Sequencing Project 2005). Subsequently, the National Science Foundation (NSF) Arabidopsis project (USA) was launched with the stated goal of determining the functions of the 25,000 genes of Arabidopsis by 2010. This accumulation of nucleotide sequences of model plants, as well as of applied species such as crops, has provided fundamental information for the design of sequence-based research applications in functional genomics (Somerville and Dangl 2000). Technologies which are included under the canopy of ‘genomics’ are:

  • Automatic DNA sequencing (the machine can read two million base pairs a day)

  • Microarrays and DNA chips (tens of thousands of genes can be scanned for activity levels at the same time)

  • Automated genotyping machines (assay tens of thousands of DNA diagnostic points a day)

Bioinformatics remains obligatory to projects that seek deciphering of the whole genome of an organism. In fact, soon it will be possible to monitor whole genomes for gene expression on single chips. Once genome sequencing is achieved, one aims to identify and delineate the genomic elements of functional relevance contained within the genome, i.e. ‘structural annotation’ and assigning biological functions to these elements, referred to as ‘functional annotation’.

3.2.1 Genomics Applications in Relation to Abiotic Stress Tolerance

In order to employ applications of genomics field to address the problems of abiotic stress in mandate crops and model plants, approaches like genomic-scale expressed sequence tags (ESTs), genomic sequencing and cDNA microarray analyses have tremendous potentiality in rapidly isolating the candidate genes involved in tolerance mechanisms under stress conditions. Some of the latest techniques used for genomic analysis under stress conditions are as follows:

  • Expressed sequence tags (ESTs) are created by partial ‘one-pass’ sequencing of randomly picked gene transcripts that have been converted into cDNA (Adams et al. 1993). ESTs are often used to be relative collections from stressed and non-stressed plant tissues. A comparison of ESTs of the stressed and non-stressed sample will identify genes that are up-regulated in the stressed tissues and those which are down-regulated or switched off.

  • The presence of various key functionalities of full-length cDNA resources in omic space is also essential to establish relevant information resources that provide gateways to these resources as well as to integrate related datasets derived from other omics fields and species (Sakurai et al. 2005).

  • cDNA libraries also serve as primary sequence resources for designing microarray probes and as clone resources for genetic engineering to improve crop efficiency (Futamura et al. 2008). Further, candidate genes (induced by stress) which emerge from microarray analyses are ideal for comparative analysis.

  • Mini-arrays which are built from collections of ESTs assembled from random cDNA libraries, or from more targeted collections made from cDNAs collected from stressed tissues. Even more targeted will be the special ‘stress arrays’ made up of all the expressed genes for which there is any evidence of implication.

Omics platforms and their associated databases are also essential for the effective design of approaches making the best use of genomic resources, including resource integration. Various bioinformatics software and tools are being increasingly used to maintain, analyse and retrieve the massive-scale molecular data under stress and non-stressed conditions (Table 3.1). Some of those involved specifically under stress are as follows:

  • Plant Stress Gene Database (PSGD): It provides information about the genes involved in stress conditions in plants (Prabha et al. 2011). This database includes 259 stress-related genes of 11 species along with all the available information about the individual genes. Stress-related ESTs were also found for Phaseolus vulgaris. The database also includes ortholog and paralog of proteins which are coded by stress-related genes.

  • Stress-Responsive Transcription Factor Database (STIFDB V2.0) : It is a comprehensive collection of biotic and abiotic stress-responsive genes in Arabidopsis thaliana and Oryza sativa L. with options to identify probable transcription factor binding sites in their promoters. In response to biotic stress like bacteria and abiotic stresses like ABA, drought, cold, salinity, dehydration, UV-B, high light, heat, heavy metals, etc., ten specific families of transcription factors in Arabidopsis thaliana and six in Oryza sativa L. are known to be involved (Shameer et al. 2009).

  • Stress-Responsive Transcription Factor Database (STIFDB2) : Currently it has 38,798 associations of stress signals, stress-responsive genes and transcription factor binding sites predicted using the Stress-responsive Transcription Factor (STIF) algorithm, along with various functional annotation data. As a unique plant stress regulatory genomics data platform, STIFDB2 can be utilized for targeted as well as high-throughput experimental and computational studies to unravel principles of the stress regulome in dicots and gramineae (Naika et al. 2013).

  • STIF (Hidden Markov Model-Based Search Algorithm): It is used for the recognition of binding sites of stress-upregulated transcription factors and genes in Arabidopsis (Ambika et al. 2008).

  • PESTD: A comparative genomics study on plant responses to abiotic stresses and is a dataset of orthologous sequences. A large amount of sequence information, including those derived from stress cDNA libraries, are used for the identification of stress-related genes and orthologs associated with the stress response. Availability of annotated plant abiotic stress ortholog sets will be a valuable resource for researchers studying the biology of environmental stresses in plant systems, molecular evolution and genomics (Jayashree et al. 2006).

  • Arabidopsis Stress-Responsive Gene Database (ASRGD) : It is a powerful mean for manipulation, comparison, search and retrieval of records describing the nature of various stress-responsive genes in Arabidopsis thaliana. About 44 types of different stress factors are related to Arabidopsis thaliana, and the database contains 636 gene entries related to stress response with their related information like gene ID, nucleotide and protein sequences and cross-response. The database is based exclusively on published stress-responsive and stress-tolerant genes associated with plants (Borkotoky et al. 2013).

  • The Arabidopsis Information Resource (TAIR) : It contains genetic and molecular biology data for the Arabidopsis thaliana, which is more widespread to different aspects apart from the stress response, which makes it difficult to look for only stress-related genes (Swarbreck et al. 2008).

  • Pathogen Receptor Genes Database (PRGDB) : It allows easy access not only to the plant science research community but also to breeders who want to improve plant disease resistance. It offers 153 reference resistance genes and 177,072 annotated candidate pathogen receptor genes (PRGs). Plant diseases display useful information linked to genes and genomes to connect complementary data and better address specific needs. Through a revised and enlarged collection of data, the development of new tools and a renewed portal, PRGdb 3.0 engages the plant science community in developing a consensus plan to improve knowledge and strategies to fight diseases that afflict main crops and other plants (Osuna-Cruz et al. 2018).

  • Rice SRTFDB : It provides comprehensive expression information on rice transcription factors (TFs) during drought and salinity stress conditions and various stages of development. It will be useful to identify the target TF(s) involved in stress response at a particular stage of development. It also provides curated information for cis-regulatory elements present in their promoters, which will be important to study the binding proteins. This database aims to accelerate functional genomics research of rice TFs and understand the regulatory mechanisms underlying abiotic stress responses (Priya and Jain 2013).

  • QlicRice : This database is designed to host publicly accessible, abiotic stress-responsive quantitative trait loci (QTLs) in rice (Oryza sativa) and their corresponding sequenced gene loci. It provides a platform for the data mining of abiotic stress-responsive QTLs, as well as browsing and annotating associated traits, their location on a sequenced genome, mapped expressed sequence tags (ESTs) and tissue- and growth stage-specific expressions on the whole genome. An appropriate and spontaneous user interface has been designed to retrieve associations to agronomically important QTLs on abiotic stress response in rice (Smita et al. 2011).

  • Drought Stress Gene Database (DroughtDB) : It is a manually curated compilation of molecularly characterized genes that are involved in drought stress response. It includes information about the originally identified gene, its physiological and/or molecular function and mutant phenotypes and provides detailed information about computed orthologous genes in nine model and crop plant species. Thus, DroughtDB is a valuable resource and information tool for researchers working on drought stress and will facilitate the identification, analysis and characterization of genes involved in drought stress tolerance in agriculturally important crop plants (Alter et al. 2015).

Table 3.1 Genomic repositories and stress-related databases

3.2.2 Platforms and Resources in the Transcriptome of Plants Under Abiotic Stress/Plant Transcriptomics-Related Computational Tools and Databases

Transcriptome (RNA sequencing or expression profile of an organism) is highly dynamic and involves capturing of the RNA expression profile in spatial and temporal plant organs, tissues and cells within particular conditions (Duque et al. 2013; El-Metwally et al. 2014a). In response to various abiotic stresses, the plant constantly adjusts their transcriptome profile. Thus, transcriptomics study assists in finding genes that are associated with alterations in the plant phenotype under different abiotic or biotic stress conditions (Kawahara et al. 2013). This comprehensive and high-throughput RNAseq analysis finds its applications in plant stress response and tolerance such as searching for abiotic stress candidate genes, predicting tentative gene functions, discovering cis-regulatory motifs and providing a better understanding of the plant-pathogen relationship (De Cremer et al. 2013; Agarwal et al. 2014). The recent boom in the availability of online resources, databases and archives of transcriptome data allows for performing novel genome-wide analysis of plant stress responses and tolerance (Duque et al. 2013). Several studies on the transcriptome of different organs and developmental stages of plants under different environmental conditions were observed (Narsai et al. 2010; Zhou et al. 2008). Narsai et al. (2010) identified an exclusively new set of reference genes in rice that are of immense significance, and analysis of their promoter sequence shows the prevalence of some stress regulatory cis-element (Zhou et al. 2008).

Different techniques exist to analyse transcriptomic changes in a system under different stress conditions and these are as follows:

  • RNA/gene expression profiling is mostly accomplished using microarray, RNA sequencing (RNAseq) through next-generation sequencing (NGS), serial analysis of gene expression (SAGE) and digital gene expression profiling (Kawahara et al. 2013; Duque et al. 2013; De Cremer et al. 2013).

  • Hybridization-based method , such as that used in microarrays and GeneChips, has been well established for acquiring large-scale gene expression profiles for various species (De Cremer et al. 2013).

  • Next-generation DNA sequencing application , deep sequencing of short fragments of expressed RNAs, including sRNAs, is quickly becoming an effective tool for use with genome-sequenced species (Harbers and Carninci 2005).

  • Quantitative PCR analyses up to a few genes at a time, while microarray analysis allows the simultaneous measurement of transcript abundance for thousands of genes (Joshi et al. 2012).

  • Tiling arrays cover the genome at regular intervals to measure transcription without bias towards known or predicted gene structures, the discovery of polymorphisms, analysis of alternative splicing and identification of transcription factor binding sites (Coman et al. 2013). Transcriptome analysis in Arabidopsis under abiotic stress conditions using a whole-genome tiling array resulted in the discovery of antisense transcripts induced by abiotic stresses (Matsui et al. 2008).

In the post-genomic era, RNA-Seq provides a global transcriptome profile, which could cover lncRNAs, coding genes and their alternatively spliced isoforms in stress response, and aids plant biologists to expand new insights into molecular mechanisms and responses to biotic and abiotic stress events. Several data portals contain a vast amount of plant RNA-Seq data, such as the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA). However, these data portals mainly serve as raw biological data archives. Large-scale stress-specific RNA-Seq database that can provide comprehensively visualized transcriptome expression profiles and statistical analysis for differential expression has been listed in Table 3.2. Some of these databases are as follows:

  • Plant Stress RNA-Seq Nexus (PSRN): It is a comprehensive database which includes 12 plant species, 26 plant stress RNA-Seq datasets and 937 samples. PSRN is an open resource for intuitive data exploration, providing expression profiles of coding-transcript/lncRNA and identifying which transcripts are differentially expressed between different stress-specific subsets, in order to support researchers generating new biological insights and hypotheses in molecular breeding or evolution. PSRN was developed with the goal of collecting, processing, analysing and visualizing publicly available plant RNA-Seq data (Li et al. 2018).

  • PlantExpress: It is a web database as a platform for gene expression network (GEN) analysis with the public microarray data of rice and Arabidopsis. PlantExpress has two functional modes: single-species mode is specialized for GEN analysis within one of the species, while the cross-species mode is optimized for comparative GEN analysis between the species. It stores data obtained from three microarrays, namely, the Affymetrix Rice Genome Array, the Agilent Rice Gene Expression 4x44K Microarray and the Affymetrix Arabidopsis ATH1 Genome Array, with respective totals of 2,678, 1,206 and 10,940 samples. PlantExpress will facilitate understanding of the biological functions of plant genes (Kudo et al. 2017).

  • RiceArrayNet (RAN) : It provides information on co-expression between genes in terms of correlation coefficients (r values). A correlation pattern between Os01g0968800, a drought-responsive element-binding transcription factor; Os02g0790500, a trehalose-6-phosphate synthase; and Os06g0219500, a small heat shock factor, reflecting the fact that genes responding to the same biological stresses is regulated together (Lee et al. 2009).

  • Transcriptome Encyclopedia of Rice (TENOR) : It is a database that encompasses large-scale mRNA sequencing (mRNA-Seq) data obtained from rice under a wide variety of stress conditions. Since the elucidation of the ability of plants to adapt to various growing conditions is a key issue in plant sciences, it is of great interest to understand the regulatory networks of genes responsible for environmental changes. All the resources (novel genes identified from mRNA-Seq data, expression profiles, co-expressed genes and cis-regulatory elements) are available in TENOR (Kawahara et al. 2016).

Table 3.2 Transcriptomic repositories and stress-related databases

3.2.3 Platforms and Resources in Proteomic of Plants Under Abiotic Stress/Plant Proteomics-Related Computational Tools and Databases

‘Proteome’ referred to the total expressed protein under certain circumstances in a given organism, organ, cell, tissue or microorganism population, and it comprises all the techniques used in profiling the expressed proteins in a specific context (Tyers and Mann 2003). Similar to the transcriptome, it is an informative approach used to reveal invaluable information when studying plant stress response and tolerance, either in a whole genome or sample scale (Nakagami et al. 2012). It is used for profiling all the expressed proteins under multiple stress conditions and cross-comparing these different sets to identify the proteins which are specifically involved in stress tolerance (Yan et al. 2014). This is an evolving technology for the qualitative large-scale identification and quantification of all protein types in a cell or tissue, analysis of post-translational modifications and association with other proteins, and characterization of protein activities and structures (Jorrín-Novo et al. 2009).

Proteomics is associated with two types of studies: proteome characterization (identification of all the proteins expressed) and differential proteomics (comparative proteome analysis of control and stressed plants). The proteomic approach has been largely adopted to explore the protein profiles in plants in response to abiotic stress that might lead to the development of new strategies for improving stress tolerance (Helmy et al. 2011). Several types of proteomes can be measured, but whole proteome and the phosphoproteome are the most common proteomes quantified in plant stress tolerance (Helmy et al. 2011, 2012a, b). The main focus of quantitative proteomics is to identify the proteins that are differentially expressed under certain stress response condition (Liu et al. 2015), while phosphoproteomics is closely associated with the identification of proteins activated and functioning in response to particular stress (Zhang et al. 2014). Both whole proteomics and phosphoproteomics can be combined in one comprehensive study to provide a better understanding of the stress (Hopff et al. 2013). The main goal of functional proteomics is the high-throughput identification of all proteins that appeared in cells and/or tissues, but recent rapid technical advances in proteomics have enabled us to progress to the second generation of functional proteomics, including quantitative proteomics, subcellular proteomics and various modifications and protein-protein interactions (Jorrín-Novo et al. 2009).

Two main techniques that are mostly used for quantitative and/or qualitative profiling are protein electrophoresis and protein identification with mass spectrometry. The technology of choice for proteomics is mass spectrometry (MS) including several approaches such as liquid chromatography-mass spectrometry (LC-MS/MS), ion trap-mass spectrometry (IT-MS) and matrix-assisted laser desorption/ionization-mass spectrometry (MALDI-MS) (Helmy et al. 2011, 2012a). These technologies are basically used in measuring the mass and charge of small protein fragments (or ‘peptides’) that result from protein enzymatic digestion (Helmy et al. 2011; Nakagami et al. 2012). Furthermore, several proteomics labs use protein electrophoresis technologies such as two-dimensional electrophoresis and difference gel electrophoresis (DIGE) in plant proteomics (Duque et al. 2013).

As genome sequencing projects for several organisms have been completed, proteome analysis, which is the detailed investigation of the functions, functional networks and 3D structures of proteins, has gained accumulative consideration. Large-scale proteome datasets available serve as an imperative resource for a better understanding of protein functions in cellular systems, which are controlled by the dynamic properties of proteins (Table 3.3). These properties reflect cell and organ states in terms of growth, development and response to environmental changes. Functional and experimental validation of proteins associated with biotic and abiotic stresses has been employed as the sole criterion for inclusion in the database (Singh et al. 2015). Due to the challenges faced in text/data mining, there is a large gap between the data available to researchers and the hundreds of published plant stress proteomics articles. There are a large number of stress-related databases for proteins (Table 3.3):

  • Plant Stress Proteome Database (PlantPReS ; www.proteome.ir ): It is an open online proteomic database, which currently comprises >35,086 entries from 577 manually curated articles and contains >10,600 unique stress-responsive proteins (Mousavi et al. 2016).

  • Plant Stress Protein Database (PSPDB) : It is one of the largest repositories and a web-accessible resource that covers 2064 manually curated plant stress proteins from a wide array of 134 plant species with 30 different types of biotic and abiotic stresses. Functional and experimental validation of proteins associated with biotic and abiotic stresses has been employed as the sole criterion for inclusion in the database (Singh et al. 2015).

Table 3.3 Proteomic databases and resources

‘Proteogenomics’ is another comprehensive combo approach of large-scale proteomic data with genomic and/or transcriptomics data to elucidate various innovative regulatory mechanisms (Helmy et al. 2012a). The proteomics data generated by means of MS-based proteomics (high throughput and accuracy) provides a rich source of translation-level information about the expressed proteins that can be used as a source of large-scale experimental evidence for several predictions (Helmy et al. 2012a, b). In a proteogenomics study, the naturally expressed proteins are identified using MS-based proteogenomics followed by mapping them back to the genomic or transcriptomic data (Helmy et al. 2012a). This field has facilitated in elevating our understanding of the biology of plants in general as well as plant stress research in particular. For instance, a large-scale proteogenomics study of Arabidopsis thaliana identified 57 new genes and corrected the annotations of hundreds of its genes using intensive sampling from the Arabidopsis organs under several conditions (Baerenfaller et al. 2008). Another study reported corrections and new identifications in about 13% of the annotated genes in Arabidopsis (Castellana et al. 2008). It also gives information on the investigation of the host-pathogen relationship (Delmotte et al. 2009), identifying novel effectors in fungal diseases (Cooke et al. 2014), as well as shedding light on the mechanisms of environmental adaptation.

3.2.4 Platforms and Resources in Metabolomics of Plants Under Abiotic Stress/Plant Metabolomics-Related Computational Tools and Databases

The metabolome is the complete pool of metabolites in a cell at any given time and metabolomics refers to techniques and methods used to study the metabolome (Duque et al. 2013). Plants are able to synthesize a diverse group of chemical and biological compounds with different biological activity that is crucial for regulating the response to different types of biotic and abiotic stress (Bino et al. 2004). Therefore, identifying the metabolites produced by the plant under each stress condition by metabolomics plays a significant role to gather information not only about the phenotype but changes in it induced by stress, thereby bridging the gap between phenotype and genotype (Badjakov et al. 2012). Metabolomics may prove to be particularly important in plants due to its ability to elucidate plant cellular systems and permits engineering molecular breeding to improve the growth and productivity of plants in stress tolerance (Fernie and Schauer 2009). Metabolomic approaches allow us to conduct parallel assessments of multiple metabolites, and it is notable that the plant metabolome represents an enormous chemical diversity due to the complex set of metabolites produced in each plant species (Bino et al. 2004). A strong connection between stress metabolites and a particular protein indicates the role of this gene in the stress response process (Urano et al. 2010; Duque et al. 2013; Jogaiah et al. 2013). Metabolic profiling of plants involves a combo of several analytical, separation techniques and with other omics analysis (e.g. transcriptomics or proteomics) to investigate the correlation between metabolite levels and the expression level of genes/proteins (Jogaiah et al. 2013). Thus, metabolomics provides a better understanding of the stress response and tolerance process in model plants such as Arabidopsis (Cook et al. 2004) as well as in crops like a common bean (Phaseolus vulgaris) (Broughton et al. 2003), and other food crops (Duque et al. 2013).

This is one of the most rapidly developing technologies, and many notable technological advances have recently been made in instrumentation related to metabolomics; some of them are as follows:

  • Major approaches that are used in plant metabolomics research include metabolic fingerprinting which involves separation of metabolites based on the physical and chemical properties using various analytical tools and technologies (Jogaiah et al. 2013).

  • Metabolite profiling which includes the study of the alterations in metabolite pool that are induced by stress and finally target analyses.

  • Capillary electrophoresis-liquid chromatography-mass spectrometry (CE-MS) is considered the most advanced metabolomics technology (Soga et al. 2002).

  • Analytical instruments and separation technologies are employed in metabolomics such as gas chromatography (GC), mass spectrometry (MS) and nuclear magnetic resonance (NMR) (Duque et al. 2013).

  • Metabolomics experiment (MIAMET) gives reporting requirements with the aim of standardizing experiment descriptions, particularly within publications (Ernst et al. 2014).

  • Standard Metabolic Reporting Structures (SMRS) working group has developed standards for describing the biological sample origin, analytical technologies and methods used in a metabolite profiling experiment (Chen et al. 2015).

  • ArMet (architecture for metabolomics) proposal gives a description of plant metabolomics experiments and their results along with a database schema (Castillo-Peinado and de Castro 2016).

  • Metabolic flux analysis measures the steady-state flow between metabolites. FluxAnalyzer is a package for MATLAB that integrates pathway and flux analysis for metabolic networks (Rocha et al. 2008).

A number of studies of metabolic profiling in plant species have been performed that have resulted in the publication of related databases (Table 3.4). For instance, metabolic pathways that act in response to environmental stresses in plants were investigated by metabolome analysis using various types of MS coupled with microarray analysis of overexpressors of genes encoding two TFs, DREB1A/CBF3 and DREB2A (Maruyama et al. 2009). Metabolomic profiling was also used to investigate chemical phenotypic changes between wild-type Arabidopsis and a knockout mutant of the NCED3 gene under dehydration stress conditions (Urano et al. 2010). These databases are vast information resources and repositories of large-scale datasets and also serve as tools for further integration of metabolic profiles containing comprehensive data acquired from other omics research (Akiyama et al. 2008). One of the huge databases for metabolites is PlantMetabolomics.org (PM), which is a web portal and database for exploring, visualizing and downloading plant metabolomics data. Widespread public access to well-annotated metabolomics datasets (Table 3.4) is essential for establishing metabolomics as a functional genomics tool. PM can be used as a platform for deriving hypotheses by enabling metabolomic comparisons between genetically unique Arabidopsis (Arabidopsis thaliana) populations subjected to different environmental conditions (Bais et al. 2015).

Table 3.4 Metabolomic databases and resources

3.2.5 Micro RNAs: Attributes in Plant Abiotic Stress Responses and Bioinformatics Approaches on MicroRNA

MicroRNA (miR) represents a major subfamily of endogenously transcribed sequences (21–24 bp) and has been acknowledged as a major regulatory class that inhibits gene expression in a sequence-dependent manner (Eldem et al. 2013). miRs are small regulators of gene expression in the numerous developmental and signalling pathways and are emerging as important post-transcriptional regulators that may regulate key plant genes responsible for stress tolerance. Plants combat environmental stresses by activating several gene regulatory pathways and studies with different model plants have revealed the role of these miRNAs in response to abiotic stress (Zhou et al. 2010). Plant exposed to abiotic stress causes over- or underexpression of certain miRNA and might even lead to the synthesis of new miRNAs to withstand stress (Khraiwesh et al. 2012). Several studies identified species- and clades-specific miRNA families associated with plant stress-regulated genes (Zhang et al. 2013). The functions of stress-responsive miRNAs can only be studied by understanding the regulatory interaction within the network (Jeong and Green 2013). Identification of a huge number of stress-responsive miRNAs might be helpful in developing new strategies to withstand stress, thereby improving the stress tolerance in plant. With the drastic improvement in genomic tools and methods, novel miRNAs in various plant species involved in abiotic stress response are increasing and are providing us with a better understanding of miRNAs-mediated gene regulation (Wang et al. 2014).

Sequence-based profiling along with computational analysis has played a key role in the identification of stress-responsive miRs. sRNA blot and RT-PCR analysis have played an equally important part in systematically confirming the profiling data (Jagadeeswaran et al. 2010). This has also enabled quantification of their effect on the genetic networks, such that many of the stress-regulated miRs have emerged as potential candidates for improving plant performance under stress. The development and integration of plant computational biology tools and approaches have added new functionalities and perspectives in the miR biology to make them relevant for genetic engineering programmes for enhancing abiotic stress tolerance. So far, three major strategies have been employed for the identification and expression profiling of stress-induced miRs:

  • The first approach involves the classical experimental route that included direct cloning, genetic screening or expression profiling.

  • The second method involved computational predictions from genomic or EST loci.

  • The latest one employed a combo of both as it was based on the prediction of miRs from high-throughput sequencing (HTS) data.

Each of these was followed by experimental validations by northern analysis, PCRs or microarrays. In recent years, high-throughput sequencing and screening protocols have caused an exponential increase in a number of miRs, identified and functionally annotated from various plant species (Jagadeeswaran et al. 2010). The first biological database generated for miR was miRBase, which acts as an archive of miR sequences and annotations (Griffiths-Jones et al. 2008). With the future advancement of genomic tools and methods to identify novel miRNAs in various plant species, the number of miRNAs involved in abiotic stress response is increasing, thus providing us with a better understanding of miRNAs-mediated gene regulation during various abiotic stresses (Table 3.5). Some of the databases comprising miRNAs-related information are:

  • PASmiR : This database is a complete repository for miRNA regulatory mechanisms involved in plant response to abiotic stresses for the plant stress physiology community. It is a literature-curated and web-accessible database and was developed to provide detailed, searchable descriptions of miRNA molecular regulation in different plant abiotic stresses. It currently includes data from ~200 published studies, representing 1038 regulatory relationships between 682 miRNAs and 35 abiotic stresses in 33 plant species (Zhang et al. 2013).

  • PmiRExAt : It is a new online database resource that caters plant miRNA expression atlas. The web-based repository comprises of miRNA expression profile and query tool for 1859 wheat, 2330 rice and 283 maize miRNA. The database interface offers open and easy access to miRNA expression profile and helps in identifying tissue preferential, differential and constitutively expressing miRNAs (Table 3.5).

  • Plant MicroRNA Database (PMRD) : miRNA expression profiles are provided in this database, including rice oxidative stress-related microarray data and the published microarray data for poplar, Arabidopsis, tomato, maize and rice. The plant miRNA database integrates available plant miRNA data deposited in public databases, gathered from the recent literature, and data generated in-house (Zhang et al. 2010).

  • WMP: It is a novel resource that provides data related to the expression of abiotic stress-responsive miRNAs in wheat. This database allows the query of small RNA libraries, including in silico predicted wheat miRNA sequences and the expression profiles of small RNAs identified from those libraries (Remita et al. 2016).

Table 3.5 Major microRNA repositories and stress databases

3.3 Role of Bioinformatics in Plant Disease Management

Omics studies focused on whole-genome analysis have unlocked a new era for biology in general and for agriculture in particular. Combination of bioinformatics and functional genomics globally has paved way towards a better understanding of plant-pathogen biological interaction which eventually leads to breaking thoughts in the promotion of plant resistance to pests (Koltai and Volpin 2003). Bioinformatics has played a great role in plant disease management by understanding the molecular basis of the host-pathogen interaction (Koltai and Volpin 2003). Modern genomics tools, including applications of bioinformatics and functional genomics, allow scientists to interpret DNA sequence data and test hypotheses on a larger scale than previously possible (Anonymous 2005). From past few years, numerous components of the plant signalling system have also been identified that function downstream of the detection molecules such as the pathogen proteins that are used to suppress host defences and drive the infection process (so-called effector proteins) by using molecular biological technologies and genetics approaches (Anonymous 2005). Disease resistance is only one of the several traits under selection in a breeding programme. Thus, bioinformatics has to play an increasing role in integrating phenotypic and pedigree information for agronomic as well as resistance traits (Vassilev et al. 2006). Improved algorithms and increased computing power have made it possible to improve selection strategies as well as to model the epidemiology of pathogens (Michelmore 2003). Some of the key roles of bioinformatics for plant improvements has been enlisted by Vassilev et al. (2005): submitting all sequence data information generated from experimentation into the public domain, through repositories; providing rational annotation of genes, proteins and phenotypes; elaborating relationships both within the plants’ data and between plants and other organisms; providing data including information on mutations, markers, maps, functional discoveries; and others.

From past few years, there have been many technological advances in the understanding of plant-pathogen interaction. Omics techniques (genomics, proteomics and transcriptomics) have provided a great opportunity to explore plant-pathogen interactions from a system’s perspective and studies on protein-protein interactions (PPIs) between plants and pathogens (Delaunois et al. 2014). Identification of the molecular components as well as the corresponding pathways has provided a relatively clear understanding of the plant immune system. In particular, the study of plant-pathogen interactions has also been stimulated by the emergence of various omics techniques, such as genomics, proteomics and transcriptomics (Schulze et al. 2015). With the availability of massive amounts of data generated from high-throughput omics techniques, network interactions have become a powerful approach to further decipher the molecular mechanisms of plant-pathogen interactions through network biology:

  • Genomics is particularly important, and with the rapid development of next-generation sequencing (NGS) technique, numerous plant and pathogen genomes have been fully sequenced.

  • Proteomics is a key technique for the analysis of the proteins involved in plant-pathogen interactions (Delaunois et al. 2014).

  • DNA microarray and RNA sequencing are two key transcriptomics techniques for acquiring the expression profile of genes on a large scale. Transcriptomics is also important to investigate plant-pathogen interactions and has been employed to learn how plants respond to the pathogen invasion and how pathogens counter the plant defence at the transcript level (Schulze et al. 2015).

Genomic approaches always have a significant impact on efforts to improve plant diseases by increasing the definition of and access to gene pools available for crop improvement (Vassilev et al. 2005). Such an approach to identify key genes and understand their function will result in a quantum leap in plant improvement. Moreover, the ability to examine gene expression will allow us to understand how plants respond to and interact with the physical environment and management practices (Vassilev et al. 2006). This approach will involve the detailed characterization of the many genes that confer resistance, as well as technologies for the precise manipulation and deployment of resistance genes. Plant-pathogen interactions are sophisticated and dynamic in the continually evolving competition between pathogens and plants. Thus genomic studies on pathogens are providing an understanding of the molecular basis of specificity and the opportunity to select targets for more durable resistance (Michelmore 2003). This understanding is fundamental to allow efficient exploitation of plants as biological resources in the development of new cultivars with improved quality and reduced economic, pathogen and abiotic stress resistance and is also vital for the development of new plant diagnostic tools (Vassilev et al. 2006). When plants respond to biotic stress, a series of biological processes rather than a single gene or protein will be changed. Therefore, it is necessary to explore plant-pathogen interactions from a systems perspective (e.g. network level (Mine et al. 2014)). Bioinformatics thus plays several roles in breeding for disease resistance and is important for acquiring and organizing large amounts of information. Some of the databases/repositories (Table 3.6) of plant-pathogen interactions are:

  • PHI-base: A new database for pathogen-host interactions. It is designed for hosting any type of pathogen-host interaction, and its focus is on genes with functions that have been experimentally verified. These genes are compiled and curated in a way that can be used to bridge the genotype-phenotype gap underlying the interactions between hosts and pathogens (Winnenburg et al. 2006). The mission of PHI-base is to provide expertly curated molecular and biological information on genes proven to affect the outcome of pathogen-host interactions.

  • PathoPlant: A database on plant-pathogen interactions . This is a relational database to display relevant components and reactions involved in signal transduction related to plant-pathogen interactions. On the organism level, the tables ‘plant’, ‘pathogen’ and ‘interaction’ are used to describe incompatible interactions between plants and pathogens or diseases (Bülow et al. 2004).

Table 3.6 Abiotic stress databases

3.4 Conclusion and Future Prospects

Bioinformatics is an exclusive approach capable of exploiting and sharing a large amount of omics data. This approach has given a more holistic view of the molecular response in plants when exposed to biotic and abiotic stress, and the integration of various omics studies has revealed a new zone of interactions and regulation (Fig. 3.2). This system biology approach has enabled the identification, characterization and functional analysis of plant genes that determine plant’s response to various biotic and abiotic factors and understanding of the plant stress interaction. However, so many efforts are still required for detailed analysis of the omic modulation induced by abiotic stress and its interacting partners. This requires the development of reliable and rigorous techniques for firm characterization of the spatiotemporal regulation of omics under stress conditions. The three main domains that must be addressed to take full advantage of systems biology are the development of omics technology, integration of data in a usable format and analysis of data within the domain of bioinformatics. Thus, the perspective of computational/system biology needs to be tapped for performing an extensive analysis among agriculturally important crops for improving crop tolerance to environmental stress. The current surge of affordable omics data encourages researchers to create improved, more integrated and easily accessible plant stress pathway databases. Despite the drawbacks, there is no doubt that bioinformatics is a field that holds great potential for transforming biological research in the coming decades. The expansion and integration of bioinformatics tools and approaches will certainly add new functionalities and perspectives in the stress biology to make them applicable for genetic engineering programmes for enhancing stress tolerance.

Fig. 3.2
figure 2

Pathway depicting molecular effects of abiotic and biotic stress at genomic, transcriptomic, metabolomic and microRNA levels inside the plant cell. Both biotic and abiotic stresses have to be first sensed by the plant cell, and then the information is transduced to the appropriate downstream-located pathway(s). Sensors as well as signal transducers might be shared by both types of stressors. After the signal is perceived by the sensors at the cell wall, it transduces the signal towards the nucleus, where modifications occur at genome level, resulting in activation of stress-responsive genes and certain transcription factors (genomics). This activation can occur at the transcript level (transcriptomics), wherein a change in RNA expression levels, alternatively spliced forms of RNA and antisense transcripts occur. Few regulatory small RNAs (microRNA) are also synthesized against a specific type of stress conditions (miRNAs). Alterations, which have been initiated at the cell wall, propagate to the cytosol, in form of proteins which can be identified by quantitative proteomics, protein-protein interaction and phosphor-proteomics (proteomics). After stress signalling, the defence system comes into rescue where large amounts of secondary metabolites are synthesized (metabolomics); these metabolites can be analysed by metabolomic profiling and change in the type of metabolites synthesized