Keywords

5.1 Introduction

The overwhelming trend in omics studies relies heavily on bioinformatics to store, mine, process, analyse, interpret, and curate biological big data. Bioinformatics includes computer science, statistics, and mathematical methods, with computer programming for the analysis of various sequence data in molecular biology. The term bioinformatics was introduced in 1970 for the study of biosystems information processes, which has evolved into an interdisciplinary field largely dealing with computational methods for comparative genomic data analysis since the late 1980s [1]. In general, bioinformatics refers to biological studies aided by computer programming apart from data analysis pipelines, especially in the field of genomics such as that of illustrated in previous chapters.

5.2 Different Aspects of Bioinformatics

Bioinformatics covers many aspects of fundamental and applied research, from hypothesis-driven to data-driven (Fig. 5.1). The hypothesis-driven bottom-up approach is largely knowledge based and depends strongly on modelling and computational simulation for understanding of biological processes. For example, mathematical modelling of enzyme kinetics in a reaction pathway or simulation of flux distribution in a genome-scale model can help identify rate-limiting enzyme/metabolite [2, 3].

Fig. 5.1
figure 1

An overview of different aspects of bioinformatics, from knowledge-based hypothesis-driven bottom-up approach to top-down statistics and data-driven

On the other hand, data-driven bioinformatics evolved in the mid-1990s as demanded by the Human Genome Project , which led to the explosion of high-throughput omics data. The advancement in sequencing technology dominates the development of bioinformatics, for the acquisition, analysis, and management of tremendous volume of biological data. This is paralleled by the advancement of information technologies, algorithms, and computational and statistical methods. Computationally intensive techniques, such as data mining [4], machine learning, visualisation [5], and pattern recognition, are indispensable with continuous improvement of bioinformatics software and tools for efficient access, analysis, and curation of heterogeneous datasets. Bioinformatics even encompasses solving problems arising from database management. Common sequence analyses include sequence alignment, genome assembly, gene prediction, and functional annotation, as compared to gene and protein expression studies which are based on abundance analysis, in which the latter relies on mass spectrometry for protein fragment identification. Image analysis involves important automated techniques for the microscopic tracing of subcellular molecular movement, as well as phenotypic tracking of organ growth in real time. Protein structure prediction is a field of structural bioinformatics important for the inference of structure-function relationship to understand the molecular mechanism or protein-protein/metabolite interactions, which can be applied for drug design.

Nowadays, the field of bioinformatics is largely data-driven. Computational modelling and simulation in network analysis have become increasingly important for the integration of multi-omics in the context of systems biology. Table 5.1 summarises the different aspects of bioinformatics.

Table 5.1 Different aspects of bioinformatics with examples

5.3 Bioinformatics for Systems Biology

Essentially, systems biology constitutes a crossover between knowledge-based modelling and omics data-driven approaches . Bioinformatics is a broad multidisciplinary field which is indispensable for systems biology that deals with omics data, mathematical modelling, and network analysis . This is because the dynamic behaviours of biological systems are beyond human intuitive grasp due to the sheer number of components (biomolecules, cells, drugs, and each other) which interact. System-level understanding is only possible through computational models and simulations. Metabolic, gene regulatory, and protein-protein interaction networks are the core of common systems studies, with many examples in E. coli [6, 7] and yeast [8,9,10]. Detailed descriptions and discussion are beyond the scope of this chapter. Readers can refer to recent literature [11,12,13] to understand further the bioinformatics tools available for systems biology.

5.4 Applications of Bioinformatics

In this section, examples of bioinformatics applications on integrative omics are described for molecular medicine, systems metabolic engineering, and plant genome-scale modelling.

5.4.1 Integrative Omics in Network Pharmacology

Network pharmacology is a new paradigm in postgenomic era of molecular medicine for drug design or discovery [14]. This is based on the realm that one drug often targets many proteins and one protein can be targeted by many drugs. Hence, a combination of different drugs could be beneficial synergistically in treating complex diseases. This also led to the current trend of drug repositioning/repurposing, whereby known drugs/compounds are applied for treatment of new diseases.

Network pharmacology relies on a multi-omics systems biology approach, which analyses various omics data together using bioinformatics tools [5, 15] to develop disease networks, drug-target networks, or drug-disease networks [16, 17]. One good example is the use of this approach to discover multicomponent drugs from traditional Chinese medicine (TCM) for multi-target therapy [18,19,20]. To achieve this, TCM pattern in a disease can be identified using molecular network biomarkers and integrate with pharmacological network of herbal formulas (Fig. 5.2).

Fig. 5.2
figure 2

A conceptual framework of network pharmacology for multiple compound drug discovery from TCM

The construction of disease-TCM pattern molecular network depends on multi-omics data analysis of categorised patients, according to TCM pattern based on expert consensus or literature analysis. Text mining of SinoMed database helps identify TCM herbal combinations for the treatment of disease with specific TCM patterns. Targeted proteins by the active compounds in the TCM herbal formula obtained from PubChem are used to construct drug-target networks. Potential multiple-compound drug candidates can then be shortlisted from well-matched compound combinations between disease-TCM pattern molecular network and pharmacological network of herbal formulas. This is not possible through reductionist approach in the past without systems approach of network analysis which requires computing resources. A good example of TCM drug repositioning is reported recently on the use of systems pharmacology approach in the discovery of Liuweiwuling therapeutic use for liver failure [21].

5.4.2 Integrative Omics for Systems Metabolic Engineering

The emergence of ethnomedicine as alternatives of disease treatment has increasethe demands for natural products and bioactive compounds as drugs [22], For example, an antimalarial drug artemisinin from a TCM Artemisia annua has driven engineered production of its precursor artemisinic acid in yeast [23].

There is a growing trend of employing synthetic biology approach for genetically engineering metabolic pathways in microbial system to produce natural and synthetic compounds. For this purpose, bioinformatics plays a key role in the selection, synthesis, assembly, and optimisation of the parts (enzymes and regulatory elements), devices (pathways), and systems [24]. Furthermore, systems metabolic engineering often employs genome-scale models for flux analysis of the metabolic reconstruction [25]. Hence, fluxomics play important role for optimising flux distribution towards target compound production. Genome-scale metabolic reconstructions allow the modelling on the effects of gene knockouts. However, this is largely dominated by microbes such as E. coli and S. cerevisiae. Much of the curated/predicted metabolic reconstructions can be found at MetaCyc and BioCyc databases [26]; see http://systemsbiology.ucsd.edu/InSilicoOrganisms/OtherOrganisms for an updated list. This systems approach has accelerated the development of metabolic engineering, such as that of the use of E. coli for the production of terpenoids [27] and bioethanol [28].

Recently, multi-omics has become a common approach for comprehensive understanding of different microbial strains by compensating each omics’ limitations as illustrated in Fig. 5.3. The ultimate aim is to improve titre, yield, and productivity of engineered microbial cell factories. For that purpose, multi-omics systems biology contributes in the understanding of cellular metabolic status, genome-wide identification of knockout or overexpression targets, pathway prediction, and even enzyme design through computational structural prediction. Further descriptions and discussion on systems metabolic engineering with the integration of systems and synthetic biology with evolutionary engineering can refer to the next chapter and a recent review [29] with references therein. Fondi and Liò (2015) provide a good review for tools used in integrating multi-omics for metabolic modelling pipelines [30].

Fig. 5.3
figure 3

Overall framework of a metabolic modelling/reconstruction pipeline with multi-omics integration, computational simulation, biological validation, and iterative model refinement. Pre-existing genome annotation in public repositories provides information on the presence/absence of metabolic pathways and overall metabolic capabilities of a microbe. For a novel microbe, a metabolic model can be generated from the closest related species with publicly available data based on taxonomic information or from de novo genome sequencing and assembly. Next, different layers of datasets resulting from the application of different omics technologies can be integrated for computational simulation of phenotype prediction. Multi-omics integration provides a more comprehensive perspective on the microbe under study, statistically grounded inferences, novel questions to be addressed, or new target genes to be manipulated, possibly through reiterating the pipeline based on experimental data for further refinement of model.

5.4.3 Integrative Omics for Genome-Scale Modelling in Plants

As mentioned above, genome-scale metabolic model (GEM) is an in silico metabolic flux model constructed from genome annotation-derived metabolic networks with stoichiometry of all known metabolic reactions. GEM is often built by algorithms with constraint-based flux (reaction rate) analysis within defined system boundaries to bridge between modelled metabolic network structure and observed metabolic processes. Constraints are important to limit possible flux values (solution space) in the studied system, which include mass balancing, physico-thermo-chemical, and actual flux measurements [31]. Flux balance analysis (FBA) is the most popular mathematical method for the phenotypic solution space exploration through linear programming.

GEM allows the assessment of the essentiality of metabolic steps. This enables the prediction of gene targets for knockout or overexpression and is useful for flux optimisation and designing rational metabolic engineering strategies, especially for microbial systems. It is more challenging to construct GEM for higher organisms, especially plants due to complexity of plant cells with photosynthesis/photorespiration, compartmentation, tissue differentiation, diverse metabolic processes, and responses to endogenous (phytohormones) and environmental stimuli [31]. The first ever plant GEM was reported in 2009 for Arabidopsis thaliana cell suspension cultures [32]. Other selected examples and their significance are provided in Table 5.2. Previously neglected secondary metabolism is also gaining momentum with the latest advancement of omics approaches in filling in the gaps of metabolomics and proteomics data, especially in medicinal plants producing important bioactive compounds [33].

Table 5.2 Selected examples of plant GEMs

Despite that GEM is now possible in plants, challenges remain on filling in missing metabolic information with the integration of regulatory and signalling components in dynamic simulation. In this respect, multi-condition, single-platform omics studies such as transcriptomics will be useful for mapping gene expression data onto GEM to generate condition-specific models for more realistic depiction of actual metabolic states. Similarly, quantitative proteomics can also be applied for modelling system-level metabolic changes following experimental perturbations , assuming that gene expression or protein abundance correlates with metabolic fluxes. Incorporating multi-conditions transcriptomics and proteomics data will enable condition-based simulation with the elements of gene/protein regulation in switching a pathway on/off. Lastly, metabolomics profiling under different conditions allows the comprehensive identification of metabolite compositional changes to narrow down target pathways for further fluxomics analysis (13C-based) under different experimental conditions. With multi-omics , multi-conditions data, a more realistic dynamic GEM can be simulated to predict outcomes for various scenarios. In plants, GEMs of different tissues, such as root to shoot, can be integrated for whole-plant simulation [46]. With the integration of regulation into GEMs, we can gain important insights of plant metabolic plasticity for rational metabolic engineering to improve plant biomass production through higher tolerance and resistance to biotic and abiotic stresses.

5.5 Case Study: Integrating Multi-Omics in Polygonum minus

Over the past 10 years, extensive studies using different omics approaches have been performed on aromatic herb Polygonum minus as described in previous chapters. Much is learnt about P. minus on the transcriptomes [47,48,49] and metabolomes [50,51,52] from different tissues, as well as molecular responses towards elicitors [53,54,55]. The integration between transcriptomics and metabolomics studies [56] allows the reconstruction of secondary metabolite biosynthetic pathways. This also helps in the elucidation of global gene reprogramming which resulted in the compositional changes of volatile organic compounds (VOCs) in response to elicitation or other environmental factors. Furthermore, the established transcriptome sequences provide a reference for the identification of proteins in shotgun proteomics through proteomics informed by transcriptomics (PIT) approach [57].

General research framework of integrating multi-omics results in P. minus is shown in Fig. 5.4. This is applicable for other plants/organisms without a reference genome, particularly tropical medicinal plants, which have scarce sequence information and limited knowledge on the production of bioactive compounds. By elucidating the genes and enzymes involved in pathways of secondary metabolite biosynthesis, metabolic engineering in microbial system becomes possible through synthetic biology approach (described in the next chapter). Hence, integrative omics through systems biology approach provides a fundamental blueprint to enable applied large-scale production of targeted compounds through microbial bioengineering.

Fig. 5.4
figure 4

Research framework for the integration of multi-omics studies in P. minus