Keywords

1 Background

With the rapid industrialization, thousands of chemical compounds are produced causing air and soil pollution. Out of which, some are very toxic in nature and remain in the environment possessing a major threat to life of living organisms. Hence, therefore it is important to look for the techniques that can be employed for either the removal of these contaminants or to convert them into nonhazardous products that are eco-friendly to the environment. This is achieved by the use of enzymatic capabilities of microorganisms that break down toxic chemical compounds into end products or metabolites, which are not toxic anymore, and this whole process is known as bioremediation. Thus, such degradation is carried out by particular microbes, and to know more about it, the knowledge about the properties of these toxic chemicals such as classification, identification, environmental properties, toxicity, and distribution can enhance the biodegradation process. This technique has potential to restore the contaminated environment effectively with low cost and labor. But the information for the factors that control the growth and metabolism is still not known completely, making the implementation of it bit restricted.

Bioinformatics, which has now become the essential part of every life science-related research, has given new direction in view of bioremediation technique also. With the development of software packages and tools with the help of computational biology, bioinformatics has revolutionized the integration of it with bioremediation. In last few decades, branches of bioinformatics like genomics, proteomic, transcriptomics, or metabolomics have given a lot of contribution in exploration of bioremediation process.

Hence, bioinformatics with its multidisciplinary approach has assisted in understanding the bioremediation by unveiling the pathways, chemistry of toxic chemicals that are undisclosed for making it a process for control of environmental contamination. The aim of this chapter is to provide a complete overview of the bioinformatic approaches and its applications present in relation to bioremediation.

2 Introduction

Bioremediation is the deliberate use of microorganisms, which act as biological catalysts for removing pollutants from the environment. In general, we can say that it is an environmental science approach where natural biological actions are used to remediate the polluted groundwater and contaminated soil. A variety of pollutants like xenobiotics, polycyclic aromatic hydrocarbons known as PAHs, and chlorinated and nitro-aromatic compounds are present, which can be cancer-causing and mutagenic to all the present life forms (Zhang and Bennett 2005; Samanta et al. 2002). With the use of these microbes for biodegradation, the natural environmental conditions can be maintained efficiently. So, the role of these microorganisms (bacteria, fungi, insects, worms, etc.) in bioremediation technique has proven to maintain our planet with its greenery.

The general microbial mode of action to perform bioremediation activity is done by metabolization of a compound to another metabolite, which is not harmful to the environment. The basic principle involved in biodegradation of pollutants is either biotic or abiotic conditions. It can be done by number of known processes, such as bioventing, biopiles, bioaugmentation, biostimulation, and bioattenuation. So, the bioremediation can be effective only where environmental conditions permit microbial growth and activity, and its application often involves the manipulation of environmental parameters to allow microbial growth and degradation to proceed at a faster rate (Kumar et al. 2011; Abatenh et al. 2017).

Being the natural process, it is cheap, harmless to the ecosystem, needs less labor requirement, eco-friendly, and sustainable (Dell Anno et al. 2012).

Thus, the use of bioremediation technique is an environmental-friendly approach for restoring and sustaining the contamination-free environment for future generations.

2.1 Introduction to Bioinformatics

Bioinformatics is the combination of biology and information technology. It involves the knowledge of both. The field of bioinformatics does the computer-based analysis of biological datasets followed by its interpretation. This is done by using statistical tools and algorithms.

In understanding the bioinformatics and its applications, it is important to know about the various approaches used for performing analysis. This includes genomics, proteomics, data mining, biological databases, phylogenetic analysis, and (trancriptomics, metabolomics) system biology. All of it together plays a significant role. Figure 27.1 below shows its various branches.

Fig. 27.1
figure 1

Branches of bioinformatics

2.2 Integrating Bioinformatics with Bioremediation

The role of microbes in soil and water-based biodegradation and cleaning of the environment has shown us the way to maintain and sustain a greener earth. The use of bioinformatic domain for the study of bioremediation has shown in the past the suggested promising results. With the help of bioinformatic-based applications only, it has been made possible to perform the in silico studies and analyzation of data. For uplifting the technique of bioremediation and the study of specific microbes at the molecular level including the gene-to-gene interactions, pre-requirement of conditions needed to be used for the changes at genetic level can be done only with bioinformatic strategies. Also, the bioremediation process can be enhanced using databases for gene identification and microbial degradation pathways of compounds (Ellis et al. 2001).

Thus, bioinformatics along with its branches is revolutionizing and will continue to do so in its future prospects. The pictorial representation above in Fig. 27.2 is depicting the use of bioinformatic approach for the improvement of bioremediation process.

Fig. 27.2
figure 2

Pictorial representation of integrated approach of advanced technologies in biodegradation of xenobiotic compounds. (Source: Mishra, S., Lin, Z., Pang, S., Zhang, W., Bhatt, P., & Chen, S. (2021). Recent Advanced Technologies for the Characterization of Xenobiotic-Degrading Microorganisms and Microbial Communities. Frontiers in bioengineering and biotechnology, 9, 632059. https://doi.org/10.3389/fbioe.2021.632059. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC-BY). The use, distribution, or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice)

3 Bioinformatics in Improving Bioremediation

Although microbes are known for their potential to perform biodegradation, still the process has its own limitation. And this is because of scarcity of data for factors, which control the growth and metabolism of microbes with bioremediation potential (Dua et al. 2002). Therefore, bioinformatics aids in using microarray data by enhancing the structural characterization of microbial proteins with contamination degradable capabilities (Singh 2006).

Hence, by understanding the microbial process at the molecular level with use of bioinformatic analyses we can learn about the following below mentioned aspects of bioremediation in more depth.

  1. 1.

    Prediction of Degradation Pathways

  2. 2.

    Omic-Based Approaches

  3. 3.

    Prediction of Toxicity of Chemicals

  4. 4.

    Databases

3.1 Prediction of Degradation Pathways

For the bioremediation process, a microbe undergoes enzymatic reactions to change the pollutant into a metabolite, which is not harmful. For all this, the study of enzymatic kinetic aspect is important. This includes the physical and chemical characteristics of the degradation pathway (Okoh 2006).

But for the prediction of products and pathways associated with microbial degradation by in silico methods, classification approach is required (Wicker et al. 2010). This classification can be done as knowledge-based and machine learning-based approach. Both of which have some limitations and strengths. Taking into account the machine learning approach:

Firstly, this approach does prediction for a biotransformation when it has a quite a general class (Gomez et al. 2007) or whether it is the substrate of some broad reaction class, e.g., oxidoreductase catalyzed reactions (Mu et al. 2006).

Next is knowledge-based approach:

  • META: META is a knowledge-based expert system that simulates the biotransformation of xenobiotics. It operates with the help of a dictionary (knowledge base) to seek target fragments in a compound and transform them to products (Klopman et al. 1997).

  • METEOR: It is a knowledge-based expert system for prediction of metabolism (Marchant et al. 2008).

  • CATALOGIC: It is a platform for models targeting environmental fate of chemicals. It explicitly aims at probability estimates (Dimitrov et al. 2010).

  • UM-PPS: It stands for the University of Minnesota Pathway Prediction System (UM-PPS) and comes under the UM-BBD (University of Minnesota Biocatalysis/Biodegradation Database). It is available at http://umbbd.msi.umn.edu/predict/. Presently, it contains information on almost 1200 compounds, over 800 enzymes, almost 1300 reactions, and almost 500 microorganism entries (Gao et al. 2011).

The UM-PPS predicts plausible biodegradation pathways for organic compounds on the basis of sets of biotransformation rules derived from the UM-BBD database or from the scientific literature (Fenner et al. 2008). The user can predict both aerobic and anaerobic degradation pathways of chemicals and can select whether they will view all or only the more likely aerobic transformations. Users can also obtain the most accurate prediction for those compounds similar to compounds with biodegradation pathways that have been reported in the scientific literature (Gao et al. 2011; Arora and Bae 2014).

Usage

  1. 1.

    Prediction can be made both for aerobic and anaerobic degradation pathways of chemicals, and it can be selected that whether the user will view all or only the more likely aerobic transformations.

  2. 2.

    Also, we can obtain the most accurate prediction for those compounds similar to those biodegradation pathways that have been reported in the scientific literature.

  3. 3.

    For the prediction, users may enter a compound into the system by either drawing the structure and generating SMILES or entering SMILES directly.

  4. 4.

    For example, the degradation pathways of 4-nitrophenol have been thoroughly investigated, while those of 2-fluro-4-nitrophenol and 2-bromo-4-nitrophenol have not. However, the structures of 2-fluro-4-nitrophenol and 2-bromo-4-nitrophenol are similar to 4-nitophenol. Therefore, PPS can provide very accurate predictions for degradation of 2-flouro-4-nitrophenol and 2-bromo-4-nitrophenol (Arora and Bae 2014).

3.1.1 PathPred

It is a knowledge-based prediction system, which uses data derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) in the form of KEGG REACTION and KEGG RPAIR database. The KEGG RPAIR database has collection of biochemical structure transformation patterns, called RDM patterns, and chemical structure alignments of substrate–product pairs (reactant pairs) in all known enzyme-catalyzed reactions taken from the enzyme nomenclature and the KEGG PATHWAY database (Moriya et al. 2010).

It is a web-based server available at http://www.genome.jp/tools/pathpred/. It predicts plausible pathways of multi-step reactions starting from a query compound, based on the local RDM pattern match and the global chemical structure alignment against the reactant pair library. The server provides transformed compounds and reference transformation patterns in each predicted reaction and displays all predicted multi-step reaction pathways in a tree-shaped graph (Moriya et al. 2010). It basically aims at predicting pathway for microbial biodegradation of environmental compounds and biosynthesis of plant secondary metabolites.

3.1.1.1 Usage

The PathPred server can be used for predicting microbial biodegradation pathways of xenobiotics in bacteria and biosynthesis pathways of secondary metabolites in plants. This can be done by

  1. 1.

    Selecting Reference Pathway—the user is requested to choose the reference pathway for either of biosynthesis and biodegradation, which determines the subset of RDM patterns to be utilized.

  2. 2.

    Query Format—the query can be inputted as a query compound in the MDL mol file format, in the SMILES representation, or by the KEGG compound/drug identifier (C/D number). This compound, termed initial compound, corresponds to the compound to be degraded or the compound to be synthesized.

  3. 3.

    Output—The output of the PathPred server shows the prediction results as tree-shaped graph. For example, the biodegradation prediction of glycolate (C00160) from 1,2,3,4-tetrachlorobenzene. The output tree graph predicts the other possible pathways including biodegradations through known compounds such as 3,4,6-trichlorocatechol (C12831), 6-chlorobenzene-1,2,4-triol(C06328), and 1,2,4-trichlorobenzene (C06594) (Fig. 27.3).

Fig. 27.3
figure 3

Example of the predicted pathway tree of tetrachlorobenzene biodegradation (a) and the detail of the top green pathway from the query compound (query) to the final compound (C00160) (b). Structure images popup when the mouse is moved over nodes and edges in the tree if JavaScript is enabled in the web browser. (Source: Moriya Y, D. Shigemizu, M. Hattori, T. Tokimatsu, M. Kotera, S. Goto, Minoru Kanehisa, PathPred: an enzyme-catalyzed metabolic pathway prediction server, Nucleic Acids Research, Volume 38, Issue suppl_2, 1 July 2010, Pages W138–W143, https://doi.org/10.1093/nar/gkq318. Reused with Licence Number 5125780225626 dated August 11, 2021)

3.1.2 BNICE

It stands for Biochemical Network Integrated Computational Explorer, a computational approach developed to generate every possible biochemical reaction based on a set of enzyme reaction rules of the enzyme commission (EC) and starting compounds (Finley et al. 2009). In general, it predicts whether a particular compound is biodegradable and whether alternate routes can be engineered for compounds already known to be biodegradable.

The BNICE screens out all possible pathways for thermodynamic feasibility based on the Gibbs free energies of the reaction and selects feasible novel thermodynamic pathways (Soh and Hatzimanikatis 2010). Hence, it is used to (1) study the combinatorial nature of polyketide synthesis (Gonzalez-Lergier et al. 2005), (2) to provide systematic framework for linking of enzymatic chemistry and reactive sites of metabolic compounds (Hatzimanikatis et al. 2004), and (3) for the prediction of biodegradation pathways of compounds, which represent various classes of xenobiotics.

Further, it has been also suggested by Soh and Hatzimanikatis et al. that the pathways generated by BNICE can be further evaluated using established pathway analysis approaches, such as thermodynamic-based flux balance analysis (FBA) Grow match allows investigation of the overall effects of these novel pathways on metabolic network performance in host organisms. FBA can help predict maximum yield, phenotypic changes, effects of gene knockouts, changes in bioenergetics of the system for metabolic engineering, and synthetic biology (Soh and Hatzimanikatis 2010).

3.1.2.1 Usage

The BNICE framework searches for pathways by considering the starting compound and/or products, the requested length of the pathway, and the range of reactions to search over (Henry et al. 2010; Medema et al. 2012).

The user can also choose to search for a number of possibilities, either by searching for a pathway using enzyme reactions from known pathways, by combination of multiple pathways, or the whole metabolic network (Henry et al. 2010; Hatzimanikatis et al. 2005). A set of molecules is given as an input and every molecule is evaluated to determine whether it has the appropriate functionality to undergo reactions corresponding to the specified reaction classes (Bashir Sajo and Mohd 2015).

While predicting the possible pathways the BNICE predicts more than 10,000 different pathways for the biosynthesis and degradation of the compound of interest, due to the fact that the system relies on few criteria. However, Henry et al. had pioneered a prioritization approach in this framework, in which generated pathways are ranked according to four criteria: pathway length, thermodynamic feasibility, maximum achievable yield, and maximum achievable activity.

Output: The output of the BNICE is a graph-theoretic matrix representation of biochemical compound, enzyme reaction rules, and molecules. It is represented using the bond-electron matrix (BEM) where each atom in a molecule is represented by a row and column. The BEM is characterized by diagonal elements, which denote the non-bonded valence electrons and non-diagonal elements, which give the connectivity via bonding between different atoms and the bond order between atoms (Hatzimanikatis et al. 2005).

3.1.3 DESHARKY

It is a Monte Carlo algorithm, which finds a metabolic pathway from a target compound by exploring database of enzymatic reactions. It predicts a possible route connecting the specified target metabolism with the host metabolism, instead of using pathway selection by enumeration of possible metabolic routes. It finds pathway within shortest possible time by computing its associated genetic burden. Also, it can be used also in distributed computing to sample most of the solution spaces (Rodrigo et al. 2008).

3.1.3.1 Usage

The algorithm is implemented in C/C+ +, and it is easily compiled and runs in UNIX environment (e.g., in Linux or in Windows using Cygwin). The algorithm calculates thermodynamic favorability and energy loss in transcription and translation.

The input of the algorithm is usually the target compound, while its output is the designed metabolic pathway together with quantification of the transcriptional, translational, and metabolic load (Rodrigo et al. 2008). This framework also provides the sequence of amino acids of the enzyme involved in the pathway.

Output: The output is the designed metabolic pathway together with the quantification of the transcription, translation, and metabolic load. It provides the sequence of amino acids of the enzymes involved in the pathway. These amino acid sequences provided are usually the closest phylogenetically to Escherichia coli according to KEGG classification of organisms (Rodrigo et al. 2008).

3.1.4 FMM

It stands for from metabolite to metabolite, a web server. It is available freely at http://FMM.mbc.nctu.edu.tw/. It can reconstruct metabolic pathways from one metabolite to another metabolite among different species, based mainly on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database and other integrated biological databases (Chou et al. 2009). Even though KEGG maps utilized in many metabolic tools, none of them can connect metabolites from different KEGG maps. FMM supports the connection of different KEGG maps.

FMM has many applications in synthetic biology and metabolic engineering. For example, the reconstruction of metabolic pathways to produce valuable metabolites or secondary metabolites in bacteria or yeast is a promising strategy for drug production. FMM provides a highly effective way to elucidate the genes from which species should be cloned into those microorganisms based on FMM pathway comparative analysis (Chou et al. 2009).

3.1.4.1 Usage
  1. 1.

    Data collection and Integration:

    Reaction definitions, species-specific reactions, reaction maps, and enzyme list can be obtained from KEGG/LIGAND and KEGG/PATHWAY databases recent releases. Information such as gene names, enzyme commission numbers, and species-specific enzymes can be retrieved from UniProtKB/Swiss-Prot and NCBI taxonomy databases. Additionally, the data in FMM are usually updated on a regular basis.

  2. 2.

    Construction of reaction matrix information on reactions and enzymes can be obtained from KEGG maps and the equation of each reaction can be determined. Therefore, reaction matrices can be constructed based on maps, reactions, and enzyme data.

    The workflow of FMM in above Fig. 27.4 shows the reaction matrix, which was developed to identify numerous reaction processes from one metabolite to another. Enzyme annotations from UniProtKB/Swiss-Prot (Boutet et al. 2007) were employed to identify enzymes from different species in comparative analysis.

  3. 3.

    Reconstruction of metabolic pathway from various KEGG pathway maps: After all possible reaction paths were identified, the number of pathway maps was calculated. Usually, found paths occurred not in only a single pathway map, but also in a complicated fashion in several maps. Pathway maps that contain the most paths are selected and the one pathway map that has only one reaction is avoided. A matrix of maps versus reactions was employed to reconstruct metabolic pathway from different KEGG maps.

  4. 4.

    Comparative Analysis: Comparative analysis provided in FMM is useful in synthetic biology. Comparative analysis provides an easy way to elucidate which genes from which species should be cloned into those microorganisms. First, the enzymes identified in the reconstructed pathway were processed to search for orthologous encoding genes from various species. Then, the presence or absence of the pathway in a particular species can be known.

Fig. 27.4
figure 4

Workflow of FMM. (Source: Chou, C. H., Chang, W. C., Chiu, C. M., Huang, C. C., & Huang, H. D. (2009). FMM: a web server for metabolic pathway reconstruction and comparative analysis. Nucleic acids research, 37(Web Server issue), W129–W134. https://doi.org/10.1093/nar/gkp264. Reproduced with License no 5125960782113 dated August 11, 2021)

3.1.5 RetroPath

It is a server, which applies a retrosynthetic approach, a concept originally proposed for synthetic chemistry, which uses reverse chemical transformations (reverse enzyme-catalyzed reactions in the metabolic space) starting from the desired target compound to identify the reactants (precursors) that are indigenous to the selected host (Carbonell et al. 2012). It is available at http://www.issb.genopole.fr/~faulon/retropath.php.

This method of metabolic pathway design is unique because it addresses the complexity problem by coding substrates, products, and reactions into molecular signatures. The approach used by RetroPath is characterized by metabolic maps, which are represented in hypergraphs. The complexity involved in the reactions is controlled by varying the specificity of the molecular signature. Each signature has different “heights,” h, that correspond to levels of structural detail. The height can be varied, which reduces the number of reactions that can be generated (Carbonell et al. 2011).

The proliferation of several metabolic databases with rich information is considered to be a significant breakthrough. KEGG that is a database resource integrated with chemical and systematic functional information and genomics is linked to RetroPath, where information on the reactions predicted using this framework can be found in KEGG. BRENDA (Schomburg et al. 2013) is another database that contains one of the largest collections functional enzyme data. Incomplete knowledge or gaps still exist in many cases, especially when looking for novel ways to synthesize a target compound of interest (Carbonell et al. 2013).

To successfully achieve a heterologous pathway design, the process needs to be rationalized by following the principles of synthetic biology: modeling of the biological system of interest, modular design through standardization, goal-oriented optimization, and experimental validation (Carbonell et al. 2013).

3.1.5.1 Usage

In the research study done by Carbonell et al. (2013), they have suggested that for retrosynthetic design of heterologous pathways, the following steps will be required: (1) host chassis selection, (2) in silico model selection for the chassis from BiGG (Schellenberger et al. 2010) or biomodels (Le Novere et al. 2006), (3) definition of the metabolic space, (4) pathway enumeration, (5) gene selection, (6) estimation of yields by metabolic analysis software, e.g., COBRA, OptFlux (Rocha et al. 2010), and COPASI (Hoops et al. 2006; Schaber 2012), (7) toxicity prediction of pathway metabolite, (8) definition of an objective function to select the best pathway to engineer, and (9) pathway implementation and validation (Fehér et al. 2014).

3.1.6 Metabolic Tinker

It is a web tool used to design synthetic metabolic pathways between user-defined target and source compounds. The interface is available at http://osslab.ex.ac.uk/tinker.aspx. It uses a tailored heuristic search strategy to search for thermodynamically feasible paths in the entire known metabolic universe. The program contains a directed graph known as universal reaction network (URN), which represents the entire set of known reactions and compounds from the Rhea database (McClymont and Soyer 2013). Nodes and edges on this graph represent metabolites and reactions, respectively, and thus, the entire graph represents the current known metabolic universe. This tool searches possible biochemical paths between two compounds within this URN using standard search algorithms developed in computer science and graph theory (McClymont and Soyer 2013). To complete the search, the Rhea/CHEBI identification codes of both the source and target compounds are needed.

3.1.7 Carbon Search

It is an algorithm-based approach, which identifies pathways within existing metabolic networks by tracking the conservation of atoms moving through them. On the basis of this approach, two algorithms are developed that find metabolic pathways by using atom mapping data to track the movement of atoms through metabolic networks. One algorithm finds linear pathways, and the other algorithm finds branched pathways. They both take as input atom as mapping data, a start compound, a target compound, and a minimum number of atoms to conserve and a maximum number of pathways to return (Heath et al. 2010). In the output, a set of metabolic pathways, which conserve at least given number of atoms from the start compound to the target compound, are returned. They have also demonstrated that this carbon search tool based on the algorithms has efficiently identified both linear and branched metabolic pathways, in which a certain threshold of atoms is conserved. The resulting metabolic pathways are validated on known functional pathways. The algorithms are having the potential to find novel or alternative pathways that may span multiple organisms (Heath et al. 2010).

Using this atom tracking approach, earlier Pitkänen et al. in 2009 have also enabled graph theoretical-based method for finding biologically meaningful linear and branched metabolic pathways in genome-scale metabolic networks.

3.1.8 The Furusawa Platform

It is an in silico platform that uses a developed algorithm for finding feasible heterologous pathways by which non-native target metabolites are produced by microorganisms, using Escherichia coli, Corynebacterium glutamicum, and Saccharomyces cerevisiae as templates (Chatsurachai et al. 2012).

3.1.8.1 Usage

The usage of this platform for heterologous pathway design comprises of following four steps:

  1. 1.

    Construction of an in-house database of metabolic reactions—This is done by considering known metabolic reactions from KEGG ligand section database and BRENDA. These metabolic reactions are considered as candidate heterologous reactions that could be added to the host metabolic networks (Chatsurachai et al. 2012). All metabolic reaction information regarding genes, enzymes, pathways, and organism in the KEGG database can be collected into the database. The information collected the information in a constructed database using PostgresQL. The enzymatic information employed can be retrieved from BRENDA, and python script can be used to access the constructed in-house database (Chatsurachai et al. 2012).

  2. 2.

    Genome-scale metabolic models of host microorganisms—The microorganisms that are widely used in industry were adopted as chassis templates to demonstrate the viability of it on in silico platform. This includes Escherichia coli, C. glutanicum, and S. cerevisiae, which were selected based on a number of criteria such as having high growth activity under various conditions, ease of genetic manipulation, and hence are considered as ideal hosts for bioengineered products (Chatsurachai et al. 2012).

  3. 3.

    Heterologous pathway identification for target production—The developed platform can be used to screen all producible target metabolites listed in the database by adding heterologous reactions to host microorganisms. For all producible target metabolites, the user can estimate the production yields using FBA, assuming steady-state conditions and the maximum biomass production rate (Chatsurachai et al. 2012). The entire list of producible target metabolites in different hosts can be analyzed, and a set of rational heterologous pathways and hosts can be selected that will likely produce the desired targets.

  4. 4.

    Flux balance analysis (FBA)—FBA is based on a genome-scale metabolic model and optimization of a specific objective flux by linear programming. One can use FBA to estimate the metabolic flux profile of metabolic networks expanded with heterologous reactions. All FBA simulations in this framework can be performed under the MATLAB interface (Chatsurachai et al. 2012).

3.2 Omic-Based Approaches

3.2.1 Proteomics

According to Keller and Hettich (2009) and Aslam et al. (2017), proteomics has emerged as an interesting and fruitful technology for the study of protein expression (it includes post-translational modifications, protein turnover, proteolysis, and changes in the corresponding gene expression) of the microbial world. Proteomics has been used to identify microbial communities/microorganisms in various ecosystems including soil and sediment, activated sludge, marine and groundwater sediment, acid mine biofilms, and wastewater plants (Williams et al. 2013; Colatriano et al. 2015; Grob et al. 2015; Bastida et al. 2016; Jagadeesh et al. 2017). Thus, the inclusion of a proteomic approach helps to identify related enzymes and their metabolic pathways in the bioremediation of xenobiotics from various contaminated sites (Liu et al. 2017; Wei et al. 2017). Studies also revealed important and hidden information related to protein synthesis, gene expression stability, mRNA turnover, and protein–protein interaction networks in microbial communities in stress environments Aslam et al. (2017). Hence, the studies related to proteomic analysis plays important role for bioremediation process.

Protein Analysis: Generally, there are four primary steps that involve proteomic analysis of microbial communities:

  1. 1.

    preparation of a biological sample;

  2. 2.

    extraction and separation of proteins by using two-dimensional gel electrophoresis (2D-GE);

  3. 3.

    protein gel images are examined by means of image analysis software such as ImageMaster 2D or PDQuest; and

  4. 4.

    proteins are identified by using mass spectroscopy (MS)/MALDI-TOF/MS or LC-MS (Yates et al. 2009; Chakka et al. 2015; Velmurgan et al. 2017).

The workflow of proteomic analysis is shown below in Fig. 27.5.

Fig. 27.5
figure 5

Workflow of proteomic analysis. (Source: Chandran, H., Meena, M., & Sharma, K. (2020). Microbial Biodiversity and Bioremediation Assessment Through Omics Approaches. Frontiers in Environmental Chemistry, 1, 9. https://doi.org/10.3389/fenvc.2020.570326. Frontiers is fully compliant with open-access mandates, by publishing its articles under the Creative Commons Attribution Licence (CC-BY))

3.2.1.1 Applications of Proteomics in Bioremediation
  1. 1.

    The bioremediation of compounds done by microorganisms has shown involvement of several proteins. This is demonstrated by the study done by Vandera et al. (2015). In their study, they have done comparative proteomic analysis of Arthrobacter phenanivorans Sphe3 on aromatic compounds phenanthrene and phthalates. The proteomic approach confirmed the involvement of several proteins in aromatic substrate degradation by identifying those mediating the initial ring hydroxylation and ring cleavage of phenanthrene to phthalate. This study also revealed the presence of both the ortho- and meta-cleavage pathways for the degradation of these aromatic compounds, and it also identified all proteins that take part in these pathways and are highly upregulated upon phthalate growth in comparison with phenanthrene growth.

  2. 2.

    The proteomic analysis of pyrene-degrading bacterium Achromobacter xylosoxidans PY4 done by Nzila et al. (2018) has identified a total of 1094 proteins. Out of which, 95 proteins were detected in glucose supplementation, and 612 proteins were detected in the presence of pyrene. Furthermore, they have found 25 upregulated proteins to be involved in stress response and the progression of genetic information. Two upregulated proteins, 4-hydroxyphenylpyruvate dioxygenase and homogentisate 1,2-dioxygenase, are implicated in the lower degradation pathway of pyrene. Enzyme 4-hydroxyphenylpyruvate dioxygenase may catalyze the conversion of 2-hydroxybenzalpyruvic acid (metabolite of pyrene) to homogentisate. Homogentisate 1,2-dioxygenase is involved in the incorporation of 2 oxygen atoms to produce 4-maleyacetoacetate, which is an intermediate in several metabolic pathways (Nzila et al. 2018).

  3. 3.

    Lee et al. (2016) have performed proteomic analysis of PAH-degrading bacterial isolate Sphingobium chungbukense DJ77. This strain exhibited outstanding degradation capability for various aromatic compounds. With this study, it was demonstrated that the degradation of three xenobiotic compounds, i.e., phenanthrene, naphthalene, and biphenyls (PNB), and their associated proteins was analyzed by 2-DE and MALDI-TOF/MS analysis. During PNB biodegradation by bacterial cells, an alteration was observed in protein expression to cope with the stress condition.

  4. 4.

    In year 2019, Chen et al. have investigated a biodegradation mechanism of tetrabromobis-phenol (TBBPA) in Phanerochaete chrysosporium by using a proteomic approach. With aid of this approach, they have found that compared to control TBBPA, stress caused 148 differentially expressed proteins in P. chrysosporium, among which 90 proteins were upregulated and 58 proteins were downregulated. The upregulation of cytochrome p450 monooxygenase, glutathione-S-transferase, O-methyltransferase, and other oxidoreductases is responsible for the biotransformation of TBBPA via oxidative hydroxylation and reductive debromination.

  5. 5.

    Another bioremediation study with proteomic analysis was performed by Yu et al. (2019). It was of decabromodiphenyl ether (BDE-209). It was explored in Microbacterium Y2 in a polluted water-sediment system. The results of study have shown that the overexpression of haloacid dehalogenases, glutathione S-transferase, and ATP-binding cassette (ABC) transporter might occupy important roles in BDE-209 biotransformation. Moreover, heat-shock proteins (HSPs), ribonuclease E, oligoribonuclease (Orn), and ribosomal proteins were activated to counter the BDE-209 toxicity. Thus, it is suggested that these proteins are implicated in microbial degradation, antioxidative stress, and glycolysis.

  6. 6.

    Another application of proteomics in bioremediation is researched by Gregson et al. (2020). It was reported that LC–MS/MS shotgun proteomics is used to determine variations in the proteome of hydrocarbon-degrading psychrophile Oleispira antarctica RB-8 when grown on n-alkanes in cold temperatures.

3.2.2 Genomics and Metagenomics

Genomics is the powerful computer technology used to understand the structure and function of all genes in an organism based on knowing the organism’s entire DNA sequence. This field includes intensive efforts to determine the entire DNA sequence of organisms and in-depth genetic mapping efforts (Fulekar and Sharma 2008).

Whereas metagenomic studies unblock the traditional ways of uncultured microorganisms and explore their genetic advantages in the process of bioremediation (Rahimi et al. 2018; Nascimento et al. 2020). It uses the pool of environmental genomes of microorganisms, which increases the probability to discover unique genes and diverse pathways with new enzymes containing highly specific catalytic properties (Scholz et al. 2012; Yergeau et al. 2017; Awasthi et al. 2020). This technology gives a new parade to microbiologists for understanding unculturable microbiota with a genetic variability of microbial communities (Devarapalli and Kumavnath 2015; Zhu et al. 2018; Awasthi et al. 2020). Hence, metagenomic information will enable researchers to integrate pure culture study with genomics (Hodkinson and Grice 2015). Current metagenomic practices allow for identifying the whole-genome structure of microorganisms and specifying particular genes that are attributed to encode degradative enzymes for the mineralization of xenobiotics (Zafra et al. 2016; Zhu et al. 2020). This clearly highlights the crucial role of novel genes in connecting the entire microbial population with functional diversity and structural identity. Based on it, the metagenomics involves the manufacturing of metagenomic libraries. With the help of these, biological information can be retrieved from these metagenomic libraries by two types of analysis:

  1. 1.

    Sequence-Driven Analysis: This analysis is based on the sequencing of clones with phylogenetic anchors or conserved DNA sequences, which is the plausible origin of the DNA fragment (Wu et al. 2010; Felczykowska et al. 2015; Wong 2018). This type of analysis is increasingly used owing to the availability of several software packages for data analysis and the ease to assess metagenomic sequencing data. This approach is predominantly influenced by the precision of genome annotation, the integrity of the available data, algorithms, and facts in databases to ascertain the function of novel genes (Ferrer et al. 2009).

    The complete genome analysis or sequence analysis is progressed through three technical transformations:

    1. (a)

      First-Generation Sequencing—Frederick Sanger and Allen Maxam Walter Gilbert sequencing techniques were categorized as the first-generation DNA sequencing methods. Sanger sequencing uses denatured DNA template, radioactively labeled primer, DNA polymerase, and chemically modified nucleotides called di-deoxynucleotides to generate DNA fragments with various lengths.

    2. (b)

      Next-Generation Sequencing—It is also called high-throughput sequencing. Next-generation sequencing involves library preparation, sequencing, base calling, alignment to the established genome, and assorted annotation. Library preparation begins with the fragmentation of DNA into multiple fragments by sonication, enzymatic digestion, or transposase followed by ligation with adaptor sequences. The prepared library is then amplified using clonal amplification and PCR methods to generate DNA replicas. DNA replicas are then sequenced using different approaches (Samorodnitsky et al. 2015). The major platforms used for microbiome studies in next-generation sequencing are pyrosequencing (Roche/454 sequencing), Illumina, SOLiD, Ion Torrent, PacBio RS, etc.

      These are high-throughput sequencing techniques of ribosomal genes that quantify community structures and functions at a higher resolution, e.g., 16S rRNA in prokaryotes, and 5S or 18S rRNA genes, or the internal-transcribe-spacer (ITS) region in eukaryotes (Luo et al. 2012). The effectiveness of such NGS technologies in analyzing microbial communities from diverse environments was elucidated, validated, and documented in many studies (Brown et al. 2013; Shokralla et al. 2014; Zhou et al. 2015; Niu et al. 2016; Scholer et al. 2017).

    3. (c)

      Third-Generation Sequencing—It is also called single-molecule long-read sequencing. It offers lower sequencing charge and contented sample preparation without PCR amplification. The two most widely used sequencing platforms in third-generation sequencing are Pacific Biosciences, Oxford Nanopore Technology, and Helioscope technology.

    The competitive analysis of platforms used in second and third-generation sequencing is discussed below in Table 27.1.

    Shotgun Sequencing

    It is also called shotgun metagenomic sequencing. It is a powerful technique in microbial ecology because it provides a vigorous and reliable evaluation of microbial diversity (Hillmann et al. 2018). It does not depend on PCR amplification and is used to examine the functional potential and microbial composition of the community.

    Importance of shot-gun sequencing in bioremediation

    1. (a)

      It is the only way to study the microbial community with no markers like viruses (Quince et al. 2017; Vermote et al. 2018).

    2. (b)

      It allows strain-level remodeling in the taxonomic analysis and pathway predictions for the functional annotation of the microbiome under study (Han et al. 2020).

    3. (c)

      It is an emerging molecular method to bridge the gap amid community structure and functional competence.

    4. (d)

      It also helps in understanding the strategies adopted by microorganisms to thrive in adverse conditions (Sharpton 2014; Peabody et al. 2015; Ranjan et al. 2016).

    5. (e)

      This techniques workflow for taxonomy analysis consists of quality pruning and evaluation of a reference database involving whole genomes or specifically designed marker genes to create a taxonomy profile. Since it contains all genetic information in a sample, the information can be used for supplementary analyses like metagenomic assembly and binning, metabolic function profiling, and antibiotic-resistant gene profiling (Chandran et al. 2020).

    6. (f)

      Shotgun metagenomic analysis of microbial communities from deep seabed petroleum seeps in the Eastern Gulf of Mexico revealed the presence of diverse communities of chemoheterotrophs and chemolithotrophs (Dong et al. 2019).

    7. (g)

      Whole-genome shotgun sequencing was engaged to identify the taxonomic diversity and gene repertoire of bacteria isolated from tannery effluents and petrol-polluted soil samples for degradation of persistent organic pollutants like naphthalene, toluene, petrol, and xylene (Muccee and Ejaz 2020).

  2. 2.

    Function-Driven Analysis: The function-driven analysis is based on the identification of clones that express their functional activity. If the sequence analogy does not complement to a functional association or the original gene has less analogy to some genes whose products have been investigated biochemically or a specific gene is capable to accomplish diverse tasks in the cell (Hallin et al. 2008), then in such cases, function-driven screening is preferred to discover genes with novel functions or to explore the sequence variation of protein families (Singh et al. 2009; Meena et al. 2016). The workflow below is showing the general methodology used for metagenomic research in Fig. 27.6.

Table 27.1 Comparative analysis of different platforms used for second- and third-generation Sequencing. Source: Chandran, H., Meena, M., & Sharma, K. (2020). Microbial Biodiversity and Bioremediation Assessment Through Omics Approaches. Frontiers in Environmental Chemistry, 1, 9. https://doi.org/10.3389/fenvc.2020.570326. Frontiers is fully compliant with open-access mandates, by publishing its articles under the Creative Commons Attribution Licence (CC-BY)
Fig. 27.6
figure 6

Workflow of Metagenomic Research. (Source: Chandran, H., Meena, M., & Sharma, K. (2020). Microbial Biodiversity and Bioremediation Assessment Through Omics Approaches. Frontiers in Environmental Chemistry, 1, 9. https://doi.org/10.3389/fenvc.2020.570326. Frontiers is fully compliant with open-access mandates, by publishing its articles under the Creative Commons Attribution Licence (CC-BY))

3.2.2.1 Applications of Metagenomics
  1. 1.

    With metagenomic analysis, research area has increased to analyze microbial communities, their genetic diversity, and metabolic pathways. It has provided opportunities to discover microbial consortia and genes involved in the bioremediation of xenobiotic compounds. For example, phenol-degrading pathways of uncultivated bacteria in activated sludge were studied using metagenomics (Sueoka et al. 2009).

  2. 2.

    The metagenomic approach was used by Silva et al. (2013) to characterize genes and metabolic pathways associated with the degradation of phenol and other aromatic compounds in sludge from a petroleum refinery wastewater treatment system.

  3. 3.

    Also, Jeffries et al. in 2018 have employed metagenomic analysis to outline the functional potential and taxonomic community composition, and to predict the breakdown of chemical compounds of soils with organophosphorus pesticide exposure.

  4. 4.

    A combined physical and chemical analysis along with metagenomics was done by Gaytán et al. (2020) to explicate probable metabolic pathways associated with polyurethane-degrading to alleviate plastics and xenobiotic pollution.

  5. 5.

    Studies are done by Aubé et al. (2020), using metagenome and enriched mRNA metatranscriptome sequencing on the persistent impact of petroleum pollutants on the taxonomic and metabolic structure of microbial mats.

  6. 6.

    Auti et al. (2019) have demonstrated that 16S rRNA gene sequencing analysis is a highly recommended cost-effective technique for the phylogenetic resolution and taxonomic profiling of microbial communities. As 16S rRNA gene sequence similarity between two strains provides a simple yet robust criterion for the identification of newly isolated strains, whereas phylogenetic analyses can be used to elucidate the overall evolutionary relationship between related taxa (Johnson et al. 2019).

  7. 7.

    Using metagenomic approach, Zhu et al. (2020) have explored microbial assemblage and functional genes potentially involved in upstream and downstream phthalate degradation in soil. Results of which indicate that bacterial taxon Actinobacteria (Pimelobacter, Nocardioides, Gordonia, Nocardia, Rhodococcus, and Mycobacterium) was a major degrader under aerobic conditions, and bacterial taxa Proteobacteria (Ramlibacter and Burkholderia), Acidobacteria, and Bacteroidetes were involved under anaerobic conditions.

  8. 8.

    By metagenomic analysis, Hidalgo et al. (2020) in their research have exposed that the members of Geobacteraceae and Peptococcaceae microbiota present in the jet-fuel-contaminated site could be exploited for their remarkable metabolic potential for the mitigation of toluene and benzene.

3.2.3 Transcriptomics

Transcriptomics is the study of an organism’s transcriptome, i.e., the sum of all of its RNA transcripts. The information content of an organism is recorded in the DNA of its genome and expressed through transcription. Here, mRNA serves as a transient intermediary molecule in the information network, while noncoding RNAs perform additional diverse functions. A transcriptome captures a snapshot in time of the total transcripts present in a cell (Lowe et al. 2017).

It is also called gene expression profiling because it provides the understanding of up- or downregulation of genes under various environments in microbial communities. mRNA analysis provides a direct vision of cell and tissue-specific gene manifestation like (1) the existence, nonexistence, and assessment of transcript, (2) assessment of alternative splicing to foresee protein isoforms, and (3) quantitative evaluation of genotype impact on gene expression via expression assessable trait loci analysis or allele-specific expression (Chandran et al. 2020). Thus, transcriptomic analysis provides a large amount of gene information about the potential function of microbial communities in adaptation and survival in extreme environments (Singh et al. 2018).

There are a number of techniques in transcriptomics that supports in reviewing and evaluating mRNA expression of an organism. This includes the following:

  1. 1.

    Microarrays: DNA microarray is a powerful technique in transcriptomics that supports in reviewing and evaluating mRNA expression of every single gene existing in an organism. The technique has been employed to evaluate variance in metabolic and catabolic gene expressions, to analyze the microbial community physiology from diverse environments, identify new bacterial species, etc. (Dennis et al. 2003; Greene and Voordouw 2003).

  2. 2.

    RNA Sequencing: RNA sequencing uses next-generation sequencing to determine the amount of RNA in a sample. It is very extensive as it facilitates different types of RNA at a much-advanced coverage and broad discovery studies (Shendure 2008; Nagalakshmi et al. 2010).

    The generation of raw transcriptome data involves purification of fine RNA of interest followed by transformation of RNA to complementary DNA (cDNA), fragmenting cDNA to build a library using sequence by synthesis (RNA sequencing), running the microarray or sequence through superior software platform and carrying out ad hoc QC (Chandran et al. 2020). Thus, it a better approach to understand the basic nature and mechanism of differently expressed genes in the host and symbiotic microbes at a time (Kaur and Kaur 2016).

  3. 3.

    GeoChip: It is a high-throughput tool, which analyzes microbial community composition, structure, and functional activity. It uses key enzymes or genes to spot various microbe-mediated mechanisms for biogeochemical cycles, resistance mechanism for heavy metals, and degradation pathways of xenobiotics (He et al. 2010; Xiong et al. 2010; Xie et al. 2011).

  4. 4.

    DNA and RNA-SIP: These are both stable isotope probing technologies. They are used for probing hydrocarbon degraders. They are also valuable to uncover the microbial taxa and catabolic genes that are important for the bioremediation of polluted environments (Lueders 2015).

  5. 5.

    microRNAs: The regulation of gene expression can be studied also by the collective analysis of mRNA and microRNA levels. MicroRNAs (mRNAs) are short, noncoding RNA molecules that control transcription of mRNA. The precise binding of mRNAs to a target mRNA (by sequence homology) either impedes mRNA binding to the ribosome or targets it for degradation. mRNA profiling along with miRNA expression can be used to explore variations in the transcriptome profile, particularly to identify the miRNA transcripts that are subjected to regulation, emphasizing the probable molecular pathways supporting a particular trait or condition (Chandran et al. 2020).

3.2.3.1 Applications of Transcriptomics
  1. 1.

    Comparative transcriptomics have been used to reveal highly upregulated degradation pathways and putative transporters for phenol to improve phenol tolerance and utilization by lipid-accumulating Rhodococcus opacus PD630 (Yoneda et al. 2016).

  2. 2.

    Hong et al. in year 2016 have studied hydrocarbon-degrading bacterium Achromobacter sp. using transcriptomics. The species was isolated from seawater and indicated that the upregulation of enzymes such as dehydrogenases, monooxygenases, and novel genes associated with fatty acid metabolism is responsible for its enormous capability for hydrocarbon degradation and survival.

  3. 3.

    The investigation done by Lima-Morales et al. in year 2016 using transcriptomic approach on the microbial organization and catabolic gene diversity. They have worked on three types of contaminated soil under continuous long-term pollutant stress with benzene and benzene/toluene/ethylene/xylene (BTEX). The results obtained have shown shifts in community structure and the prevalence of key genes for catabolic pathways. Moreover, de novo transcriptome synthesis gives new insights into and reveals basic information about nonmodel species without a genome reference.

  4. 4.

    Metatranscriptomic analysis of the wheat rhizosphere identified dominant bacterial communities of diverse taxonomic phyla, including Acidobacteria, Cyanobacteria, Bacteroidetes, Streptophyta, Ascomycota, and Firmicutes, having functional roles in the degradation of various xenobiotic pollutants (Singh et al. 2018).

  5. 5.

    Transcriptomic along with genomic approaches was used by Sengupta et al. in year 2019 for studying mechanistic insights of 4-nitrophenol (4-NP)-degrading bacterium Rhodococcus sp. strain BUPNP1. The study identified a catabolic 43 gene cluster named nph that harbors not only mandatory genes for the breakdown of 4-NP into acetyl co-A and succinate by nitrocatechol, but also for other diverse aromatic compounds.

  6. 6.

    Transcriptome analysis of activated sludge microbiomes decoded the role of the nitrifying organisms in heavy oil degradation (Sato et al. 2019).

  7. 7.

    Also, studies done by Das et al. in year 2020 using transcriptome analyses of crude oil-degrading Pseudomonas aeruginosa strains revealed the significance of differentially expressed genes implicated in crude oil degradation.

3.2.4 Metabolomics

A metabolome is the total metabolites in an organism, and the study of the metabolite profile of a cell within a given condition is called metabolomics (Beale et al. 2017). Metabolomics explores the relationships between organisms and the environment, such as organismal responses to abiotic stressors, including both natural factors such as temperature, and anthropogenic factors such as pollution, to investigate biotic–biotic interactions such as infections, and metabolic responses (Lindon et al. 2006; Griffiths 2007; Mallick et al. 2019).

Metabolomics analyzes the metabolites produced by the cell in response to changing environmental conditions, which in turn provide information about the regulatory events in a cell (Krumsiek et al. 2015). A metabolomic analysis workflow starts with sample acquisition and preparation followed by separation and detection of analytes. Detection and quantification of metabolites are normally accomplished through an amalgamation of chromatography techniques (liquid chromatography and gas chromatography) and detection systems like mass spectrometry and nuclear magnetic resonance (Aldridge and Rhee 2014).

3.2.4.1 Applications of Metabolomics
  1. 1.

    Seo et al. in 2013 have investigated the degradation mechanism of carbaryl and other N-methyl carbamates pesticides in Burkholderia sp. strain C3 by using metabolomic approach. The result of this study has shown that the metabolic adaptation of Burkholderia sp. C3 to carbaryl in comparison with glucose and nutrient broth. The metabolic changes were notably associated with the biosynthesis and metabolism of amino acids, sugars, PAH lipids, and cofactors. Thus, this metabolomic study could provide detailed insights into bacterial adaptation to different metabolic networks and the metabolism of toxic pesticides and chemicals.

  2. 2.

    Wang et al. in 2019 have applied comparative metabolic approach for studying the microbial degradation of cyfluthrin by Photobacterium ganghwense. This approach has explored the biotransformation pathway of cyfluthrin with the identification of 156 metabolites during the biodegradation process.

  3. 3.

    In 2018, Li et al. on the basis of interactions of indigenous soil microorganisms to PAH-contaminated soil have that the majority of microbial metabolic functions were adversely affected to cope with PAH pollution. This study includes the combined study of enzyme activity and sequencing analysis with metabolomics, which further exposed the specific inhibition of soil metabolic pathways associated with carbohydrates, amino acids, and fatty acids due to microbial community shifting under PAH stress.

  4. 4.

    High-throughput sequencing and soil metabolomics were used by Song et al. in 2020 for investigating the differential structures and functions of soil bacterial communities in the pepper rhizosphere and bulk soil under plastic greenhouse vegetable cultivation (PGVC). In the study, a total of 245 metabolites were identified, among which 11 differential metabolites were detected between rhizosphere and bulk soil, including organic acids and sugars that were positively and negatively correlated with the relative abundances of the differential bacteria. A starch and sucrose metabolic pathway was the most differentially expressed pathway in rhizospheric soil. The main functional genes participating in this pathway were predicted to be downregulated in rhizosphere soil.

  5. 5.

    Wright et al. in 2020 also evaluated the metabolomic characterization of two potent marine bacterial isolates, Mycobacterium sp. DBP42 and Halomonas sp. ATBC28, capable of the degradation of phthalate and plasticizers such as ATBC, DBP, and DEHP. This research study presented the molecular analysis of metabolites generated during biodegradation. It also confirmed that DBP and ATBC were degraded through the sequential removal of ester side chains and generated monobutyl phthalate and phthalate in the case of DBP degradation and citrate in the case of ATBC degradation in Mycobacterium species.

  6. 6.

    Metabolite pathway databases and repositories are there, which can be used to supervise and investigate the information about metabolites and their pathways. They provide a databank on metabolic information and help in the unification of complex data into metabolic pathways. These databases and repositories also help in modeling metabolic pathways that can be investigated and prompted using mathematical modeling techniques (Chandran et al. 2020).

3.3 Prediction of Chemical Toxicity

Determination of chemical toxicity level, which is lethal for the survival of the degrading microbes, is very important. Several tools and computational models are present, which can predict the toxicity of chemicals involved.

QSAR-Based Models: It stands for quantitative structure regulatory activity relationship. This calculates toxicity based on the physical characteristics of the structure of chemicals such as the molecular weight or the number of benzene rings (molecular descriptors) using mathematical algorithms (Eriksson et al. 2003). There is number of tools based on QSAR:

  1. 1.

    VirtualToxLab—It is for prediction of the toxic potential of drugs, chemicals, and natural products. This includes endocrine and metabolic disruption, and some aspects of carcinogenicity and cardiotoxicity (Vedani et al. 2009).

  2. 2.

    Toxicity Estimation Software Tool (TEST)—This tool is for prediction of the acute toxicity of organic chemicals based on their molecular structures. It allows a user to estimate toxicity without requiring any external programs. Users input a chemical to evaluate by drawing it in an included chemical sketcher window, entering a structure text file, or importing it from an included database of structures. Once entered, the toxicity is estimated using one of several advanced QSAR methodologies (http://www.epa.gov/nrmrl/std/qsar/qsar.html).

  3. 3.

    Sarah Nexus—It is a statistical-based model used for prediction of the mutagenicity of chemicals (Barber et al. 2016).

  4. 4.

    TOPKAT—It is for prediction of the ecotoxicity, mutagenicity, and reproductive or developmental toxicity of chemicals (Prival 2001).

  5. 5.

    Ecological Structure–Activity Relationships (ECOSAR)—The Ecological Structure–Activity Relationships (ECOSAR) Class Program is a computerized predictive system that estimates aquatic toxicity. The program estimates a chemical’s acute (short term) toxicity and chronic (long term or delayed) toxicity to aquatic organisms, such as fish, aquatic invertebrates, and aquatic plants, by using computerized structure–activity relationships (SAR) (http://www.epa.gov/oppt/newchems/tools/21ecosar.htm). This software is available for free.

  6. 6.

    Estimation Programs Interface (EPI)—The Estimation Programs Interface (EPI) Suite is a Windows-based suite of physical/chemical property and environmental fate estimation programs. It is a screening-level tool. It uses a single input to run the following estimation programs: KOWWIN, AOPWIN, HENRYWIN, MPBPWIN, BIOWIN, BioHCwin, KOCWIN, WSKOWWIN, WATERNT, BCFBAF, HYDROWIN, KOAWIN, and AEROWIN, and the fate models WVOLWIN, STPWIN, LEV3EPI, and ECOSAR (http://www.epa.gov/opptintr/exposure/pubs/episuite.htm).

  7. 7.

    CAESAR—The CAESAR QSAR model is developed for assessment of chemical toxicity under the REACH (Cassano et al. 2010).

  8. 8.

    ToxinPred—It is a web server available for prediction of aqueous toxicity of small chemical molecules in Tetrahymena pyriformis. It is available at http://crdd.osdd.net/raghava/toxipred. It is used for environmental risk assessment of small chemical compounds based on quantitative structure–toxicity relationship (QSTR) model (Mishra et al. 2014).

  9. 9.

    ACD/TOx suite—It is a tool for potential bacterial system to be employed in textile dye decolorization and degradation studies (Srinivasan et al. 2017).

3.4 Databases

In relation to bioremediation, the number of databases has been developed to provide information regarding chemicals and their biodegradation. Given below is the list of chemical databases:

  1. 1.

    TOXNET—It is developed by the National Library of Medicine (NLM), is a Web-based system of databases providing information on toxicology, hazardous chemicals, and the environment. Databases fall under the general headings of Toxicology Data, Toxicology Literature, Toxic Releases, and Chemical Identification/Nomenclature (Wexler 2001). There are various databases under it, and this includes:

    1. (a)

      CCRIS—It stands for Chemical Carcinogenesis Research Information System. The database contains chemical records with carcinogenicity, mutagenicity, tumor inhibition test results. It was developed by the National Cancer Institute (NCI). Data are derived from studies cited in primary journals, current awareness tools, NCI reports, and other sources. Test results have been reviewed by experts in carcinogenesis and mutagenesis (http://toxnet.nlm.nih.gov/cgibin/sis/htmlgen?CCRIS).

    2. (b)

      Developmental and Reproductive Toxicology Database (DART)—It provides references related to developmental and reproductive toxicology literature (http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?DARTETIC).

    3. (c)

      Genetic Toxicology Data Bank (GENE-TOX)—It provides genetic toxicology (mutagenicity) test data from expert peer review of open scientific literature for more than 3000 chemicals from the United States Environmental Protection Agency (EPA). It was established to select assay systems for evaluation, review data in the scientific literature, and recommend proper testing protocols and evaluation procedures for these systems (http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?GENETOX).

    4. (d)

      Integrated Risk Information System (IRIS)—This program supports the mission by identifying and characterizing the health hazards of chemicals found in the environment. Each IRIS assessment can cover a chemical, a group of related chemicals, or a complex mixture. IRIS assessments are an important source of toxicity information used by EPA, state and local health agencies, other federal agencies, and international health organizations (http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?IRIS).

  2. 2.

    Biodegradative Strain Database (BSD)—It is a Web-based database that provides detailed information about biodegradative bacteria and the hazardous chemicals that they degrade (Urbance et al. 2003). It is available at http://www.bsd.cme.msu.edu/.

  3. 3.

    MetaRouter—It maintains varied information regarding biodegradation networks, predicting biodegradative pathways for chemical compounds (Pazos et al. 2005). It is available at http://pdg.cnb.uam.es/MetaRouter.

  4. 4.

    ECHA Classification & Labeling Inventory—It gives the information about the classification and labeling of substances reported and registered by manufacturers and importers (Schöning 2011).

  5. 5.

    N-CLASS—It stands for the Nordic N-Class Database on Environmental Hazard Classification. It provides information describing chemicals that have been or are currently being considered by the European Commission on classification and labeling for environmental effects (http://apps.kemi.se/nclass/default.asp).

  6. 6.

    International Toxicity Estimates for Risk (ITER)—It provides risk information for 600 chemicals from authoritative groups worldwide (Wullenweber et al. 2008).

  7. 7.

    ProteoWizard—It is used for rapid proteomic analysis (Kessner et al. 2008). It is available at http://proteowizard.sourceforge.net/

  8. 8.

    SuperToxic—It is a Web database having collection of about 60,000 toxic compounds and their structures. With the aid of implemented similarity searches, it can provide information about possible biological interactions. Also, connections to the Protein Data Bank, UniProt, and the KEGG database are available, to allow the identification of targets and the pathways, the searched compounds that are involved in Ref. Schmidt et al. (2009). This database is available online at http://bioinformatics.charite.de/supertoxic.

  9. 9.

    Acutoxbase—It aims to optimize and prevalidate an in vitro testing strategy for predicting acute human toxicity. The database consists of two principal parts for archiving in vitro and in vivo data, respectively. The in vitro part, designed following the principles of Good Cell Culture Practice (GCCP), provides a standard format for collection of in vitro data, together with detailed descriptions of methodologies (Standard Operating Procedures, SOPs), generated by research laboratories participating in the project (Kinsner-Ovaskainen et al. 2009).

  10. 10.

    Biodegradation Network-Molecular Biology (Bionemo)—The Bionemo database is available at http://bionemo.bioinfo.cnio.es. It was developed by the structural Computational Biology Group at the Spanish National Cancer Research Center. Bionemo is a manually curated database that provides information regarding proteins and genes involved in biodegradation metabolism. The protein information involves sequences, domains, and structures for proteins, whereas the genomic information involves sequences, regulatory elements, and transcription units for genes (Carbajosa et al. 2009). It complements UM-BBD, which focuses on the biochemical aspects of biodegradation. Bionemo has been developed by manually associating sequence database entries to biodegradation reactions based on the information extracted from published articles.

  11. 11.

    OxDBase—It is an enzymatic database that contains all literature-cited information related to oxygenases (Arora et al. 2009). It is available at www.imtech.res.in/raghava/oxdbase/.

  12. 12.

    PAHbase—The PAH database contains significant information on PAH-degrading bacteria, their occurrence phylogeny, metabolic pathways, and the genetic basis of their biodegradation capability (Surani et al. 2011). It is available at http://www.pahbase.in.

  13. 13.

    BioRadBase—It is a comprehensive knowledge database that provides detailed information about the bioremediation of radioactive waste through microorganisms (Reena et al. 2012). It is available at http://biorad.igib.res.in.

  14. 14.

    BiofOmics—It is a novel, systematic, and large-scale database for the management and analysis of biofilm data from high-throughput experiment studies of microorganisms (Lourenco et al. 2012). It is available at www.biofomics.org.

  15. 15.

    Kyoto Encyclopedia of Genes and Genomes (KEGG)—It provides information regarding genetic, metabolic, enzymatic, and cellular progressions of microorganisms (Kanehisa et al. 2017). It is available at http://genome.ad.jp/kegg/

  16. 16.

    Proteomics Identifications (PRIDE)—It is a world’s largest database for analysis of mass spectrometry-based proteomic data. It includes generic standard-based format that can be annotated to capture data generated using any proteomic pipeline (Vizcaino et al. 2016). It is available at http://www.ebi.ac.uk/pride/.

  17. 17.

    MetaboLights—It is a database for metabolomic studies that provide primary research data and metadata for cross-platform and species metabolomic studies (Kale et al. 2016). It is available at http://www.ebi.ac.uk.

  18. 18.

    MetaCyc—It is a database of metabolic pathways derived from the scientific experimental literature that comprises more than 2097 experimentally determined metabolic pathways from more than 2460 different organisms. This is the largest curated database of metabolic pathways of all domains of life. This database provides information regarding the metabolic pathways involved in primary and secondary metabolism with associated compounds, enzymes, and genes (Capsi et al. 2016). This database is freely available at http://metacyc.org/. It provides multiple scientific applications:

    1. (a)

      provide reference data for computational prediction of the metabolic pathways of organisms from their sequenced genomes,

    2. (b)

      support metabolic engineering,

    3. (c)

      facilitate comparison of biochemical networks, and

    4. (d)

      serve as an encyclopedia of metabolism.

  19. 19.

    BioCyc—This database was developed and curated by the BioCyc group at SRI international. It is available at BioCyc (http://biocyc.org/). It is a collection of more than 2988 organism-specific Pathway/Genome Databases (PGDBs). Each PGDB contains the full genome and predicted metabolic pathway of a single organism. The pathway tool software predicts pathways using MetaCyc as a reference database.

    The BioCyc PGDBs contain information about predicted operons, transport systems, and pathway hole fillers. BioCyc pathway tool-based websites offer multiple tools for querying and analysis of PGDBs, including analysis of gene expression, metabolomics, and other large-scale datasets (Capsi et al. 2016).

  20. 20.

    Molecular Evolutionary Genetic Analysis (MEGA 7.0)—It is used for sequence alignment, hierarchical classification, and constructing phylogenetic trees (Kumar et al. 2016). It is available at www.megasoftware.net.

4 Conclusion and Future Prospective

With the advent of bioinformatics, the application area of bioremediation has increased. The progressive increase in research from last few decades to present has changed the scenario a lot. The applications of genomics, proteomics, transcriptomics, and metabolomics have given in-depth knowledge of genes, proteins, and enzymes with which the ability to understand the cellular mechanism of microbes has widened. Hence, it can be concluded that this interdisciplinary approach would be supporting the bioremediation by providing distinctive and comprehensive knowledge to build new biodegradative pathways at the molecular level, new hypotheses, postulations, and paradigm for the bioremediation of contaminated living habitat. But in view of future prospective, still research is required for recognition of specific genes and protein sequences of microbes for efficaciously eliminating contamination. Also, studies related to homogeneity shared by genes and proteins involved in bioremediation practice.