Introduction

Ecological processes are associated with microbial activities, either indirectly or directly, where microbes perform vital functions in the overall biogeochemical cycling of nutrients and the degradation of persistent ecological pollutants [1, 2]. Microbes' use to remove natural or synthetic pollutants is the prevailing, low-cost green technology to treat different kinds of polluted environments [3, 4]. Microbes make fantastic pollutant degraders because they possess enzymes, for instance, dehalogenases. The enzymes use the environmental pollutants as nutrient or carbon sources, and their naturally small size enables facile contact with the pollutants [5]. However, the naturally occurring organohalide compounds are insignificant compared to anthropogenic ones [6]. Adding halogen atoms to organic molecules would significantly improve their properties, such as solubility and toxicity [6]. The change would result in valuable commercial product improvement but may impart serious effects on microbial metabolism [7]. At times, halogenation decreases the vulnerability of chemicals significantly to enzymatic attack and give rise to persistent compounds [8].

Consequently, bioremediation technologies to remove such chemicals from the environment are still growing. Several microbes have been identified as promising in the bioremediation of these pollutants [5, 9, 10]. However, the main questions remain, (i) what are the enzymes involved in pollutants degradation pathways? (ii) how do they carry out their function?, and (iii) how do they respond to various pollutants?. The questions are useful in forecasting the degradability and ability of the native microbial environment and provides valuable information about enzymes involved in degradation. However, the identification of critical regulatory genes, their structure, and the comprehension of their genetic responses remain challenging.

The arrival of molecular microbial tools has profoundly changed the microbial ecology area by providing a direct approach to environmental microbes' phylogeny and physiology, without depending on their cultivability [11]. This new microbial ecology era is defined by creating quick and high input techniques, which aids the culture-independent method of microbial communities dwelling in ecosystems. Hence, this incapacitates the limitations of cultivation-dependent methods. Molecular methods to evaluate microbial ecology facilitated the essential comprehension of microbial communities' structural and functional behaviors [12, 13], offering excellent prospects for novel bioremediation approaches. Molecular approaches permit immediate investigation of the transcripts, metabolites, genes, and protein of natural microbial communities in situ [12, 14]. It also gives insight into their relations that affect the attenuation of environmental pollutants, thereby restoring the polluted environments.

This review provides recent information on the bioremediation of environmental pollutants and the various genomic strategies involving the WGS, metagenomics, and single-cell genome analysis in pollutant degradation research. An overview of functional and structural arrangement and detailed application of the techniques to pollutant degradation research are also summarized. Moreover, information on gene arrangement, metabolic pathways, and the molecular mechanism of organohalide pollutants are lacking. The approaches may improve catabolic pathways and offer useful information on the genome, transcriptome, and proteome of in situ microbial communities without needing cultivation [12, 14].

Contaminants in natural environment

The extensive utilization of harmful organic compounds and the rise in pollutant concentrations in the environment are consequences of rapid industrialization, increased population, military activities, changed agricultural activities, and urbanization. Many pollutants and waste materials containing heavy metals, petroleum hydrocarbons, and halogenated organic compounds are released into the environment annually. The issue is exacerbated by their persistence and is amassed over a prolonged period. Their widespread distribution into air and groundwater has resulted in critical ecological and health challenges globally [15].

Categorically, common environmental pollutants are hydrocarbon, agricultural supplements (pesticide, herbicides, and synthetic insecticides), and heavy metals (Table 1). Petroleum hydrocarbons have multiple carbon bonds, which create strong, complex structures when bonded with other types of molecules. They exist in various forms, viz. short, medium, long aliphatic, aromatics, and polycyclic aromatic hydrocarbons (PAHs) of varying ratios [16]. Organohalides are also common environmental pollutants by spraying herbicides, fungicides, insecticides, hydraulic and heat fluids, plasticizers, and chemical intermediates. As a matter of fact, chlorinated phenolic compounds are one of the most abundant recalcitrant wastes discharged by the paper and pulp industry. Pesticides are globally applied in agriculture as well as in many public health sectors. Plastics and dyes are occasionally used, except for a few PAHs used in medicine [17]. However, the compounds are deemed as environmental pollutants in various terrestrial and aquatic ecosystems. Their detrimental impacts offset their usefulness, as they amass and are resistant to degradation. Toxicities of such compounds have been reported, including oxidative stress, changes in metabolic parameters, hepatic stress, and cellular necrosis [18, 19]. The biological consequences concede the survival, growth, reproduction, and the early-stage development of organisms [19,20,21].

Table 1 Common environmental pollutant

The frequently used techniques to treat polluted environments include physical and chemical technologies such as burying, combusting, extracting soil vapor, soil washing, and dispersion [22]. The methods, however, are incapable of totally decomposing pollutants and are economically unfriendly. In some cases, more toxic chemicals are formed compared to before the treatment [22]. Due to the high cost and the lack of public acceptance of these methods, bioremediation seems to be an excellent alternative clean-up technology. Bioremediation is a method that biologically degrades organic wastes to a safe compound under regulated conditions or to levels lower than concentration limits established by regulating agencies [23]. The method is safe, cost-effective, and naturally eco-friendly, uses several enzymes from microbes to degrade toxic organic pollutants [10, 23, 24].

Classification of enzymes involve in bioremediation

Based on biodegradation ability, microbial enzymes are classified into oxidoreductases and hydrolases [24, 25]. Certain oxidoreductase-producing bacteria such as laccase, peroxidase, and oxygenase can degrade radioactive metals, chlorinated compounds, and petroleum containing hydrocarbons [21, 26, 27]. In contrast, hydrolases are common hydrolytic enzymes that bioremediate and detoxify agrochemical pollutants. For instance, oil spills, organohalides, organophosphates, and carbamate insecticides have been effectively degraded by dehalogenases, lipases, cellulases, carboxylesterases, and phosphodiesterases [10, 28].

The study of pollutant-degrading microbes, identifying their genetic and biochemistry, and developing techniques for their use have led to crucial human endeavors. Microbes stand preferable as agents to clean up the environment in terms of cost, chemical, and physical techniques. Nonetheless, microbial-driven clean up of polluted environments remains limited, following the lack of knowledge of the factors that control and regulate their metabolism, growth, and different microbial community dynamics. The recent use of innovative tools like proteomics, fluxomics, transcriptomics, genomics, and metabolomics has produced eco-friendlier pollutant treatment strategies [14]. Microbes can mineralize and degrade the contaminants into energy sources and then convert them into simpler intermediates. During this process, the microbes can transform toxic chemicals into nontoxic compounds or completely degrade them into carbon dioxide and water by converting them from one phase to another. The expression of the microbial enzymes’ genes relies on these toxic chemicals' presence, as the enzymes are substrate-specific and mobile due to their smaller size [24, 29, 30]. Gene expression of the enzyme depends on the substrate types present in the medium (Fig. 1).

Fig. 1
figure 1

An overview of gene expression in the biodegradation process in a bacteria system. The toxic chemicals of the halogenated compound must be able to enter the cells. Inside the cells, these toxic chemicals will trigger a specific gene in chromosomal DNA and allow for transcription and translation of a protein for an enzymatic reaction for biodegradation. The contaminants serve as carbon and/or energy source, and microbes can degrade/mineralize them into simpler intermediate

Bacterial dehalogenases

Degradation of organohalides by dehalogenases isolated from environmental bacteria and their basic mechanisms have been described in great detail [31]. Dehalogenases can be categorized into oxidative, reductive, hydrolytic, or thiolytic dehalogenases [10, 31]. Hydrolytic dehalogenases are further categorized into haloacid dehalogenases, haloalkane dehalogenases, fluoroacetate dehalogenases, and halohydrin dehalogenases that degrade aliphatic halogenated organic compounds. The haloacid dehalogenases are sub-divided into three kinds based on their substrate selectivity. The L-2-haloacid dehalogenases (L-DEX) and D-2-haloacid dehalogenases (D-DEX) catalyze L-2-haloalkanoate and D-2-haloalkanoate to produce L-2-hydroxyalkanoate and D-2-hydroxyalkanoate, respectively. D,L-2-haloacid dehalogenases (D,L-DEX) react with both D- and L-2-haloalkanoates, yielding 2hydroxyalkanoates, respectively [10, 31]. Haloacid dehalogenase (HAD) enzymes belong to a large phosphohydrolase superfamily, which include hydrolases, oxygenases and dehydrogenases that degrade polychlorinated biphenyls [10, 31]. Due to the rapid development of the WGS process, new microorganisms or enzymes (such as dehalogenase) can be easily identified and studied further by analyzing the full DNA sequence in genome databases.

Strategies for genome sequencing

The application of genome sequencing methods is classified into amplicon sequencing, shotgun metagenomics, single-cell genomic sequencing, and whole-genome sequencing of cultured microbes, depending on the issue to be solved (Fig. 2). Progress in WGS has provided a better understanding of how microbes degrade pollutants and their environmental adaptation at genetic levels. Many enzymes, for instance, dehalogenases and their pathways responsible for pollutant degradation, have been described. Complete genome sequencing techniques have unveiled several enzymes that participated in pollutant degradation in different environments, as discussed in “Application of whole-genome sequencing techniques to pollutants degradation research” section. Although the various complete genome of distinct pollutant-degrading bacteria is known, the genetic basis of pollutant degradation (especially the regulatory function of haloacid dehalogenases) and environmental adaptation to the environments are yet to be clarified. Therefore, this article reviews the WGS approach in pollutant bioremediation by emphasizing the potential regulatory mechanism of haloacid dehalogenase. It is an excellent means for consolidating information on the complete bioremediation of organohalide pollutants. Identifying individual strains in the complex microbial communities can improve and, undoubtedly, expose more novel enzymes/genes capable of degrading pollutants from an individual cell.

Fig. 2
figure 2

An overview of WGS approaches used in the bioremediation study of dehalogenase-producing microbes

Whole-genome sequencing (WGS) of cultured microbes

Whole Genome Sequencing (WGS) is one of the best methods for regaining the microbes' genetic and metabolic diversity. It becomes a problem when many microbes evade cultivation [32], making the complete genome less appreciated using traditional approaches. This technique uses sequencing platforms such as Illumina, PacBio, Nanopore, Qiagen, BGISEQ, IonTorrent, or other sequencers. Although this technique has flaws in genome sequencing, it can be rectified using a targeted enrichment method. The method enhances a particular microbe's isolation within a specified community, having the biochemical property of interest. Pertinently, the term ‘targeted enrichment’ refers to the isolation of a single microbe via a physically enhanced cell population based on patterns of phenotypic traits like size, density, shape, and the native spectral trait [33]. Subsequently, the isolated microbes are used for a complete genome assembly and sequencing.

However, this technique is not suitable for complex environmental samples because measured properties do not uniquely define them. This issue can be overcome by limited enrichment, enhancing the outcome of efforts on varying data sets of organisms. The procedure has successfully overcome the problem of low-yield in selected microbes [34]. Regardless of the availability of next-generation sequencing methods, improvement in bioinformatics, and the potential of sequencing data, the use of WGS of cultured microbes is still at a nascent stage. This method has allowed microbial ecologists to explore, compare, and characterize microbial communities since ∼ 98% of bacteria in an environmental sample are unculturable by traditional laboratory methods. The issue above can be overcome by metagenomics, and single-cell genomics, both of which allow scientists to access unculturable microbial genomes.

Metagenomic sequencing

Metagenomic sequencing reads DNA accurately from an environmental sample without cultivating individual colonies. The entire method involves the extraction of DNA from environmental samples, amplification, and high-throughput sequencing. DNA fragments produced by the sequencer are then suitably categorized (binned) and assembled into contigs by bioinformatics models. The qualified and accepted bins are then designated as metagenome-assembled genomes (MAGs) [35, 36]. According to the literature, metagenomic sequencing is grouped as amplicon metagenomic and shotgun metagenomic.

Amplicon sequencing

The partial 16S-based metagenomic technique, otherwise known as amplicon or targeted metagenomic, uses microbial marker genes like 16S rRNA, internal transcribed spacer, and other marker genes. The most general marker gene for amplicon or target sequencing is the 16S rRNA gene. It is used as a taxonomic marker that can resolve one significant query connected to microbial ecology as “who is there”, by conveying the sequence reads to a taxonomic ancestry based on identified 16S rRNA database, like Greengenes, SILVA or Ribosomal database project (RDP) [37]. However, the bacterial roles cannot be clearly defined when the 16S rRNA sequence reads fail to adequately resolve or detect microbes at the species/strain level. Also, the marker genes survey only emphasizes a few common genes, as they cannot precisely distinguish the microbes' functional or metabolic potentials [38,39,40].

Currently, projecting microbial functional capabilities from 16S rRNA gene is a prevalent substitute over the shotgun metagenomic technique. The cheaper 16S rRNA gene sequencing offers taxonomic structure but not functional interpretations. Several software packages can solve this challenge by identifying specific features to forecast the functional metagenomic abilities according to known 16S rRNA gene sequences linked to genomes. Piphillin, PICRUSt (Phylogenetic Investigation of Communities by Reconstruction of Unobserved States), MelonnPan (Model-based Genomically Informed High-dimensional Predictor of Microbial Community Metabolic Profiles), and other similar tools are publicly accessible computational software for predicting microbial functional capabilities from detected 16S rRNA genes. The software uses direct nearest-neighbor matching or relies on reference phylogenetic trees of 16S rRNA gene amplicons to infer metagenomic function. The Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCyc are among the reference genome databases for predicting metagenomic functions [40, 41]. Shotgun metagenomic sequencing can validate the purported theory using phylogeny and functions [42].

Shotgun metagenomics

Shotgun metagenomics sequencing can reveal the potential of microbial communities and offers insights into their diversity, life cycle, and functions. The metagenomic reads encoding genes of interest offer functional annotation through gene fragment recruitment, de novo gene prediction, and protein family classification [43]. This technique provides a solution to two crucial questions associated with microbial ecology; “who is there” and “what are they doing.” Annotation is assigned to the reads to establish the functional gene, using databases like Non-Redundant (NR) and Kyoto Encyclopedia of Genes and Genomes (KEGG) eggNOG [44]. This approach offers a complete thoughtful of the community composition at a high resolution and possible metabolic lane related to the microbial community [45]. However, the assignment of functional annotation and metagenome assembly may prove challenging for shotgun metagenomic sequencing. The microbial classification structure is not available, unlike the 16S rRNA reports. To address this setback, RiboFR-Seq and epiPCR (emulsion, paired isolation, and concentration PCR) techniques can simultaneously capture both the 16S rRNA variable areas and their flanking protein-coding genes. In addition, they link functional genes and phylogenetic markers in uncultured single cells connected to metagenomic contigs and 16S rRNA profiles [46]. However, the methods only partly resolve the problem and are unable to connect all the functional genes of the microorganisms to their phylogeny.

There are currently two main methods in shotgun metagenomics, which focus on various parts of the microbial community within a defined environment. Firstly, the structural metagenomics' primary emphasis is to study uncultivated microbial composition and other properties, i.e., the complex metabolic network structure between community members [47]. Here, the microbial community composition is defined as the population structure, and its dynamics in a specific environment is related to factors such as pressures and spatiotemporal parameters. Greater insight is provided by observing the community composition in terms of interactions among the individual microbes within the community, vital for offering biochemical functions among their group members [47]. The studies of 16S rRNA and shotgun metagenomics are not equally exclusive. The methods establish a link between the 16S rRNA analysis to genes, which showed that the metabolic pathways are beneficial to determine the functional potential of a microbiome [48]. The approaches complement each other and permit a more in-depth investigation of pertinent biological queries in microbial ecosystems like “who are the community members?” and “what are their functional roles?”. The single-cell sequencing is recognized as an effective technique that provides sequencing information on target microbe at single-cell levels due to flaws in amplicon sequencing and shotgun metagenomic sequencing.

Single-cell genomics

This term signifies an individual cell's genomes, which might or might not include the complete genetic range in the microbiota. Single-cell sequencing of microbial cells is becoming an essential tool for the microbiologist. The technique complements other existing techniques, including traditional culture-based methods and metagenomic sequencing. Genome sequencing of individual cells is a novel culture-independent technique that provides an evolutionary record of the microbe and enables cell-to-cell variability studies in microbial populations. It links metabolic function to specific species and creates a high-quality genome for species with low richness [49].

The single-cell genome sequencing has steps that incorporate sample preparation and single-cell isolation by advanced isolation methods (such as micromanipulation, flow cytometry, microfluidics, and encapsulation in droplets). Other methods include DNA extraction, phylogenetic classification by 16S rRNA gene, WGA using Multiple Displacement Amplification (MDA), library preparation, sequencing, and data analysis [49]. In the case sequence analysis, the distinct steps comprise of quality assurance of raw reads, genome assembly (using a single-cell-specific assembler), automated and/or manual contaminant identification and removal, annotation, genome quality inspection, and categorization according to the Minimum Information on Single Amplified Genome (MISAG) standards [49] and database submission. The method improves complete genome sequence from a single bacterium, enabling novel discoveries of genes/pathways capable of degrading pollutants. This is made possible by studying how a specific gene is regulated by looking at the putative operon. Single-cell sequencing can unveil novel bacteria with an alternative genetic code, observe the gut microbial of cells that use host originated compounds, and quantify absolute taxon abundances in the gut microbiome [50].

Application of whole-genome sequencing techniques to pollutants degradation research

The use of WGS has led to novel discoveries on genes/pathways from microbes capable of degrading pollutants from the environment. Some of the most noticeable genes/pathways are summarized in Table 2. In 2009, Suenaga et al. found over 25 estradiol dioxygenase genes responsible for catalyzing the ring cleavage of dehydroxylated central intermediate compounds during aerobic degradation of aromatic hydrocarbon [51]. Monooxygenases, dioxygenases, hydroxylase, and ring-hydroxylating dioxygenases (RHD)-degrading aromatic polyaromatic hydrocarbons (benzoate, phenol, biphenyl, polychlorinated biphenyl, hexadecane, naphthalene, and phenanthrene) were also reported [52,53,54,55]. Some researchers detected several potential genes in dehalogenases, laccase, and cutinases. The genes were found to encode the degradation of halogenated hydrocarbons, chlorinated solvents, industrial dyes, and polyethylene terephthalate (PET) [9, 48, 56,57,58,59,60,61].

Table 2 Novel biodegradation genes/pathways discovered through genome sequencing approaches

The complete genome sequence of pollutant-degrading bacteria has been extensively studied [62]. Bacterial isolates in oil/petroleum polluted sites, polluted marine water, wastewater, contaminated soils, and pharmaceutical-contaminated site (Table 1) have been sequenced, and the genetic sources of their pollutant mineralization potential had been revealed [62, 63]. In such cases, genome analysis of pollutant-degrading bacterial strains, pollutant degradation/uptake mechanism, and their genetic adaption for growing in the pollutant stressed environments might prove useful. Though pollutant degrading bacteria's complete pathway has been clarified, only a few reports have highlighted the degradation pathway [62]. Sequencing environmental/individual DNA via next-generation sequencing is rapid with minimum cost implications, and new innovative sequencing equipment is produced nearly every year. A large amount of data is available from next-generation sequencing platforms, but advances in bioinformatics analysis are lagging. For instance, reconstruction of a complete genome from most environments remains a challenge unless enrichment methods can reduce microbiome complexity in microcosms [64]. Till now, only genomes of dominants strain can be constructed from complex metagenomes [65]. The situation will only change if there is a computational improvement to restructure or complete genomes of rare taxa. Genomic techniques for addressing biochemical roles can help assign a functional and taxonomic unit to access the biodegradation activity [66]

Structural and functional characteristics of dehalogenase genes

The structural and regulatory function of dehalogenase genes can be understood using a cluster of genes called an operon. It is depicted as clusters of genes with related structures and functionally allows regulation of expression in microbes. Since the discovery of lac operon and various catabolic operons, microbes/enzymes' control strategies had been uncovered. Several microbial genomes show groups of genes within a single process that may be co-jointly transcribed and regulated in classical operons or with distinct promoters and regulators. However, the level of operon gene arrangement and gene clustering varies among species. In some bacteria, operons are reasonably unpreserved, and genes involved in one cellular process can be scattered in the genome [80]. Previously, gene cloning successfully revealed the gene structure of the haloacid dehalogenases in Rhizobium sp. RC1 (Fig. 3) is the only bacterium that produces three different haloacid dehalogenases, dehD, dehE, and dehL. The operon regulated a single regulatory gene (dehR), which controls all three structural genes of dehD, dehE, and dehL in Rhizobium sp. RC1. Rhizobium sp. RC1 dehalogenase is involved in the degradation of organohalide compounds and its expression is stimulated only in the presence of the pollutants in the environment [81].

Fig. 3
figure 3

A proposed genetic structure of haloacid dehalogenase genes of Rhizobium sp. RC1. The dehR represents a regulatory gene that controls all three dehalogenases. The dehP is the permease gene encoding the dehalogenase uptake protein. P1 and P2 represent promoter regions of the structural genes dehE and dehD/dehL, respectively. The ? represents an unknown gap between the two sets of genes. It can only be resolved when the full genome sequence of RC1 is obtained

Dehalogenase-coding genes are jointly clustered with regulatory genes, transport or uptake proteins, accessory genes, and other genes that partake in organohalides catabolism. Regulatory genes are usually located near dehalogenases and, often, in the opposite direction. They are a general feature of transcriptional factors in bacteria that enables intensive expression, production of transcriptional regulators nearby targeted genes, and favors efficient recognition of subsequent DNA-binding sites by averting spatial diffusion [82]. The entire genetic organization can only be studied by knowing the full genome sequence analysis.

Transcriptional factors involved in organohalide detection having the Helix-Turn-Helix (HTH) DNA binding domain are found in almost all known bacteria, transcriptional regulators. They make up the majority of regulators involved in bioremediation [83]. The binding of small ligands to this site induces conformational changes in transcriptional factors, thereby affecting their DNA-binding properties [84]. Some transcriptional factors have been associated with specific organohalide compounds, for instance, transcriptional regulators of the lysR family (like CatR, ClcR, LinR, PcpR). They typically detect aromatic or aliphatic organohalide pollutants such as chlorobenzoate, chlorophenol, chloroethane, and chloroethene [62]. The NtrC family of transcriptional regulators (DmpR, MopR, XylR) can identify both aromatic compounds substituted with or without halogens, although with low affinity [84]. The MarR family transcriptional factors, on the other hand, regulate aliphatic organohalide solvents [85]. However, the complete genome of Norcadia soli strain Y48 was found to contain the above said transcriptional factors, in addition to TetR, GntR, and Ic1R-family transcriptional regulators [86].

Dehalogenase expression may also be affected by transport/uptake proteins, which could be part of its operon. Genes encoding for transporters or uptake proteins in the uptake of organohalides are often located near dehalogenase genes (Fig. 3) [87]. Generally, many enzymes/genes involved in pollutants degradation and transcriptional factors that respond to organohalide pollutants are not fully characterized [88]. The survival of bacteria and their ability to catalyze under harsh conditions hinge on organohalides and nutrient uptake systems. With regards to organohalides biodegradation, genomics and metagenomics approaches have successfully identified genes in several microbes that code for enzymes related to their biodegradation. Information on the whole-genome analysis of genes of pollutant-degrading bacteria structure and function is described in “Structural and functional characteristics of dehalogenase genes” section.

Table 3 presents the complete genomic survey of bacteria capable of degrading organohalide pollutants. Many aspects of regulatory pathways remain unexplored compared to extensive information on genes and enzymes responsible for organohalides degradation. Hence, it is unsurprising that some aspects of the regulatory pathways are responsible for limiting organohalides' practical degradation in unpolluted or polluted environments.

Table 3 The complete genomic survey of bacteria with the ability to degrade organohalide pollutants

Genomics is a powerful computer-based technology used to understand the structure and feature of all genes in an organism [101]. In this review, the structure and regulation of haloacid dehalogenases are illustrated using the complete genome of organohalide-degrading Burkholderia caribensis MBA4, Pseudomonas aeruginosa N002 and Sphingobium chlorophenolicum strain L-1. Burkholderia caribensis MBA4 specifically degrades monochloroacetic acid and D,L-2-bromopropionic (D,L-2-BP) acid, but acts weakly on D, L-2- chloropropionic acid (D,L2-DCP) [56, 58]. The haloacid utilizing operon comprising of dehalogenase deh4a and permease deh4p genes was discovered in replicon CP012747. In contrast, the deh4a eight other genes were annotated as haloacid dehalogenase or haloacid dehalogenase-like proteins for the whole genome (Fig. 4a) [58]. The role of permease deh4p is to transport monochloroacetic acid into the cell. Figure 4b illustrates the hydrolysis of glycolate by a glycolate oxidase (an enzyme that has 3 subunits; viz GlcD, GlcE and GlcF.) where the genes clustered as an operon. In the case of Burkholderia caribensis MBA4, three glycolate oxidase operons were identified [58], of which one gene is located downstream of the deh4a in replicon CP012747 (Fig. 4b). This operon has a downstream malate synthase gene (glcB) and an upstream regulatory gene (glcC) in the opposite strand. Another glcDEF containing an upstream glcC was discovered in replicon CP012746 with neither adjacent glcC nor glcB (Fig. 4b). Hence, the findings conveyed that glycolate oxidase could utilize glycolate in three ways [58].

Fig. 4
figure 4

Genomic organisation of a haloacid dehalogenase deh4a operon, b glycolate oxidase operon in Burkholderia caribensis MBA4 [56, 58]

In the second example, the complete genome of Pseudomonas aeruginosa N002 contained several genes involved in the degradation of alkane, alkene, aromatic hydrocarbon, other crude oil products and organohalides (including chloroalkane, chloroalkene, chlorocyclohexane and chlorobenzene) [62]. The 2-haloacid dehalogenase (had-2) at position A222_04245 was clustered with other proteins such as, alcohol dehydrogenases (frmA, Fe-adh, adhP), transcriptional regulator (lysR), and other related proteins (Fig. 5). Other pollutants-degrading enzymes were also detected.

Fig. 5
figure 5

Genome organisations of 2-haloacid dehalogenase (had-2) of Pseudomonas aeruginosa N002 in cluster with other genes. NAD-dependent aldehyde dehydrogenases (exaC), transcriptional regulator (lysR), class III alcohol dehydrogenase (frmA), predicted hydrolases (dehH), NAD-dependent aldehyde dehydrogenases (aldedh), cytochrome C (cytC), alcohol dehydrogenase, class IV (Fe-adh) and Zn-dependent alcohol dehydrogenases (adhP)

Thirdly, whole-genome analysis of Sphingobium chlorophenolicum strain L-1 by Copley et al. (2012) [99] showed that pentachlorophenol (PCP) was metabolized by three different enzymes. The dehalogenase (PcpC) is clustered with structural proteins (PcpBD, PcpA and PcpE), transcriptional regulators (PcpR and PcpM), transporter systems, and other proteins that partake in PCP hydrolysis (Fig. 6).

Fig. 6
figure 6

Genomic orientation of PCP degrading genes. PcpR and PcpM encode transcriptional regulators that control the expression of structural genes pcpB,D, pcpA, pcpE, and pcpC

Conclusion

Halogenated organic compounds present a critical global environmental problem due to their toxicity and persistence. Anthropogenic activities release an enormous quantity of pollutants, and it is expected that a significant amount will remain in the environment. The problem can be overcome by using natural microbes to restore a polluted environment. Many newly discovered pollutant-degrading enzymes represent new tools for environmental biotechnology. This broad knowledge acquired has allowed us to discover numerous exclusively dehalogenase-producing bacteria, profoundly improving our understanding of different microbes' capabilities to degrade pollutants in a wide range of environments. Scientists can better comprehend the complete degradative potential, interactions, and functions of unculturable microbes. However, before using whole-genome approaches for uncovering microbial usability in bioremediation, certain areas must first be addressed, for instance, the specific gene’s structure, function, and regulation. Therefore, combining these genomic techniques with the data delivered by high-throughput technologies is now possible, capable of accelerating the discovery of enzyme regulatory bioremediation pathways (i.e., dehalogenase). Consequently, integrating these techniques with mechanistic information of bioremediation processes, the elucidation of structure–function relationship and knowledge on the microbes' regulatory pathways will provide the basis for successful biodegradation processes and leading to improved intervention strategies for bioremediation. The technology will allow a better understanding of the complete bioremediation process of organohalide pollutants.