Keywords

7.1 Introduction

Soil is a complex dynamic biological system where it is difficult to link specific taxa to metabolic processes (Burns et al. 2013). Soil metagenomics, which comprises isolation of soil DNA and the production and screening of clone libraries, can provide a cultivation-independent assessment of the largely untapped genetic reservoir of soil microbial communities (Daniel 2005). The term ‘metagenome’ was proposed by Handelsman et al. (1998) to describe the genomes of the total microbiota found in nature that can be understood as the whole collection of genomic information of all microorganisms in a given environment (Di Bella et al. 2013). The increasing availability of genes, genomes and metagenomes as well as the growing understanding of their functionalities are supported by considerations of microbial physiology, biochemistry, genetic regulation and engineering in pure strains or defined communities for implementation in bioremediation (Agathos and Boon 2015).

Metagenomics is a rapidly growing area of genome sciences that seeks to characterize the composition of microbial communities , their operations and their dynamically co-evolving relationships with the habitats even in unculturable environments (Franzosa et al. 2015; Taupp et al. 2011; Yong and Zhong 2010; Turnbaugh and Gordon 2008). Metagenomics involves sequencing the total DNA extracted from environmental samples (Thomas et al. 2015). Metagenomics offers the possibility to retrieve unknown sequences or functions from the environment, in contrast to methods relying on PCR amplification which are based on prior knowledge of gene sequences (Stenuit et al. 2008).

The primary objective of any metagenome sequencing project is the total characterization of a community, taxonomic breakdown and relative abundance of the various species and genic composition of each member of the community including number, functional capacity and intra-species/intra-population heterogeneity of the genes (Scholz et al. 2012). Metagenomics can be employed to identify the functional potential and the taxonomic identity of all organisms in an environmental sample, without leaving any information regarding the active members of the microbial community involved (El Amrani et al. 2015). In such a case, metagenomic techniques such as single isotope probing can be applicable for the identification of active members of the microbial community and associated genes essential for biodegradative processes (Uhlik et al. 2013).

Functional metagenomics includes screening of environmental-DNA libraries for enzymatic activities or metabolite synthesis (Tannieres et al. 2013). The application of metagenomics might aid in the isolation of novel catabolic pathways for degradation of xenobiotic compounds, indicating the functional genetic capacity for contaminant degradation and providing molecular tools useful for identification of the microbial taxa encoding the biodegradative gene (Kakirde et al. 2010). Metagenomic approaches enable to identify several novel genes encoding cellulolytic, pectinolytic, proteolytic and lipolytic enzymes and many new enzymes for screening and identification of unexplored microbial consortia involved in soil xenobiotic degradation (Bashir et al. 2014).

Xenobiotics are foreign compounds to living organisms whose molecules are not easily recognized by existing degradative enzymes and tend to accumulate in soil and water. Xenobiotics include polyaromatic, chlorinated and nitroaromatic compounds, known to be toxic, carcinogenic and mutagenic for living organisms (Eyers et al. 2004). The toxicity of these compounds for the environment and for biota results from their resistance to natural degradation owing to their structural complexity (Ufarte et al. 2015). During microbial degradation of xenobiotics, all changes in the chemical structure are due to the action of enzymes. These enzymes possess a broad range of specificity to accommodate several molecules of similar structure. If such enzymes are identified and isolated, they can be engineered by directed evolution to improve their efficiency with respect to a particular compound (Theerachat et al. 2012).

It is estimated that soil metagenome accommodates approximately 6000–10,000 Escherichia coli genomes in undisturbed organic soils and 350–1500 genomes in disturbed of which only 5 % has been cultured and studied in the laboratory (Desai and Madamwar 2007). Metagenomic analyses have enabled researchers to explore the previously uncultivable microorganisms and exploit their genetic potential in the bioremediation of contaminated soil (Martin et al. 2006; Malik et al. 2008; Simon and Daniel 2011). The metagenomic DNA of polluted environments is a potential genetic resource from which phylogenetic affiliation of uncultured bacterial species could be determined and their genetic potential can be tapped by identifying novel biocatalyst, xenobiotic and metal-detoxifying genes with utility in bioremediation processes (Meier et al. 2015, 2016; Desai and Madamwar 2007). Predictive elative metabolic turnover (PRMT ) converts metagenomic sequence data into relative metrics for the consumption or production of specific metabolites (Larsen et al. 2011).

Metagenomics lacks the tools to determine whether sufficient coverage is available for the type of analysis planned or whether one can interpret data of a certain depth for a community of a given complexity. Therefore, the standard low coverage in metagenomic studies generates a dataset that reflects a random subsampling of the genomic content of the individual community members (Desai et al. 2012).

7.2 Approaches to Metagenomics

There are different approaches to metagenomics: (1) shotgun metagenomics where all DNA is sheared and sequenced and functions and taxonomy are derived from homology search in databases, (2) activity-driven studies that are designed to search for specific microbial functions, (3) sequence-driven studies that link genome information with phylogenetic or functional marker genes of interest and (4) direct determination of the whole collection of genes within an environmental sample without constructing a metagenomic library (Suenaga 2012; Harismendy et al. 2009; Shendure and Ji 2008; Brulc et al. 2009). The basic steps involved in metagenomics of soil-bound xenobiotic compounds has been analysed by means of schematic process workflow (Fig. 7.1).

Fig. 7.1
figure 1

A schematic workflow o f steps involved in soil xenobiotic metagenomics

The four metagenomic approaches described above based on their random and directed sequencing strategies can be characterized as unselective (shotgun analysis and next-generation sequencing) and targeted (activity-driven and sequence-driven studies) metagenomics, respectively. Unselective metagenomics is a simple and cost-effective DNA sequencing option (Chen and Pachter 2005). The number of metagenomic projects has exploded in recent years, and hundreds of environmental samples have been unravelled by shotgun sequencing (Ivanova et al. 2010). Whole-metagenome shotgun sequencing and amplicon sequencing have been applied to study diverse microbiomes, ranging from natural environments to the built environment and the human body (Tyson et al. 2004). In addition to enrichment culture approaches, isolated environmental DNA can be subjected to whole genome amplification, that is, multiple displacement amplification (MDA) to provide sufficient genetic substrate for library production (Taupp et al. 2011).

A shotgun metagenomic approach relies on sequencing of total DNA extracted from a given sample, without prior cloning into a vector (Jansson 2015). The application of this approach involves the design of PCR primers or hybridization probes for the target genes that are derived from conserved regions of already known protein families, which a priori limits the chances for obtaining fundamentally new proteins (Ferrer et al. 2009). The activity-based approach involves construction of small to large insert expression libraries, especially those made in lambda phage, cosmid or copy-control fosmid vectors, which are further implemented for a direct activity screening (Lorenz and Eck 2005). Three different function-driven approaches have been used to recover novel biomolecules: phenotypical detection of the desired activity heterologous complementation of host strains or mutants and induced gene expression (Simon and Daniel 2011). The major limitation of this approach to systems microbiology is that metagenomic libraries have a size limit. After constructing a library, a critical step is to screen for clones that contain target genes among a large number of clones. Here, dozens of thousands of clones may be analysed in a single screen. Certainly, owing to the limitation of efficient expression of the metagenome-derived genes in the selected host, the numbers of positive clones will not be high. Furthermore, in activity-based screening, it is necessary to develop specialized screening systems to detect the activity of the products of the gene of interest (Ferrer et al. 2009). Targeted metagenomic studies that combine metagenomic library screening and subsequent sequencing analysis appear to be a more effective means to understand the content and composition of genes for key ecological processes in microbial communities (Suenaga 2012).

7.3 Metagenomics of Xenobiotics

7.3.1 PAHs

Polycyclic aromatic hydrocarbons (PAHs ) are ubiquitous, persistent and toxic organic compounds in the environment (Cao et al. 2015; Srujana and Khan 2012). The advent of metagenomic approaches has revealed a higher degree of diversity in the degradation pathways and enzymes (Suenaga et al. 2007; Brennerova et al. 2009). Using a function-driven metagenomic approach, Sierra-Garcia et al. (2014) reported metagenomic fragments comprising of genes belonging to different pathways, showing novel gene arrangements . These results reinforce the potential of the metagenomic approach for the identification and elucidation of new genes and pathways in poorly studied environments and contribute to a broader perspective on the hydrocarbon.

Ring-hydroxylating dioxygenases/oxygenases (RHDs ) play a crucial role in the biodegradation of a range of aromatic hydrocarbons found on polluted sites, including PAHs (Chemerys et al. 2014; Peng et al. 2010). RHDs are multicomponent metalloenzymes, which catalyse the first step in the bacterial degradation of various aromatic hydrocarbons (Jouanneau et al. 2011). Hydroxylation of an aromatic ring is the essential catalytic reaction for aromatic-ring degradation by bacteria in nature. Mostly, the hydroxylation is catalysed by an oxygenase family, Rieske oxygenase (RO). ROs catalyse a broad range of aromatic-ring compounds including mono- and polycyclic aromatic and hetero-aromatic compounds that are composed of terminal oxygenase and electron transfer components. The terminal oxygenase component has a Rieske cluster as a redox component that receives electrons from the electron transfer components and mononuclear iron as a catalytic site for dioxygen activation (Inoue and Nojiri 2014). The nah genes for PAH catabolism of Pseudomonas strains are highly homologous and usually organized in two operons: the upper nah1, which control initial oxidation of naphthalene and subsequent degradation to salicylate, and the nah2 operon for salicylate oxidation. However, the location of both operons and their relative expression may vary. New variants of salicylate hydroxylase genes were found. Also isofunctional genes for salicylate oxidation could be often detected within one Pseudomonas strain. The unique genetic organization is described for P. putida AK5 which degrades PAH via salicylate and gentisate, combining ‘classical’ nah1 operon and newly described sgp-operon (Boronin et al. 2010).

In addition to the enzyme-encoding genes involved in aromatic compound degradation, metagenomic libraries were screened earlier for regulatory elements that sense aromatic compounds (Suenaga et al. 2009). The implementation of stable-isotope probing (SIP ) to track PAH degraders led to the detection of novel bacteria with remarkable biodegradation potential. SIP approaches also exposed the affiliation of uncultured microorganisms with PAH-degrading bacteria identified in contaminated soils (Chemerys et al. 2014; Uhlik et al. 2012; Singleton et al. 2005). Fluorescence-based reporter assay system termed as substrate-induced gene expression (SIGEX ) can be used for identification of transcriptional regulators that sense benzoate and naphthalene (Uchiyama and Miyazaki 2013).

Using a metagenomic approach, microbial communities were monitored in Alert biopiles over time to identify microorganisms and functional genes linked to the high hydrocarbon degradation rates in soils undergoing treatment using an unbiased, culture and PCR-independent method . Pseudomonas sp. expressing hydrocarbon-degrading genes were most abundant in diesel-contaminated Canadian High Arctic soils . After sequencing the metagenome of soil biopiles through a time course, the results were compared with uncontaminated soil and then quantified the expression and the abundance of key functional genes for abundant microorganisms identified in the metagenomic datasets (Yergeau et al. 2012). A culture-independent approach to assess the microbial aerobic catabolome for PAH degradation was used to study the microbial community of a PAH-contaminated soil subjected to 10 years of in situ bioremediation, basing on Illumina-based deep sequencing of amplicons targeting the V5–V6 region of 16S rRNA gene. A metagenomic library was prepared in pCCFos and 425,000 clones subjected to activity-based screening for key catabolic ring-cleavage activities using 2,3-dihydroxybiphenyl as a substrate. Since most of the genes encoding extradiol ring-cleavage enzymes on 672 fosmids could not be identified using primers based on currently available sequence information, 200 fosmid inserts were sequenced using the Illumina technology. Manually curated databases for catabolic key gene families involved in degradation of aromatics were developed named as AromaDeg to overcome the misannotations in databases. Sequence information of the fosmid inserts revealed not only the presence of novel extradiol dioxygenase genes but also additional key genes of aromatic metabolic pathways only distantly related to previously described variants (Duarte 2014).

An et al. (2013) reported that 160 microbial community compositions were compared in ten hydrocarbon resource environments (HREs) and sequenced 12 metagenomes to characterize their metabolic potential. In addition to common anaerobic communities, cores from oil sands and coal beds had unexpectedly high proportions of aerobic hydrocarbon-degrading bacteria. Likewise, most metagenomes had high proportions of genes for enzymes involved in aerobic hydrocarbon metabolism. Time-course analysis of microbial communities using a combination of metagenomics with metatranscriptomics and metaproteomics and stable-isotope probing technique will greatly contribute to the evaluation of the ecological functions of microbial genes at the community level (Muller et al. 2014; Kato et al. 2015). Loviso et al. (2015) investigated the potential to degrade PAHs of yet-to-be-cultured bacterial populations from chronically polluted intertidal sediments. They identified uncultured micro-organism having the potential to degrade aromatic hydrocarbons with various chemical structures thereby providing valuable information for the design of environmental molecular diagnostic tools for biotechnological application of RHO enzymes. When spatial and temporal variations of microbial communities and reconstructed metagenomes along the rice rhizosphere gradient during PAHs degradation were investigated, distance from root surface and PAH concentrations were found to affect the microbial communities and metagenomes in rice rhizosphere. The abundance of dioxygenase genes relating to PAH degradation in metagenomes mirrored the PAH degradation potential in rice rhizosphere (Ma et al. 2015).

7.3.2 Organochlorinated Compounds

Biphenyl dioxygenase (BphA) is a key enzyme in the aerobic catabolism of PCBs which carries out the initial attack on the inert aromatic nucleus. It belongs to class II of aryl-hydroxylating dioxygenases (ARHDOs) that typically hydroxylate substituted benzenes, like toluenes and biphenyls. This enzyme represents a catabolic bottleneck, as its substrate range is typically narrower than that of subsequent pathway enzymes. Metagenomic approaches can be applied to demonstrate the feasibility of the applied approach to functionally characterize dioxygenase activities of soil metagenomes via amplification of incomplete genes (Standfuß-Gabisch et al. 2012).

\( \gamma \)-Hexachlorocyclohexane also known as lindane (γ-HCH/γ-BHC) is a xenobiotic halogenated insecticide that was previously used worldwide, and this compound still remains in the environment and causes serious environmental concern (Vijgen et al. 2011). Activity-based screening techniques were applied to clone a gene-encoding γHCH dehydrochlorinase with its flanking regions from a cosmid-based library of DNA that was extracted from a γHCH-added suspension of HCH-contaminated soil. A total of 11 cosmid clones showing the γHCH dehydrochlorinase activity were obtained through the screenings. All the clones had a linA gene identical to known one, but its flanking regions showed some structural variations with known ones, suggesting high likelihood of genetic divergence in the linA flanking regions (Ito et al. 2012).

7.3.3 Nitroaromatics

Nitroaromatic compounds such as nitrobenzene or nitrotoluene are widely used as pesticides, dyes, polymers or explosives and are considered as priority pollutants (Kulkarni and Chaudhari 2007). The two main explosives, 2,4,6-trinitrotoluene (TNT ) and hexahydro-1,3,5-trinitro-1,3,5-triazine [Royal Demolition Explosive (RDX)], are major nitroaromatic environmental pollutants and present distinct problems for bioremediation (Rylott et al. 2011). Rational design of enzymatic activity has been used to improve the degradation of nitroaromatic compounds. Nitrobenzene 1,2-dioxygenase catalyses the conversion of nitrobenzene to catechol and nitrite. The residues near the active site of this enzyme were modified for controlling substrate specificity. The substitution of amino acid at the position 293 (F293Q) expanded substrate specificity, resulting in 2.5-fold faster oxidization rate against 2,6-dinitrotoluene (Singh et al. 2008). Lee et al. (2005) reported that the residues of 2,6-dinitrotoluene near active sites chosen for site-directed mutagenesis and the replacement at the position 258 significantly changed the enantiospecificity.

Biodegradation of para-nitrophenol (PNP ) proceeds via two distinct pathways in Burkholderia sp. strain SJ98, having 1,2,3-benzenetriol (BT) and hydroquinone (HQ) as their respective terminal aromatic intermediates. A ~41 kb fragment from the genomic library of Burkholderia sp. strain SJ98 has been sequenced and analysed. This DNA fragment was found to harbour all the lower pathway genes. Later the whole genome of strain SJ98 was sequenced and annotated and found two ORFs (viz. pnpA and pnpB) showing maximum identity at amino acid level with p-nitrophenol 4-monooxygenase (PnpM) and p-benzoquinone reductase. This is the first report for studying the genes for PNP degradation in strain SJ98 which are found to be arranged differentially in the form of non-contiguous gene clusters (Vikram et al. 2013).

7.4 Challenges and Future Prospects

Although metagenomics is revealing new information about phylogenetic and functional genes in some soils, it is not possible to adopt the information available to date to all soils (Jansson 2015). Owing to the complexity and heterogeneity of the biotic and abiotic components of soil ecosystems, the construction and screening of soil-based libraries is difficult and challenging (Daniel 2005). Soil metagenomics is susceptible to limitations that are common to all molecular techniques. Soil DNA extraction procedures are not fully efficient, where adsorption of cells and the adherence of DNA onto soil components cause losses of genetic information, and the DNA exploitation techniques currently in use provide access mainly to populations that dominate in soil (Lombard et al. 2011). Metagenomics provides little information on quantitative physiological characteristics such as maximum specific growth rate, saturation constant, pH, temperature for growth, susceptibility to predation and the speed of recovery after starvation. It is also difficult to draw meaningful information from correlations between the physicochemical characteristics of soil and metagenomic data (Prosser 2015).

Data analysis is the key limiting factor in metagenomic studies as increased data volumes are posing significant challenges to the existing analysis tools and indeed to the community providing analysis systems. This growth in dataset size, along with computational complexity of analysis, has left the metagenomics community in an unsustainable position, in terms of both financial cost and feasibility of analysis itself (Desai et al. 2012). While metagenome sequencing can provide useful estimates of the relative change in abundance of specific genes and taxa between environments or over time, this does not investigate the relative changes in the production or consumption of different metabolites (Larsen et al. 2011). Adapted sampling strategies and the combination of DNA extraction methods can help to recover these minority populations, which are normally masked by the dominant ones. Pyrosequencing and Illumina/Solexa technologies also offers a chance to access the rare biosphere, but is still concerned by the overriding effect of the dominant biota. Methods such as prior separation of particular minority cells via flow cytometry, or separation/fractionation of DNA by G+C % or in a SIP-based approach, will certainly help to tease out specific minority populations (Lombard et al. 2011).