Keywords

1 Introduction

Metabolomics may be defined as the analysis of thousands of naturally occurring small biomolecules (metabolites) that are the substrates and products of both primary and secondary cellular metabolism (Jones and Cheung 2007). Such molecules include sugars, organic acids, amino acids, flavonoids, lipids and nucleotides amongst many others. An organism’s “metabolome” is its full complement of metabolites, in the same way that its genome comprises its complete genetic content and a proteome entails the complete set of proteins expressed and modified following their expression by the genome (Wilkins 2009). A very basic outline of a metabolomics experimental setup is shown in Fig. 1.

Fig. 1
figure 1

Simplified overview of a metabolomics experiment

The metabolome can be used to study the underlying biochemical responses of an organism/population/community to an environmental stimulus or combination of stimuli. However, because both populations and communities of microorganisms are often difficult to culture, metabolomic analysis of such assemblages must take a different approach from that taken by studies using tissue or biofluid(s) from larger organisms such as humans. A new and effective way to do this is to borrow techniques and approaches from the field of metagenomics.

Metagenomics is a rapidly growing area of the genome sciences founded by Venter et al. (2004) who hit upon the idea of undertaking the analysis of the total complement of microbial DNA extracted directly from entire microbial communities in their native habitats without worrying about which species it came from. This approach allowed them to sequence such communities in their entirety, regardless of the ability of individual component members to be cultured in the laboratory (Handelsman 2004). Early studies had great success. For instance, when the technique was applied to the analysis of samples from the Sargasso sea over 1.2 million previously unknown genes were identified (Venter et al. 2004).

Metagenomics allows one to see, in ever-increasing detail, the vast microbial and metabolic diversity that exists in the biosphere. It has been utilised in several high-profile publications in studies ranging from the aforementioned study of the genetic diversity of microbes in seawater (Venter et al. 2004) to phosphorus and carbon management in sewage sludge (Garcia et al. 2006; Sales and Lee 2015), to the effects of warming on carbon sequestration and lignin oxidation in soil (Feng et al. 2008).

It is of note, however, that the other omic sciences have yet to replicate this success. The first detailed study on the NMR-based metabolomics of microbial communities in environmental samples was published by Jones et al. (2014) who looked at the effects of pollutants on soil metabolic profiles in old mine sites in the UK. There have since been a few similar studies including Swenson et al. (2015) and Rochfort et al. (2015) who both looked at metabolites in the extractable organic matter in soil, and Llewellyn et al. (2015) who used community metabolomics as a new approach to discriminate marine microbial particulate organic matter in sediments from various locations in the English Channel. Plant-associated fungal communities assessed by meta-omics techniques were also the subject of a recent review by Peršoh (2015).

Much remains to be learnt from this area of research through what might be termed “Meta-Metabolomics” or, as proposed by Jones et al. (2014), “Community Metabolomics”. Under this definition, in the same way as metagenomics indicates the analyses of all the DNA from a given sample, community metabolomic analysis uses all of the thousands of naturally occurring metabolites from the meta-population of a sample of a given environment such as soil or water, and perhaps even air (Castillo et al. 2012), as opposed to those from simple, laboratory based monoclonal cultures (Eisen 2007) or clinical isolates (Bundy et al. 2005).

Community metabolomics has the potential to influence a range of “bio” related disciplines, including medicine. For instance, Bundy et al. (2005) showed that one can use metabolomics to classify pathogens on the basis of their metabolic profile, even when it is not possible to infer a direct link to known virulence factors. This could also have a wide range of possible applications in both general microbiology and microbial ecology for distinguishing and identifying different functional/physiological ecotypes of bacterial strains or species. In a similar vein, pioneering work by Nicholson et al. (2005) at Imperial College London proposed that the appropriate consideration of individual human gut microbial activities is a necessary part of future personalised healthcare. This is because the gut microbiota of most species interacts extensively with the host through metabolic exchanges and cometabolism of substrates. Such interactions are presently poorly understood but are highly likely to be involved in the aetiology of many human diseases as well as the fate and toxicity of drugs taken to alleviate said diseases (Nicholson et al. 2005). This topic is expanded upon elsewhere in this book, so will not be explored further here.

2 Soil Science

Environmental science is another area where microbial metabolomics has great potential and it is this area on which this chapter will focus. Whilst most work carried out on this topic has been, thus far, relatively simple, it illustrates the potential of the approach. For instance, Fourier Transform-Infrared Spectroscopy (FT-IR) has been demonstrated to allow the chemically based discrimination of microbial genotypes, and to produce biochemical fingerprints which are both reproducible and distinct for different bacteria and fungi (Timmins et al. 1998). Scullion et al. (2003) also tested the potential of FT-IR spectroscopy for investigating microbial communities and their activities in soil. A range of samples including laboratory cultures of similar soil bacteria, plant materials and earthworm casts from worms of various ages and feeding regimes, were analysed using cluster analyses and this proved capable of differentiating between different bacterial, litter and cast samples. However, this work has not been followed up in detail and the amount of diagnostic information that can be obtained from FT-IR is limited.

A more complicated study was that mentioned above by Jones et al. (2014), which utilised 1H-nuclear magnetic resonance spectroscopy (NMR)-based community metabolic profiling to assess the changes in biochemical profiles of communities living in contaminated soils from various sites in the UK, each with very different physicochemical characteristics (levels of metal contamination, underlying geology and soil type). Each site could be clearly distinguished on the basis of the metabolic profile of its microbial community. While some of these site differences may also have been caused by additional abiotic factors (such as soil type or pH), pattern recognition analysis of the data showed that both site- and contaminant-specific effects on the metabolic profiles could be discerned. The study therefore acts as a proof of principle for the use of community metabolomics of microbial populations from whole soil samples (rather than single isolates) as a diagnostic tool for pollution assessment. Assigning peaks in NMR spectra of soil extracts is difficult but software tools such as Chenomx NMR Suite that overlay library spectra of known compounds over the sample spectra can help (an example is shown in Fig. 2). Recently developments in NMR software such as Bayesil, (http://bayesil.ca/) a web based system that automatically identifies and quantifies metabolites developed at the University of Alberta (Canada) may also be of help in future.

Fig. 2
figure 2

Section of a 1H NMR of soil extract (black) overlain with NMR spectra of lactate (red) and valine (blue)

Rochfort et al. (2015) also used NMR to assess metabolites in soils. They found similar results to Jones et al. (2014), although in the Rochfort study there were more resonances as a result of lipids and terpenes in the NMR spectra. The authors put this down to differences between the extraction methodologies employed in each study, although it seems equally possible that soil type, condition or land use could have been responsible. Both the Jones and Rochfort studies demonstrated that soil extracts could be measured by NMR, and there were no obvious interference or line broadening as a result of paramagnetic materials that might have been present in the samples. It is interesting to note that Rochfort et al. also used mid-infrared spectroscopy (MIR) to analyse their soil samples and found some differences. Whilst the NMR metabolomic data were more associated with land use rather than location, the MIR data were correlated more to location and inorganic chemical analysis. The results may reflect the differences between NMR and MIR data (the latter technique is far less sensitive) but may also indicate that it may be beneficial to combine two or more analytical methods in such studies to ensure a comprehensive soil analysis.

It should also be noted that some microbial metabolites rapidly adsorb to soil and thus one should also keep in mind that soil physical properties such as mineral composition, surface area, shape and porosity present a notably different set of challenges in soil analysis compared to the cell and tissue extractions more common in metabolomics. Swenson et al. (2015) showed that even after a short incubation, with water, many metabolites, for example amines and carboxylic acids, could be adsorbed to soil and/or chelated by metals or other ions in the soil or extraction medium. Cationic and anionic metabolites seem to be most affected, whilst the majority of neutral metabolites were recovered. This is an important finding since metabolites not recovered in the extraction procedure could be overlooked in the subsequent analysis. Other factors that might affect such binding include soil type, metabolite-soil contact time, temperature, moisture levels, and which extraction solvents are used. Swenson et al. (2015) point out that there is, however, a silver lining to this problem as understanding how specific metabolites bind to soil particles under differing environmental conditions may also increase our understanding of how such compounds will behave in response to future climatic changes (e.g. heat or precipitation) and thereby affect substrate availability and the structure of soil microbial communities in the future.

Analysing small molecule metabolites, in addition to more abundant macromolecules, could also help to increase knowledge of biogeochemical cycles. For example, 13C-labelled carbon sources could be fed to plants or pumped into/through the soil and the bacterial metabolome studied to assess where the carbon is transported or to detect subtle shifts in microbial populations interacting with plant litter. Indeed, the use of NMR techniques has already provided valuable data supporting the occurrence, diversity and extent of carbon cycling in the carbohydrate metabolism of microorganisms; for a review see Portais and Delort (2002).

Targeted studies have also been undertaken in order to measure flux rates of specific metabolic pathways, for example through the use of isotope labelling studies. Scullion et al. (2003) used FT-IR-based metabolic analysis to assess several potential foods (oat grain, and fresh and aged tobacco) for the earthworm species Lubricus terrestris and L. rubellus. As casts aged, there was a predictable microbial succession (from bacteria to fungi) in both earthworm species. Each species was also found to differ in terms of their cast microbiology in response to food type. Analysis of ageing casts by FT-IR spectroscopy indicated greater chemical changes in casts of L. terrestris than L. rubellus irrespective of food type.

Community metabolomics of soils is potentially very useful since the identification of biomarkers indicative of a defined response to a pollutant or pollutants, before major outward changes become apparent, would be very useful in preventing damage to a variety of sensitive systems (e.g. eutrophication). This could potentially have applications in testing of contaminated land prior to redevelopment (e.g. house-building on old industrial sites). It could also be useful for the water industry since the discharge of toxic substances to the sewage network can have negative effects on wastewater treatment plants, which rely heavily on bacteria and other microorganisms to function effectively. Similar research could also identify microorganisms with favourable metabolisms for bioremediation work.

Community metabolomics is not limited to the terrestrial environment. Llewellyn et al. (2015) used ‘non-targeted’ community metabolomics to determine patterns in metabolite profiles associated with particulate organic matter (POM) at four locations from two long-term monitoring stations in the western English Channel. They identified a range of compounds including amino-acid derivatives, diacylglyceryltrimethylhomoserine (DGTS) lipids, oxidised fatty acids (oxylipins), various glycosylated compounds, oligohexoses, phospholipids, triacylglycerides (TAGs) and oxidised TAGs. Metabolic profiles varied significantly across the four locations with the largest differences for both the polar and lipid fractions again being due to geographic location (and/or time). Smaller differences were also associated with depth.

The work demonstrated that community metabolomics is not only applicable to the aquatic, as well as the terrestrial, environment but that it has the potential to contribute significantly to the comprehensive and unbiased characterisation of marine microbial populations; particularly if linked to metagenomic data. The authors also speculated that the oxylipins could have a possible link to the formation of oxygenated volatile organic compounds produced by microbial species that are important in atmospheric chemistry.

3 Case Studies in Community Metabolomics

3.1 Bacteria and Metal Toxicity

As microorganisms form an extremely important, sessile component of ecosystems, it is little surprise that they have evolved numerous strategies to deal with environmental stressors. Laboratory investigations of individual bacterial strains provide well-controlled conditions from which to study mechanistic behaviour in response to stress. The ability of microbial strains to survive in the presence of relatively high levels of metal ions is an important environmental adaptation to toxicological stress. Recently, two specific approaches to microbial adaptation have been examined using metabolomics methods: (i) an innate metabolic ‘priming’ which confers adaptive advantages to bacteria via specific pathways, and (ii) morphological adaptation into biofilm communities which completely alter the bacterial metabolism and consequent response to metal toxicity.

The negative environmental consequences of heavy metal toxicity have long been known, and recent political and regulatory awareness has sparked a renewed vigour in attacking this problem scientifically. In living systems, such as microbial communities, heavy metal toxicity manifests itself through numerous mechanisms (reviewed in Harrison et al. 2007 amongst others). Briefly, toxic effects include: substitution for other essential inorganic elements, interaction with protein and metabolite thiol groups, production of reactive oxygen species through Fenton-type reactions, competitive inhibition of the membrane transport process and siphoning electrons directly from respiratory pathways.

3.1.1 Tellurite Resistance

Incorporation of Tellurite (tellurium dioxide—TeO2) into electronics and industrial applications has led to an increased interest in the study of the effects of environmental exposure (Chasteen and Bentley 2002). A picture of pure Tellurium can be seen in Fig. 3 but this element is not often found in its native form. A common, soluble oxyanion form of this metal, however, is tellurium oxide (TeO3 ), which is highly toxic to most organisms even at low concentrations. Pseudomonas pseudoalcaligenes KF707 is a naturally Te-resistant (TeR) bacterial strain. The exact mechanisms by which KF707 is resistant to Te are poorly understood, although there are known interactions with thiol groups that mediate the bacterial interaction with Te (Zannoni et al. 2008).

Fig. 3
figure 3

Pure tellurium

In a metabolomics study aimed at understanding the metabolic shifts associated with Te exposure, a hyper-resistant KF707 strain (T5) was isolated and studied using an NMR-based approach (Tremaroli et al. 2009). In an initial experiment using an established biochemical assay (Ellman’s reagent, 5,5′-dithiobis-2-nitrobenzoic acid) to measure cellular thiol content (RSH), it was found that exposure to 25 µg/ml K2TeO3 for 30 min resulted in an equivalent depletion of RSH content in both the wild-type KF707 and hyper-resistant T5 strains. Intriguingly, the viability of T5 cells did not decrease further with extended exposure, and infact, the cells were able to replenish RSH. Similarly, T5 cells demonstrated resistance to perturbations of the membrane potential induced by metal exposure.

In order to better understand the metabolic adaptations of the T5 strain to counteract the effects of Te, an untargeted metabolite analysis was performed on both the wild-type and T5 strains ±TeO3 (Tremaroli et al. 2009). This approach consisted of quantitatively profiling 28 metabolites using the targeted approach which accounted for the majority of peaks in NMR spectra of the samples. Orthogonal partial least square-discriminant analysis (OPLS-DA) comparing two sample types at a time revealed that there were unique metabolite profiles for baseline KF707 versus. (a) T5 cells not exposed to TeO3 , (b) KF707 cells ±TeO3 and (c) T5 cells ±TeO3 . Notably, there was no significant difference between KF707 and T5 cells exposed to TeO3 , indicating that the post-exposure profile of the two strains were similar, and that it is the pre-exposure metabolism effects which confers the selective advantage to the T5 strain. One of the compounds elevated in the baseline T5 profile compared to wild-type was glutathione, which is important in responding to cellular oxidative stress. Levels of betaine, which plays a role in response to hyperosmolar stress (Lisa et al. 1994), were also found to be elevated .

Collectively, the metabolomics experiments described above lead to the conclusion that the hyper-resistance of the T5 cells could be attributed to a series of basal adaptations that ‘prime’ the cells for exposure. This conclusion was supported by further experimental evidence using phenotype microarrays which provide a series of chemical stressors to evaluate the response of exposure to a host of toxic chemicals (Bochner et al. 2001). In addition to tellurite, T5 cells exhibited an increased tolerance to a number of other ions, including selenite, chromium, aluminium and caesium, while there was no altered response to other metals. Also noteworthy was the fact that T5 demonstrated increased resistance to compounds which interact with glutathione, glutathione-related proteins or RSH groups (namely diamide, 1-chloro-2,4-dinitrobenzene and phenylarsine oxide). This result provides a demonstrable link between the higher basal levels of glutathione in T5 observed in the metabolomics experiment and the phenotypic response of the bacteria to toxicants. This is a useful demonstration of the power of community metabolomics and the work could easily be extended to a range of other metals and metalloids.

3.1.2 Investigations of Biofilms and Morphological Variants

Biofilms are increasingly recognised as potentially important morphological communities by which bacteria are able to survive under conditions that would be otherwise detrimental. Recent evidence suggests that this transformation is accompanied by specialisation, analogous to differentiation in higher organisms (Nadell et al. 2009). Such adaptation into synergistically functional communities confers a wide range of physical and chemical characteristics unique to the biofilms compared to planktonic cultures of the same strain.

One of the interesting features of biofilms in both laboratory and ‘real-world’ settings is heterogeneity in colony morphologies when the bacteria are grown on solid media (Boles et al. 2004). Within Pseudomonas, sp., two of these so-called phenotypic variants have been well characterised, namely small-colony variants (SCVs) and rugose small-colony variants [RSCVs, or wrinkly spreaders (WS)]. These morphological variants exhibit increased resistance to anti-microbial compounds and, under laboratory settings, they are selected by exposure to environmental stress agents such as oxidative agents, anti-microbials and metals.

One of the key genetic pathways associated with phenotypic morphological variation is the global activator of cyanide biosynthesis/regulator of the secondary metabolism (gac/rsm) signal transduction system (Petrova and Sauer 2009). In experiments examining the morphological response to metal ion exposure of P. fluorescens, four distinct populations were observed: (1) the parental wild-type (CHA0), which did not give rise to any morphological variants; (2) a gacS (CHA19) strain which gave rise to (3) SCV and (4) WS morphological variants upon exposure to toxicants. Using an NMR metabolomics approach, the metabolic difference between these four population types was characterised (Workentine et al. 2010).

Metabolites were extracted from the four groups from cultures grown to mid-log phase and subjected to 1D 1H-NMR analysis and the results analysed in detail by both principle component analysis (PCA) and partial least squares discriminant analysis (PLS-DA). In this case 32 metabolites were identified and quantified, revealed significant differences between all four populations, both in a combined PLS-DA approach, and via comparisons of the various strains/morphological types. Clear metabolic differences were observed between the SCV and WS morphological variants compared to both the wild-type and CHA19 strains (SCV and WS were different to both wild-type and CHA19 strains). For example, valine, phenylalanine and glycine were uniquely found to be important in the WS models, whereas acetate, pyruvate, aspartate, proline and glutamate were important to explaining the SCV metabolic differences. Other metabolites, such as tryptophan, were also found to be important in both variants.

It is not entirely clear that what types of advantages or disadvantages are conferred to morphological variants compared to the parental strains. Metal susceptibility (AgNO3, CuSO4 and NiSO4) was used to evaluate the quantitative survival of the bacteria in response to exposure. The gacS clone was susceptible to all three metals, while the SCV and WS variants were both tolerant to NiSO4; however, SCV was uniquely tolerant to CuSO4 and WS was uniquely tolerant to AgNO3. These results suggest that the morphological variants are imbued with unique properties. To further explore the link between the metabolite profiles and metal tolerance, another set of PLS-DA models were built which related the specific metabolites responsible to metal tolerance across all three strains (in other words, the basal metabolite states across all strains was considered together). In this analysis, tryptophan, glutathione, methionine, adenosine and glucose were elevated in sensitivity for both copper and silver, whereas proline was strongly correlated to sensitivity for copper only and lactate and NAD+ increased with sensitivity to silver.

Overall, the results from this study indicate that the biofilms under study had distinct metabolic states that conferred some advantages to stress, and that morphological variants further allow the bacteria to develop unique profiles. These laboratory results are particularly important to understanding the environmental survival and viability of bacterial strains.

3.1.3 Biofilm Versus Planktonic Response to Copper

One of the proposed mechanisms by which biofilms are known to differ from planktonic cultures is in sugar metabolism. In particular, exo-polysaccharide matrices are thought to provide structure for formation of the biofilms (Harrison et al. 2007). The NMR approach commonly employed for metabolomics analysis is advantageous in that it is robust, quantitative and highly reproducible. One disadvantage, however, is the relatively small number of metabolites (~30) that can be characterised from extracts, with specific metabolites related to sugar metabolism being poorly characterised. For this reason, in a study of P. fluorescens response to copper (Booth et al. 2011) a more sensitive gas chromatography-mass spectrometry (GC-MS) method was employed in addition to NMR. The two analytical platforms provided identification of 79 unique metabolites, and hence a much larger coverage of metabolic space. Pairwise multivariate comparisons indicated that while copper exposure induces a significant metabolic response in both planktonic and biofilm cultures, the nature of this response is highly different. Only three metabolites (NAD+, phosphoric acid and glutathione) were common to both responses.

Overall, the planktonic cultures were found to be far more ‘reactive’ to exposure, characterised by an oxidative stress response with changes to the tricarboxylic acid (TCA) cycle, glycolysis, pyruvate and nicotinate and niacinamide metabolism. On the other hand, the biofilm response was dominated by shifts in exo-polysaccharide metabolism, suggesting a ‘protective’ response. While alterations in levels of glutathione indicate that there was still oxidative stress in the biofilms, the lack of involvement of energy pathways suggests that the biofilms have alternate methods for enhancing protection, which is consistent with previous observations (González et al. 2010).

3.1.4 Methodological Considerations and Conclusions

Given the obvious advantages of being able to tightly control growth and exposure conditions in a laboratory setting, it is no surprise that this is one of the most popular methods used to study the effect of exposure to environmental toxicants. Studies of bacterial metabolism are exquisitely sensitive to a multitude of factors, including sampling conditions, sample processing, and in the context of metabolomics, data acquisition and analysis. Numerous studies have been conducted on appropriate methods to quench bacterial metabolism, which has recently been assessed in the context of bacterial metabolomics (van Gulik 2010). The studies described above were accomplished using cold methanol quenching, which is the most common technique, but may suffer from leakage of metabolites during the process. As the metabolomic scientists interest is often not in absolute quantification of metabolites in the cells, but relative quantification between different populations, an inherent assumption in these experiments is that the quenching and metabolite extraction processes impact the different populations in the same way.

In spite of these limitations, our understanding of the highly unique and specific alterations in microbial metabolism in response to metals has been greatly enhanced by metabolomics methods. The studies described here provide some clues as to how bacteria can adapt, or exist in a uniquely ‘primed’ state for exposure to toxicants. There remains much to do in the laboratory, including further testing of mixed microbial communities, testing of complex toxicity profiles consisting of multiple toxicants more closely reflective of real-world situations under chronic exposure conditions. From a technical perspective, further expansion of the metabolome coverage would also be advantageous, for example using liquid chromatography-mass spectrometry (LC-MS) methods. In summary, metabolomics experiments in the laboratory have contributed significantly to our understanding of the mechanisms by which microbial populations are metabolically differentiated and, in fact, these types of studies could have important industrial applications.

4 Potential Industrial Uses of Community Metabolomics

4.1 Using Community Metabolomics to Elucidate Metabolic Pathways in Microbial Fuel Cells and Anaerobic Waste Conversion Applications

Effective waste treatment is critical to maintaining ecosystems and human health. Furthermore, in the context of increasing energy costs and the mitigation of greenhouse gas emissions, the development of efficient waste conversion systems is critical. Two related technologies, which promise to help address the challenge of efficient treatment of wastes and wastewaters, are anaerobic digestion (AD) and microbial fuel cells (MFCs).

Digestion (​both ​anaerobic and aerobic) is an established waste treatment technology for residues from various sources, including industrial processes and agriculture (Lester and Birkett 1999). Anaerobic digestion can be defined as the biological mineralisation of organic material to biogas, in the absence of oxygen, by the sequential activity of several microbial groups (Lester and Birkett 1999). It has several advantages over aerobic biological treatment, including lower operating costs and, notably, biogas production (Lettinga 1995). Biogas is similar to natural gas and consists of a mixture of 50–85 % methane and 15–50 % carbon dioxide with trace amounts of other gases. It is a renewable energy source and is used for the generation of heat and electricity, or as vehicle fuel. A very interesting study by Beale et al. (2016), combined metagenomics and community metabolomics to characterise the microbiota in AD digesters, and investigate the resilience of said microbial populations when exposed to operational shocks in the form of temperature and the addition of additional substrates in the form of fats oils and grease. The results provided a good deal of useful information of the structure and function of microbe communities in AD units. It was also found that AD performance was not greatly affected by temperature shocks, but that the addition of fats oils and grease led to significantly promote biogas production.

The anaerobic reactors used for full-scale wastewater treatment and biogas production are high-rate systems based on self-immobilised, or granular, biofilms, e.g. the upflow anaerobic sludge blanket (UASB) reactor (Lettinga et al. 1980; Lettinga 1995). Biofilms are a potentially multi-species community of microbes embedded in a self-generated matrix of extracellular polymeric substance (EPS). They are often found on surfaces but sometimes exist in granules comprising solely EPS and microbes. In the case of the AD applications, each granule theoretically contains all of the trophic groups required for the waste conversion. Increased application of AD could provide benefits, both economically—by the reduction of energy and operational costs—and environmentally—by preserving fossil fuels and reducing emissions for CO2, particulate matter and other pollutants.

In the recent years AD research has moved towards: (i) energy-efficient, low-temperature (psychrophilic) waste treatment applications (McHugh et al. 2003), (ii) biorefinery applications for the conversion of a range of non-food feedstocks—such as grass and solid wastes—to valuable products, including organic acids with industrial value, bioplastics and alcohols and (iii) anaerobic fermentation processes for biohydrogen production (Oh and Logan 2005) and electricity generation in MFCs (Chaudhuri and Lovley 2003).

MFCs are bioreactors, divided into two compartments separated by means of a membrane that is permeable to cations but is impermeable to oxygen; electrodes are inserted into the compartments and connected through an electrical circuit (Min et al. 2005). Typically, anaerobic microbial consortia oxidise an organic substrate (e.g. glucose or secondary wastewater) in the anodic compartment; electrons generated in the reaction are transferred to the anode and delivered to the cathodic compartment in which they reduce an oxidised substrate (e.g. oxygen or oxidised metal). Thus, the MFC converts chemical energy directly into electrical energy.

The energy potentially available in MFCs is significant, but their efficiency is limited by several factors. These include the presence of electron acceptors in the medium, the incomplete oxidation of the substrate and the kinetic limitations in electron transfer from microorganisms to the anode. This latter process was previously thought to occur strictly through redox mediators that shuttle the electrons. However, only a decade ago, microbes capable of direct electron transfer have been discovered (Min et al. 2005). Although not currently capable of large-scale electricity generation, MFCs present an attractive technology for self-sufficient wastewater treatment with modest electricity generation.

Complex microbial communities underpin AD and MFC applications but the metabolic roles of the individual microbes are still largely undetermined. Moreover, the species are strongly connected through syntrophies where the waste products of one provide resources for another. Consequently, to understand, exploit and extend the application of these systems the ‘ecophysiological’ roles of the individual populations and species must be determined. Questions, such as how do the resulting syntrophic interactions structure the system-level behaviour? can then be addressed to support full-scale engineering and operation. We propose that a unification of metabolomics and other ‘omics approaches—particularly metagenomics—could provide a high-throughput solution to link taxonomy with function. This approach could also enable the construction and validation of ‘ecosystems biology’ models operating at the level of the whole community.

4.2 State of the Science

The vast majority of organisms in anaerobic reactor biofilms and MFCs have not yet been cultured. To study these organisms requires the application of techniques from the ‘molecular toolbox’ to characterize the complex, mixed microbial consortia present. Much prior research has focused on extracting DNA from these communities and polymerase chain reaction (PCR)-amplifying 16S rRNA genes. This gene is present in all prokaryotes and can be used as a marker for the taxa present (Clarridge 2004). To date, most 16S rRNA studies have used either gel-based, fingerprinting techniques, such as DGGE, which provides a ‘barcode’ of community diversity but without direct sequence information, or clone libraries, which do provide direct sequence reads but through cost and time constraints are limited in size (Phung et al. 2004). Both methods will fail to resolve rare taxa. Quantitative PCR assays can be used to obtain absolute abundances of even rare organisms but only for a predetermined target.

Until recently there was no economical method capable of resolving the complete community structure. Next-generation sequencing technologies have transformed this situation by providing orders-of-magnitude more sequence data at the same cost (Margulies et al. 2005). This is driving the ‘omics revolution’, allowing the sequencing of complete genomes from isolated organisms and environmental DNA.

Application to MFCs and AD systems is still in its infancy. The genomes of potentially important species have, however, been sequenced; for example, Geobacter sulfurreducens (Methe et al. 2003) and Shewanella oneidensis (Heidelberg et al. 2002), both of which appear capable of direct mediator-less electron transfer in MFCs. These two genomes have been extensively studied and their genes annotated to functions which have been confirmed by experiments and regulatory networks inferred from transcription profiles. This information has also been integrated into metabolic pathway models. However, even for these two highly studied organisms, a complete systems level understanding is still some way off (Fredrickson et al. 2008). Indeed, to extend this highly reductionist approach to all organisms that could be present in a system that is open to the environment—although potentially desirable—is impossible.

Metagenomics can be considered as the extension of genome sequencing to a community. In shotgun metagenomics, short reads are taken indiscriminately from all organisms and then assembled into longer individual genome fragments or contigs (Handelsman 2004). When only a few genomes are present, it is possible to obtain long contigs, and even complete genomes, from which the same inferences regarding gene function and putative metabolic pathways as for a single genome can be made. This technique can prove very effective for simple communities such as those associated with the breakdown of a particular compound in anaerobic bioreactors (Lykidis et al. 2011). Gene annotation is, however, only the suggestion of potential function; the methods are far from perfect, and that gene may either not be actively expressed or its associated protein could be rendered inactive by post-translational processes. For more complex communities, metagenomics will fail to recover substantially sized contigs unless the sequencing depth is very high. This motivates the amplification of focal genes, most often marker genes such as the 16S rRNA. These amplified gene fragments can then be sequenced to very high depth without dissipating effort through the whole meta-genome and sequences can be recovered from a substantial portion of the community. Unfortunately, the diversity of information associated with the genome fragments of metagenomics is lost and the connection of taxonomy to function is even more tenuous.

Using a careful combination of molecular and analytical techniques, it is possible to link taxa to function in these systems. This can be illustrated with a case study: the detection of Crenarchaeota in sludge granules from AD bioreactors (Amann et al. 1990). These organisms were first observed in bioreactors using 16S rRNA gene cloning. Fluorescence in situ hybridisation (FISH), a microscopy technique based on probing mixed-species specimens with fluorescently tagged oligonucleotides, was used to determine the spatial location of Crenarchaeota cells which were observed in close association with acetate-consuming methanogens (Collins et al. 2005). Based on this, it was hypothesised that the Crenarchaeota in AD bioreactors are H2-oxidising autotrophs that generate acetate for syntrophic, methanogenic partners. Incubations of granules with radiolabelled acetate (14C-sodium acetate), followed by biofilm cross-sectioning and beta-microimaging to detect isotope label uptake, revealed that the acetate was confined to zones dominated by the co-located Crenarchaeota and acetoclastic methanogens (Collins et al. 2005). Stable isotope probing (DNA-SIP) experiments using 13C-bicarbonate and sludge granules, with separate PCR-amplification of 16S rRNA genes from ‘heavy’ and ‘light’ DNA fractions, indicated the autotrophic—or at least, mixotrophic—potential of crenarchaeal clones. This is a fascinating example of how culture-independent techniques, and disparate datasets, can together reveal the metabolic function of target groups, but it is not a high-throughput and generically applicable approach while metagenomics is a high-throughput technique, the connection to function is not direct.

4.3 An Integrated Approach for ‘Ecosystems Biology’

An integration of ‘omics techniques could provide the high-throughput approach required to link taxonomy with function, and to deconstruct the complex microbial communities found in AD bioreactors and MFCs. Next-generation sequencing of marker genes will provide a high-resolution picture of community structure generating relative frequencies of even low-abundance taxa. It is important to sample the minor community constituents, as it has been hypothesised that this so-called ‘rare biosphere’ may be important in ecosystem functioning (Sogin et al. 2006). This can be coupled to techniques capable of resolving the absolute abundance of higher level taxonomic groups, for example microfluidics cell-counting coupled to quantitative PCR or fluorescence labelling of marker genes. This approach can determine ‘who’ is there—and in what numbers; metagenomics then allows determination of the functional genes they possess. Annotating those genes can provide a framework of potential metabolic pathways on which further information can be attached. Linking the same marker genes that have been amplified through co-occurrence on contigs to the functional genes could allow the compartmentalisation of metabolic functions to particular species and, consequently, enable a connection to taxa abundance.

The next step is to resolve from these potential pathways the true metabolic processes occurring in the community; central to this is metabolomics. Using methods such as NMR and GC-MS, it is possible to determine the concentrations of small metabolite intermediates along the active pathways. Even more powerful is to couple this with experiments pulsing stable isotope-labelled substrates through the community—through the application of flux balance analysis—to calculate the rates of reaction along targeted pathways. At the same time, mass balance measurements based on resolving the concentrations of input substrates and ions in MFCs and bioreactors, and reaction endpoints, provide a useful corollary. Less obviously, further ‘omics information’ can be incorporated to resolve ambiguities, or to provide verification of the predictions. For example, transcriptomics can be used to determine which functional genes are expressed and proteomics can be utilised to search for enzymes. There is also a role for more targeted techniques, such as FISH to identify the spatial location of particular groups of organisms.

In summary, metagenomics can be married to community metabolomics to directly elucidate the functional role of whole (but exceedingly diverse) microbial communities. This is an experimental strategy for obtaining the information necessary to resolve the pathways in individual samples and to link the components of those pathways to species. Its success, however, is dependent on coupling it to a computational pipeline capable of extracting the information and an experimental strategy that maximises the statistical power of the complete data set to give meaningful and valid results. The individual components of such a pipeline exist; it is possible to identify microbial taxa from marker genes using resources such as the Ribosomal Database Project (Cole et al. 2009); metagenomics reads can be annotated to functional genes using MG-RAST (Meyer et al. 2008); and potential metabolic pathways can be constructed from the KEGG database (Kanehisa et al. 2004). Similarly, tools exist for converting GC-MS peaks into predictions of molecule identity (Horai et al. 2010; Atherton et al. 2006). The key will be integrating the information in a single pipeline within a single statistical framework that allows for errors, mislabelling and potentially contradictory information.

Deriving the optimal or most likely set of possible pathways and their divisions amongst species will be a highly complex optimisation problem. A Bayesian probabilistic approach will most likely provide the best method to solve this. This will have the advantage of predicting not a single solution but a complete distribution of possible results. Finally, any statistical technique is limited by the quality of the data set. It is vital to link these tools to experiments that maximise the information available. This should be achieved by running replicate systems to determine the variability of the responses and, conversely, by using carefully planned perturbations to expand the range of system states explored. These perturbations could be through changing substrate concentrations or operating conditions, such as temperature. These requirements underpin the importance of the high-throughput nature of ‘omics techniques to obtain, at reasonable cost, multiple datasets, each containing huge amounts of information.

The final step of an ecosystems biology approach is integrating this information into mathematical models capable of predicting both community composition and bioreactor function. These will take the metabolic-taxonomic framework elucidated above and add dynamics. The actual mathematical structures to do this are well established: in a well-mixed bioreactor, a system of ordinary differential equations would be appropriate, possibly with the addition of environmental noise; more complex models will be necessary if spatial localisation of groups of organisms in anaerobic granules or on the anodes of MFCs are deemed important. The greater challenge than the model structure will be parameterisation. Once again a Bayesian approach will be vital to propagate that uncertainty into predictions that formally account for it.

4.4 Outlook for the Future

The holistic, polyphasic approach outlined above can be viewed as systems biology at the community level. It is not desirable to reconstruct and model every reaction and process in every microbial cell, but it is desirable to identify those pathways, syntrophies and ecological interactions that are critical for controlling the community—and by extension, process—behaviour. The integration of ‘omics’ approaches coupled to process monitoring, and advanced microscopy, can generate comprehensive, integrated datasets at microorganism, biofilm and bioreactor level. This will enable the link between processes occurring at microorganism level (scale c. 1 µm–1 mm) and the processes occurring within bioreactors (scale > 1 m). This will, in turn, lead to improved reactor monitoring capability, reactor design, performance and operational stability.

5 Upcoming Challenges: What Can Community Metabolomics Learn from Metagenomics?

5.1 The Challenges Ahead

The past two decades has seen the rapid expansion of ‘omics’ oriented research, examining the abundance of the vast array of genes (metagenomics), RNA molecules (meta-transcriptomics), proteins (meta-proteomics) and small molecule metabolites (meta/community metabolomics) present in a wide range of different environments. To date, it is genomics that has been the poster child of the omics era, with the global media avidly following the progress of high-profile research projects that have enabled us to map the human genome (Human Genome Sequencing 2004), the genomes of various economically important plants (Goff et al. 2002), animals (Hillier et al. 2004) and disease causing organisms (Heidelberg et al. 2000). More recently, we have begun to describe the vast content of ‘metagenomic’ DNA, seemingly aiming to catalogue the entirety of life around us using the DNA barcode.

As our analysis of the global meta-genome continues to benefit from recent advances in robotics, DNA sequencing chemistries, analytical and computational procedures, the costs associated with sequencing such large volumes of DNA continue to tumble. For example, the consumable costs associated with sequencing the human genome have decreased from tens of millions of (US) dollars in 2007, to just a few thousand (US) dollars or under today (Drmanac et al. 2010). With so much data now being generated, the major challenges associated with most large-scale metagenomic studies concern data storage and how to make sense of the vast amounts of DNA sequence data generated.

There are several terms in use for the application of metabolomics in ecological/environmental studies. For instance, the term “Ecotoxicogenomics” was proposed by Snape et al. (2004) to describe the integration of genomic-based science into the field of ecotoxicology. “EcoGenomics” (or ecological genomics) was used by Chapman (2001) to describe the application of genomics based techniques to ecology. In both cases the term genomics was taken to encompass all the ‘omic sciences’ namely genomics (genome sequencing and the annotation of function to genes), transcriptomics (gene expression at the transcription level), proteomics (protein abundance) and metabolomics (metabolite/small molecule levels). In addition, the phrase “Environmental metabolomics” was defined by Viant (2007) as the “application of metabolomics to characterize the metabolism of free-living organisms obtained from a natural environment, and of organisms reared under laboratory conditions, where those conditions serve to mimic scenarios encountered in the natural environment”.

Below, we describe how researchers in the field of metagenomics are meeting the new challenges provided by their new ‘mega-data sets’ and how researchers of community metabolomics might benefit from the many lessons learnt by its ‘bigger brother’, metagenomics.

5.2 The Need for Better Reference Samples

To date, most attempts to ‘sequence’ the meta-genome of varied environments such as the soil, marine waters or the human intestinal tract have been largely superficial, only exploring as much of the meta-genome as is necessary to enable some descriptions of the diversity and function of the community to be made, and with the volume of data analysed normally dictated by financial or technical constraints. There is therefore an urgent need for better reference data, comprising the complete metagenomic data of various sample media. Several years ago, an international Soil Meta-genome Consortium, ‘Terragenome’, (http://www.terragenome.org/) was established (Vogel et al. 2009) with the aim of generating the first complete meta-genome sequence of a complex environmental system (soil from Park Grass, Rothamsted, UK). This study will provide reference data, to which data collected from all other soils around the world can be compared. It is only with the provision of these ‘full’ datasets that we will be able to answer key questions including: What is the extent of microbial diversity and what is the diversity of functional genes within the sample media? What is the abundance and functional significance of each species within the soil, across different domains of life (e.g., bacteria, fungi, protozoa, etc.)? How does the composition and function of the community differ across spatial and temporal scales? And, what ‘core’ genes are present in most soils?

While it may take many years of data collection and analysis before the complexity of the soil meta-genome is fully understood, no serious efforts are currently underway to catalogue the entire community metabolome of similarly complex environmental samples. The generation of entire reference sets of community metabolomic data from complex communities is essential if we are able to better understand the diversity of functional activities undertaken within them and to prevent community metabolomics being out-shadowed by its metagenomics for years to come.

5.3 The Importance of Meta-Data

Meta-data (or data about data) provides an essential environmental context to metagenomic data, detailing key factors including site and sample descriptions, chemical and physical properties and methods of analysis. To ensure that valid comparisons can be made between the metagenomic data collected by different researchers at different times and locations, and to reduce any unnecessary experimental duplication, it is prudent that researchers should adhere to a standard set of reporting requirements. Possibly inspired by the broadly accepted MIAME (Minimal Information about a Microarray Experiment) standard (Brazma et al. 2001), the Genomics Standard Consortium (GSC, http://gensc.org) has driven the development of MIGS (Minimal Information about a Genome Sequence), MIMS (Minimal Information about a Metagenome Sequence) and MIENS (Minimal Information about an Environmental Sequence) standards. Similar initiatives have also been undertaken to provide comprehensive reporting standards for environmentally derived metabolomics data (Morrison et al. 2007).

However, despite the efforts of such groups, most current studies of the meta-genome (and community metabolome) do not attempt to adhere to such standards. If the vast increases in data generated by metagenomics and community metabolomics are to be used to their greatest potential, it is a matter of urgency that standards for the provision of metadata will be adopted and required for major scientific journals, as they are already required for microarray experiments (e.g., see instructions for authors in the ISME Journal). Such a universal approach would help to avoid unnecessary duplications of effort, whilst also benefiting various fields of ecosystems biology, by helping to improve our understanding of the ‘connectivity’ between the parts lists generated by the ever growing number of omics approaches (including metagenomics, meta-transcriptomics, meta-proteomics, and community metabolomics).

5.4 Data Storage and Sharing

Modern molecular methods have provided tools for the global scientific community to produce a continual deluge of DNA sequence data. These data, which are generated at an exponential rate, perhaps doubling in volume every 18 months (Lathe et al. 2008), now requires terabyte-scale computation and ever expanding storage facilities to reduce the widening gap between rates of data collection and interpretation. Whilst it is the aim of researchers to increase the volume of useful DNA data stored in their own databases and in online repositories such as NCBI’s GenBank (www.ncbi.nlm.nih.gov/genbank/), much of the data being stored by researchers are of limited value. Increasingly, it is being recognised that much, or even most, of the storage space is used to house data that could be thrown away. For this reason, it is now common practice to delete raw data files (e.g. electropherogram data) immediately after the data have been processed and condensed into far smaller text files containing the DNA sequence data. Due to recent technical advances, the amount of data produced by metabolomic studies is beginning to pose similar challenges and new standards of data storage and interpretation are now required to minimise the cataloguing of redundant information.

5.5 Knowledge Transfer to Applied Research Outcomes

Even once all of our data have been collected, adequately stored and compared to appropriate reference and metadata there remain many challenges to maximise the value of the large datasets characteristic of meta-omic studies. To date, many large-scale metagenomic and community metabolomic studies have sought only to address primarily pure, or fundamental, research aims, cataloguing the complexity of various biological systems and to provide new insights into the functional capabilities of many different microbial communities. Whilst these are worthy research aims, it remains crucial that our new understanding of microbial diversity and function be exploited to its full potential, addressing much broader, applied research questions of benefit to key industrial partners, healthcare providers and environmental protection agencies.

The development and application of new, more sensitive, biological indicators of environmental health will be a key growth area of metagenomics and metabolomics led research. For example, there is an increasing interest in the monitoring and restoration of freshwater systems, aimed at maximising their value for ecological, recreation and economic purposes. Macroinvertebrate communities have been used around the globe as biological indicators of stream health for decades. Extensive research has permitted classification of a wide range of macroinvertebrate taxa into pollution tolerant, sensitive and facultative categories which are used to provide indices of water quality (e.g., Hilsenhoff 1987) and a well-researched body of information now supports their use (Feld and Hering 2007; Trigal et al. 2007).

This macroinvertebrate community data provide little or no information as to the potential causes of any observed declines in freshwater ecological health. However, recent advances in meta-omics centred research mean that it is now possible to rapidly characterise (i) the abundance of ‘functional groups’ of bacteria, such as those conferring resistance to any heavy metal (e.g., cobalt, zinc and cadmium) or the resistance and ability to degrade organic pollutants (e.g., PCBs), (ii) the abundance and activity of genes encoding for these ‘key’ functions and (iii) the presence and abundance of functionally related metabolites. As such, the era of meta-omics research promises to revolutionise the application of biological indicators such that the rapid analysis of microbial communities may not only be used to provide an indication of general ecological stress, but also to identify specific drivers of environmental degradation within any site. This is of particular importance for the restoration of urban ecosystems in which the primarily drivers of environmental changes are frequently hard to distinguish amongst the large background of other potentially harmful anthropogenic influences using traditional chemical analysis.

In contrast, the holistic approach adopted by metagenomic and metabolomic studies aid the identification of key drivers of microbial community function, akin to analysing the total chemical profile within a sample. This information can be used to design highly targeted restoration strategies on a site-specific basis, with no prior knowledge of the site’s history required. The field of biomedical sciences similarly stands to benefit from predicted increases in metagenomic and community metabolomic derived data. For example, a number of studies have already sought to catalogue the presence and abundance of microbial genes within the human gut, also identifying key differences in the gut microflora of healthy individuals and patients with inflammatory bowel disease (Qin et al. 2010). By improving our understanding of the links between the diversity and function of microbial communities, and their metabolites, on and in the human body, it is expected that we may devise better treatments and preventative care for a wide range of microbial diseases. For example, could changes in the abundance of certain bacterial genes or metabolites in the human gut be used as an early warning signal to detect the onset of inflammatory bowel disease? It is already evident that the primary legacy of the new ‘omics’ era of microbial research will not be the catalogue of parts that list diversity of microbial life on earth, but it is possible by applying our new knowledge of the complex relationships between the functional capability and versatility of microbial communities and the complex environments in which they live.