1 Introduction

1.1 Anaerobic digestion, a robust and complex process

Anaerobic Digestion (AD) of waste and effluents is a robust process that is nowadays used with success in full scale systems for the treatment of solid waste and urban and industrial wastewaters. Worldwide, there are thousands of high rate sludge bed reactors for industrial wastewater treatment, and millions of domestic biogas plants generating energy from waste (Ren 2013; Noyola et al. 2012). Many engineering enterprises are involved in the design and installation of these type of biological reactors. Moreover, new technologies are being tested at lab-scale using different reactor configurations (e.g. UASB, EGSB, anaerobic membrane reactors), integrating different treatments (N-removal, S-removal, micropollutants removal, etc.), or systems to recover nutrients (Batstone and Virdis 2014).

Anaerobic biological process is driven by a complex network of microorganisms belonging to Bacteria and Archaea domains, working together for a complete degradation of the organic compounds into CH4 and CO2. In the complete anaerobic chain, four main metabolic steps are involved: 1-hydrolysis, 2-fermentation, 3-acetogenesis, 4-methanogenesis (Zinder 1984) (Fig. 1). These four steps involve different microbial guilds who are specialized in different processes. A diverse number of hydrolytic and fermentative bacteria take part in the first two steps, whereas the oxidation of intermediate fermentation products to acetate is performed by either hydrogen- or formate- producing acetogens (Stams and Plugge 2009). Ultimately, methane is exclusively generated from acetate and hydrogen/CO2 by methanogenic archaea.

Fig. 1
figure 1

Schematic view of the anaerobic degradation of organic matter

In general, the microbial composition of anaerobic digestion sludge is highly diverse and shows high redundancy. This means that several microorganisms are metabolically flexible and capable of performing the same work. It has been postulated that this characteristic is one of the reasons for the robustness of anaerobic digestion processes (Fernández et al. 1999; Zumstein et al. 2000).

To study these very complex ecosystems there are nowadays different molecular tools available. It is very important to choose the adequate tool to answer the questions formulated in each experiment.

1.2 Questions to answer

Regarding the microbial composition and function of the communities in methanogenic bioreactors, four main questions are usually formulated:

1-Who is there?, 2-How does the community change over time?, 3-How many microorganisms from the different groups are present?, 4-What are the specific functions of the microorganisms within the community and their relation to each other?

Molecular ecology techniques have been evolving over the past years to give answers to these questions and an array of options are available and will be discussed in this review (Fig. 2).

Fig. 2
figure 2

Questions to be formulated and molecular techniques that could be used to answer them, results obtained and statistical analysis

1.2.1 Microbial diversity: Who is there?

Information on the diversity and identity of the microorganisms involved in the anaerobic digestion process is important to understand bioreactor functioning, especially when concerning new metabolic processes. The discovery of microorganisms involved in the anaerobic oxidation of ammonium (Anammox process) is a nice example (Jetten et al. 1999; Ni and Zhang 2013). Other important key process are syntrophic oxidation of organic acids (McInerney et al. 2008), degradation of recalcitrant compounds such as detergents (Okada et al. 2014), and anaerobic oxidation of methane (Fernández et al. 2008).

To assign the identity of the microorganisms involved in a microbial community the most frequently used technique is based on the analysis of the 16S rRNA gene. This gene was proposed as a housekeeping genetic marker to study bacterial and archaeal phylogeny and taxonomy for several reasons: (1) is present in almost all bacteria and archaea; (2) the function over time has not changed, suggesting that random sequence changes are a more accurate measure of time (evolution); and (3) the 16S rRNA gene (1500 bp) is large enough for informatics purposes (Patel 2001). Nowadays, it is possible to determine the genus and species of a bacteria or archaea by sequencing the 16S rRNA gene and compare the sequence with available databases. This technique was adapted to study the microbial composition of a sample as will be explained below.

1.2.2 Microbial dynamics: How does the community change over time?

Different reactor operation parameters, such as applied organic loading rate, hydraulic retention time or operating temperature, are frequently tested to determine the optimal operation conditions for the anaerobic digestion process. In order to explain how these operation conditions can change reactor performances it is necessary to monitor the microbial community composition during operation. When the objective of the research is to test different reactor configurations, different sources of inocula, or different substrates, comparing the microbiology of different systems is needed. In all these cases, numerous community structures could be obtained and the application of a fingerprinting technique such as DGGE, T-RFLP or SSCP is a suitable choice.

1.2.3 Microbial quantification: How many microorganisms from different groups are present?

In AD process it is important to quantify some key groups, in particular the density and proportion of methanogens because of their relevance for ensuring an efficient methanogenic process. For this, techniques that quantify different groups of microorganisms present in a complex community such as Fluorescent in situ Hybridization (FISH) or quantitative PCR (q-PCR) are adequate.

1.2.4 Microbial function: What are the roles of different microbial groups in anaerobic microbial communities?

Discovering specific microbial roles in anaerobic microbial communities is currently one of the most challenging issues for microbiologists and molecular ecologists. Metabolic pathways and interspecies relations involved in the anaerobic process are frequently unknown, in particular when a difficult process is going on, such as the degradation of recalcitrant compounds. Knowing the identity of the microorganisms involved in the process (Sect. 1.2.1) can give a hint on the metabolic potential of abundant players, but it is generally not enough to assign a function to those microorganisms. It is also possible that a single microorganism plays a role at different steps of the metabolic pathways, e.g. Clostridium sp. can encode both hydrolytic and fermentative enzymatic machineries. In these cases more sophisticated techniques, such as proteomics, metagenomics, metabolomics and techniques that link the identity with function, such as Stable Isotope Probing (SIP) or micro-autoradiography linked to FISH (MAR-FISH), are the most relevant choice.

2 Sampling, storage of samples and environmental data collection

2.1 Sampling

Sampling is one of the most important steps in microbial ecology analysis. A good sampling strategy is necessary to ensure the success of the study. It has to be taken into account that the use of statistical tools to compare microbial communities might not be correctly applied if a representative sampling is not performed from the beginning of the study.

Choice of sample amount is not trivial and has to be considered from the beginning of the study, both to ensure enough biomass concentration for molecular analysis and a representative selection of the microorganisms in the reactor sludge. The majority of the techniques and kits used to extract DNA or RNA from reactor’s biomass are designed for soil samples (Griffinths et al. 2000), e.g. PowerSoil DNA isolation (Mo Bio Laboratories Inc. Carlsbad, CA) and Nucleo-Spin Soil (Macherey–Nagel, Düren, Germany). Considering these example kits, manufacturers protocols suggest the use of 0.5 g to 1.0 g of soil. Reactor’s sludges are in general suspended in the liquid mixture and present fewer amounts of inorganic compounds and higher amount of cells than soil. According to that, the volume of the sample has to be recalculated. The amount and frequency of sampling might also have an impact in the biological reactor itself; volume of sampling needs to be considered to avoid volume changes in the reactor, especially when working with small lab-scale reactors. To guarantee bioreactor sample representativeness, it is recommended to take samples from different parts of the reactor and pool them, especially in the case of full scale reactors. The sampling of the biomass from solid waste digesters may present additional difficulties due to the heterogeneity of the system. If statistical analysis is going to be applied it is recommended to plan the sampling with a specialist. When the microbial community is analyzed in time series, it is important to adapt the sampling frequency with the objective that samples represent the studied period. Several samples taken during a phase of operation are recommended to be able to compare the different phases. Last but not least, sampling directly in the vial that will be used for further analysis spares some manipulations and limits the risk of sample loss.

Once the sample is taken it must be stored correctly to avoid microbial growth during storage. The recommended temperature of storage is −20 °C for further DNA extraction and −80 °C if it is planned to work with RNA. If a freezer is not available close to the place of sampling, it is recommended to store the samples on ice until you reach the laboratory. If the samples are going to be analyzed by FISH, they should be fixed with formaldehyde or ethanol before storing at −20 °C.

A fact to consider before DNA/RNA extraction is how to handle the sample. Depending on the kind of reactor, the biomass can be suspended, attached to an inert support, aggregated in granules or forming flocs. In some cases, before nucleic acids extraction, it will be necessary to detach the biomass from the support or disrupt the aggregates. Several protocols are available in literature using ultrasound treatment (Perna et al. 2013) or physical disruption (Bergmann et al. 2010).

2.2 Environmental data collection

In order to be able to link reactor performances with the microbial community structure, collecting reactor operation data at sampling time is absolutely necessary. In general, parameters such as temperature and pH are recorded daily, but other parameters that need further physicochemical analysis might be determined weekly or monthly depending on the duration of reactor operation. Information about reactor design and wastewater composition is also very important for interpreting the results. There is no general rule but the collection of a complete set of operating data is preferred.

3 Tools for studying microbial diversity (Who is there?)

3.1 Cloning and Sanger sequencing

The identity of the microorganisms present in a sample can be determined through PCR amplification and analysis of conserved marker genes. The 16S rRNA gene is the most widely used marker gene and has the most extensive reference database (Su et al. 2012). During the last decades cloning in a plasmid vector followed by Sanger sequencing has been widely used. The technique involves PCR amplification of the 16S rRNA genes using primers directed to either Bacteria or Archaea domains. In the anaerobic digestion research area, Bacteria and Archaea are both important, and both corresponding clone libraries can be made separately by choosing domain-specific PCR primer sets during the initial PCR amplification step (Vanwonterghem et al. 2014).

Several Bacteria and Archaea specific primers have been designed (Lane 1991; Grosskopf et al. 1998; Leclerc et al. 2004), e.g. the Bacteria specific PCR primer sets 27f and 1492r (Lane 1991) and the Archaea-specific primers A349f (Takai and Horikoshi 2000) and A915r (Stahl and Amann 1991).

The sequences obtained are then clustered into arbitrary Operational Taxonomic Units (OTUs), generally defined as sequences that shared more than 97 % identity. This value has been chosen considering that, in general, two strains from the same species share more than 97 % homology in their 16S rRNA gene sequences (Yarza et al. 2014). The taxonomic position of the representative sequences is determined by comparing the sequences with databases using BLAST at National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/) (Altschul et al. 1990), the Classifier at the Ribosomal Database Project (RDP) (https://rdp.cme.msu.edu/) (Wang et al. 2007) or the European database (http://www.ebi.ac.uk/ena).

Several ecological questions of the AD process have been answered by using 16S rRNA gene sequencing techniques. At the beginning, as the cost by sequence was high, the clones were previously analyzed by other techniques [Amplified Ribosomal DNA Restriction Analysis (ARDRA), Single Strand Conformation Polymorphism (SSCP)] and only representative clones were then sequenced. Studies were focused on describing the composition of the complex communities developed in lab-scale methanogenic reactors (Godon et al. 1997) or full scale systems (Roest et al. 2005). Other works focused on the stability of these systems (Fernández et al. 1999; Zumstein et al. 2000) or comparing communities developed under different temperatures (Sekiguchi et al. 1998) or feeding conditions (Goberna et al. 2009).

With the development of the automatic Sanger sequencing technology, higher number of clones could be sequenced. In a very extensive sequencing effort, Riviére et al. (2009) analyzed the microbial composition of seven anaerobic sludge digesters. A total of 9890 16S rRNA gene sequences were analyzed. From the comparison of all the retrieved information, the authors defined three component models within the bacterial communities: 1-a core group of phylotypes common to most of the digesters, 2-phylotypes shared among few digesters and 3-specific phylotypes (Table 1).

Table 1 Over view of the main bacterial phyla that composed the microbial communities from full scale anaerobic reactors

To give a global view of the microbial diversity involved in the AD process in general, Nelson et al. (2011), compiled the information produced by the cloning/sequencing approach. The authors performed a meta-analysis based on all publicly available 16S rRNA gene sequences generated by Sanger sequencing from various anaerobic digesters up to May 2010. A total of 19,388 sequences (16,519 bacterial and 2869 archaeal) were analyzed, representing 28 known bacterial phyla (e.g. Proteobacteria, Firmicutes, Bacteroidetes, Chloroflexi) (Table 1). Archaeal sequences were assigned to 296 OTUs, primarily Methanosaeta and the uncharacterized WSA2 group.

The cloning/sequencing approach is also very valuable to get sequence information for the design of signature oligonucleotides that are complementary to target groups, at the taxonomic level of family, genus, species, or strain. These newly designed oligonucleotides may serve in FISH studies and in the development of real-time PCR assays for quantification (Ariesyady et al. 2007).

The methodology might also be applied to study specific functional genes. As an example, one of the key genes involved in the methanogenic pathway was used to study the specific composition of this population (Zhu et al. 2011). A different strategy could be applied to study a specific population using 16S rRNA gene primers specifically designed to target that microbial group (Chouari et al. 2005).

3.2 Next generation sequencing of PCR amplicons

The development of Next Generation Sequencing (NGS) technologies has overcome three of the main limitations of the cloning/sequencing technology, as these new methods are: faster, cheaper and high-throughput. Millions of sequences can be obtained in a single run in a complete automatic method within a few hours or days. Sequencing technologies are based on the detection of incorporated nucleotides by different chemistries. In all cases, sequences are generated without the need of a conventional, vector-based cloning procedure (Shokralla et al. 2012). A complementary approach is the adoption of nucleotide barcodes in the amplification primers for multiplexing samples. In this way, up to 120 samples from different origins can be mixed in one run and after sequencing their data can be separated according to their barcode (Parameswaran et al. 2007; Meyer et al. 2007). This approach decreases the cost per sample since more samples can be pooled in the sequencing run rather than sequencing fewer samples to greater depth (Cardenas and Tiedje 2008).

The first step is the PCR of the desired gene, as the length of the sequences obtained is shorter than in the Sanger methods, specific primers for this technology are used (Cardenas and Tiedje 2008; Wang and Qian 2009). This step is followed by high-throughput sequencing of the resulting amplicons libraries by means of a selected NGS platform. There are four commercially available platforms which use PCR based sequencing systems: Roche 454 Pyrosequencing Genome Sequencer (Roche Diagnostics Corp. Branford, CT, USA), MiSeq and HiSeq 2000 (Illumina Inc. San Diego, CA, USA), AB SOLiD System (Life Technologies Corp. Carlsbad, CA, USA) and Ion Personal Genome Machine (Life Technologies, South San Francisco, CA, USA) (Shokralla et al. 2012).

These new sequencing technologies are based on different concepts: 454 Pyrosequencing, Ion Personal Genomic Machine (Ion PCM) and Illumina uses real-time sequencing-by-synthesis. In these technologies each nucleotide incorporated by DNA polymerase generates a signal detected by a specific detector system. In the case of 454 Pyrosequencing each nucleotide incorporated release a pyrophosphate molecule which is linked to the production of light by the action of the enzyme luciferase (Margulies et al. 2005). The Ion Personal Genome Machine detects the changes in the hydrogen ion concentration produced when a nucleotide is incorporated into a strand of DNA by the polymerase action (Rothberg et al. 2011). The Illumina platform sequencing is based on the detection of fluorescent signal release after incorporation of a modified nucleotide (Shendure and Ji 2008). These three techniques includes the immobilization of the library fragments on beads (454 and Ion PGM) or a surface flow cell (Illumina) whose surfaces carry oligonucleotides complementary to specific adapter sequences ligated or PCR-generated onto both ends of the fragmented library. After a step of amplification, the plate/chip/flow cell is sequenced en masse by the instrument.

Unlike the previous three platforms, SOLiD uses a sequencing by-oligo ligation technology. This process couples oligo adaptor-linked DNA fragments with complementary oligos immobilized on the surface of 1-mm magnetic beads. After the ligation step, a fluorescent readout identifies the ligated 8-mer oligo, which corresponds to one of the four possible bases (Shokralla et al. 2012).

Each technology has advantages and disadvantages, the major advantages of the 454 Pyrosequencing platform are the long read length obtained and its relatively short run time. Longer sequences generated through 454 provide higher flexibility in terms of accurate annotation of reads in ecological applications involving non model organisms. This has made 454 the most commonly used NGS platform for the analysis of environmental DNA for ecological applications. The Ion PGM present a cheap alternative with relative long reads (>200 bp) but lower coverage than 454-pyrosequencing. The advantage of both Illumina and SOLiD systems is the high output per run compared to 454 pyrosequencing and Ion PGM. The main drawback of these systems is the relative short-read length. Then, before choosing a platform, it is necessary to evaluate the need of higher coverage or higher sequences length (Shokralla et al. 2012).

The interpretation of the huge amount of produced data requires appropriate bioinformatic tools and a specific know-how in addition to efficient computational resources. Three bioinformatic pipelines are widely used to analyze high amounts of sequences: QIIME (Caporaso et al. 2010), Mothur (Schloss et al. 2009) and the online tool developed by the Ribosomal Database Project (http://pyro.cme.msu.edu/) (Cole et al. 2014).

Although this technology was developed 10 years ago (Margulies et al. 2005), several applications to the analysis of microbial communities from anaerobic digesters have been reported over the past years. As the sequencing cost has decreased markedly the technologies are more popular and several samples can be analyzed in a short time with reasonable costs. This has allowed to perform studies comparing the microbiology of high number of reactors operated during long time periods. Werner et al. (2011), analyzed the microbiology of 9 full scale methanogenic bioreactors treating brewery wastewater by 454-pyrosequencing. The high amount of datasets was analyzed using specialized bioinformatic tools and correlated with environmental data using statistical analysis. The authors found that each of the nine facilities had a unique community structure with an unprecedented level of stability. Syntrophic populations were highly stable, resilient, and specific for function and environment. This indicates that resilience, rather than dynamic competition, plays an important role in maintaining the necessary syntrophic population. They also found that communities with a greater phylogenetic variability functioned more efficiently.

Similarly, Sundberg et al. (2013) studied the bacterial and archaeal compositions of 21 full scale biodigesters operated in thermophilic or mesophilic conditions to produce energy using different solid waste. The reactors were fed with sewage sludge and various combinations of sewage sludge in codigestion with other wastes. The predominant populations in sewage sludge-fed reactors included Actinobacteria, Proteobacteria, Chloroflexi, Spirochetes, and Euryarchaeota, while Firmicutes was the most prevalent in the codigestion reactors fed with other waste (Table 1). The main bacterial class found in all reactors was Clostridia. Acetoclastic methanogens were detected in sewage sludge, and not in other reactors. Their absence suggests that methane formation from acetate takes place mainly via syntrophic acetate oxidation in these reactors, as previously reported (Karakashev et al. 2006). Statistical analysis revealed that the microbial composition was mainly governed by the type of substrate and the process temperature.

Other more specific projects studied the relationship of the microbial communities with different environmental variables as fed sources (Ziganshin et al. 2013) or temperature (Tian et al. 2015).

The principal advantage of the sequencing-based methods is that the identity of the microorganisms present in a sample can be determined. The Sanger sequencing method gives more accurate taxonomic assignments as full length sequences can be retrieved (1500 bp), but, the method is onerous, time-consuming and nowadays widely replaced by NGS methods. However, the recent improvements of the NGS equipment allow achieving longer sequences maintaining a low cost. For instance, in the new 454 GS FLX + System (Roche) reads up to 1000 bp can be obtained. The bottleneck of the new methodology is now to develop bioinformatic tools to analyze the high amount of data generated. The tools available require informatic skills and important computer capacity.

Functional insight into the process can be inferred from 16S-based studies by searching for closely related cultured species. However, this should be done with caution as closely related organisms can be functionally different (Jaspers and Overmann 2004). Metabolic functionality can also be inferred by correlating community composition and dynamics with operational conditions and performance parameters using multivariate statistical tools. This correlation is particularly important in anaerobic digestion, where niche differentiation occurs and syntrophic groups are responsible for fulfilling essential steps in the process (Vanwonterghem et al. 2014).

3.3 DNA microarrays

DNA microarrays or microchips, are a fast, semi-quantitative technique for phylogenetic identification of bacterial and archaeal species or for detection of functional genes. It is based on the hybridization between extracted DNA sample or 16S rRNA amplicons, which are fluorescently labeled, and complementary oligonucleotide probes that are immobilized on a glass slide (microchip). When hybridization occurs, fluorescence can be detected using a laser. DNA microarrays can detect thousands of genes at the same time (high-throughput) and enables screening of microbial structure and activities (Čater et al. 2013). Morover, it is fast and once the chip has been design and fabricated the cost of applying the technique is not high. The main disadvantage is that the genes detected are limited to genes previously included in the microchip. Taking into account this limitation, the technique is very useful to compare communities from samples taken under different conditions.

To study AD, an ANAEROCHIP was used targetting methanogens which includes 103 probes. This chip has been used to investigate the methanogenic community from a thermophilic continuously stirred tank reactor (Franke-Whittle et al. 2009) and to follow the start up of solid waste digesters (Goberna et al. 2015).

Functional genes can be studied using a special chip which particularly includes different genes from the environment (as the GeoChip) (Tu et al. 2014). This strategy was used to study changes in the microbial community during operation of a combined nitritation-anammox reactor established to treat anaerobic digestion supernatant (Zhao et al. 2014). As far as we know, there is no available microarray especially designed with functional genes present in anaerobic digesters communities. This functional microarray may have high potential application to monitor bioreactors operation and start up.

4 Microbial dynamics (How does the community change over time?)

4.1 Fingerprinting techniques: DGGE, T-RFLP, SSCP

The most common fingerprinting methods include Denaturing Gradient Gel Electrophoresis (DGGE), Single-Strand Conformation Polymorphism (SSCP), Terminal-Restriction Fragment Length Polymorphism (T-RFLP) and Ribosomal Intergenic Spacer Analysis (RISA). These methods are based on the analysis of PCR products amplified from 16S rRNA genes (DGGE, SSCP and T-RFLP) or from the ribosomal intergenic region between 16S and 23S rRNA genes (RISA). With these methods, a community fingerprint based on sequence polymorphism is generated which gives an assessment of community structure and fluctuation over time in ecological studies (Schmalenberger and Tebbe 2002; Rastogi and Sani 2011). Fingerprints of the active part of a community may also be obtained after RNA extraction and reverse transcription of the 16S rRNA (Delbès et al. 2000, 2001). The possibility to get fingerprinting patterns from both 16S rDNA and 16S rRNA confers a great advantage to DGGE, SSCP and T-RFLP with respect to the RISA method. Other genes can be targeted with fingerprinting methods and it is not restricted to ribosomal genes (Fig. 3). In anaerobic digestion, monitoring the activity of methanogens is a primary target. The mcrA gene carried by methanogens, which encodes the key enzyme called Methyl Coenzyme M Reductase catalysing the last step of methane formation, was established as a molecular marker (Lueders et al. 2001) since all known methanogens express the Mcr enzyme (Ferry 1999). Nowadays, a satisfactory number of databank entries for mcrA gene are available, and sequence analyses reveal a huge diversity of methanogens. The presence of this enzyme provides a reliable diagnostic of methanogenesis in diverse environments (Reeve et al. 1997; Luton et al. 2002; Steinberg and Regan 2009).

Fig. 3
figure 3

Methods to study microbial community structure and dynamics based on the amplification of 16S rRNA genes or functional genes: DGGE, T-RFLP and SSCP

4.1.1 DGGE as one of the most popular and used molecular tool for monitoring microbial community structure and dynamics in bioreactors

DGGE is one of the most popular and used techniques for biodiversity assessment in bioreactor samples (Boon et al. 2002; Arooj et al. 2007; Connaughton et al. 2006; Miura et al. 2007; Pereira et al. 2014; Ramos et al. 2014).

DGGE protocols are relatively simple and straightforward and produce results in relatively short time: after extraction of genomic DNA or RNA, target gene sequences (mostly 16S rRNA gene fragments, or functional genes) are amplified using specially designed primers with GC clamps in the PCR reaction, and PCR amplicons of equal length are separated electrophoretically in a denaturing gradient gel (Muyzer et al. 1993) (Fig. 3). The interpretation of DGGE gels is rapid and straightforward, if only band patterns and relative band intensities are of interest. This renders this method popular for fast comparative analysis of communities from different reactors (Boon et al. 2002; Ramos et al. 2014) or from the same reactor operated under different conditions (Arooj et al. 2007; Miura et al. 2007; Luo and Angelidaki 2012; Ramos et al. 2013; Shi et al. 2013; Pereira et al. 2012). Sequencing of the DNA bands excised from gels allows the identification of different members of the microbial community and further phylogenetic analysis (Arooj et al. 2007; Connaughton et al. 2006; Díaz et al. 2011; Luo and Angelidaki 2012; Ramos et al. 2013, 2014; Pereira et al. 2012).

Nevertheless, DGGE as any other techniques has some disadvantages and limitations. Reports on methodological difficulties of DGGE include issues such as comigration of DNA from different species in the same band (Sekiguchi et al. 2001b; Speksnijder et al. 2001) and formation of multiple bands in the amplification of genes from single genomes (Brosius et al. 1981; Nübel et al. 1996). Moreover, this method has limitations with respect to its resolving power, as typically only up to 50 bands can be distinguished in a gel lane and it is clearly limited to the dominant members of the microbial community (with a threshold of roughly 0.1 % of the total) (Van Elsas and Boersma 2011). In spite of these reports, many researchers employ this technique for the identification of important community members and dominant organisms found in bioreactors.

The number of bands is assumed to accurately reflect the diversity of microbes in the sample (Miura et al. 2007; Luo and Angelidaki 2012; Ziembinska-Buczynska et al. 2014) whilst the relative intensity of each band is thought to reflect the relative abundance of the particular organism represented by the band in the community (Dar et al. 2005; Connaughton et al. 2006; Díaz et al. 2006; Shi et al. 2013).

Since identification of important community members as well as numerically most dominant members of a community is a key aspect of microbial community analysis of bioreactor samples, it is very important to know whether DGGE is a reliable technique to obtain such community data. In a systematic study conducted by Araujo and Schneider (2008) with artificial consortia, this technique was tested under conditions where results would not be affected by differences in DNA extraction efficiency from cells. They demonstrated that DGGE was suitable for identification of all important community members in the three-membered artificial consortium, but not for identification of the dominant organisms in this small community. Multiple DGGE bands obtained for single organisms with the V3 primer pair (targeting 16S rRNA) could greatly confound interpretation of DGGE profiles.

Despite all the limitations mentioned above, PCR-DGGE has turned into a routine fingerprinting method to study microbial diversity because it has a reasonable resolving power and allows fast comparative analysis between different samples (Van Elsas and Boersma 2011).

DGGE has been used to analyze the biodiversity of granular sludge (Roest et al. 2005; Zhang et al. 2005), to determine the microbial composition and structure of different types of granules in a UASB reactor from an industrial brewery (Díaz et al. 2006) and in a UASB reactor treating domestic wastewater (Pereira et al. 2012), to compare the start-up and evolution of mesophilic and thermophilic UASB reactors (LaPara et al. 2000; Liu et al. 2002), to investigate changes in the microbial population in continuously stirred tank reactors (Rincon et al. 2006; Ueno et al. 2001), and to analyze the microbial population dynamics in anaerobic reactors treating landfill leachate or trichloroethane (Calli et al. 2006; Tresse et al. 2005). DGGE was also used to investigate the effect of operating temperatures on the microbial community profiles in a high cell density hybrid anaerobic reactor (Kundu et al. 2012), to analyze the effect of the carbon source on the microbial community structure and performance of an anaerobic reactor (Kundu et al. 2013a), and to evaluate changes in microbial communities in a hybrid anaerobic reactor with organic loading rate and temperature (Kundu et al. 2013b).

The correlation of microbial community structure with wastewater composition and reactor’s performances was investigated by Kundu et al. (2013a). Self-immobilized granules were developed in synthetic wastewaters based on different carbon sources (glucose, sugarcane molasses, and milk), in three hybrid anaerobic reactors operated at 37 °C. To study archaeal community structure, a polyphasic approach was used with both qualitative and quantitative analysis. While DGGE of 16S rRNA gene did not reveal major shifts in diversity of Archaea with change in substrate, quantification of different groups of methanogens and total bacteria by real-time PCR showed variations in relative abundances with the dominance of Methanosaetaceae and Methanobacteriales. These data were supported by differences in the ratio of total counts of Archaea and Bacteria analyzed by catalyzed reporter deposition—fluorescence in situ hybridization. During hydraulic and organic shocks, the molasses-based reactor showed the best performances followed by the milk-and glucose-based reactors. This study indicates that carbon source shapes the microbial community structure more in terms of relative abundance with distinct metabolic capacities rather than its diversity.

Ramos et al. (2014) used DGGE to evaluate the impact of O2 on microbial communities in an industrial-pilot scale sewage sludge reactor. Ziembinska-Buczynska et al. (2014) analyzed the bacterial community performing methanogenesis in a pilot scale anaerobic chamber during the shift from mesophilic to thermophilic conditions to point at the group of temperature tolerant microorganisms and their performances, by using DGGE. Taken together, these examples indicate that DGGE has turned into a routine fingerprinting method and is still one of the most used molecular methods to study microbial diversity in anaerobic reactors.

4.1.2 T-RFLP a molecular tool to monitor bacterial and archaeal communities in anaerobic reactors

T-RFLP emerged as a molecular tool for the study of microbial communities when Liu et al. (1997) adapted an existing method (RFLP) in order to perform a rapid analysis of complex environments. T-RFLP analysis involves amplification of target genes from whole-community DNA extracts by using specific primer pairs, one of which is fluorescently labeled. Subsequently, amplicons are digested with restriction enzymes, typically using 4-base cutters, and fragments are size separated and detected on automated sequencers (Fig. 3). As only terminal fragments (T-RFs) are detected the complexity of the profiles is considerably reduced. The polymorphism depends on the fragment length and an internal size standard labeled with a different fluorophore running with each sample allows a precise assignment of the fragment lengths with single base pair resolution. The patterns of T-RFLP for a sample can be converted into a numerical form and allows, after applying standardization procedures, to compare samples using a variety of statistical approaches. In this way, by using T-RFLP profiles, questions related to the diversity and dynamics of complex ecosystems might be revealed. Moreover, a presumptively phylogenetic assignment of T-RFs might be possible relating the T-RFLP profiles to in silico restriction of OTUs obtained from cloning libraries of the same or related samples. This is especially useful in simple ecosystems or when a specific population is targeted, where the number of in silico T-RFs retrieved from the OTUs of a cloning library is low. For more complex communities, the taxonomic assignment for all T-RF might not be possible. Nonetheless, the predominant T-RFs are generally identified. One shortcoming is that discrepancies between in silico and observed T-RF sizes might occur which makes the T-RF assignment difficult (Schütte et al. 2008). Another disadvantage of T-RFLP is the occurrence of pseudo T-RFs which leads to an overestimation of the diversity (Egert and Friedrich 2003). In order to improve the reproducibility and accuracy of the method and the taxonomic assignment of T-RFs several technical improvements have been suggested, e.g. the use of more than one restriction enzyme (Osborne et al. 2006; Padmasiri et al. 2007; Collins et al. 2003), an appropriate choice of restriction enzymes and primers (Schütte et al. 2008) and labeling both forward and reverse primers with two different fluorochromes (Collins et al. 2003). Technical improvements have also been proposed in the last years to improve data analysis (for more detailed information see Nocker et al. 2007; Schütte et al. 2008; Prakash et al. 2014).

Even though T-RFLP is a very simple method, data analysis and taxonomic T-RF assignment might require an additional effort. Several steps are needed before the data can be statistically analyzed. First, a fluorescence threshold needs to be established to remove baseline noise. Second, standardization between samples is needed to avoid the influence of variations in DNA concentration between samples. Finally, size binning is necessary between samples to be able to compare them. Further analysis is needed using database information or clone sequence in silico restriction for taxonomic T-RF assignment. As a consequence of the time-consuming analysis of T-RFLP data, several web-based tools have been developed to aid in T-RF assignment and identify plausible members of a microbial community based on T-RFLP data. These tools are also performing additional tasks such as profile comparison, statistical analysis of data and representation of similarity in the form of a dendrogram. Some examples are TAP-TRFLP (http://rdp.cme.msu.edu), torast (http://www.torast.de), MiCA (http://mica.ibest.uidaho.edu/) and T-RFPred (http://nodens.ceab.csic.es/trfpred/), with phylogenetic assignment tools (PAT) likeT-Align (http://inismor.ucd.ie/*talign/), ARB software integrated tool, TRF-CUT (http://www.mpi-marburg.mpg.de/braker/trfcut.zip), TRiFLe (http://cegg.unige.ch/trifle/trifle.jnlp), and with T-RFLP statistical data analysis software (http://www.ibest.uidaho.edu/tool/T-RFLP_stats/index.php). Afterwards, statistical analysis can be performed with standard softwares in order to interpret the results.

T-RFLP has been extensively used to monitor structural changes and dynamics of Archaea and Bacteria in anaerobic reactors (Collins et al. 2003; Carballa et al. 2011; Pycke et al. 2011; Feng et al. 2010; Pervin et al. 2013; Zhang et al. 2014a, b; Whang et al. 2015). In general, the community is monitored in order to relate changes in the microbial community by modifying one or several operational parameters. Bacterial and archaeal communities can be monitored in lab scale or full scale reactors and different statistical tools might be used to interpret T-RFLP results. In this sense, Blasco et al. (2014) studied changes in bacterial and archaeal communities by only using T-RFLP in two lab scale stirred tank reactors fed with autoclaved and untreated waste. Strong statistical analysis of the data was performed including PCoA, ANOVA and Spearman’s rank correlation coefficient. The authors were able to determine that autoclaving as a pretreatment as well as change of OLR influenced the microbial community structure, especially the bacterial one. Another investigation used T-RFLP to study the microbial community richness, dynamics, and organization of four full-scale anaerobic digesters during a time-course study of 45 days (Pycke et al. 2011). In this case the authors calculated the parameters established by Marzorati et al. (2008) [Richness (Rr), Functional organization (Fo), Dynamics (Dy)] and performed cluster analysis. The authors demonstrated that the temperature has a strong effect on both bacterial and archaeal communities. The dynamics of change was very high and varied for both Bacteria and Archaea at a rate of change between 20 and 50 % per 15 days. Moreover, the community organization of the Bacteria changed more in the thermophilic reactors, compared with the mesophilic ones and the Archaea community structure was more stable.

Several authors have used T-RFLP to determine the community structure and dynamics of methanogens, one of the key populations of the anaerobic process in anaerobic digesters (McHugh et al. 2004; Padmasiri et al. 2007; Ziganshin et al. 2013). By choosing the correct restriction enzymes it is possible to study the relation between hydrogenotrophic and acetoclastic methanogens. Padmasiri et al. (2007) were able to determine that hydrogen utilizing methanogens increased in abundance during a period of poor reactor performances when studying the effect of the degree of shear to which the biomass in an AnMBR is exposed. The authors used three restriction enzymes and were able to study the dynamics of Methanomicrobiales and Methanobacteriales (hydrogenotrophic) and Methanosaetaceae and Methanosarcinales (aceticlastic) without performing a cloning library for T-RF assignment. When combining T-RFLP (using one enzyme but forward and reverse primers labeled) with cloning and sequencing a more exact determination of the structure, diversity, dynamics and composition of the methanogenic community was achieved. Xu et al. (2010) studied the effect of two methanogenic inhibitors, BES and CHCl3, on the methanogenic community structure present in an anaerobic sludge. The phylogenetic analysis of the archaeal sequences obtained from the clone library allowed determining the archaeal community composition which consisted of Methanosaetaceae, Methanomicrobiales, and Methanobacteriales and of yet uncultured archaeal lineages such as RC-III. Correspondingly, six fragments were detected as major peaks in the T-RFLP profiles. In silico digestion of the representative clones allowed to assign three T-RFs to Methanomicrobiales, one T-RF to Methanobacteriales and one T-RF to Methanosaetaceae. Finally, one T-RF was associated with two lineages, which could be differentiated by forward digestion. By using this strategy the authors were able to determine that acetoclastic methanogens were more sensitive to inhibitors than hydrogenotrophic methanogens as the relative abundance of Methanosaetaceae decreased compared to the control experiment and the replacement of acetoclastic methanogens was more pronounced in the CHCl3 versus the BES incubations.

Another interesting option used to study methanogens is to perform T-RFLP profiles on the methyl coenzyme M reductase gene (mcrA) (Zhang et al. 2014a, b; Ma et al. 2013; Ács et al. 2013). One recent example revealed which methanogens were involved in tetramethylammonium hydroxide (TMAH) degradation in three full scale bioreactors (Whang et al. 2015).

T-RFLP has also been used to monitor microbial communities present in lab scale hydrogen producing reactors, combined with sequencing and strain isolation, and it has been shown that a mixed microbial community developed while non hydrogen producing bacteria were also present in the reactors (Castelló et al. 2009; Perna et al. 2013; Ferraz et al. 2014). For example, when using raw cheese whey as substrate a low hydrogen yield was obtained which could be explained by the selection of a mixed fermentative population with the presence of hydrogen-producing organisms (Clostridium, Ruminococcus and Enterobacter) and other non-hydrogen producing fermenters (Lactobacillus, Dialister and Prevotella) in the same reactor (Castelló et al. 2011). Knowing how these organisms outcompete might be important in order to further optimize hydrogen production in UASB reactors. These examples demonstrate that T-RFLP has become a rapid, inexpensive, sensitive, robust and reproducible technique for the study of bacterial and archaeal communities in anaerobic bioreactors.

4.1.3 SSCP fingerprinting as a rapid and accurate molecular tool for monitoring microbial dynamics in anaerobic bioprocesses

The SSCP-based analysis relies on the propensity for single-strand DNA (ssDNA) to take a folded secondary structure in non-denaturing conditions. The secondary structure is determined by the nucleotide sequence and intramolecular interactions. Therefore, the SSCP method involves a step of heat denaturation of the PCR products where the ssDNA is formed. An electrophoresis is then performed under non-denaturing conditions, where ssDNA fragments are separated into different bands according to their difference in electrophoretic motility (Lee et al. 1996; Delbès et al. 2001). Nowadays, the procedure was adapted to an automated capillary DNA sequencer and is so-called Capillary Electrophoresis-SSCP (CE-SSCP) (Zumstein et al. 2000). An internal SSCP ladder is added in each sample to compare data and eliminate capillary-to-capillary or run-to-run variability, which has been a real technical improvement as compared to the original SSCP analysis (Quéméneur et al. 2011b). Since automation of the CE-SSCP electrophoresis allows a rapid and easy process of a maximum of 96 samples at a time, CE-SSCP fingerprinting is particularly useful to study the dynamics of the microbial populations over a long time series (Zumstein et al. 2000). Considering this, further statistical analysis can be carried out. Usually, CE-SSCP are processed and computed either with the graphical interface Stat-Fingerprints program working under the R software environment (Michelland et al. 2009) or the SAFUM toolbox written in MATLAB (Zemb et al. 2007).

The V3 variable region of the 16S rRNA gene is one of the most common target for characterizing archaeal and bacterial communities in anaerobic digesters (Zumstein et al. 2000; Leclerc et al. 2001, 2004; Mnif et al. 2012; Jáuregui-Jáuregui et al. 2014). In literature, SSCP analysis is widely reported as useful to compare microbial diversity in response to changes in environmental or process operation conditions especially when the diversity is low (Leclerc et al. 2004; Ye et al. 2007). For instance, the archaeal diversity in anaerobic digesters is often limited to a dozen of peaks, and diversity can be rapidly characterised by simple peak counting on SSCP profiles (Leclerc et al. 2004). Bacterial diversity may also be monitored by SSCP in anaerobic ecosystems that are simplified by applying stringent conditions, e.g. in hydrogen-producing bioreactors where only few major bacterial species dominate and are mostly affiliated to the Clostridium genus (Rafrafi et al. 2013). Similarly, the CE-SSCP analysis is perfectly adapted to monitor the decrease of microbial diversity all along an enrichment procedure, e.g. from anaerobic sludge to a co-culture of anaerobic syntrophs (Trably et al. 2008). The diversity of more complex bacterial community can also be estimated but only when considering the background signal corresponding to the information that is not enclosed in peaks areas (Loisel et al. 2006).

Interestingly, Bauer et al. (2008) developed and validated successfully a functional PCR-SSCP method followed by a cloning step of the interesting bands and a sequence analysis. They evaluated this method in combination with direct PCR cloning and sequence analysis. A novel and highly degenerated mcrA/mrtA pair of PCR-primers targeting all known methanogens was developed and tested with reference methanogens. DNA extracts from biogas fermenters fed with maize silage and operated at different conditions were analyzed and the dominant residing methanogens were characterized to assess their functional importance during the process stages. Whereas direct PCR cloning was more suited to determine the relative abundances of methanogens, the SSCP analysis easily detected population changes and the involved sequences specifically. Although most of the bands were not sharp probably because of high degenerated primers, the authors concluded that the SSCP technique was more suitable than DGGE for monitoring methanogens. The SSCP method was further refined and employed to investigate the presence and dynamics of distinct sub-populations of methanogens during biogas process acidification (Munk et al. 2010). The authors concluded that technical details associated with degeneration of the primers should be improved since different sequences comigrated in one SSCP band and identical sequences were found in different bands. This method, in combination with classical chemical parameters, could help to quickly indicate changes in population structure due to process conditions, and thus prevent dysfunction of the process.

The correlation between the expression of hydA gene and H2 production in batch or continuous reactors has been well established (Tolvanen et al. 2008a, b; Wang et al. 2008a, b). Quéméneur et al. (2010) developed a successful CE-SSCP method based on functional hydA [Fe–Fe]-hydrogenase genes for monitoring hydrogen (H2)-producing clostridial populations. These authors designed and validated a set of non-degenerated primers, named hydAClosF/hydAClosR, by binding two conserved regions of the active site domain (H-cluster) of the [Fe–Fe]-H2ases-coding genes. These primers showed a high in silico specificity and selectivity for the detection of the hydA genes from a broad range of Clostridium strains belonging to different phylogenetic clusters. This method was validated using experimental H2-producing mixed cultures and was shown to be very effective in monitoring the shift of H2-producing bacterial populations in relation to environmental changes, such as temperature (Quéméneur et al. 2010), pH (Quéméneur et al. 2011b) and the type of carbohydrate as substrate (Quéméneur et al. 2011a). The hydA sequences amplified with this new set of primers clearly showed that the functional diversity of hydA-carrying Clostridium sp. strains impacted the H2 production yields. Interestingly, the 16S rDNA-based fingerprints were found to be less correlated to changes in H2 production performances. The difficulty in distinguishing Clostridium cluster I species in H2 dark fermentation systems by using universal bacterial primers was also reported by (Chang et al. 2008).

Hence, functional CE-SSCP tools provide highly reliable and throughput analysis of the microbial communities which could be used as a complement to 16S rDNA phylogenetic markers. Such first successful approach for determining functional microbial dynamics in bioprocesses by the CE-SSCP technique makes possible its rapid extent to other functional groups found in anaerobic digestion systems, e.g. methanotrophic (pmoA), and sulfate reducing (dsrAB) bacteria.

The three fingerprinting methods presented here are highly suitable to study the microbial diversity, structure and dynamics of microbial communities in AD reactors. Overall, the CE-SSCP method offers several advantages in terms of rapidity and ease of obtaining data, as well as a high reproducibility between runs and data processing. Regarding the DGGE-based methods, which relies on the use of clamped primers and gradient gels, with only a limited number of samples per gel that are analysed, SSCP and T-RFLP are simpler and more straight forward techniques (Marzorati et al. 2008; Rastogi and Sani 2011; Schütte et al. 2008). However, DGGE is widely used and has turned into a routine fingerprinting method to study microbial diversity because it has reasonable resolving power and allows fast comparative analysis between different samples (Van Elsas and Boersma 2011). On the other hand, T-RFLP has been shown to be a reproducible and accurate tool for community fingerprinting (Osborn et al. 2000) but, the data analysis requires an additional effort. Additionally, even though DGGE is widely used, special equipment, but very affordable, is required while for SSCP and T-RFLP the samples can be analysed in any sequencing service commercially available.

5 Tools for microbial quantification (How many microorganisms from the different groups are present in the reactor?)

5.1 Quantification and detection of particular organisms in anaerobic digesters by FISH

The FISH method involves application of fluorescently labeled probes to ribosomal rRNA in permeabilized whole microbial cells. The probes consist of short pieces of DNA (usually 15-25 nucleotides in length) that are designed to specifically hybridize to their complementary target sequence on the rRNA structures (16S and 23S subunits are typically used for Bacteria and Archaea) (Amann et al. 1995). From the composition of the probe, it is possible to design it to specifically target a narrow phylogenetic group (down to the species level) or any other higher phylogenetic hierarchical group (Amann et al. 1995). No probes will hybridize to those cells without target sequences. Cells containing the target sequence will on the other hand retain the hybridized probe and due to the large number of ribosomes only the active cells (from 103 to 105 ribosomes) become fluorescently labeled. FISH provides an effective means of identification and qualitative and/or quantitative microbial population analysis in natural and engineered environments (Amann et al. 2001).

The optimisation process in FISH method is generally carried out with pure cultures of microorganisms that contain rRNA with a perfect match to the probes (positive controls) and with microorganisms that contain rRNA with preferably one (optimally centrally located) or several mismatches to the probe (negative controls) (Manz et al. 1992). There are several variables that can be exploited to modify the strength of a hybrid between an oligonucleotide and its perfect target. These include temperature, ionic strength of the medium and organic solvent concentration (Amann et al. 1995).

Several researches applied FISH in order to study microbial ecology in aerobic and/or anaerobic reactors (Amann et al. 1992; Raskin et al. 1994; Power et al. 1999; Rocheleau et al. 1999; Sekiguchi et al. 1999; Araujo et al. 2000, 2004; Imachi et al. 2000; Zheng and Raskin 2000; Egli et al. 2003; Ginige et al. 2004; Díaz et al. 2006). One of the early studies (Raskin et al. 1994) is very important to mention because authors designed different oligonucleotide probes for ex situ hybridization in order to quantify different groups of methanogenic Archaea. The second study, which is also worth to mention, was of Sekiguchi et al. (1999). This was the first study that used FISH combined with confocal laser scanning microscopy (CLSM) to elucidate the spatial distribution of microbes within two types of methanogenic granular sludge (mesophilic and thermophilic) in UASB reactors fed with sucrose-, acetate-, and propionate-based artificial wastewater. In situ hybridization with archaeal- and bacterial-domain probes within granule sections clearly showed that both mesophilic and thermophilic granules had layered structures and that the outer layer harbored mainlybacterial cells while the inner layer consisted mainly of archaeal cells. Methanosaeta-, Methanobacterium-, Methanospirillum-, and Methanosarcina-like cells were detected with oligonucleotide probes specific for the different groups of methanogens, and they were found to be localized inside the granules, in both types of which dominant methanogens were members of the genus Methanosaeta.

In the study of Crocetti et al. (2006), 3000 Euryarchaeota 16S rRNA gene sequences were phylogenetically analyzed and previously published oligonucleotide probes were evaluated for target accuracy. Where necessary, modifications were introduced or new probes were designed.

Despite all the studies mentioned above that used FISH, there are some limitations associated with this method, such as: 1-poor cell permeability (a pretreatment is necessary to fix Gram positive cells or some archaea (Davenport et al. 2000; Zheng and Raskin 2000). 2-The detection limit is not very low (varies from 103 cells/mL (Li and Gu 2011) to 104 cells/mL (Amann 1995). 3-Insufficient ribosome content, as inactive or dormant cells will not give fluorescent signal or it can be very low (Morgenroth et al. 2000). 4-Sample autofluorescence, specially some methanogenic Archaea can produce autofluorescence that interfere with fluorescent labels.

Some of these limitations, especially the detection of cells with low ribosomal content present in oligotrophic conditions, can be overcome by applying the catalyzed reporter deposition FISH (CARD-FISH) (Pernthaler et al. 2002). In this approach (CARD-FISH) the critical step is the diffusion of the large molecules, in this case the horse radish peroxidase (HRP)-labelled probe, into whole cells embedded in an agarose matrix. A directed permeabilization of prokaryotic cell walls prior to the hybridization step is of crucial importance to enable the penetration of the probe (Pernthaler et al. 2002). In the study of Wilhartitz et al. (2007), CARD-FISH resulted in substantially higher recovery efficiency than the conventional FISH-approach and therefore it is more suitable for the enumeration of specific prokaryotic groups in ultra-oligotrophic environments such as ground- and drinking water samples.

Therefore, knowledge of the nature and applicability of the sample as a uniform protocol with application of the proper controls are of fundamental importance for obtaining solid and comparable information. By using FISH it is possible to observe the morphology and to quantify numbers of bacteria or equivalent biovolume. Although the method is very time consuming and tedious, the principal advantage is that it is possible to quantify and at the same time, determine the position of the different cells in the community.

5.2 Q-PCR for quantification in anaerobic digestion processes

The quantitative polymerase chain reaction (q-PCR) or real-time quantitative PCR (RT-PCR) is a molecular technique that detects and quantifies DNA molecules in solution, by means of fluorescence signal detection. Based on the end point or regular PCR, a target region of the template DNA is amplified with specific primers, although a key difference is that products are detected in each cycle during the exponential phase of the process (Higuchi et al. 1992).

The two detection methods most widely used in environmental microbiology are: (1) the addition of intercalating fluorescent dyes (e.g. SYBR Green I), which emits fluorescence only when binds to double-stranded DNA (Wittwer et al. 1997) or, (2) sequence-specific oligonucleotide probes (complementary to an internal sequence) labeled with a fluorophore and with a quencher (e.g. TaqMan) (Gudnason et al. 2007). The TaqMan technology (Heid et al. 1996) is quite more specific than SYBR Green I due to the third oligonucleotide probe used in the reaction that drastically reduces false positives. On the other hand, SYBR Green I is a less expensive method and sometimes the only choice, when the target sequence hinders the primer and probe design.

Hence, the quantification of the fluorescent signal can be correlated to the amount of target DNA molecules present in the solution. Quantification can be absolute or relative; the former method is currently the most used in microbial ecology and consists in the interpolation of the inquired samples in a standard curve constructed with known concentrations of target sequences, often cloned in plasmids or as PCR products.

The specificity, sensitivity but mainly the quantitative basis of the q-PCR filled a gap in microbial ecology. Since the pioneer studies in 2000s (Suzuki et al. 2000; Takai and Horikoshi 2000), this approach has been increasingly applied to investigate microbial ecology questions that could not be addressed with other techniques such as clone libraries or even the actual high-throughput DNA sequencing. It has been used to track genes and/or transcripts of ribosomal or functional genes across temporal or spatial scales in different environments such as biofilms (de Gregoris et al. 2011), rumen (Bekele et al. 2011) seawater (Mincer et al. 2007) and alpine soils (Brankatschk et al. 2011). A recent work published by Kim et al. (2013) describes the state of the art in the quantification of the different groups of microorganisms found in wastewater treatment systems.

In the anaerobic wastewater treatment research field, q-PCR has been applied mainly to monitor the methanogen community. The strategy consists in targeting the 16S rRNA gene (Yu et al. 2005) or the mcrA gene (Alvarado et al. 2014). By targeting the 16S rRNA gene, some studies have shown a swhich from aceticlastic to hydrogenotrophic methanogenesis when stressing conditions such as temperature or ammonia are applied to lab-scale reactors (Lee et al. 2009; McKeown et al. 2009; Song et al. 2010; Traversi et al. 2011; Jang et al. 2014; Town et al. 2014). On the other hand, a direct correlation between specific methanogenic activity and mcrA gene copy numbers was reported recently (Morris et al. 2014).

Some considerations must be stressed before interpreting or comparing q-PCR results. The first issue to consider is the DNA yield bias introduced by the different methods of nucleic acids extraction (Martin-Laurent et al. 2001). Another important aspect is the operon copy number variation (CNV) of the target gene, in the case of bacterial 16S rRNA, variations of 1-15 copies between different species or even in different phases of the cell cycle were reported (Klappenbach et al. 2000; Sun et al. 2013). In the case of organisms from the Archaea domain, ribosomal operon variation is less pronounced, ranging 1-4 copies per genome (Kim et al. 2013). As an attempt to cope with the gene CNV, alternative molecular markers with only one copy per genome have been used. The mcrA gene, mentioned above has been proved to be a good molecular marker for tracking methanogens in the archaeal community (Luton et al. 2002).

The specificity of the primers and probes used is another key aspect of this approach. However, there is a vast set of primers designed to monitor the wastewater treatment microbiology. For instance, a complete review of primers and probes designed to monitor methanogenic Archaea have been compiled by Narihiro and Sekiguchi (2011).

Due to the potential applications of this technique in wastewater treatment research, efforts must be done to jointly improve the robustness of the approach and standardize protocols to compare results between different works. Although far from a golden standard, q-PCR is currently a promising technique to assess quantitatively the structure of microbial communities. The principal advantages of this quantification technique are the low detection limits (it is possible to detect a single DNA molecule depending of the efficiency of the reaction), the possibility to perform several samples at the same time and the short time required to obtain results. q-PCR analysis give access to absolute quantification, and thus complement very well sequencing and FISH data that give relative abundances of targeted microorganisms.

6 Tools to analyze microbial function (What is the function of the microorganisms inside the reactor?)

Understanding how complex microbial communities function inside anaerobic digesters and how environmental conditions may affect interspecies relationships is currently one of the big challenges for both environmental biotechnologists and microbiologists. Only by getting more insight into microbial function we will be able to fully exploit and manage microbial communities for biotechnology purposes (Verstraete 2007). Over the last years, several culture-dependent and culture-independent techniques have been developed that allow to study community composition and function in complex ecosystems, such as detection of specific enzyme activity or of specific enzyme-coding genes, functional microarrays, radioactive and stable isotope labeling, probing and advanced imaging, and meta-omics approaches (Su et al. 2012; Vanwonterghem et al. 2014). These techniques, alone or in combination with each other and with differential bioreactor operation, can offer insights into the metabolic activities of microbial communities in AD processes and it is in this perspective that they are discussed in the following sections.

6.1 Stable isotope probing and in situ advanced microscopic methods

Stable isotope probing (SIP) is performed by amending a stable isotope (for example 13C-labeled substrate to microbial communities and further analyzing microbial composition in heavy DNA or rRNA fraction, e.g. by cloning and sequencing (Fig. 4a) (Radajewski et al. 2000). Heavy DNA/rRNA represents the active populations in the community because those are the ones incorporating the isotopes within their biomass.

Fig. 4
figure 4

Methods to study microbial diversity and function in complex ecosystems: a Stable isotope probing (SIP), and b in situ microscopic methods coupled to stable/radioactive spectroscopy techniques (microautoradiography coupled to FISH analysis: MAR-FISH; Raman microspectrometry coupled to FISH: Raman-FISH; nanoscale secondary ion mass spectrometry (NanoSIMS) coupled to FISH (SIMSISH)

In recent years, with advances in high resolution imaging and spectroscopy, SIP has been complemented with other techniques, such as FISH, microautoradiography combined with FISH (MAR-FISH), Raman microspectroscopy combined with FISH (Raman-FISH), or nanoscale secondary ion mass spectrometry (NanoSIMS) combined with FISH (SIMSISH) (Chapleur et al. 2013; Huang et al. 2007; Lee et al. 1999; Li et al. 2008). These techniques allow to study taxonomic identity and activity of microbial communities at single cell resolution (Fig. 4b). This is a field of fast evolution and complementary techniques are rapidly emerging. One example (not yet applied to anaerobic digestion systems) is the utilization of amino acid tagging and click chemistry for in situ visualization of newly synthesized proteins (biorthogonal non-canonical amino acid tagging, BONCAT) (Hatzenpichler et al. 2014). Combination of BONCAT technique with FISH (BONCAT-FISH) enables a direct link between taxonomic identity and translational activity. Comparison of BONCAT-FISH results and SIMISH for anabolic nitrogen assimilation showed concordance, with BONCAT-FISH offering a less expensive solution for these studies.

Several studies have addressed the key players in anaerobic digesters by performing separate feeding studies with different 13C-labelled carbon sources, such as 13C-cellulose, 13C-glucose, 13C-acetate and 13C-propionate, followed by DNA-SIP and FISH analysis (a summary of the main results obtained in these studies is shown in Table 2). In this way, the metabolic activity for independent steps in the cellulose/glucose to methane conversion pathway in methanogenic reactors could be assigned to specific taxonomic groups, e.g. cellulose degraders were shown to belong to Clostridia and Acetivibrio, while subsequent glucose conversion was mainly performed by bacteria belonging to the Clostridia and Porphyromonadacaea (Li et al. 2009; Li et al. 2008). The involvement in cellulose conversion of novel identified groups in anaerobic digesters, such as the WWE1 candidate division, could also be confirmed using these methods (Limam et al. 2014). The strategy to perform feeding studies with different 13C-labelled substrates, corresponding to the different levels of the conversion pathway, was also successfully applied in elucidating the microbial groups involved in each of the sequential steps. In a glucose-degrading anaerobic digester it was elegantly demonstrated that glucose was converted to propionate by members belonging to Actinobacteria, Bacteroidetes and Chloroflexi, and that propionate was further converted to acetate by Syntrophobacter and Smithella, and acetate finally utilized by members belonging to the Synergistes group 4 and the methanogen Methanosaeta (Ito et al. 2011, 2012).

Table 2 Overview of main functional groups detected in anaerobic digestion sludges by using SIP, probing and advanced imaging techniques

6.2 The meta-omics era

Omics techniques (genomics, transcriptomics, proteomics) were first used to study and characterize cultivable isolates but, with the development of more powerful instrumentation and bioinformatics tools, application of these techniques has been extended to more complex ecosystems. This is important because interspecies interaction and responses to environmental conditions, which result in multiple community behaviors, can only be identified if the ecosystem is assessed as a whole. Meta-omics techniques—metagenomics, metatranscriptomics, metaproteomics, and metabolomics—can be applied to ecosystems to unravel composition and function of uncultured microorganisms (Fig. 5). Methods such as SIP and advanced imaging techniques discussed in the previous section allow analyzing structure–function in ecosystems, but only associated with a particular physiological trait. Omics techniques, on the other hand, have the potential to reveal the complete picture of the microbial functionalities in an ecosystem.

Fig. 5
figure 5

Possibilities for multi-omics analysis of complex microbial communities. Metagenomics analysis provides information on taxonomy (based on retrieved 16S rRNA gene sequences) and on the metabolic genes (potential functional metabolic pathways) present in the metagenome. Metatranscriptomics enables the investigation of the actively transcribed ribosomal and messenger RNA from a community, giving more insight on the active fraction of the community; it can be used to study how communities and gene expression change in response to environmental changes. Metaproteomics is an excellent tool to for studying functionality in an ecosystem because proteins are the direct responsible for cell phenotype and are therefore more informative than the identification of functional genes or their corresponding messenger RNAs. Metabolomics has been recently introduced in systems biology approaches and studies the complete set of metabolites formed by the whole microbial community as a result of its interaction with biotic and abiotic factors of its environment

6.3 Metagenomics

Metagenomics has greatly facilitated the mining of microbial diversity and metabolic potential of highly diverse microbial communities. In particular, in recent years the advances in NGS technology and accompanying bioinformatic tools have enabled the generation of large sequence datasets and streamlined analysis of the complex sequence data. Through these methods, in-depth 16S rRNA gene-sequencing analysis of production-scale biogas digesters has generated many novel details on the microbial composition (and compositional changes) during anaerobic digestion. These studies confirmed that Clostridia comprise the most prevalent bacterial class in different types of AD reactors (Jaenicke et al. 2011; Li et al. 2013; Rademacher et al. 2012; Schlüter et al. 2008), but also identified hitherto neglected taxa such as Streptococcus, Acetivibrio, Garciella, Tissierella, Gelria (Jaenicke et al. 2011) or Psychrobacter and Anaerococcus (Li et al. 2013). Methanogenic archaea were mainly represented by the hydrogenotrophic Methanoculleus, Methanosarcina, Methanobacterium and Methanothermobacter, or the acetoclastic archaea Methanosaeta. This corresponded to results from previous studies. In addition one study also demonstrated the presence of Thermacetogenium (Rademacher et al. 2012). The type strain of this genus, Thermacetogenium phaeum, oxidizes acetate with Methanothermobacter thermoautotrophicus (Hattori et al. 2000).

Besides phylogenetic information, the metagenomics analysis also provides more details on the functional potential of the microbial community and its correlation to the anaerobic process under study. In AD reactors treating plant material waste a high number of sequence reads related to cellulose and hemicellulose conversion as well as other carbohydrate-degrading genes were detected, which in one study could be assigned to the predominant phylogenetic taxa identified (Jaenicke et al. 2011). Acetivibrio species had previously been recovered in the heavy DNA-fraction during SIP experiments with cellulose (Li et al. 2009); taxonomic and functional genes results obtained by Jaenicke et al. (2011) support once more that this genus most probably play a role in cellulose degradation. Some Clostridia may be involved in acetogenesis, as deduced by mapping bacteria taxa to metagenome hits representing the Wood-Ljungdahl pathway (Jaenicke et al. 2011). Reads encoding for enzymes required for hydrogenotrophic and acetoclastic methanogenesis pathways were also detected, corresponding to the taxonomic observations (Fig. 6). Enzymes necessary for catalyzing CO2 conversion to methane were all found within the metagenomes of the bioreactor sludges treating cellulosic materials and waste activated sludge (Rademacher et al. 2012; Wong et al. 2013). Acetate kinase and phosphotransacetylase, involved in the first step of acetoclastic methanogenesis, were not detected in these studies. Wong et al. (2013) proposed an alternative route in which an acetate transporter coupled to acetyl-CoA synthetase and the hydrolysis of pyrophosphate by inorganic pyrophosphatase drives this reaction forward, similar to the pathway present in Methanosaeta thermophila (Smith and Ingram-Smith 2007).

Fig. 6
figure 6

Hydrogenotrophic and acetoclasticmethanogenesis pathways. Genes detected in methagenomic analyses of anaerobic digesters are indicated in blue—data from Rademacher et al. (2012) and Wong et al. (2013); in red, genes that could not be detected in those studies. Dotted lines represent an alternative pathway for the activation of acetate to acetyl-CoA, as suggested by Wong et al. (2013). MF—methanofuran, MPT—methanopterin, CoM—coenzyme M, CoB—coenzyme B, CoA—coenzyme A, F420—coenzyme F420, Fd—ferredoxin. fmd—formylmethanofuran dehydrogenase, ftr—formylmethanofuran:H4MPT formyltransferase, mch—methenyl- H4MPT cyclohydrolase, hmd—H2-forming methylene- H4MPT dehydrogenase, mer—F420-dependent methylene- H4MPT reductase, mtr—methyl-H4MPT: coenzyme M methyltransferase, mcr—methyl coenzyme M reductase, ack—acetate kinase, pta—phosphotransacetylase, acs—acetyl-CoA synthetase, ppa—inorganic pyrophosphatase, chd—carbon monoxide dehydrogenase/acetyl-CoA synthase. (Color figure online)

One of the main drawbacks of metagenomics studies is still the large amount of sequence reads with unidentified microbial origin or gene prediction. This may indicate that the anaerobic digestion process is also conducted by yet unknown microorganisms or species for which no genomic information is available. Most metagenomics studies rely on complete or draft genomes to identify fragmentary sequences, which limits the ability to resolve metagenomics data deriving from unknown microbial diversity (Temperton and Giovannoni 2012). Therefore, it is important to continue the effort of isolating new microorganisms and sequencing the genome of representatives of the different groups of microorganisms present in bioreactors, following a similar strategy that was successfully applied with the human microbiote.

6.4 Metatranscriptomics

While metagenomics gives information on the metabolic and functional potential of a microbial community, it cannot provide information on the activity of the genes present or to differentiate between expressed and non-expressed genes. To get an estimate of the actual metabolic activity, total mRNA from a microbial environment needs to be retrieved and sequenced. Metatranscriptomics in general generates less complex datasets to analyze (only transcribed genes are retrieved) than metagenomics does. However, mRNA molecules have a short half-life, and ribosomal RNA represents the majority of RNA isolated, thereby reducing sequencing-depth for mRNA reads that represent the expressed genes and pathways.

Recently, a metatranscriptome study investigated the transcriptionally active microbial fraction of a biogas-producing reactor and identified the most abundant transcripts during biogas production (Zakrzewski et al. 2012). This is the only metatranscriptomics study performed on a full-scale anaerobic digester thus far. The most abundant mRNA-derived sequence reads from this metatranscriptome dataset were from transcripts encoding enzymes involved in substrate hydrolysis, acidogenesis, acetate formation and methanogenesis, indicating that these pathways were highly active in the anaerobic digestion process. Taxonomic profiling of the active community, based on 16S rRNA tags, revealed that members of the Firmicutes and Euryarchaeota were the most abundant bacteria and archaeal phyla in the system, respectively; low numbers of 16S rRNA tags were associated to the phyla Bacteroidetes, Actinobacteria, and Synergistetes. In addition, comparison of the metatranscriptome dataset with 16S rRNA-gene metagenomic sequence tags derived from the same community, indicated that the archaeal species present were highly active and that in general the most abundant species in the community contributed to the majority of the retrieved transcripts.

6.5 Metaproteomics

Where metagenomics identifies the microbial community and its metabolic/functional capacity, and metatranscriptomics the transcriptional activity of specific metabolic pathways/microbes, metaproteomics can be used to characterize highly expressed proteins within an environmental microbial consortium. This can provide a more functional evidence of key metabolic pathways that are active and important for anaerobic digestion processes. In contrast to metagenomics and metatranscriptomics, the main limitation of metaproteomics is to extract a sufficient quantity of high quality proteins that are representative of the sample. Extracting a representative protein fraction for analysis is complicated due to the complex microbial communities involved, the presence of interfering compounds, and the heterogeneity of natural environments. Nevertheless, metaproteomics has a great potential to link the genetic diversity and activities of microbial communities with their impact on ecosystem function (Su et al. 2012).

Metaproteomics studies in anaerobic digestion are still limited. One study analysing a lab-scale anaerobic bioreactor treating glucose-based wastewater assigned putative functions to the proteins detected, and demonstrated that they were mostly related to cellular processes such as methanogenesis from both CO2 and acetate, glycolysis and the pentose phosphate pathway (Abram et al. 2011). The number of identified proteins was very low though, only 18 distinct proteins were assigned to different functions (from 70 proteins excised from two-dimensional electrophoresis gel). No metagenomics data was available from the reactor, which decreases the rate of identification. The protein assignment also indicated the presence of specific microorganisms in the bioreactor (Actinobacteria, Firmicutes, Proteobacteria, Methanosarcinales and Methanobacteriales), though little overlap was observed with the 16S rRNA gene clone libraries constructed for phylogenetic analysis of the bioreactor. In a similar work, Hanreich et al. (2013) analyzed the metaproteome of biogas batch fermentation systems co-digesting straw and hay. Members of the orders Clostridiales, Bacteroidales and Flavobacteriales were identified as key players in this ecosystem from the analysis of metagenomics data. A higher number of gene sequences codifying for enzymes from the glycoside hydrolase families GH5, GH48 and GH94 (comprising cellulase, cellobiohydrolyse and cellobiose phosphorylase activities) were present at the beginning of the incubations (together with higher numbers of Clostridiales-related microorganisms); over time, an increase in Bacteroidales and Flavobacteriales was observed with an increase in the number of sequences encoding enzymes of the GH2, GH13, GH29, GH77 and GH92 families (comprising enzymes capable of cleaving a variety of monosaccharides and polysaccharides). Nevertheless, only three hydrolytic enzymes could be detected during the proteomics analysis—a putative α-amylase, a pectaselyase and a glycosyide hydrolase family protein). On the other hand, a high number of enzymes involved in methanogenesis could be detected in the protein fractions (e.g. methyl-coenzyme M reductase), although methanogens represented a minor part of the community as inferred from metagenomic data (orders Methanosarcinales, Methanomicrobiales, and Methanobacteriales). Recently, Lü et al. (2014) performed a much more comprehensive metaproteomics study on thermophilic anaerobic cellulose degradation, with the identification of more than 500 non-redundant protein functions. The taxonomic distribution of the identified proteins suggested the presence of a few dominant bacterial groups, namely Caldicellulosiruptor species (≈41 %), Coprothermobacter proteolyticus (≈20 %) and Clostridium thermocellum (≈16 %). Methanothermobacter thermoautotrophicus (≈5 %) was the most abundant methanogen. A large number of proteins potentially involved in cellulose and hemicellulose binding and hydrolysis or in oligosaccharide metabolism were identified, and their synthesis linked to bacteria related to Caldicellulosiruptor spp. and Clostridium thermocellum. A cooperation between these two groups of bacteria was suggested, in which Caldicellulosiruptor spp. are responsible for hydrolysing the cellulosic part of the substrate, whereas Clostridium thermocellum is also hydrolysing hemicellulose in addition to cellulose. Regarding methanogenesis, data generated by pyrosequencing, metaproteomics and isotopic analyses together strongly support the predominance of the hydrogenotrophic pathway by strains of the genus Methanothermobacter. None of the enzymes specific for acetoclastic methanogenesis were identified, which opens the hypothesis of the occurrence of syntrophic acetate oxidation (SAO) in this ecosystem. Nevertheless, the lack of known protein specific for the SAO pathways and the limited knowledge about thermophilic SAO microorganisms did not allow further confirming this. SAO is currently a topic of interest as new microorganisms and theories for this pathway are emerging (Nobu et al. 2015).

7 Statistical analysis to link results from molecular data to reactors performance

The results obtained by molecular methods can be linked to reactor performances using different statistical analysis. The link with environmental data is very important for the interpretation of the results. Although some conclusions can be drawn just by observing fingerprinting profiles or OTU tables, a deeper analysis is essential in order to arrive at strong and robust conclusions.

There are different kinds of analyses that can be performed with different levels of complexity. The most frequently used are the hierarchical clustering analysis, the ordination methods and the calculation of diversity indices (Talbot et al. 2008). The methods can be applied to either fingerprinting data or sequencing analysis datasets. The first step is to perform a matrix with the samples and the presence/absence or abundance of each “species” (represented by OTUs, chromatogram peaks or DGGE bands). With this species matrix several analyses can be performed.

7.1 Multivariable statistical tools

With the hierarchical cluster analysis the samples are grouped according to the dissimilarity between them, the results are visualized in a dendrogram constructed to represent how the samples are grouped and different coefficients and algorithms can be applied for the clustering (Ramette 2007). This analysis is very useful to determine the changes in the communities due to differences in operation conditions. The samples can also be ordinated according to the dissimilarities between them in two dimensional spaces using ordination methods as Principal Component Analysis (PCA), Non-metric Multidimensional Scaling (NMDS) or Canonical Correspondence Analysis (CCA). In constrained (canonical) ordination analysis methods it is possible to include in the original matrix additional environmental data such as: pH, temperature, organic loading rate (OLR), or other environmental data. The resulting ordination diagrams display samples, species, and environmental variables so that ‘fitted species samples’ and ‘species environment’ relationships can be derived as easily as possible from angles between arrows or distances between points and arrows (Ramette 2007).

Another objective may be to test whether differences between groups of objects (rows) in a multivariate table are significantly different based on the set of their attributes (columns), i.e. to test whether similarities within groups are higher than those between groups. Here, nonparametric multivariate ANOVA (NPMANOVA) and analysis of similarities (ANOSIM), which are commonly found in standard statistical packages, can be used (Ramette 2007).

An overview of the analysis that can be applied in microbial ecology could be found in the review published by Ramette (2007) and in the web tool developed by Buttigieg and Ramette (2014).

7.2 Diversity indices

Using the same matrix of species abundance, diversity indices can be calculated. The indices reflect the organization of the community and can be used to determine differences within microbial communities (Talbot et al. 2008). Three main diversity indices: richness, evenness (or equitability), and Shannon index, can be calculated from fingerprinting or sequencing data to define the structure of a microbial community (Hill et al. 2003). Richness is simply defined as the number of different OTUs obtained by the fingerprinting method or sequencing approach. Evenness is a measure of the equitability of abundance. In other words, an environmental sample with more even abundance is more diverse than a sample with predominant and sparse species. Finally, Shannon index (Ho) is calculated from the relative abundances of each band/peak/OTU and is also influenced by the number of different OTUs. There are other diversity indices described in the bibliography that could be also determined, more detailed information can be found at the work of Hill et al. (2003).

The clustering, ordination methods and diversity indices can be calculated using different free softwares that are easily downloaded from the web (R package; http://www.r-project.org/, PAST package: http://palaeo-electronica.org/2001_1/past/issue1_01.htm, Hammer 2001).

7.3 Microbial Resource Management concept

Marzorati et al. (2008) proposed a new concept to obtain valuable information from fingerprinting data that can be applied for the management of microbial communities for biotechnological applications. The idea of the authors was to define parameters that describe the organization of a community and reflects its functionality. These parameters can be used to explain and predict the performance of reactors.

Three parameters were proposed to describe the structure and assembly of the microbial communities: Richness (Rr), Functional organization (Fo), Dynamics (Dy).

The Rr represents the number of different individuals (species) in a sample and is calculated as the number of peaks in a fingerprinting profile (T-RFLP or SSCP), the number of bands in DGGE or the number of different OTUs in a cloning library. In the case of DGGE the authors propose a rarified coefficient to give different weight according the identity of the microorganisms represented by a band. This coefficient was calculated according to the migration of the bands in the gel.

The Fo coefficient is a single value that describes a specific degree of evenness, measuring the normalized area between a given Lorenz curve and the perfect evenness line. This coefficient can be determined by constructing Lorenz curves and calculating Gini coefficients (Rousseau et al. 1999) or by constructing Pareto-Lorenz curves (Wittebolle et al. 2008). The higher the Fo coefficient, the more uneven a community is.

The Dy is a coefficient to determine the rate of change of a community along time and is calculated from the similarity values within two samples taken in a frame of time (for example if the samples are taken weekly the frame of time is 1 week). The similarity between two samples is determined from the cluster analysis.

The authors propose that, according to the ecosystems functionality, the community will adapt to a particular assembly, represented by the three parameters that indicate if the community is in a “good or bad” organization to perform a function, deviation from these “normal” values may represent problems in the function (Marzorati et al. 2008).

These parameters were used to describe the communities from full scale anaerobic bioreactors (Pycke et al. 2011), to understand methanogenic bioreactors failures after overload (Carballa et al. 2011) and to correlate microbial communities with different operational conditions (Werner et al. 2011; Ciotola et al. 2013; Chen et al. 2014).

8 Lessons from the application of molecular tools to study the anaerobic digestion process and future challenges

During the last three decades molecular biology tools have been more and more applied to the study of the anaerobic digestion process. Each method present different advantages and limitation which have to be taken into account (Table 3). Important questions have been addressed and important information was generated. The key point now is how to use this information to improve the process.

Table 3 Advantages and limits of the molecular methods mostly used in anaerobic digestion

From the 16S rRNA gene approach we now know that the microbial communities involved in the anaerobic digestion present a general microbial composition at the phylum level that is shared in most cases. These following phyla: Chloroflexi, Firmicutes, Proteobacteria, Bacteroidetes, Synergistetes, Verrucomicrobia predominate within the Bacteria Domain (Nelson et al. 2011; Werner et al. 2011; Sundberg et al. 2013). Species of known anaerobic microorganisms with important functions for the anaerobic digestion (hydrolysis, fermentation, acetogenesis) are included in these phyla. However, three of these phyla (Chloroflexi, Verrucomicrobia and Synergistetes) are represented by very few isolates and large amount of environmental sequences, then, there is a big hole in the knowledge of these organisms and their role in anaerobic digestion.

In particular, the organisms from the Chloroflexi phylum presented high abundance in the biomass from methanogenic reactors, and form part of the “core group” of organisms always found in anaerobic digesters (Rivière et al. 2009; Werner et al. 2011). It was postulated that they play an important role in the hydrolytic and fermentative step and in the formation of granules or flocs (Sekiguchi et al. 2001a; Yamada et al. 2005). They were also related to bulking problems in some UASB reactors (Yamada et al. 2007; Borzacconi et al. 2008).These organisms are very difficult to culture and isolate; therefore, new isolation procedures and new molecular techniques should be applied to know the role of these organisms so abundant in AD systems.

On the other hand, it is known that the syntrophic organisms are a very specialized group of bacteria with low abundance in reactors and that they present high resilience (Werner et al. 2011). Important efforts have been made to understand the ecology of these key microorganisms (Stams et al. 2012) and there is still much work to be done in the physiology of these organisms with such a particular way of obtaining energy.

It is also known that Archaea from the Euryarcheota phylum are responsible for the methanogenic step. Within methanogens two metabolic pathways have been found: the hydrogenotrophic and acetoclastic. Although it was postulated for several years that the acetoclastic pathway predominate in the AD bioreactors (Garcia et al. 2000; Batstone et al. 2006), the molecular data now are questioning this paradigm as the hydrogenotrophic methanogens have been found to be more abundant in several investigations (Kampmann et al. 2012). This is not trivial because the different groups of methanogens present different kinetics and different tolerances to toxics compounds. Hence, there is still a need of knowledge within this important group.

Although the general composition of the biomass from AD process is, in general, maintained at the phylum level, the particular species that predominate in each system change. These changes during operation and between different systems demonstrate the high redundancy within the different guilds. It was postulated that this high redundancy is one of the keys to maintain the general function of the process in a changeable environment (Fernández et al. 1999). It was also demonstrated by several authors that the AD community is very dynamic (Fernández et al. 1999; Zumstein et al. 2000; Pycke et al. 2011). This high dynamic was proven at the species level (represented by OTUs, fingerprinting peaks or bands), but it is not known whether this dynamic indicates change in the metabolic pathways or not. Future works using metagenomic and metaproteomics approaches are necessary to answer this important question.

Several research groups have been focused in trying to explain the environmental factors that drive the AD process; from these works, it is possible to conclude that temperature of operation (Levén et al. 2007), type of substrate (Zhang et al. 2014a, b) and inoculum are important factors that select different populations (Ali Shah et al. 2014). This is an important issue which should be taken into account for practical applications as they could aid in decrease start-up periods of full scale reactors and predict failures due to changes during operation. There is a need of compilation of all the information to have an overview of the different variables that affect the microbial communities in AD processes and how the communities cope with the changes during the operation. In that sense, a web site where all the information on the physiology of all the organisms present in the AD process, as was performed for the activated sludge process (http://midasfieldguide.org), would be a good initiative to follow.

Finally, all the information generated should be applied to try to explain the function of reactors and help to solve engineering problems. Figure 7 gives an overview of the possible application of molecular biology tools to improve AD processes. The development of new molecular techniques focusing on the function, the application of deeper statistical analysis and the improvement of bioinformatic tools to manage the high amount of data generated will help to walk in that direction. An effort to integrate the four disciplines: microbial ecology, environmental engineering, mathematical modeling and bioinformatics will be necessary to achieve this goal.

Fig. 7
figure 7

Possible applications of molecular biology techniques results to improve AD processes. 1 Prevent failures—by the regular monitoring of the microbial community of a bioreactor, information about the dynamic and the organization of the community during bad and good performance could be retrieved and used to predict the failures. The knowledge of microorganisms that cause problems (bulking, acidification, etc.) could be also used to prevents their growth. 2 Select inoculum—with the NGS available, it is possible to have a complete view of the microbial composition of a sample in one or two days with low cost this information could help to choose an adequate source of inoculum for full scale reactors avoiding cost of long periods of start up or reinoculation. 3 Data for modeling—molecular data could be used to design systems of automatic control of reactors and for modeling. The AD model is based in data from kinetic and bulk measurements of the biomass. An effort of conceptualization of the data and integration of the two disciplines should be done. 4 Bioaugmentation—with the knowledge of key microorganisms important for the process it will be possible to design strategies for bioaugmentation and follow the growth of these organisms in the reactor. This could be very useful for the degradation of recalcitrant compounds (like detergents, pharmaceutical products) or to improve the degradation of difficult substrates (as lipids). 5 Design new process—with the discover of new metabolic functions of the microorganisms involved in the AD process it could be think different engineering process to solve some practical issues