A New Framework to Accurately Quantify Soil Bacterial Community Diversity from DGGE

Lalande, Jonathan; Villemur, Richard; Deschênes, Louise

doi:10.1007/s00248-013-0230-3

A New Framework to Accurately Quantify Soil Bacterial Community Diversity from DGGE

Methods
Published: 03 May 2013

Volume 66, pages 647–658, (2013)
Cite this article

Download PDF

Access provided by CONRICYT – Journals CONACYT

Microbial Ecology Aims and scope Submit manuscript

A New Framework to Accurately Quantify Soil Bacterial Community Diversity from DGGE

Download PDF

Jonathan Lalande¹,
Richard Villemur² &
Louise Deschênes¹

1121 Accesses
11 Citations
Explore all metrics

Abstract

Denaturing gradient gel electrophoresis (DGGE) has been and remains extensively used to assess and monitor the effects of various treatments on soil bacterial communities. Considering only abundant phylotypes, the diversity estimates produced by this technique have been proven to be uncorrelated to true community diversity. The aim of this paper was to develop a framework to estimate a community’s true diversity from DGGE. Developed using in silico DGGE profiles generated from published pyrosequencing datasets, this framework elongates the rank-abundance distributions (RADs) drawn by band quantification using the peak-to-signal ratio (PSR) parameter, which was proven to be related to bacterial richness. The ability to compare DGGE-based diversity estimates to the true diversity of communities led to a unique opportunity to identify potential pitfalls when analyzing DGGE gels with commercial analysis software programs and gain insight into the process of DNA band clustering in the profiles. Bacterial diversity was compared through richness, Shannon, and Simpson’s 1/D indices. Intermediate results demonstrated that, even though commercial gel analysis software programs were unable to produce consistent results throughout all samples, a newly developed Matlab-based framework unraveled the dominance profiles of communities from band quantification. Elongating these partial RADs using the PSRs extracted from the DGGE profiles chiefly made it possible to accurately estimate the true diversity of communities. For all the samples analyzed, the estimated Shannon and Simpson’s 1/D were accurate at ±10 %. Richness estimations were less accurate, ranging from −11 to 31 % of the expected values. The framework showed great potential to study the structure and diversity of soil bacterial communities.

Analysis of Community Dynamics in Environmental Samples Using Denaturing Gradient Gel Electrophoresis

Denaturing Gradient Gel Electrophoresis (DGGE) for Microbial Community Analysis

Denaturing Gradient Gel Electrophoresis (DGGE) to Estimate Fungal Diversity

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Considering the importance of diversity with regards to ecosystem functioning [1] and its resistance and resilience to perturbations [2], it is of great importance to be able to routinely assess and compare soil microbial community diversity across a large scale of management practices and treatments.

Indeed, soil microbial communities are among the most diverse and abundant on Earth [3] and play a key role in terrestrial ecosystems [4]. Even if these communities can be studied in great depth using modern metagenomics [5], researchers still require cost-effective methods to reliably estimate the diversity of multiple samples. Fingerprinting techniques such as denaturing gradient gel electrophoresis (DGGE) have been successfully used in many diversity studies [6].

For such complex communities, interpreting the outcome of DGGE-based diversity surveys—even though they are of widespread use—is not an easy task. Initially developed as a mutation detection tool, DGGE can theoretically separate DNA sequences that differ by a single base pair [7, 8]. Consequently, in estimating community diversity from DGGE profiles, visible bands are implicitly associated with unique phylotypes. Since authors including Schmalenberger and Tebbe [9] have clearly demonstrated that DGGE bands may contain many different operational taxonomic units (OTUs), this general consideration is known to be false. Generally speaking, the similarity level between the genetic markers of different organisms constitutes the baseline of OTU definition in modern microbial taxonomy. It has been stated that the minimal value of 97 % similarity between 16S rRNA gene sequences proposed by Stackebrandt and Goebel [10] may lack the resolving power to relate bacteria at the species level [11], especially for short partial sequences. If the similarity level that should be used is debatable, the concept still provides a useful and scientifically grounded basis to estimate and compare bacterial community diversity. Even though DNA band superposition on DGGE profiles has been shown to occur, it is not known whether this process can be related to DNA sequence clustering at any similarity level or if both processes will consistently yield a comparable representation of soil bacterial community dominance profiles.

Another poorly understood and seemingly overlooked aspect of DGGE is the profile analysis step itself. In the scientific literature, fingerprinting patterns were analyzed using many different software programs. Although convenient, it must be recognized that most were not specifically developed to analyze fingerprints as complex as those produced by soil bacterial communities (personal communications). It seems that the differences between software programs and more generally their capacity to quantitatively unravel the communities’ true dominance profile from banding patterns have never been assessed.

Furthermore, numerical simulations demonstrated that by only enabling the consideration of the most abundant phylotypes, fingerprinting techniques provide an inaccurate estimation of the true diversity of microbial communities [12, 13]. The diversity indices routinely used to quantify diversity are influenced by both the richness (length) and the dominance pattern conveyed by the dataset used to plot the rank-abundance distribution (RAD) of the studied community [12, 14]. Traditional DGGE banding pattern analysis theoretically yields information on community dominance profiles but not on richness. However, based on simulated bacterial communities, Loisel et al. [15] showed that the subunit background percentage (SBP), an indicator extractable from DGGE profiles, was related to community richness. The SBP can be seen as a measure of the proportion of the studied community that is not accounted for when only visible peaks are considered. Unfortunately, the relationship between the SBP and richness is not straightforward and was shown to depend on the abundance model to generate the simulated bacterial communities. Used in conjunction with theoretical abundance models, the SBP could make it possible to infer whole community RADs from DGGE banding patterns. The most commonly used models to describe soil bacterial communities are the lognormal [13, 15–18], the power law [19, 20], and the geometric [13, 15] distributions, and these abundance models could be used to elongate the partial DGGE-based RADs to “add” the OTUs that are not abundant enough to produce a visible band on the gel. The SBP would provide a stopping criterion for the elongation process, indicating the length at which the distribution is complete. If such a methodology could be developed, it would allow for the accurate characterization of soil bacterial community diversity from DGGE fingerprints.

The main objective of this paper was to develop such a framework. In the process of developing this framework, the influence of certain analytical parameters (background noise subtraction and DGGE peaks quantification) and the extent of the bands clustering process were assessed. To do so, publicly available pyrosequencing datasets of soil bacterial communities [21] were used to generate in silico DGGE profiles. Knowing community composition made it possible to assess whether DGGE-based diversity estimations can theoretically lead to similar conclusions than more robust methods based on DNA sequencing and clustering.

Methods

In Silico DGGE Profiles Construction

In silico DGGE profiles were constructed using pyrosequencing datasets of soil bacterial communities downloaded from the NCBI Sequence Read Archive [22]. This methodology was selected to avoid the necessity of choosing a suitable theoretical model to derive the RADs used to create the profiles. Datasets were generated using DNA extracted from six different soils and targeting the V2–V3 region of the bacterial 16S rRNA gene (approximately 400 nt) [21]. The sequences were processed by the authors to remove the primers and the regions presenting low quality scores and ranged in length between 200 and 300 nt.

Presenting different initial richness and dominance patterns, three of these datasets were selected for further analysis: FUG3 (intensely fertilized grassland), BF2 (unmanaged beech forest), and SAF1 (spruce forest). Each dataset was aligned and clustered using three similarity levels (100, 97, and 95 %) with the RDP pyrosequencing pipeline [23]. Datasets were further simplified by associating the relative abundance of every OTU with a unique representative sequence, ensuring that every cluster generates a unique DGGE band in the in silico profiles. A total of nine communities were produced, ranging in richness between 1,895 and 17,552 OTUs and presenting a most dominant OTU relative abundance between 1.5 and 5.4 %.

OTUs were positioned in the profiles using DNA sequence theoretical melting temperatures (T _m) calculated with Khandelwal and Bhyravabhotla’s predictive model [24]. This model was selected because it gave good results over a wide range of sequence lengths (15-mers to genomic). Gel conditions (denaturing gradient) were adjusted considering that urea and formamide reduce DNA sequence melting temperatures by 2.25 °C/M and 0.6 °C/% [25], respectively, ensuring that all the sequences were included in the profiles. DGGE peaks were represented in the in silico profiles by Gaussian probability density functions (PDFs) (Eq. 1).

$$ I(x)=\frac{A}{{\sqrt{{2{\pi^2}}}}}{e^{{-\frac{{{{{\left( {x-{x_0}} \right)}}^2}}}{{2{\sigma^2}}}}}} $$

(1)

The peak corresponding to each OTU was therefore completely represented by three parameters: central position on the gel (x ₀ in pixels, determined from the T _m), amplitude (A in grayscale intensity proportional to the OTU relative abundance in its dataset RAD), and peak width (σ in pixels, standard deviation). Based on observations of experimental DGGE gels, the standard deviation was set at a mean value of 2.0 pixels and forced to vary randomly for every OTU between ±10 % of the mean value. Peak intensity I(x) was evaluated for all of the pixels (x values) over the whole gel vertical length (set at 1,024 pixels). The in silico profiles were obtained by summing up the intensity corresponding to every OTU contained within a given dataset for every pixel.

The image representing the in silico DGGE gel was created in 16 bits uncompressed TIFF format (maximum grayscale value of 65,535). The intensities of all the vertical profiles were first normalized so that the maximum grayscale value for every sample equals 50,000. This step was seen as analogous to adjusting the exposition time when photographing DGGE gels. Two-dimensional profiles were considered to be 175 pixels wide, all filled with the previously generated one-dimensional profiles. In order to reproduce some of the difficulties associated with the analysis of real DGGE gels, an additional background noise calculated from the profiles mean intensity level and randomly adjusted was added.

Profiles Analysis

In silico DGGE profiles were analyzed using four different software programs: TotalLab Quant (TotalLab Ltd., Newcastle upon Tyne, UK), GelCompar II (Applied Maths, Inc., Austin, TX, USA), BIO-1D advanced (Vilber Lourmat, Marne-la-Vallée, France), and a Matlab-based program (The MathWorks Inc., Natick, MA, USA) specifically developed for this paper. The main differences between software programs were mostly associated to their background subtraction or peak delimitation algorithms. These parameters were therefore further studied.

Background Subtraction

The background subtraction algorithms evaluated in this paper were limited to the popular rolling ball approaches included in TotalLab, GelCompar II, and BIO-1D, and the approach developed for the Matlab-based program. TotalLab Quant and GelCompar II include practically the same algorithm: a virtual ball whose diameter is chosen by the user rolls under the profiles and subtracts the signal located under the top of the ball. BIO-1D’s algorithm is slightly different since it first subtracts the signal from the center of the ball and then asks the user to define a threshold level to adjust profile baseline intensities to zero.

GelCompar II and BIO-1D both propose an “optimal” ball size for a given gel (41 and 72 pixels for the synthesized image, respectively). Every profile was therefore analyzed with each software program using ball radiuses of 20, 41, 72, and 144 (94 for BIO-1D since it was the maximum size allowed by the software). TotalLab Quant and BIO-1D do not require the redefinition of the peaks when modifying the ball size, and additional radiuses of 5 and 10 were added for these software programs.

Background subtraction with the Matlab-based program worked differently. After trying to develop an automated calculation procedure to derive background noise profiles from DGGE gels, it was observed that a manual adjustment was the best and, perhaps, the only way to correctly draw a line between peaks area and background. This manual adjustment was based on the careful observation of the image of the gel. The background profiles were derived by qualitatively ranking neighboring peaks from very weak to very bright. It was observed that a background profile very close to the peaks’ root gives more weight to the brightest peaks and vice versa. Adjusting the background level closer to or farther from the peaks’ root made it possible to draw the most representative picture of what is visually conveyed by the image. It must be mentioned that this process is iterative. During the quantification process, if a peak is disproportionate as compared to its neighbors or if its optimized standard deviation is significantly different from all the other peaks, it may be necessary to adjust the background profile accordingly.

Peaks Delimitation and Quantification

In the four considered software programs, peaks were quantified using two different general approaches. The approach shared by TotalLab Quant and BIO-1D consists in delimiting peaks with two straight lines. Peaks are then quantified by summing up the intensity of the background-subtracted profile between these lines. In contrast, GelCompar II and the Matlab-based framework adjust Gaussian PDFs under the peaks (Eq.1). This adjustment is done manually in GelCompar II, while the Matlab-based framework automatically and simultaneously optimizes many peaks. For the Matlab-based framework, profile analyses are conducted in many optimization rounds. In an interactive dialog box, the analyst enters information on the central position of the peaks to be quantified (fewer than 10 peaks for every round). Central positions are determined directly from the image of the DGGE gel viewed with any image editing software program. After the convergence of the algorithm, optimized peaks are plotted against the analyzed profile. The analyst can accept or reject the results. If rejected, the optimization routine can be run again with different initial central positions or with fewer or more peaks. The routine is run repeatedly until the peaks are accepted. The resulting PDF parameters are then saved and the optimization moves on to other peaks, until the entire profile is analyzed. Peak abundances are finally determined from the PDF amplitudes.

Representativeness of DGGE-Based Dominance Profiles

While a similarity level of 97 % is chosen in almost all sequencing-based bacterial community diversity surveys, it is doubtful that the value has any meaning when analyzing DGGE banding patterns. If all the DGGE bands were generated by a single OTU, RADs drawn from peak quantification would be very similar to RADs produced by sequencing dataset clustered at the 100 % similarity level. Considering that DGGE bands are known to superpose to a certain extent, datasets used to generate in silico profiles were further clustered using the RDP pipeline [23], with similarity levels ranging from 96 to 100 %. This step aims to determine whether the DNA band superposition process is numerically similar to the clustering of DNA sequences at a specific similarity level. For simplicity, in the particular context of this publication, RADs and diversity indices calculated from the pyrosequencing datasets used to generate the in silico DGGE profiles will be referred to as the true RADs and true diversity for a certain similarity level.

The peak-to-signal ratio (PSR = 1 − SBP), a parameter analogous to the SBP introduced by Loisel et al. [15], was extracted from all the in silico DGGE profiles. For each sample, the PSR was calculated as the area under all the peaks divided by the area under the whole profile. The background noise added under the DGGE profiles when synthesizing the image was subtracted before calculating the PSRs. This parameter represents the percentage of all the DNA sequences loaded into a DGGE profile contained within the most abundant OTUs (the peaks). The remainder belongs to the OTUs not abundant enough to produce a visible band on the gel. These OTUs are incidentally unaccounted for in the diversity estimates produced through DGGE and will be dealt with by the elongation framework (introduced below).

Finally, considering that small peaks may not be very indicative of true community dominance profiles, DGGE-based RADs were truncated by subtracting all the peaks with relative abundances smaller than a certain cut-off. Since it was impossible to objectively choose an appropriate cut-off, percentages between 0 and 3.0 % (in 0.2 % increments) were successively used in order to identify the optimal value for subsequent analyses. This truncation was judged necessary because it was observed that DGGE-based and true RADs were deviating at a certain relative abundance value. The PSR values were modified to take into account the peaks that were removed from the DGGE-based RADs.

To compare DGGE-based and true community dominance profiles, true RADs had to be modified. For all the samples, the true distributions were truncated by keeping the same number of OTUs as the number of peaks above the cut-off percentage. True PSRs were then calculated as the number of sequences in the truncated RADs divided by the number of sequences in the complete distributions. Finally, since DGGE band quantification yields results in relative abundance, true RADs were transformed accordingly. This truncation led to the calculation of biased diversity indices and was solely used for comparative purposes. The representativeness of DGGE-based dominance profiles was evaluated using four indicators. The first indicator was used to verify whether PSRs can be accurately extracted from DGGE profiles:

1.
ΔPSR: Deviation percentage of DGGE-based PSRs compared to true PSRs calculated from sequencing dataset clustering;

Since the main objective of this paper was to determine whether DGGE could be confidently used to assess the diversity of soil bacterial communities, two ubiquitous diversity indices were calculated to characterize the RADs representing the communities’ dominance profiles. These indices were calculated using PAST software [26].
2.
ΔH′: Deviation percentage of DGGE-based Shannon indices from corresponding expected values;
3.
Δ1/D: Deviation percentage of DGGE-based Simpson’s 1/D indices from corresponding expected values.

In order to further characterize the similarity between DGGE and clustering-based dominance profiles, the Euclidean distance [27] was calculated. Unlike the two previous diversity indices, this measure associated every DGGE peak to its principal underlying OTU. To do so, theoretical OTU migratory positions were associated with actual peak locations in the in silico profiles. Since the measure was calculated using the truncated RADs produced by both approaches, certain peaks did not match any OTU theoretical position and vice versa. In these cases, the relative abundance of the corresponding peak/OTU was set at 0 %.
4.
D _EUCLIDEAN: Calculated using Eq. 2, where A _DGGE corresponds to peak relative intensities, A _OTU represents OTU relative abundances, and n corresponds to RAD lengths.

$$ {D_{\mathrm{EUCLIDEAN}}}=\sqrt{{\sum\nolimits_{i=1}^n {{{{\left( {{A_{{i\mathrm{DGGE}}}}-{A_{{i\mathrm{OTU}}}}} \right)}}^2}} }} $$

(2)

Using PSRs to Improve DGGE-Based Diversity Estimates

Using PSR values extracted from the profiles, an empirical framework to estimate true community diversity from DGGE was developed. Since DGGE and pyrosequencing-based dominance profiles were very similar when clustering the datasets at the 98 % level, the framework was developed to estimate true community diversity at that particular similarity level.

First of all, RADs produced by the Matlab-based framework were truncated using the optimal cut-off value (1.0 %). After truncation, RADs were normalized (sum = 1), multiplied by corresponding PSRs, and further multiplied by 35,000. This last step aimed to make it possible to work in terms of absolute rather than relative abundance. The value 35,000 was chosen because it was close to the number of reads per sample in the pyrosequencing datasets used in this publication.

Distributions were then elongated to “add” the species that were not accounted for in the peak quantification process. This elongation framework was designed and calibrated using only the true community RADs. The method was then applied without further modification over the DGGE results. Although the lognormal, power law, and geometric distributions are the most commonly used abundance models to describe soil bacterial communities, they were unable to fit the true RADs correctly. The power law distribution (PLD, Eq. 3) provided an acceptable fit if a distinct model parameterization is used to predict mid- and low-abundance values. Starting right after the last DGGE peak above the optimal cut-off value, the elongation framework was therefore divided into two distinct steps, both based on Eq. 3.

$$ PLD(x)={x_{\min }}*{x^{-a }} $$

(3)

The first step elongated the truncated DGGE-based RADs until a richness of 699 was reached using a PLD exponent (α) of 0.875 for all the samples. The abscissa (x value in Eq. 3) producing an abundance just below that of the last retained peak was selected as the starting x value for the elongation. This initial abscissa was distinct for all samples and once determined was increased by one each time a species was added to the RAD. The other parameter of Eq.3, x _min, varied between 500 and 3,000 and was optimized for each sample to ensure continuity in the predicted abundance values at the junction of the two elongation steps. The second elongation step produced abundance values for species of rank 700 and more. In this second step, species rank corresponded to abscissa values. The PLD parameters α and x _min are functions of the PSR values and were determined using relationships derived from the true community RADs (Eqs. 4 and 5). For both elongation steps, predicted abundance values were rounded to the nearest integer. Considering that the elongation framework was designed to work in absolute abundance, when a value of 1 was reached, singletons were added until the sum of the abundance of all the species equaled 35,000. Figure 1 presents a schematic representation of the elongation framework.

$$ \alpha =0.267*\operatorname{PSR}-0.935 $$

(4)

$$ {x_{\min }}=-5,034*\operatorname{PSR}+3,656 $$

(5)

It must be emphasized that the elongation framework produces distributions that do not follow a particular abundance model. Indeed, RAD heads are drawn from peak quantification and are distribution free. Mid- and low-abundance values are all predicted using the PLD, but with different model parameterization. Therefore, resulting RADs do not generally follow a power law.

Using these elongated RADs, bacterial community diversity was characterized through three indicators: community richness (number of species), Shannon, and Simpson’s 1/D indices. Estimated diversities were compared to true community diversity at the 98 % similarity level (untruncated RADs). All indices were calculated using PAST software [26].

Results

Representativeness of DGGE-Based Dominance Profiles

Pyrosequencing datasets of 16S rRNA gene sequences from three different environments were clustered at 95, 97, and 100 % similarity level, generating nine "theoretical" bacterial communities. Based on the theoretical T _m of a unique representative sequence selected for each OTU, in silico DGGE profiles were derived (Fig. 2).

Each in silico profile was analyzed in 17 ways (software programs and ball sizes), and resulting RADs were truncated using various cut-off values and compared to corresponding pyrosequencing datasets clustered using five different similarity levels (from 96 to 100 %). This methodology generated a significant amount of data, and the complete results are presented as online resources (ESM1.xlsx). To synthesize the results, the parameters that were found to be optimal for all the sample–software pairs are presented in Table 1. Optimal parameters correspond to the ball size and similarity level that made it possible to meet predefined criteria (ΔPSR, ΔH′, and Δ1/D ≤ ±10 %) over the widest range of cut-off values. Indicator stability for different cut-off values was deemed to be a very important aspect to consider since chance alone can yield good results for a specific cut-off percentage. A more complete version of Table 1, which also presents the mean values of the indicators over the entire reported cut-off range, is presented as online resources (ESM2.pdf).

Table 1 Optimal parameters obtained for all the sample–software pairs

Full size table

Table 2 also presents the parameters that were deemed optimal for each software program. However, since DGGE profile analysis traditionally involves the use of a single ball size for all the samples loaded on a given gel, the parameters presented in Table 2 were selected considering all nine samples simultaneously. The optimal cut-off percentage was also limited to a single value. Figure 3 presents the mean ΔPSR, ΔH′, Δ1/D, and D _EUCLIDEAN values produced by each software program when the parameters presented in Table 2 are used.

Table 2 Parameters that made it possible to minimize indicator values when simultaneously considering all the samples

Full size table

The differences in the results presented in Tables 1 and 2 for the three commercial software programs are noteworthy. Taking TotalLab Quant as an example, a ball radius of 20 was never found to be optimal when considering samples individually. Still, this ball size was selected as the best compromise when simultaneously considering all samples. This observation is also true for the similarity level that made it possible to minimize the differences between DGGE and pyrosequencing-based RADs.

The Matlab-based framework was found to yield very stable results over all the samples. Indeed, DGGE-based and true RADs were very close when a similarity level of 98 % was chosen—the only exception being BF_97% for which the 97 % level gave better results. This framework allowed an accurate extraction of the sample PSRs from the profiles.

Among the indices tested, H′ presented a very good match between DGGE-based and true RADs. However, this index was found to be influenced by RAD lengths more than community dominance profiles. Considering that, for the sake of comparison, distribution lengths were forced to be equal for both approaches, it is not surprising that H′ exhibited very low variability for all the samples and software programs. As a dominance index, Simpson’s 1/D was much more dynamic and is therefore a better indicator than H′ to compare software programs. On average, the Matlab-based framework performed better than the other software programs with Δ1/D close to ±5 % for all samples, except BF_97%. All four software programs yielded a similar average Euclidean distance value around 10–12 %.

Using PSRs to Improve DGGE-Based Diversity Surveys

Based on simulated communities, Loisel et al. [15] showed that the subunit background percentage, an indicator analogous to PSR, was related to community richness. Figure 4 shows the relationship between PSRs extracted from DGGE profiles using the Matlab-based framework and true community richness at a 98 % similarity level. This similarity level was chosen since it was found to be the one that best corresponded to the extent of DGGE peak clustering. Considering that a relationship between PSR and richness is clearly visible, this parameter can be thought as useful to estimate community diversity from partial DGGE-based RADs.

Calculated solely from the community dominance profile, the diversity indices presented so far are known to be uncorrelated with whole community diversity [16]. Table 3 illustrates the comparison between richness, Shannon H′, and Simpson’s 1/D indices calculated from DGGE-based dominance profiles or elongated using the PSRs with the corresponding true indices (untruncated RADs) at a 98 % similarity level.

Table 3 Deviation of DGGE-based diversity estimates from the indices calculated using the untruncated true RADs

Full size table

Discussion

Representativeness of DGGE-Based Dominance Profiles

In light of the complete results produced for this paper (online resources ESM1.xlsx), it can be concluded that analytical parameters highly influence DGGE-based diversity surveys of complex bacterial communities. The algorithm used to implement the rolling ball background subtraction method, the chosen ball radius, and the way peaks are delimited and quantified all influence the results to a certain extent. The most influential step is undoubtedly background noise subtraction. Indeed, by sharing the same rolling ball algorithm, TotalLab Quant and GelCompar II behaved similarly while BIO-1D was completely different. Furthermore, indicator values were highly dynamic to ball size modifications for all the software programs. It must be stressed that no ball size fit all samples equally well.

As presented in Table 1, the closest matching similarity level was not the same for all the samples when DGGE profile background noise was subtracted using rolling ball approaches. This observation mostly implies that it is impossible to know on what basis samples are compared when DGGE profiles are analyzed using automated background subtraction algorithms. The conclusions that can be drawn from DGGE-based diversity studies of soil bacterial communities—at least when profiles are analyzed using the three commercial software programs tested in this publication—are therefore highly limited. It was not possible to identify any relationship between true community diversity and optimal ball radius.

The results produced by the Matlab-based framework that was developed were completely different. Indeed, mean indicator values associated with this methodology were all very close to zero (except for Euclidean distances, discussed later) and generated rather narrow error bars. Furthermore, results were stable over a wide range of cut-off values and consistent throughout all samples. Most importantly, it is the only methodology that made it possible to extract accurate PSRs from DGGE profiles. The ability of this framework to consistently match the samples’ true dominance profile at 98 % similarity level should not be seen as a coincidence. Using optimized Gaussian PDFs to delimit and quantify peaks was an interesting feature since it made it possible to determine the bands generated by two almost co-migrating OTUs. However, the consistency of the framework was mostly associated with the manual adjustment of the background profiles that made it possible to treat every sample equally, while rolling ball approaches proved to be highly dependent on how peak superposed in the profiles. It must be acknowledged that this approach was long and challenging at first and thus required some training. Working with in silico DGGE gels proved to be a very good way of producing such training sets. Indeed, real DGGE gels shared a lot of similarities with in silico profiles, even if they are imperfect and therefore more challenging.

The cut-off value that had to be applied to DGGE-based RADs in order to match true RADs was surprisingly high: 1.0 % for the Matlab-based framework. Indeed, more than half of the quantified peaks are subtracted when using such a high cut-off value. In fact, bands superposition was found to happen quite locally, and background profiles varied irregularly throughout the profile lengths. For the datasets used for this publication, the background was high at the profiles’ center but low at their beginnings and ends. Consequently, rather rare OTUs will generate distinct peaks in regions where background is low, while more abundant OTUs migrating in regions of high background will not. These weak bands therefore generated divergences between DGGE-based RADs and true RADs and had to be subtracted.

An important objective of this paper was to evaluate the extent of DGGE bands clustering. From the in silico DGGE profiles produced here, when studying complex soil bacterial communities, co-migration events can be seen as the norm rather than the exception. As unambiguously demonstrated by other authors [9], all DGGE bands can be expected to contain many different OTUs. Working with in silico profiles led to the observation that peaks are indeed formed by the addition of one dominant and many rare phylotypes. Even though a high number of OTUs had the exact same calculated T _m and therefore co-migrated on the in silico profiles (OTU positions were predicted from the sequences calculated T _m), cases in which two dominant OTUs shared the same T _m were not observed in any sample. It is important to keep in mind that this situation could happen in other samples. For all the samples, it was still possible to identify certain peaks formed by the exact co-migration of many mid-dominant phylotypes, leading to the appearance of a dominant DGGE band. Euclidean distance values presented in Fig. 3 were mostly driven by the presence of certain important peaks with no corresponding dominant OTUs. For some samples, the presence of a very abundant OTU in the true RADs associated with a DGGE band having a much lower relative abundance also had a significant impact on the resulting Euclidean distances. This happened when two OTUs, both dominant at 100 % similarity, were found to cluster when choosing a lower level. DGGE bands corresponding to these OTUs did not cluster on the gel. Considering the lack of resolving power of 16S rRNA gene sequences to identify bacteria at the species level [11], it is impossible to state unambiguously that these OTUs should be clustered. Still, the Euclidean distances relatively small values clearly indicate that most of the DGGE peaks were associated with a dominant OTU having a quantitatively comparable relative abundance.

It can therefore be concluded that the DGGE peaks and DNA sequences cluster in two different ways. Peaks cluster on the basis of their melting properties more so than on the basis of base-to-base sequences similarity. Of course, melting properties are linked to sequence composition but a high base-to-base similarity does not guarantee that two sequences will migrate at a similar position, at least in in silico DGGE profiles. While different in nature, both processes consistently yielded comparable dominance profiles at 98 % similarity level, a value that could change slightly for experimental DGGE gels.

Using PSRs to Improve DGGE-Based Diversity Surveys

Based on numerical simulations or pyrosequencing studies, a typical soil bacterial community RAD can be confidently described as long [3, 19, 28], rather steep for the most abundant phylotypes and then slowly decreasing toward an asymptotic relative abundance value (doubletons and singletons) [13, 17, 20]. The elongation framework developed for this paper aimed to reproduce these characteristics. As presented in Fig. 4, the ability to accurately extract PSRs from the profiles is a strong prerequisite to estimate true community diversity from DGGE.

The elongation framework presented here was developed using nine samples originating from only three distinct pyrosequencing datasets. The numerous clustering steps involved may have modified the shape of the resulting RADs. By using more datasets, covering many different environments and containing enough reads per sample to reach the plateau of the rarefaction curves, it would be possible to develop a more robust framework adjustable to the many different soil types/environments that researchers may study. Though highly empirical, this rather simple framework proved to be very effective in predicting true community diversity using both the Shannon and Simpson’s 1/D indices. As presented in Table 3, all diversity estimates were accurate at ±10 % and highly correlated with true indices at 98 % similarity. These values are tremendously better than those produced before RAD elongation, even if these partial distributions were found to be highly representative of true community dominance profiles at 98 % similarity.

For some samples, divergence in community richness was found to be higher than ±10 %. For some samples, divergence in community richness was found to be higher than ±10 %. These differences per se were not considered to be a strong shortcoming since the elongation framework did not specifically aim to accurately predict community richness. The usefulness of the framework lies in its capacity to consider true community dominance and bring H′ and 1/D indices calculation to higher and more realistic richness values. Results by Narang and Dunbar, among others, showed that diversity indices are less sensitive to richness at these high values [13]. For 1/D, the elongation framework presented here is somewhat similar to the methodology published by Loisel et al. [29]. These authors proposed the use of a correction factor, also based on the background noise level, to estimate accurate 1/D values from fingerprints. As a dominance index, 1/D was found to be highly sensitive to the OTUs with the highest abundances. Its calculation can therefore be seen as pretty robust to the elongation framework but must rely on an accurate peak quantification step.

As stated by Hill et al. [12], H′ gives more weight than 1/D to rare species and is essentially an intermediate between community richness and the Simpson index. This index is therefore less affected by the peak quantification step than 1/D but requires an acceptable estimation of community richness in order to be accurate. In the course of developing the elongation framework, it was observed that the accuracy of H′ depended on the number of rare species more than on the trajectory of the RADs. Therefore, as long as the predicted richness was fairly accurate, using one or another abundance distribution for the RAD elongation step did not change H′ by more than ±10 %, even though the correspondence between the elongated and true RADs was not very good. It will therefore be important to validate that the proposed elongation framework is able to yield acceptable community richness predictions on real—and more challenging—samples before using these values. However, it will always be possible to adapt the model parameterization to different situations (primer pairs, studied environments, etc.) whenever necessary.

A question that remains is the usefulness of traditional DGGE-based diversity surveys that rely solely upon the quantified peaks. Table 3 clearly shows that these surveys strongly underestimate the diversity of soil bacterial communities. More importantly, from the diversity indices presented as online resources (ESM3.pdf), it has been observed that these studies are susceptible to lead to erroneous ecological conclusions, often showing no differences between samples when important ones exist or sometimes predicting the opposite.

In conclusion, the framework presented in this paper proved to be very successful at estimating true community diversity for all nine in silico DGGE profiles analyzed. Though only Shannon and Simpson’s 1/D indices were evaluated, the very good correspondence between all the DGGE-based and true community RADs at 98 % similarity leads to the hypothesis that the framework will accurately estimate any diversity index influenced by community structure more than richness. Imperfect in nature, experimental DGGE gels are much more challenging to analyze than the image synthesized here. Consequently, when working with experimental results, deviations from true community diversity can be expected to be higher. At the moment, it is not possible to provide a quantitative estimate of the expected deviation. Potentially important biases in sequencing datasets, often linked to sequence GC content, were reported [30–33]. Until these issues are resolved, whether or not next-generation sequencing platforms offer a more solid ground than DGGE to quantitatively estimate the diversity of soil bacterial community remains an open question that deserves further attention.

References

Hooper DU, Chapin FS, Ewel JJ, Hector A, Inchausti P, Lavorel S, Lawton JH, Lodge DM, Loreau M, Naeem S, Schmid B, Setälä H, Symstad AJ, Vandermeer J, Wardle DA (2005) Effects of biodiversity on ecosystem functioning: a consensus of current knowledge. Ecol Monogr 75(1):3–35. doi:10.1890/04-0922
Article Google Scholar
Girvan MS, Campbell CD, Killham K, Prosser JI, Glover LA (2005) Bacterial diversity promotes community stability and functional resilience after perturbation. Environ Microbiol 7(3):301–313
Article CAS PubMed Google Scholar
Roesch LFW, Fulthorpe RR, Riva A, Casella G, Hadwin AKM, Kent AD, Daroub AH, Camargo FAO, Farmerie WG, Triplett EW (2007) Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 1:283–290
PubMed Central CAS PubMed Google Scholar
Van der Heijden MGA, Bardgett RD, Van Straalen NM (2008) The unseen majority: soil microbes as drivers of plant diversity and productivity in terrestrial ecosystems. Ecol Lett 11(3):296–310. doi:10.1111/j.1461-0248.2007.01139.x
Article PubMed Google Scholar
Simon C, Daniel R (2011) Metagenomic analyses: past and future trends. Appl Environ Microbiol 77(4):1153–1161. doi:10.1128/aem.02345-10
Article PubMed Central CAS PubMed Google Scholar
Nakatsu CH (2007) Soil microbial community analysis using denaturing gradient gel electrophoresis. Soil Sci Soc Am J 71:562–571. doi:10.2136/sssaj2006.0080
Article CAS Google Scholar
Myers RM, Maniatis T, Lerman LS (1987) Detection and localization of single base changes by denaturing gradient gel electrophoresis. In: Ray W (ed) Methods in enzymology, vol 155. Academic, San Diego, pp 501–527. doi:10.1016/0076-6879(87)55033-9
Sheffield VC, Cox DR, Lerman LS, Myers RM (1989) Attachment of a 40-base-pair G + C-rich sequence (GC-clamp) to genomic DNA fragments by the polymerase chain reaction results in improved detection of single-base changes. Proc Natl Acad Sci 86(1):232–236
Article PubMed Central CAS PubMed Google Scholar
Schmalenberger A, Tebbe CC (2003) Bacterial diversity in maize rhizospheres: conclusions on the use of genetic profiles based on PCR-amplified partial small subunit rRNA genes in ecological studies. Mol Ecol 12(1):251–262. doi:10.1046/j.1365-294X.2003.01716.x
Article CAS PubMed Google Scholar
Stackebrandt E, Goebel BM (1994) Taxonomic note: a place for DNA–DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol 44(4):846–849. doi:10.1099/00207713-44-4-846
Article CAS Google Scholar
Rosselló-Mora R, Amann R (2001) The species concept for prokaryotes. FEMS Microbiol Rev 25(1):39–67. doi:10.1111/j.1574-6976.2001.tb00571.x
Article PubMed Google Scholar
Hill TCJ, Walsh KA, Harris JA, Moffett BF (2003) Using ecological diversity measures with bacterial communities. FEMS Microbiol Ecol 43:1–11
Article CAS PubMed Google Scholar
Narang R, Dunbar J (2004) Modeling bacterial species abundance from small community surveys. Microb Ecol 47(4):396–406. doi:10.1007/s00248-003-1026-7
Article CAS PubMed Google Scholar
Magurran AE (2004) Measuring biological diversity. Blackwell Science, Oxford
Google Scholar
Loisel P, Harmand J, Zemb O, Latrille E, Lobry C, Delgenès J-P, Godon J-J (2006) Denaturing gradient electrophoresis (DGE) and single-strand conformation polymorphism (SSCP) molecular fingerprintings revisited by simulation and used as a tool to measure microbial diversity. Environ Microbiol 8(4):720–731. doi:10.1111/j.1462-2920.2005.00950.x
Article CAS PubMed Google Scholar
Blackwood CB, Hudleston D, Zak DR, Buyer JS (2007) Interpreting ecological diversity indices applied to terminal restriction fragment length polymorphism data: insights from simulated microbial communities. Appl Environ Microbiol 73(16):5276–5283. doi:10.1128/aem.00514-07
Article PubMed Central CAS PubMed Google Scholar
Doroghazi JR, Buckley DH (2008) Evidence from GC-TRFLP that bacterial communities in soil are lognormally distributed. PLoS One 3(8):e2910. doi:10.1371/journal.pone.0002910
Article PubMed Central PubMed Google Scholar
Dunbar J, Barns SM, Ticknor LO, Kuske CR (2002) Empirical and theoretical bacterial diversity in four Arizona soils. Appl Environ Microbiol 68(6):3035–3045. doi:10.1128/aem.68.6.3035-3045.2002
Article PubMed Central CAS PubMed Google Scholar
Gans J, Wolinsky M, Dunbar J (2005) Computational improvements reveal great bacterial diversity and high metal toxicity in soil. Science 309(5739):1387–1390. doi:10.1126/science.1112665
Article CAS PubMed Google Scholar
İnceoğlu Ö, Al-Soud WA, Salles JF, Semenov AV, van Elsas JD (2011) Comparative analysis of bacterial communities in a potato field as determined by pyrosequencing. PLoS One 6(8):e23321. doi:10.1371/journal.pone.0023321
Article PubMed Central PubMed Google Scholar
Nacke H, Thürmer A, Wollherr A, Will C, Hodac L, Herold N, Schöning I, Schrumpf M, Daniel R (2011) Pyrosequencing-based assessment of bacterial community structure along different management types in German forest and grassland soils. PLoS One 6(2):e17000. doi:10.1371/journal.pone.0017000
Article PubMed Central CAS PubMed Google Scholar
The NCBI Sequence Read Archive (SRA) (2012) Accessed August 1st 2012
Cole JR, Wang Q, Cardenas E, Fish J, Chai B, Farris RJ, Kulam-Syed-Mohideen AS, McGarrell DM, Marsh T, Garrity GM, Tiedje JM (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37(1):141–145. doi:10.1093/nar/gkn879
Article Google Scholar
Khandelwal G, Bhyravabhotla J (2010) A phenomenological model for predicting melting temperatures of DNA sequences. PLoS One 5(8):e12433. doi:10.1371/journal.pone.0012433
Article PubMed Central PubMed Google Scholar
Hutton JR (1977) Renaturation kinetics and thermal stability of DNA in aqueous solutions of formamide and urea. Nucleic Acids Res 4(10):3537–3555. doi:10.1093/nar/4.10.3537
Article PubMed Central CAS PubMed Google Scholar
Hammer O, Ryan P, Harper D (2001) PAST: Paleontological Statistics software package for education and data analysis. Palaeontol Electron 4(1):9
Google Scholar
Legendre P, Legendre L (1998) Numerical ecology. Second English edition. Developments in Environmental Modelling 20. Elsevier, Amsterdam
Fierer N, Breitbart M, Nulton J, Salamon P, Lozupone C, Jones R, Robeson M, Edwards RA, Felts B, Rayhawk S, Knight R, Rohwer F, Jackson RB (2007) Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Appl Environ Microbiol 73(21):7059–7066
Article PubMed Central CAS PubMed Google Scholar
Loisel P, Hamelin J, Godon J-J, Haegeman B, Harmand J (2009) A method for measuring the biological diversity of a sample. European Patent EP20553401
Dohm JC, Lottaz C, Borodina T, Himmelbauer H (2008) Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 36(16):e105. doi:10.1093/nar/gkn425
Article PubMed Central PubMed Google Scholar
Jaenicke S, Ander C, Bekel T, Bisdorf R, Dröge M, Gartemann K-H, Jünemann S, Kaiser O, Krause L, Tille F, Zakrzewski M, Pühler A, Schlüter A, Goesmann A (2011) Comparative and joint analysis of two metagenomic datasets from a biogas fermenter obtained by 454-pyrosequencing. PLoS One 6(1):e14519. doi:10.1371/journal.pone.0014519
Article PubMed Central CAS PubMed Google Scholar
Pinard R, de Winter A, Sarkis G, Gerstein M, Tartaro K, Plant R, Egholm M, Rothberg J, Leamon J (2006) Assessment of whole genome amplification-induced bias through high-throughput, massively parallel whole genome sequencing. BMC Genomics 7(1):216
Article PubMed Central PubMed Google Scholar
Pinto AJ, Raskin L (2012) PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PLoS One 7(8):e43093. doi:10.1371/journal.pone.0043093
Article PubMed Central CAS PubMed Google Scholar

Download references

Acknowledgments

The authors acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada. The CIRAIG would also like to thank its industrial partners for their financial support: ArcelorMittal, Bell Canada, Bombardier, Cascades, Mouvement des caisses Desjardins, Groupe Electricite de France/Gaz de France, Eco Entreprises Quebec, Hydro-Quebec, Johnson & Johnson, Groupe Louis Vuitton Moët Hennessy, Michelin, Nestlé, Recyc-Quebec, Rio Tinto Alcan, RONA, Societe des Alcools du Quebec, Solvay, Total, Umicore, and Veolia Environment.

Author information

Authors and Affiliations

École Polytechnique de Montréal, Chemical Engineering Department, CIRAIG, 2500 chemin de Polytechnique, Montréal, QC, Canada, H3T 1J4
Jonathan Lalande & Louise Deschênes
Environmental Microbiology Research Group, INRS–Institut Armand-Frappier Research Center, 531 boul. des Prairies, Laval, QC, Canada, H7V 1B7
Richard Villemur

Authors

Jonathan Lalande
View author publications
You can also search for this author in PubMed Google Scholar
Richard Villemur
View author publications
You can also search for this author in PubMed Google Scholar
Louise Deschênes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Lalande.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(XLSX 1268 kb)

ESM 2

(PDF 119 kb)

ESM 3

(PDF 111 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lalande, J., Villemur, R. & Deschênes, L. A New Framework to Accurately Quantify Soil Bacterial Community Diversity from DGGE. Microb Ecol 66, 647–658 (2013). https://doi.org/10.1007/s00248-013-0230-3

Download citation

Received: 19 December 2012
Accepted: 11 April 2013
Published: 03 May 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s00248-013-0230-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A New Framework to Accurately Quantify Soil Bacterial Community Diversity from DGGE

Abstract

Similar content being viewed by others

Analysis of Community Dynamics in Environmental Samples Using Denaturing Gradient Gel Electrophoresis

Denaturing Gradient Gel Electrophoresis (DGGE) for Microbial Community Analysis

Denaturing Gradient Gel Electrophoresis (DGGE) to Estimate Fungal Diversity

Introduction